Automate Invoice Processing with AI: A Developer's Guide
Manual invoice processing costs businesses an average of $15 per invoice. For a company processing 500 invoices per month, that's $7,500/month in labor — just to copy numbers from PDFs into a system.
This guide shows how to automate the entire process with a few lines of code using ScoutExtract.
The Code
import base64, requests, json
API_KEY = "YOUR_API_KEY" # Get free at extract.ramlabs.dev/dashboard
def process_invoice(file_path):
"""Extract structured data from an invoice file."""
with open(file_path, "rb") as f:
content = base64.b64encode(f.read()).decode()
response = requests.post(
"https://api.ramlabs.dev/v1/extract",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"document": content,
"documentType": "pdf",
"schema": "invoice"
}
)
return response.json()
# Process a single invoice
result = process_invoice("invoice.pdf")
data = result["data"]
print(f"Invoice: {data['invoice_number']['value']}")
print(f"Vendor: {data['vendor']['value']}")
print(f"Total: ${data['total']['value']}")
What Gets Extracted
The built-in invoice schema extracts these fields automatically:
| Field | Type | Example |
|---|---|---|
invoice_number | string | "INV-2024-0892" |
date | string (ISO) | "2024-03-15" |
vendor | string | "CloudStack Solutions" |
line_items | array | [{description, qty, price, amount}] |
subtotal | number | 3597.00 |
tax | number | 319.23 |
total | number | 3916.23 |
payment_terms | string | "Net 30" |
Every field includes a confidence score, so you know which extractions to trust and which to review.
Batch Processing
import os
invoice_dir = "./invoices/"
for filename in os.listdir(invoice_dir):
if filename.endswith(".pdf"):
result = process_invoice(os.path.join(invoice_dir, filename))
if result.get("success"):
data = result["data"]
print(f"[OK] {filename}: #{data['invoice_number']['value']} - ${data['total']['value']}")
# save_to_database(data)
else:
print(f"[ERR] {filename}: {result.get('message')}")
Smart Routing with Confidence Scores
def process_with_review(data):
total_confidence = data["total"]["confidence"]
vendor_confidence = data["vendor"]["confidence"]
if total_confidence > 0.95 and vendor_confidence > 0.9:
auto_approve(data) # High confidence - auto process
return "auto_approved"
if total_confidence > 0.8:
approve_with_flag(data) # Medium - approve but flag
return "flagged"
queue_for_review(data) # Low - human review
return "needs_review"
Cost Comparison
| Method | Cost per invoice | Setup time | Maintenance |
|---|---|---|---|
| Manual data entry | $10-20 | None | Ongoing labor |
| Custom OCR pipeline | $0.50-2 | 2-4 weeks | High |
| ScoutExtract (Starter) | $0.049 | 15 minutes | None |
At $0.049 per invoice on the Starter plan ($49/mo for 1,000 extractions), ScoutExtract pays for itself after replacing just 3-4 manually processed invoices.