ScoutExtract

AI Structured Data Extraction

Automate Invoice Processing with AI: A Developer's Guide

By RamLabs Team · April 2026 · 5 min read

Manual invoice processing costs businesses an average of $15 per invoice. For a company processing 500 invoices per month, that's $7,500/month in labor — just to copy numbers from PDFs into a system.

This guide shows how to automate the entire process with a few lines of code using ScoutExtract.

The Code

import base64, requests, json

API_KEY = "YOUR_API_KEY"  # Get free at extract.ramlabs.dev/dashboard

def process_invoice(file_path):
    """Extract structured data from an invoice file."""
    with open(file_path, "rb") as f:
        content = base64.b64encode(f.read()).decode()

    response = requests.post(
        "https://api.ramlabs.dev/v1/extract",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "document": content,
            "documentType": "pdf",
            "schema": "invoice"
        }
    )
    return response.json()

# Process a single invoice
result = process_invoice("invoice.pdf")
data = result["data"]

print(f"Invoice: {data['invoice_number']['value']}")
print(f"Vendor:  {data['vendor']['value']}")
print(f"Total:   ${data['total']['value']}")

What Gets Extracted

The built-in invoice schema extracts these fields automatically:

FieldTypeExample
invoice_numberstring"INV-2024-0892"
datestring (ISO)"2024-03-15"
vendorstring"CloudStack Solutions"
line_itemsarray[{description, qty, price, amount}]
subtotalnumber3597.00
taxnumber319.23
totalnumber3916.23
payment_termsstring"Net 30"

Every field includes a confidence score, so you know which extractions to trust and which to review.

Batch Processing

import os

invoice_dir = "./invoices/"

for filename in os.listdir(invoice_dir):
    if filename.endswith(".pdf"):
        result = process_invoice(os.path.join(invoice_dir, filename))

        if result.get("success"):
            data = result["data"]
            print(f"[OK] {filename}: #{data['invoice_number']['value']} - ${data['total']['value']}")
            # save_to_database(data)
        else:
            print(f"[ERR] {filename}: {result.get('message')}")

Smart Routing with Confidence Scores

def process_with_review(data):
    total_confidence = data["total"]["confidence"]
    vendor_confidence = data["vendor"]["confidence"]

    if total_confidence > 0.95 and vendor_confidence > 0.9:
        auto_approve(data)       # High confidence - auto process
        return "auto_approved"

    if total_confidence > 0.8:
        approve_with_flag(data)  # Medium - approve but flag
        return "flagged"

    queue_for_review(data)       # Low - human review
    return "needs_review"

Cost Comparison

MethodCost per invoiceSetup timeMaintenance
Manual data entry$10-20NoneOngoing labor
Custom OCR pipeline$0.50-22-4 weeksHigh
ScoutExtract (Starter)$0.04915 minutesNone

At $0.049 per invoice on the Starter plan ($49/mo for 1,000 extractions), ScoutExtract pays for itself after replacing just 3-4 manually processed invoices.

Start Automating Today

25 free extractions/month. No credit card required.

Get Your API Key →