ScoutExtract

AI Structured Data Extraction

Build a Resume Parser in 10 Lines of Code

By RamLabs Team · April 2026 · 4 min read

If you're building an ATS, HR tool, or recruiting platform, you need to parse resumes. Candidates upload PDFs in hundreds of different formats — and your system needs structured data.

Here's how to build a production-ready resume parser in 10 lines of Python.

The Code

import base64, requests

def parse_resume(file_path):
    with open(file_path, "rb") as f:
        content = base64.b64encode(f.read()).decode()

    resp = requests.post("https://api.ramlabs.dev/v1/extract",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={"document": content, "documentType": "pdf", "schema": "resume"})

    return resp.json()["data"]

# Parse a resume
candidate = parse_resume("sarah_chen_resume.pdf")
print(f"Name: {candidate['name']['value']}")
print(f"Email: {candidate['email']['value']}")
print(f"Skills: {', '.join(candidate['skills']['value'])}")

That's it. No NLP libraries, no training data, no regex.

What Gets Extracted

{
  "name": { "value": "Sarah Chen", "confidence": 0.99 },
  "email": { "value": "sarah.chen@email.com", "confidence": 0.99 },
  "phone": { "value": "(415) 555-0142", "confidence": 0.97 },
  "location": { "value": "San Francisco, CA", "confidence": 0.95 },
  "experience": {
    "value": [
      {
        "title": "Senior Software Engineer",
        "company": "Stripe",
        "period": "2021 - Present",
        "highlights": [
          "Architected payment processing pipeline handling 10M+ transactions/day",
          "Led migration from monolith to microservices"
        ]
      }
    ],
    "confidence": 0.94
  },
  "education": {
    "value": [
      { "degree": "B.S. Computer Science", "institution": "Stanford University", "year": "2016" }
    ],
    "confidence": 0.96
  },
  "skills": {
    "value": ["TypeScript", "Python", "Go", "React", "Node.js", "AWS", "Kubernetes"],
    "confidence": 0.93
  }
}

Traditional vs ScoutExtract

Traditional approach

  1. Install NLP libraries (spaCy, NLTK)
  2. Build text extraction pipeline
  3. Train NER model
  4. Write regex for phones, emails, dates
  5. Handle edge cases forever

Time: 2-4 weeks
Accuracy: 70-85%

ScoutExtract approach

  1. Sign up for API key
  2. Send resume + schema
  3. Get structured JSON

Time: 15 minutes
Accuracy: 90-98%

Batch Processing

import os, json

results = []
for filename in os.listdir("./resumes/"):
    if filename.endswith((".pdf", ".png", ".jpg")):
        data = parse_resume(f"./resumes/{filename}")
        results.append({
            "file": filename,
            "name": data["name"]["value"],
            "email": data["email"]["value"],
            "skills": data["skills"]["value"],
        })
        print(f"Parsed: {data['name']['value']} - {len(data['skills']['value'])} skills")

with open("parsed_candidates.json", "w") as f:
    json.dump(results, f, indent=2)

Smart Candidate Screening

def screen_candidate(data):
    skills = [s.lower() for s in data["skills"]["value"]]
    required = ["python", "react", "aws"]
    matched = [s for s in required if s in skills]

    score = len(matched) * 20
    if len(data["experience"]["value"]) >= 3:
        score += 20

    avg_confidence = sum(
        data[f]["confidence"] for f in ["name", "skills", "experience"]
    ) / 3

    return {
        "score": score,
        "matched_skills": matched,
        "confidence": avg_confidence,
        "auto_qualify": score >= 60 and avg_confidence > 0.85
    }

Parse Your First Resume

25 free extractions/month. No credit card required.

Get Your API Key →