Build a Resume Parser in 10 Lines of Code
If you're building an ATS, HR tool, or recruiting platform, you need to parse resumes. Candidates upload PDFs in hundreds of different formats — and your system needs structured data.
Here's how to build a production-ready resume parser in 10 lines of Python.
The Code
import base64, requests
def parse_resume(file_path):
with open(file_path, "rb") as f:
content = base64.b64encode(f.read()).decode()
resp = requests.post("https://api.ramlabs.dev/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={"document": content, "documentType": "pdf", "schema": "resume"})
return resp.json()["data"]
# Parse a resume
candidate = parse_resume("sarah_chen_resume.pdf")
print(f"Name: {candidate['name']['value']}")
print(f"Email: {candidate['email']['value']}")
print(f"Skills: {', '.join(candidate['skills']['value'])}")
That's it. No NLP libraries, no training data, no regex.
What Gets Extracted
{
"name": { "value": "Sarah Chen", "confidence": 0.99 },
"email": { "value": "sarah.chen@email.com", "confidence": 0.99 },
"phone": { "value": "(415) 555-0142", "confidence": 0.97 },
"location": { "value": "San Francisco, CA", "confidence": 0.95 },
"experience": {
"value": [
{
"title": "Senior Software Engineer",
"company": "Stripe",
"period": "2021 - Present",
"highlights": [
"Architected payment processing pipeline handling 10M+ transactions/day",
"Led migration from monolith to microservices"
]
}
],
"confidence": 0.94
},
"education": {
"value": [
{ "degree": "B.S. Computer Science", "institution": "Stanford University", "year": "2016" }
],
"confidence": 0.96
},
"skills": {
"value": ["TypeScript", "Python", "Go", "React", "Node.js", "AWS", "Kubernetes"],
"confidence": 0.93
}
}
Traditional vs ScoutExtract
Traditional approach
- Install NLP libraries (spaCy, NLTK)
- Build text extraction pipeline
- Train NER model
- Write regex for phones, emails, dates
- Handle edge cases forever
Time: 2-4 weeks
Accuracy: 70-85%
ScoutExtract approach
- Sign up for API key
- Send resume + schema
- Get structured JSON
Time: 15 minutes
Accuracy: 90-98%
Batch Processing
import os, json
results = []
for filename in os.listdir("./resumes/"):
if filename.endswith((".pdf", ".png", ".jpg")):
data = parse_resume(f"./resumes/{filename}")
results.append({
"file": filename,
"name": data["name"]["value"],
"email": data["email"]["value"],
"skills": data["skills"]["value"],
})
print(f"Parsed: {data['name']['value']} - {len(data['skills']['value'])} skills")
with open("parsed_candidates.json", "w") as f:
json.dump(results, f, indent=2)
Smart Candidate Screening
def screen_candidate(data):
skills = [s.lower() for s in data["skills"]["value"]]
required = ["python", "react", "aws"]
matched = [s for s in required if s in skills]
score = len(matched) * 20
if len(data["experience"]["value"]) >= 3:
score += 20
avg_confidence = sum(
data[f]["confidence"] for f in ["name", "skills", "experience"]
) / 3
return {
"score": score,
"matched_skills": matched,
"confidence": avg_confidence,
"auto_qualify": score >= 60 and avg_confidence > 0.85
}