How To Add New Parsers

Scope

In this repo, "new parsers" usually means improving an existing extraction stage for a new carrier or layout pattern rather than creating a second end-to-end parser stack.

Start With The Right Diagnosis

Symptom	Likely file
guide/sample content leaking into output	`parsing/page_classifier.py`
missing claim/property metadata	`extract/metadata.py`
weak totals	`extract/totals.py`
broken detail rows	`extract/line_items.py`
weak roof metrics	`extract/measurements.py`

Recommended Process

identify which stage is actually failing
capture one or more real examples of the failing pattern
inspect *.canonical.json and warnings before touching XML output
add the smallest rule that fixes the pattern
run tests
re-check a layout that already worked
document the new behavior if coverage changed materially

Parser Contribution Checklist

did the change improve the canonical estimate?
did the change avoid hardcoding fake values?
did the change preserve existing working layouts?
did the change produce clearer warnings when recovery is incomplete?

Important Rule

If the parser learns something new, that knowledge should appear in the canonical estimate first. Do not patch XML output to compensate for parsing gaps.

What To Watch Out For

overfitting a rule to one sample
reintroducing sample/guide-page pollution
distorting line-item math for layouts that already worked
converting warnings into silent wrong data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How To Add New Parsers

Scope

Start With The Right Diagnosis

Recommended Process

Parser Contribution Checklist

Important Rule

What To Watch Out For

Related Docs

FilesExpand file tree

HOW_TO_ADD_NEW_PARSERS.md

Latest commit

History

HOW_TO_ADD_NEW_PARSERS.md

File metadata and controls

How To Add New Parsers

Scope

Start With The Right Diagnosis

Recommended Process

Parser Contribution Checklist

Important Rule

What To Watch Out For

Related Docs