In this repo, "new parsers" usually means improving an existing extraction stage for a new carrier or layout pattern rather than creating a second end-to-end parser stack.
| Symptom | Likely file |
|---|---|
| guide/sample content leaking into output | parsing/page_classifier.py |
| missing claim/property metadata | extract/metadata.py |
| weak totals | extract/totals.py |
| broken detail rows | extract/line_items.py |
| weak roof metrics | extract/measurements.py |
- identify which stage is actually failing
- capture one or more real examples of the failing pattern
- inspect
*.canonical.jsonand warnings before touching XML output - add the smallest rule that fixes the pattern
- run tests
- re-check a layout that already worked
- document the new behavior if coverage changed materially
- did the change improve the canonical estimate?
- did the change avoid hardcoding fake values?
- did the change preserve existing working layouts?
- did the change produce clearer warnings when recovery is incomplete?
If the parser learns something new, that knowledge should appear in the canonical estimate first. Do not patch XML output to compensate for parsing gaps.
- overfitting a rule to one sample
- reintroducing sample/guide-page pollution
- distorting line-item math for layouts that already worked
- converting warnings into silent wrong data