FAQ

Is this a real working app?

Yes. It is a runnable Windows desktop app with a real PDF-processing pipeline and real output artifacts.

Does it use cloud OCR or cloud AI APIs?

No. The current implementation is local-first and does not depend on cloud parsing services.

Does it support scanned PDFs?

Yes, to a degree. It detects text-poor pages and can apply local OCR when dependencies are available. Scanned PDFs remain harder than native-text PDFs.

Does it write a native Xactimate ESX file?

Not in the proprietary XACTDOC.ZIPXML sense. It writes a standards-based .esx zip package containing XACTDOC.XML, canonical_estimate.json, and manifest.json.

Why is there a `*.canonical.json` output?

Because the project uses a canonical estimate boundary between parsing and export. That file is the easiest way to inspect what the parser actually produced before looking at XML.

Why not convert directly from PDF to XML?

Because estimate PDFs are noisy and inconsistent. The canonical model isolates parser uncertainty from export logic and makes the codebase much easier to debug and evolve.

Where do I start if parsing is wrong?

Start with:

./05_TESTING_AND_DEBUG/DEBUGGING_GUIDE.md
source log: logs/pdf_to_esx_agent.log
packaged log: %LOCALAPPDATA%\PDF-TO-ESX-Agent\logs\pdf_to_esx_agent.log
the generated *.canonical.json

Where do I add support for a new carrier layout?

Usually in:

src/pdf_to_esx_agent/extract/metadata.py
src/pdf_to_esx_agent/extract/line_items.py
src/pdf_to_esx_agent/extract/totals.py
src/pdf_to_esx_agent/parsing/page_classifier.py

See:

./06_CONTRIBUTING/HOW_TO_ADD_NEW_PARSERS.md

Where do I look if multi-PDF jobs behave strangely?

Start with:

Why was SALES FORCE AGENT referenced?

The source repo contained useful patterns for OCR-aware ingestion, staged parsing, canonical normalization, and ESX structure references. This project reused those ideas but deliberately did not import unrelated Salesforce runtime and platform systems.

Is the project cross-platform?

The current target is Windows + local Python + VS Code. The code is mostly Python, but the app and scripts are documented and validated primarily for Windows.

Can I ship this to a non-developer without asking them to install Python?

Yes. The repo now includes a PyInstaller-based Windows onedir build that produces dist\PDF-TO-ESX-Agent\PDF-TO-ESX-Agent.exe.

What are the best first contributions?

parser coverage for new layouts
OCR-heavy recovery improvements
fixture-based regression tests
better ESX compatibility evidence and validation

Where should I start if I only want the shortest path into the repo?

Read in this order:

Why are sample PDFs not in this repo?

Real estimate PDFs may have sharing, privacy, or licensing constraints. The project documents the fixture behavior and paths used during development without assuming those inputs can be redistributed publicly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ

Is this a real working app?

Does it use cloud OCR or cloud AI APIs?

Does it support scanned PDFs?

Does it write a native Xactimate ESX file?

Why is there a `*.canonical.json` output?

Why not convert directly from PDF to XML?

Where do I start if parsing is wrong?

Where do I add support for a new carrier layout?

Where do I look if multi-PDF jobs behave strangely?

Why was SALES FORCE AGENT referenced?

Is the project cross-platform?

Can I ship this to a non-developer without asking them to install Python?

What are the best first contributions?

Where should I start if I only want the shortest path into the repo?

Why are sample PDFs not in this repo?

FilesExpand file tree

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

FAQ

Is this a real working app?

Does it use cloud OCR or cloud AI APIs?

Does it support scanned PDFs?

Does it write a native Xactimate ESX file?

Why is there a *.canonical.json output?

Why not convert directly from PDF to XML?

Where do I start if parsing is wrong?

Where do I add support for a new carrier layout?

Where do I look if multi-PDF jobs behave strangely?

Why was SALES FORCE AGENT referenced?

Is the project cross-platform?

Can I ship this to a non-developer without asking them to install Python?

What are the best first contributions?

Where should I start if I only want the shortest path into the repo?

Why are sample PDFs not in this repo?

Why is there a `*.canonical.json` output?