Yes. It is a runnable Windows desktop app with a real PDF-processing pipeline and real output artifacts.
No. The current implementation is local-first and does not depend on cloud parsing services.
Yes, to a degree. It detects text-poor pages and can apply local OCR when dependencies are available. Scanned PDFs remain harder than native-text PDFs.
Not in the proprietary XACTDOC.ZIPXML sense. It writes a standards-based .esx zip package containing XACTDOC.XML, canonical_estimate.json, and manifest.json.
Because the project uses a canonical estimate boundary between parsing and export. That file is the easiest way to inspect what the parser actually produced before looking at XML.
Because estimate PDFs are noisy and inconsistent. The canonical model isolates parser uncertainty from export logic and makes the codebase much easier to debug and evolve.
Start with:
- ./05_TESTING_AND_DEBUG/DEBUGGING_GUIDE.md
- source log:
logs/pdf_to_esx_agent.log - packaged log:
%LOCALAPPDATA%\PDF-TO-ESX-Agent\logs\pdf_to_esx_agent.log - the generated
*.canonical.json
Usually in:
src/pdf_to_esx_agent/extract/metadata.pysrc/pdf_to_esx_agent/extract/line_items.pysrc/pdf_to_esx_agent/extract/totals.pysrc/pdf_to_esx_agent/parsing/page_classifier.py
See:
Start with:
- ./02_ARCHITECTURE/MERGE_AND_RECONCILIATION.md
- ./02_ARCHITECTURE/CANONICAL_MODEL.md
- ./05_TESTING_AND_DEBUG/COMMON_FAILURE_PATTERNS.md
The source repo contained useful patterns for OCR-aware ingestion, staged parsing, canonical normalization, and ESX structure references. This project reused those ideas but deliberately did not import unrelated Salesforce runtime and platform systems.
The current target is Windows + local Python + VS Code. The code is mostly Python, but the app and scripts are documented and validated primarily for Windows.
Yes. The repo now includes a PyInstaller-based Windows onedir build that produces dist\PDF-TO-ESX-Agent\PDF-TO-ESX-Agent.exe.
- parser coverage for new layouts
- OCR-heavy recovery improvements
- fixture-based regression tests
- better ESX compatibility evidence and validation
Read in this order:
- ./00_START_HERE/PROJECT_OVERVIEW.md
- ./00_START_HERE/QUICK_START_FOR_DEVELOPERS.md
- ./02_ARCHITECTURE/SYSTEM_ARCHITECTURE.md
- ./06_CONTRIBUTING/CODEBASE_TOUR.md
Real estimate PDFs may have sharing, privacy, or licensing constraints. The project documents the fixture behavior and paths used during development without assuming those inputs can be redistributed publicly.