Simple, powerful UBL XML to JSON/CSV converter with built-in exception handling
ublkit is a lightweight wrapper that converts UBL XML documents (Invoice, CreditNote, Order, DespatchAdvice, etc.) to JSON or CSV format with a simple, clean API.
- π Zero Configuration - Works out of the box with sensible defaults
- π Flexible Output - Convert to JSON or flattened CSV format
- π― Single File or Batch - Process one file or entire directories
- π Parallel Processing - Fast batch conversion with multithreading
- π CSV File Splitting - Automatically split large CSVs into manageable chunks
- π‘οΈ Robust Error Handling - Never crashes, always provides detailed error info
- π Comprehensive Logging - Uses py-logex for production-grade logging
- βοΈ YAML Configuration - Easy, flexible configuration
- π¨ Data Preservation - Prevents Excel from corrupting your data
- π Detailed Summaries - File-by-file status and aggregate statistics
pip install ublkitRequirements:
- Python >= 3.8
- lxml >= 4.9.0
- polars >= 0.19.0
- pyyaml >= 6.0
- py-logex-enhanced >= 0.1.0
from ublkit import convert_file
# Convert to JSON
result = convert_file(
xml_path="invoice.xml",
output_format="json",
config_path="./config/ublkit.yaml"
)
# Result contains everything in memory
if result["success"]:
print(f"UBL Type: {result['ubl_document_type']}")
print(f"Processing time: {result['processing_time_seconds']:.2f}s")
data = result["content"] # Your converted data
else:
print(f"Error: {result['error_message']}")from ublkit import convert_batch
# Convert entire directory to CSV
summary = convert_batch(
input_dir="./xml_files",
output_dir="./output",
output_format="csv",
config_path="./config/ublkit.yaml"
)
print(f"Processed: {summary.total_files}")
print(f"Successful: {summary.successful}")
print(f"Failed: {summary.failed}")Create ublkit.yaml in your project root:
# Logging configuration (uses py-logex library)
logging:
level: "INFO"
file: "ublkit.log"
rotation: "500 MB"
retention: "10 days"
compression: "zip"
# Processing configuration
processing:
max_workers: 4 # Parallel threads
encoding: "utf-8"
# CSV output configuration
csv:
max_records_per_file: 50000 # Split large CSVs
preservation_method: "apostrophe" # Prevent Excel corruption
key_separator: " | "
xml:
preserve_namespace_prefix: true
json:
flatten: true # flattened or nested json
separator: "/"
# Output directories
output:
summary_dir: "./summaries"
logs_dir: "./logs"
# Feature flags
features:
enable_dry_run: falsePrevent Excel from corrupting your data:
apostrophe: Prepends'to values (Excel standard)quotes: Wraps values in double quotesbrackets: Wraps values in[]
Convert a single XML file (in-memory, no disk writes).
result = convert_file(
xml_path: str, # Path to UBL XML file
output_format: str, # "json" or "csv"
config_path: str # Path to ublkit.yaml (required)
) -> dictReturns:
{
"success": bool,
"error_message": str,
"processing_time_seconds": float,
"source_file": str,
"file_size_bytes": int,
"ubl_document_type": str,
"output_format": str,
"content": dict | list # Converted data
}Convert multiple XML files (writes to disk).
summary = convert_batch(
input_dir: str, # Directory containing XML files
output_dir: str, # Output directory
output_format: str, # "json" or "csv"
config_path: str # Path to ublkit.yaml (required)
) -> ProcessingSummaryReturns: ProcessingSummary object with:
total_files: Total files processedsuccessful: Successfully convertedfailed: Failed conversionsresults: List of per-file resultsstart_time,end_time: Processing timestamps
# Single file to JSON
ublkit convert invoice.xml --format json --output output.json --config ublkit.yaml
# Batch to CSV
ublkit batch ./xml_files ./output --format csv --config ublkit.yaml
# Dry run (preview without writing)
ublkit batch ./xml_files ./output --dry-run --config ublkit.yamlUBLKit flattens nested XML into key-value pairs:
Key,Value,Filename
Invoice | ID | value,'INV-001',invoice_001.xml
Invoice | IssueDate | value,'2024-12-27',invoice_001.xml
Invoice | AccountingSupplierParty | Party | PartyName | Name | value,'ACME Corp',invoice_001.xmlBenefits:
- β See all data at a glance
- β Easy validation and debugging
- β Works with any UBL document type
- β Automatic file splitting for large datasets
# Clone repository
git clone https://github.com/sherozshaikh/ublkit.git
cd ublkit
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=ublkit --cov-report=html
# Format code
black src tests
isort src tests
# Type checking
mypy srcUBLKit works with any UBL 2.x document type:
- Invoice
- CreditNote
- DebitNote
- Order
- OrderResponse
- DespatchAdvice
- ReceiptAdvice
- ApplicationResponse
- And more...
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
pytest - Submit a pull request
MIT License - see LICENSE file for details.
- Built with lxml for robust XML processing
- Uses polars for efficient CSV operations
- Powered by py-logex for production logging
- Issues: GitHub Issues
- PyPI: https://pypi.org/project/ublkit/
Made with β€οΈ for the UBL community