Skip to content

sherozshaikh/ublkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ublkit

Simple, powerful UBL XML to JSON/CSV converter with built-in exception handling

PyPI version Python Versions License: MIT

ublkit is a lightweight wrapper that converts UBL XML documents (Invoice, CreditNote, Order, DespatchAdvice, etc.) to JSON or CSV format with a simple, clean API.


✨ Features

  • πŸš€ Zero Configuration - Works out of the box with sensible defaults
  • πŸ“ Flexible Output - Convert to JSON or flattened CSV format
  • 🎯 Single File or Batch - Process one file or entire directories
  • πŸ”„ Parallel Processing - Fast batch conversion with multithreading
  • πŸ“Š CSV File Splitting - Automatically split large CSVs into manageable chunks
  • πŸ›‘οΈ Robust Error Handling - Never crashes, always provides detailed error info
  • πŸ“ Comprehensive Logging - Uses py-logex for production-grade logging
  • βš™οΈ YAML Configuration - Easy, flexible configuration
  • 🎨 Data Preservation - Prevents Excel from corrupting your data
  • πŸ“‹ Detailed Summaries - File-by-file status and aggregate statistics

πŸ“¦ Installation

pip install ublkit

Requirements:

  • Python >= 3.8
  • lxml >= 4.9.0
  • polars >= 0.19.0
  • pyyaml >= 6.0
  • py-logex-enhanced >= 0.1.0

πŸš€ Quick Start

Single File Conversion

from ublkit import convert_file

# Convert to JSON
result = convert_file(
    xml_path="invoice.xml",
    output_format="json",
    config_path="./config/ublkit.yaml"
)

# Result contains everything in memory
if result["success"]:
    print(f"UBL Type: {result['ubl_document_type']}")
    print(f"Processing time: {result['processing_time_seconds']:.2f}s")
    data = result["content"]  # Your converted data
else:
    print(f"Error: {result['error_message']}")

Batch Processing

from ublkit import convert_batch

# Convert entire directory to CSV
summary = convert_batch(
    input_dir="./xml_files",
    output_dir="./output",
    output_format="csv",
    config_path="./config/ublkit.yaml"
)

print(f"Processed: {summary.total_files}")
print(f"Successful: {summary.successful}")
print(f"Failed: {summary.failed}")

βš™οΈ Configuration

Create ublkit.yaml in your project root:

# Logging configuration (uses py-logex library)
logging:
  level: "INFO"
  file: "ublkit.log"
  rotation: "500 MB"
  retention: "10 days"
  compression: "zip"

# Processing configuration
processing:
  max_workers: 4                   # Parallel threads
  encoding: "utf-8"

# CSV output configuration
csv:
  max_records_per_file: 50000       # Split large CSVs
  preservation_method: "apostrophe" # Prevent Excel corruption
  key_separator: " | "

xml:
  preserve_namespace_prefix: true

json:
  flatten: true                  # flattened or nested json
  separator: "/"

# Output directories
output:
  summary_dir: "./summaries"
  logs_dir: "./logs"

# Feature flags
features:
  enable_dry_run: false

CSV Preservation Methods

Prevent Excel from corrupting your data:

  • apostrophe: Prepends ' to values (Excel standard)
  • quotes: Wraps values in double quotes
  • brackets: Wraps values in [ ]

🎯 API Reference

convert_file()

Convert a single XML file (in-memory, no disk writes).

result = convert_file(
    xml_path: str,              # Path to UBL XML file
    output_format: str,         # "json" or "csv"
    config_path: str            # Path to ublkit.yaml (required)
) -> dict

Returns:

{
    "success": bool,
    "error_message": str,
    "processing_time_seconds": float,
    "source_file": str,
    "file_size_bytes": int,
    "ubl_document_type": str,
    "output_format": str,
    "content": dict | list      # Converted data
}

convert_batch()

Convert multiple XML files (writes to disk).

summary = convert_batch(
    input_dir: str,             # Directory containing XML files
    output_dir: str,            # Output directory
    output_format: str,         # "json" or "csv"
    config_path: str            # Path to ublkit.yaml (required)
) -> ProcessingSummary

Returns: ProcessingSummary object with:

  • total_files: Total files processed
  • successful: Successfully converted
  • failed: Failed conversions
  • results: List of per-file results
  • start_time, end_time: Processing timestamps

πŸ› οΈ CLI Usage

# Single file to JSON
ublkit convert invoice.xml --format json --output output.json --config ublkit.yaml

# Batch to CSV
ublkit batch ./xml_files ./output --format csv --config ublkit.yaml

# Dry run (preview without writing)
ublkit batch ./xml_files ./output --dry-run --config ublkit.yaml

πŸ“Š CSV Output Format

UBLKit flattens nested XML into key-value pairs:

Key,Value,Filename
Invoice | ID | value,'INV-001',invoice_001.xml
Invoice | IssueDate | value,'2024-12-27',invoice_001.xml
Invoice | AccountingSupplierParty | Party | PartyName | Name | value,'ACME Corp',invoice_001.xml

Benefits:

  • βœ… See all data at a glance
  • βœ… Easy validation and debugging
  • βœ… Works with any UBL document type
  • βœ… Automatic file splitting for large datasets

πŸ§ͺ Development

# Clone repository
git clone https://github.com/sherozshaikh/ublkit.git
cd ublkit

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=ublkit --cov-report=html

# Format code
black src tests
isort src tests

# Type checking
mypy src

πŸ“– Supported UBL Document Types

UBLKit works with any UBL 2.x document type:

  • Invoice
  • CreditNote
  • DebitNote
  • Order
  • OrderResponse
  • DespatchAdvice
  • ReceiptAdvice
  • ApplicationResponse
  • And more...

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: pytest
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details.


πŸ™ Acknowledgments

  • Built with lxml for robust XML processing
  • Uses polars for efficient CSV operations
  • Powered by py-logex for production logging

πŸ“§ Support


Made with ❀️ for the UBL community