-
Notifications
You must be signed in to change notification settings - Fork 1
queue management guide
This guide explains how to manage and recover BMLibrarian agent queues when processes are interrupted or encounter problems.
BMLibrarian uses a SQLite-based queue system to process large numbers of documents efficiently. When you submit tasks (like scoring thousands of documents), they're stored in a queue and processed in the background. This prevents memory issues and allows you to track progress.
You might need queue recovery in these situations:
- Process interrupted: You stopped the application while it was processing tasks
- System crash: Your computer crashed or lost power during processing
- Stuck tasks: Tasks that seem to run forever without completing
- Application freeze: The application stopped responding
See what's currently in your queue:
python -m bmlibrarian.queue_cli statusThis shows you:
- How many tasks are pending, processing, completed, or failed
- Whether any tasks are stuck or orphaned
- When tasks were created
Problem: Tasks are stuck (processing for too long)
# Reset stuck tasks to retry them
python -m bmlibrarian.queue_cli recover --timeout 30
# Or mark stuck tasks as failed if they can't be recovered
python -m bmlibrarian.queue_cli recover --timeout 30 --mark-failedProblem: Process crashed and left orphaned tasks
# Clean up tasks from dead processes
python -m bmlibrarian.queue_cli cleanup-deadProblem: Want to start fresh
# Cancel all pending tasks
python -m bmlibrarian.queue_cli cancel
# Remove old completed/failed tasks
python -m bmlibrarian.queue_cli cleanup-old --hours 0What happened: You pressed Ctrl+C or closed the terminal while documents were being processed.
What to do:
-
Check the queue status:
python -m bmlibrarian.queue_cli status
-
If you see "processing" tasks that are no longer running, recover them:
python -m bmlibrarian.queue_cli recover --timeout 5
-
Restart your application to continue processing.
What happened: Your computer crashed, lost power, or the application crashed unexpectedly.
What to do:
-
Check for orphaned tasks:
python -m bmlibrarian.queue_cli status
-
Clean up tasks from dead processes:
python -m bmlibrarian.queue_cli cleanup-dead
-
Recover any stuck tasks:
python -m bmlibrarian.queue_cli recover --timeout 10
What happened: Some tasks have been "processing" for hours without completing.
What to do:
-
Check which tasks are stuck:
python -m bmlibrarian.queue_cli list --status processing
-
If they're genuinely stuck, recover them:
# Try to retry them python -m bmlibrarian.queue_cli recover --timeout 60 # Or mark as failed if they keep getting stuck python -m bmlibrarian.queue_cli recover --timeout 60 --mark-failed
What happened: You want to cancel everything and start fresh.
What to do:
# Cancel all pending tasks
python -m bmlibrarian.queue_cli cancel
# Remove all completed/failed tasks
python -m bmlibrarian.queue_cli cleanup-old --hours 0When you run python -m bmlibrarian.queue_cli status, you'll see:
-
Pending: Tasks waiting to be processed
-
Processing: Tasks currently being worked on
-
Completed: Successfully finished tasks
-
Failed: Tasks that encountered errors
-
Cancelled: Tasks that were manually cancelled
-
Stuck tasks: Tasks processing longer than 30 minutes
-
Orphaned tasks: Tasks from processes that no longer exist
Use this when you had a planned interruption and want to continue where you left off:
python -m bmlibrarian.queue_cli recover --timeout 30Good for: Intentional stops, system updates, planned restarts
Use this when tasks are genuinely broken and shouldn't be retried:
python -m bmlibrarian.queue_cli recover --timeout 30 --mark-failedGood for: System crashes, out-of-memory errors, corrupted data
Use this when you want to completely reset:
python -m bmlibrarian.queue_cli cancel
python -m bmlibrarian.queue_cli cleanup-old --hours 0Good for: Testing, major changes, debugging
python -m bmlibrarian.queue_cli listpython -m bmlibrarian.queue_cli list --status pending
python -m bmlibrarian.queue_cli list --status failedpython -m bmlibrarian.queue_cli list --agent document_scoring_agentpython -m bmlibrarian.queue_cli list --verboseSave your queue data to a JSON file for analysis or backup:
python -m bmlibrarian.queue_cli export my_queue_backup.jsonThis creates a file with all your task data that you can examine or share for troubleshooting.
Check queue status periodically during long processing runs:
# In another terminal window
python -m bmlibrarian.queue_cli statusIf you know your tasks typically take 10 minutes, set recovery timeout to 20-30 minutes:
python -m bmlibrarian.queue_cli recover --timeout 20Clean up old completed tasks periodically to keep your queue database small:
# Remove tasks older than 1 week
python -m bmlibrarian.queue_cli cleanup-old --hours 168Make sure you're running the commands from the correct directory, or specify the database path:
python -m bmlibrarian.queue_cli status -d /path/to/your/agent_queue.dbMake sure you have BMLibrarian installed and are using the correct Python environment:
# Check if module is installed
python -c "import bmlibrarian; print('Installed successfully')"This might indicate:
- Tasks are too complex for available memory
- Network issues (if using remote services like Ollama)
- Database performance problems
Try:
- Reducing batch sizes in your processing
- Checking system resources (memory, CPU)
- Verifying external services are running
The system may have already cleaned up automatically. BMLibrarian includes automatic cleanup when processes shut down gracefully.
If you encounter problems not covered in this guide:
-
Check the queue status first:
python -m bmlibrarian.queue_cli status -
Export queue data:
python -m bmlibrarian.queue_cli export debug.json - Check application logs for error messages
-
Try the recovery demo:
python examples/queue_recovery_demo.py
The recovery system is designed to be safe - it won't delete your data unless you explicitly ask it to. When in doubt, try the gentler recovery options first (like recover without --mark-failed).
Getting Started
Applications
Features
- Workflow Guide
- Agents Guide
- Multi-Model Query Guide
- Query Agent Guide
- Citation Guide
- Reporting Guide
- Counterfactual Guide
Advanced
Architecture
Systems
- Workflow System
- Queue System Architecture
- Citation System
- Reporting System
- Counterfactual System
- Multi-Model Architecture
Contributing