-
Notifications
You must be signed in to change notification settings - Fork 1
getting started
This guide will help you install and configure BMLibrarian for the first time.
Before installing BMLibrarian, ensure you have the following:
-
Python 3.12 or higher
python --version # Should show 3.12.0 or higher -
PostgreSQL 14 or higher with pgvector extension
psql --version # Should show PostgreSQL 14.0 or higher -
Ollama for local LLM inference
ollama --version # Install from https://ollama.ai -
uv package manager (recommended)
curl -LsSf https://astral.sh/uv/install.sh | sh
- RAM: 16GB minimum, 32GB recommended (for local LLM models)
- Storage: 500GB+ recommended for full PubMed database
- OS: Linux, macOS, or Windows (WSL recommended for Windows)
git clone https://github.com/hherb/bmlibrarian.git
cd bmlibrarianUsing uv (recommended):
uv syncUsing pip:
pip install -e .Create a PostgreSQL database with the pgvector extension:
# Create database
createdb knowledgebase
# Enable pgvector extension
psql knowledgebase -c "CREATE EXTENSION IF NOT EXISTS vector;"Copy the example environment file and edit it:
cp test_database.env.example .envEdit .env with your database credentials:
POSTGRES_DB=knowledgebase
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
PDF_BASE_DIR=~/knowledgebase/pdfCreate the configuration file:
mkdir -p ~/.bmlibrarianCreate ~/.bmlibrarian/config.json:
{
"database": {
"name": "knowledgebase",
"user": "your_username",
"password": "your_password",
"host": "localhost",
"port": "5432"
},
"ollama": {
"host": "http://localhost:11434"
},
"agents": {
"query": {
"model": "medgemma4B_it_q8:latest",
"temperature": 0.1,
"top_p": 0.9
},
"scoring": {
"model": "gpt-oss:20b",
"temperature": 0.1,
"top_p": 0.9
},
"citation": {
"model": "gpt-oss:20b",
"temperature": 0.1,
"top_p": 0.9
},
"reporting": {
"model": "gpt-oss:20b",
"temperature": 0.2,
"top_p": 0.9
}
}
}Download the recommended models:
# Fast model for query generation
ollama pull medgemma4B_it_q8:latest
# Powerful model for analysis and reporting
ollama pull gpt-oss:20bRun the setup script to create the database schema and import initial data:
# Full setup with medRxiv and PubMed sample data
uv run python initial_setup_and_download.py .env
# Quick setup (schema only, no data import)
uv run python initial_setup_and_download.py .env --skip-medrxiv --skip-pubmed
# Limited import for testing
uv run python initial_setup_and_download.py .env --medrxiv-days 7 --pubmed-max-results 1000This will:
- Create the database schema
- Import medRxiv preprints (optional)
- Import PubMed articles (optional)
- Generate embeddings for semantic search (optional)
Note: Full PubMed import requires ~400GB of storage and several days. Start with the quick setup for testing.
uv run python -c "
from bmlibrarian.database import get_db_manager
db = get_db_manager()
with db.get_connection() as conn:
with conn.cursor() as cur:
cur.execute('SELECT COUNT(*) FROM document;')
print(f'Documents in database: {cur.fetchone()[0]}')
"curl http://localhost:11434/api/tagsShould return a list of installed models.
uv run python -c "
from bmlibrarian.config import get_config
config = get_config()
print('Configuration loaded successfully!')
print(f'Database: {config[\"database\"][\"name\"]}')
print(f'Ollama: {config[\"ollama\"][\"host\"]}')
"Launch the desktop application:
uv run python bmlibrarian_qt.pyThe GUI provides:
- Research tab for automated literature research
- Configuration tab for settings
- Query Lab for interactive query development
- PICO Lab for systematic review components
- Document Interrogation for AI-powered document Q&A
Launch the command-line interface:
uv run python bmlibrarian_cli.pyFollow the prompts to:
- Enter your research question
- Review and edit the generated database query
- Score documents for relevance
- Extract citations
- Generate a comprehensive report
Validate biomedical statements:
# Create a sample statements file
cat > statements.json << 'EOF'
[
{
"id": 1,
"statement": "Aspirin reduces the risk of cardiovascular events.",
"expected_answer": "yes"
},
{
"id": 2,
"statement": "Vitamin C prevents the common cold.",
"expected_answer": "no"
}
]
EOF
# Run fact checker
uv run python fact_checker_cli.py statements.json -o results.json
# Review results in GUI
uv run python fact_checker_review_gui.py --input-file results.dbSolution: Ensure Ollama is running:
ollama serveSolution: Check PostgreSQL is running and credentials are correct:
psql -h localhost -U your_username -d knowledgebaseSolution: Install pgvector extension:
# Ubuntu/Debian
sudo apt install postgresql-15-pgvector
# macOS (using Homebrew)
brew install pgvector
# Then enable in your database
psql knowledgebase -c "CREATE EXTENSION IF NOT EXISTS vector;"Solution: Pull the model:
ollama pull medgemma4B_it_q8:latest
ollama pull gpt-oss:20bSolution: Import data in smaller batches:
# Import only recent medRxiv preprints
uv run python medrxiv_import_cli.py update --days 30
# Import PubMed by specific queries instead of bulk download
uv run python pubmed_import_cli.py search "COVID-19" --max-results 10000For testing and development:
# Import last 7 days of medRxiv
uv run python medrxiv_import_cli.py update --days 7
# Import specific PubMed articles
uv run python pubmed_import_cli.py search "machine learning medical imaging" --max-results 1000For typical research use:
# Import last 90 days of medRxiv
uv run python medrxiv_import_cli.py update --days 90 --download-pdfs
# Import PubMed articles by topic
uv run python pubmed_import_cli.py search "cardiovascular disease" --max-results 50000
uv run python pubmed_import_cli.py search "diabetes treatment" --max-results 50000For complete PubMed mirror (~400GB):
# Download complete PubMed baseline
uv run python pubmed_bulk_cli.py download-baseline
# Import baseline into database (takes several days)
uv run python pubmed_bulk_cli.py import --type baseline
# Set up daily updates
uv run python pubmed_bulk_cli.py sync --updates-onlyNow that BMLibrarian is installed and configured:
-
Learn the Basics
- Read the User Guide for a comprehensive tutorial
- Explore the Qt GUI Guide for desktop application features
- Check the CLI Reference for command-line tools
-
Try Advanced Features
- Multi-Model Query Generation - Use multiple AI models for better results
- Fact Checker Guide - Validate biomedical statements
- Query Optimization - Improve search quality
-
Customize BMLibrarian
- Configuration Guide - Tune parameters for your needs
- Plugin Development - Extend the Qt GUI
- Agent Development - Create custom AI agents
-
Join the Community
- Report bugs on GitHub Issues
- Share your research workflows
- Contribute to the project - see Contributing Guidelines
# Launch Qt GUI
uv run python bmlibrarian_qt.py
# Launch CLI
uv run python bmlibrarian_cli.py
# Run fact checker
uv run python fact_checker_cli.py statements.json -o results.json
# Import medRxiv preprints
uv run python medrxiv_import_cli.py update --days 30 --download-pdfs
# Import PubMed articles
uv run python pubmed_import_cli.py search "your query" --max-results 10000
# Generate embeddings
uv run python embed_documents_cli.py embed --source medrxiv --limit 1000
# Check import status
uv run python medrxiv_import_cli.py status
uv run python pubmed_import_cli.py status-
Configuration:
~/.bmlibrarian/config.json -
GUI Settings:
~/.bmlibrarian/gui_config.json -
Database Environment:
.env(in project root) -
PDF Storage:
~/knowledgebase/pdf/(configurable) -
Log Files:
~/.bmlibrarian/logs/
If you encounter issues:
- Check the Troubleshooting Guide
- Search GitHub Issues
- Review relevant documentation pages
- Ask in GitHub Discussions
- Report bugs with detailed error messages
Congratulations! 🎉 BMLibrarian is now ready to use. Happy researching!
Getting Started
Applications
Features
- Workflow Guide
- Agents Guide
- Multi-Model Query Guide
- Query Agent Guide
- Citation Guide
- Reporting Guide
- Counterfactual Guide
Advanced
Architecture
Systems
- Workflow System
- Queue System Architecture
- Citation System
- Reporting System
- Counterfactual System
- Multi-Model Architecture
Contributing