getting started

Getting Started with BMLibrarian

This guide will help you install and configure BMLibrarian for the first time.

Prerequisites

Before installing BMLibrarian, ensure you have the following:

Required Software

Python 3.12 or higher

python --version  # Should show 3.12.0 or higher

PostgreSQL 14 or higher with pgvector extension

psql --version  # Should show PostgreSQL 14.0 or higher

Ollama for local LLM inference

ollama --version  # Install from https://ollama.ai

uv package manager (recommended)

curl -LsSf https://astral.sh/uv/install.sh | sh

System Requirements

RAM: 16GB minimum, 32GB recommended (for local LLM models)
Storage: 500GB+ recommended for full PubMed database
OS: Linux, macOS, or Windows (WSL recommended for Windows)

Installation

Step 1: Clone the Repository

git clone https://github.com/hherb/bmlibrarian.git
cd bmlibrarian

Step 2: Install Dependencies

Using uv (recommended):

uv sync

Using pip:

pip install -e .

Step 3: Set Up PostgreSQL Database

Create a PostgreSQL database with the pgvector extension:

# Create database
createdb knowledgebase

# Enable pgvector extension
psql knowledgebase -c "CREATE EXTENSION IF NOT EXISTS vector;"

Step 4: Configure Environment

Copy the example environment file and edit it:

cp test_database.env.example .env

Edit .env with your database credentials:

POSTGRES_DB=knowledgebase
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

PDF_BASE_DIR=~/knowledgebase/pdf

Step 5: Configure BMLibrarian

Create the configuration file:

mkdir -p ~/.bmlibrarian

Create ~/.bmlibrarian/config.json:

{
  "database": {
    "name": "knowledgebase",
    "user": "your_username",
    "password": "your_password",
    "host": "localhost",
    "port": "5432"
  },
  "ollama": {
    "host": "http://localhost:11434"
  },
  "agents": {
    "query": {
      "model": "medgemma4B_it_q8:latest",
      "temperature": 0.1,
      "top_p": 0.9
    },
    "scoring": {
      "model": "gpt-oss:20b",
      "temperature": 0.1,
      "top_p": 0.9
    },
    "citation": {
      "model": "gpt-oss:20b",
      "temperature": 0.1,
      "top_p": 0.9
    },
    "reporting": {
      "model": "gpt-oss:20b",
      "temperature": 0.2,
      "top_p": 0.9
    }
  }
}

Step 6: Install Ollama Models

Download the recommended models:

# Fast model for query generation
ollama pull medgemma4B_it_q8:latest

# Powerful model for analysis and reporting
ollama pull gpt-oss:20b

Step 7: Initialize Database and Import Data

Run the setup script to create the database schema and import initial data:

# Full setup with medRxiv and PubMed sample data
uv run python initial_setup_and_download.py .env

# Quick setup (schema only, no data import)
uv run python initial_setup_and_download.py .env --skip-medrxiv --skip-pubmed

# Limited import for testing
uv run python initial_setup_and_download.py .env --medrxiv-days 7 --pubmed-max-results 1000

This will:

Create the database schema
Import medRxiv preprints (optional)
Import PubMed articles (optional)
Generate embeddings for semantic search (optional)

Note: Full PubMed import requires ~400GB of storage and several days. Start with the quick setup for testing.

Verification

Test Database Connection

uv run python -c "
from bmlibrarian.database import get_db_manager
db = get_db_manager()
with db.get_connection() as conn:
    with conn.cursor() as cur:
        cur.execute('SELECT COUNT(*) FROM document;')
        print(f'Documents in database: {cur.fetchone()[0]}')
"

Test Ollama Connection

curl http://localhost:11434/api/tags

Should return a list of installed models.

Test Configuration

uv run python -c "
from bmlibrarian.config import get_config
config = get_config()
print('Configuration loaded successfully!')
print(f'Database: {config[\"database\"][\"name\"]}')
print(f'Ollama: {config[\"ollama\"][\"host\"]}')
"

First Run

Option 1: Qt GUI (Recommended for New Users)

Launch the desktop application:

uv run python bmlibrarian_qt.py

The GUI provides:

Research tab for automated literature research
Configuration tab for settings
Query Lab for interactive query development
PICO Lab for systematic review components
Document Interrogation for AI-powered document Q&A

Option 2: Interactive CLI

Launch the command-line interface:

uv run python bmlibrarian_cli.py

Follow the prompts to:

Enter your research question
Review and edit the generated database query
Score documents for relevance
Extract citations
Generate a comprehensive report

Option 3: Fact Checker

Validate biomedical statements:

# Create a sample statements file
cat > statements.json << 'EOF'
[
  {
    "id": 1,
    "statement": "Aspirin reduces the risk of cardiovascular events.",
    "expected_answer": "yes"
  },
  {
    "id": 2,
    "statement": "Vitamin C prevents the common cold.",
    "expected_answer": "no"
  }
]
EOF

# Run fact checker
uv run python fact_checker_cli.py statements.json -o results.json

# Review results in GUI
uv run python fact_checker_review_gui.py --input-file results.db

Common Issues and Solutions

Issue: "Connection to Ollama failed"

Solution: Ensure Ollama is running:

ollama serve

Issue: "Database connection failed"

Solution: Check PostgreSQL is running and credentials are correct:

psql -h localhost -U your_username -d knowledgebase

Issue: "pgvector extension not found"

Solution: Install pgvector extension:

# Ubuntu/Debian
sudo apt install postgresql-15-pgvector

# macOS (using Homebrew)
brew install pgvector

# Then enable in your database
psql knowledgebase -c "CREATE EXTENSION IF NOT EXISTS vector;"

Issue: "Model not found in Ollama"

Solution: Pull the model:

ollama pull medgemma4B_it_q8:latest
ollama pull gpt-oss:20b

Issue: "Out of memory during import"

Solution: Import data in smaller batches:

# Import only recent medRxiv preprints
uv run python medrxiv_import_cli.py update --days 30

# Import PubMed by specific queries instead of bulk download
uv run python pubmed_import_cli.py search "COVID-19" --max-results 10000

Data Import Options

Quick Start (Minimal Data)

For testing and development:

# Import last 7 days of medRxiv
uv run python medrxiv_import_cli.py update --days 7

# Import specific PubMed articles
uv run python pubmed_import_cli.py search "machine learning medical imaging" --max-results 1000

Moderate Setup (Recommended)

For typical research use:

# Import last 90 days of medRxiv
uv run python medrxiv_import_cli.py update --days 90 --download-pdfs

# Import PubMed articles by topic
uv run python pubmed_import_cli.py search "cardiovascular disease" --max-results 50000
uv run python pubmed_import_cli.py search "diabetes treatment" --max-results 50000

Full Mirror (Advanced)

For complete PubMed mirror (~400GB):

# Download complete PubMed baseline
uv run python pubmed_bulk_cli.py download-baseline

# Import baseline into database (takes several days)
uv run python pubmed_bulk_cli.py import --type baseline

# Set up daily updates
uv run python pubmed_bulk_cli.py sync --updates-only

Next Steps

Now that BMLibrarian is installed and configured:

Learn the Basics
- Read the User Guide for a comprehensive tutorial
- Explore the Qt GUI Guide for desktop application features
- Check the CLI Reference for command-line tools
Try Advanced Features
- Multi-Model Query Generation - Use multiple AI models for better results
- Fact Checker Guide - Validate biomedical statements
- Query Optimization - Improve search quality
Customize BMLibrarian
- Configuration Guide - Tune parameters for your needs
- Plugin Development - Extend the Qt GUI
- Agent Development - Create custom AI agents
Join the Community
- Report bugs on GitHub Issues
- Share your research workflows
- Contribute to the project - see Contributing Guidelines

Quick Reference

Common Commands

# Launch Qt GUI
uv run python bmlibrarian_qt.py

# Launch CLI
uv run python bmlibrarian_cli.py

# Run fact checker
uv run python fact_checker_cli.py statements.json -o results.json

# Import medRxiv preprints
uv run python medrxiv_import_cli.py update --days 30 --download-pdfs

# Import PubMed articles
uv run python pubmed_import_cli.py search "your query" --max-results 10000

# Generate embeddings
uv run python embed_documents_cli.py embed --source medrxiv --limit 1000

# Check import status
uv run python medrxiv_import_cli.py status
uv run python pubmed_import_cli.py status

Important File Locations

Configuration: ~/.bmlibrarian/config.json
GUI Settings: ~/.bmlibrarian/gui_config.json
Database Environment: .env (in project root)
PDF Storage: ~/knowledgebase/pdf/ (configurable)
Log Files: ~/.bmlibrarian/logs/

Getting Help

If you encounter issues:

Check the Troubleshooting Guide
Search GitHub Issues
Review relevant documentation pages
Ask in GitHub Discussions
Report bugs with detailed error messages

Congratulations! 🎉 BMLibrarian is now ready to use. Happy researching!

BMLibrarian | GitHub | Issues | Version 0.6+

BMLibrarian Wiki

Home

User Guides

Getting Started

Applications

Features

Advanced

Developer Docs

Architecture

Systems

Contributing

GitHub Repository

getting started

Getting Started with BMLibrarian

Prerequisites

Required Software

System Requirements

Installation

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Set Up PostgreSQL Database

Step 4: Configure Environment

Step 5: Configure BMLibrarian

Step 6: Install Ollama Models

Step 7: Initialize Database and Import Data

Verification

Test Database Connection

Test Ollama Connection

Test Configuration

First Run

Option 1: Qt GUI (Recommended for New Users)

Option 2: Interactive CLI

Option 3: Fact Checker

Common Issues and Solutions

Issue: "Connection to Ollama failed"

Issue: "Database connection failed"

Issue: "pgvector extension not found"

Issue: "Model not found in Ollama"

Issue: "Out of memory during import"

Data Import Options

Quick Start (Minimal Data)

Moderate Setup (Recommended)

Full Mirror (Advanced)

Next Steps

Quick Reference

Common Commands

Important File Locations

Getting Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!