contributing

Contributing to BMLibrarian

Thank you for your interest in contributing to BMLibrarian!

This document provides guidelines for contributing to the project. All contributions are welcome, from bug reports to new features.

Code of Conduct

Be respectful, inclusive, and professional. We welcome contributors of all backgrounds and experience levels.

Getting Started

Prerequisites

Python 3.12 or higher
PostgreSQL 14+ with pgvector extension
Ollama for local LLM inference
Git for version control
uv package manager (recommended)

Development Setup

Fork the repository on GitHub

Clone your fork:

git clone https://github.com/YOUR_USERNAME/bmlibrarian.git
cd bmlibrarian

Add upstream remote:

git remote add upstream https://github.com/hherb/bmlibrarian.git

Install dependencies:

uv sync --dev  # Includes development dependencies

Set up database:

createdb bmlibrarian_dev  # Use separate dev database
cp test_database.env.example .env
# Edit .env with dev database credentials

Initialize database:

uv run python initial_setup_and_download.py .env --skip-medrxiv --skip-pubmed

Run tests to verify setup:
```
uv run python -m pytest tests/ -v
```

How to Contribute

Reporting Bugs

Search existing issues to avoid duplicates
Use the bug report template on GitHub Issues
Include:
- Clear description of the bug
- Steps to reproduce
- Expected vs. actual behavior
- Python version, OS, database version
- Error messages and stack traces
- Screenshots if applicable

Suggesting Features

Check if feature already exists or is planned
Use the feature request template
Describe:
- Use case and motivation
- Proposed solution or API
- Alternatives considered
- Impact on existing functionality

Contributing Code

Areas where contributions are especially welcome:

New Agents - Create specialized AI agents
Qt Plugins - Extend the GUI with new tabs
Importers - Add support for new data sources
Documentation - Improve guides and examples
Tests - Increase test coverage
Bug Fixes - Fix reported issues

Development Workflow

Branch Strategy

main - Stable release branch
develop - Development branch (base for PRs)
feature/your-feature - Feature branches
bugfix/issue-number - Bug fix branches

Creating a Feature Branch

# Update your fork
git fetch upstream
git checkout develop
git merge upstream/develop

# Create feature branch
git checkout -b feature/your-feature-name

Making Changes

Make small, focused commits:

git add src/bmlibrarian/agents/new_agent.py
git commit -m "Add new agent for X functionality"

Follow commit message conventions:
- Use present tense ("Add feature" not "Added feature")
- Be concise but descriptive
- Reference issues: "Fix #123: Description"
Keep commits atomic - One logical change per commit

Testing Your Changes

# Run all tests
uv run python -m pytest tests/ -v

# Run specific test file
uv run python -m pytest tests/test_my_agent.py -v

# Run with coverage
uv run python -m pytest tests/ --cov=bmlibrarian --cov-report=html

# Test Qt GUI (requires X server or Xvfb)
uv run python -m pytest tests/gui/qt/ -v

Updating Your Branch

# Fetch latest changes
git fetch upstream

# Rebase on develop
git rebase upstream/develop

# Or merge if rebasing is problematic
git merge upstream/develop

Coding Standards

Python Style

We follow PEP 8 with some exceptions:

Line length: 100 characters (not 79)
Imports: Group by standard library, third-party, local
Docstrings: Google style (see below)

Type Hints (MANDATORY)

All functions and methods must include type hints:

from typing import Dict, List, Optional, Tuple, Any

def process_documents(
    documents: List[Dict[str, Any]],
    min_score: float = 0.7,
    callback: Optional[Callable[[int, int], None]] = None
) -> Tuple[List[Dict], Dict[str, float]]:
    """
    Process documents and return scored results.

    Args:
        documents: List of document dictionaries
        min_score: Minimum relevance score (0.0-1.0)
        callback: Optional progress callback (current, total)

    Returns:
        Tuple of (scored_documents, statistics)
    """
    # Implementation...

Docstrings (MANDATORY)

Use Google-style docstrings for all public functions and classes:

def search_literature(
    query: str,
    max_results: int = 100
) -> List[Dict[str, Any]]:
    """
    Search biomedical literature databases.

    Performs full-text search across PubMed and medRxiv databases
    using PostgreSQL text search.

    Args:
        query: Natural language search query
        max_results: Maximum number of results to return (default: 100)

    Returns:
        List of document dictionaries with keys:
        - id: Document ID
        - title: Document title
        - abstract: Document abstract
        - publication_date: Publication date (ISO format)

    Raises:
        ValueError: If query is empty or max_results <= 0
        ConnectionError: If database is unavailable

    Examples:
        >>> docs = search_literature("COVID-19 vaccine", max_results=10)
        >>> print(f"Found {len(docs)} documents")
        Found 10 documents

    Note:
        Results are ordered by publication date (newest first).
    """

No Magic Numbers

Use named constants or configuration:

# BAD
if score > 0.7:
    documents = documents[:50]

# GOOD
MIN_RELEVANCE_THRESHOLD = 0.7
DEFAULT_MAX_DOCUMENTS = 50

if score > MIN_RELEVANCE_THRESHOLD:
    documents = documents[:DEFAULT_MAX_DOCUMENTS]

Logging (MANDATORY)

Use Python's logging module, never print():

import logging

logger = logging.getLogger(__name__)

def process_data(data):
    logger.info(f"Processing {len(data)} items")
    try:
        result = complex_operation(data)
        logger.info("Processing completed successfully")
        return result
    except Exception as e:
        logger.error(f"Processing failed: {e}", exc_info=True)
        raise

Import Order

# Standard library imports
import json
import logging
from pathlib import Path
from typing import Dict, List, Optional

# Third-party imports
from PySide6.QtWidgets import QWidget, QVBoxLayout
from PySide6.QtCore import Signal
import psycopg

# BMLibrarian imports
from bmlibrarian.config import get_config
from bmlibrarian.database import get_db_manager
from bmlibrarian.agents.base import BaseAgent

Code Organization

"""
Module-level docstring describing purpose.
"""

# Standard library imports
# Third-party imports
# BMLibrarian imports

# Module-level constants
DEFAULT_BATCH_SIZE = 50
MIN_CONFIDENCE_THRESHOLD = 0.7

# Module-level logger
logger = logging.getLogger(__name__)


class MyClass:
    """Class definition with docstring."""

    def __init__(self, ...):
        """Constructor docstring."""
        # Implementation

    def public_method(self, ...) -> ReturnType:
        """Public method with full docstring."""
        # Implementation

    def _private_method(self, ...) -> ReturnType:
        """Private method (still needs docstring)."""
        # Implementation

Testing Guidelines

Test Structure

"""
Unit tests for CustomAgent.
"""

import unittest
from unittest.mock import Mock, patch
from bmlibrarian.agents.custom_agent import CustomAgent


class TestCustomAgent(unittest.TestCase):
    """Test suite for CustomAgent."""

    def setUp(self):
        """Set up test fixtures."""
        self.agent = CustomAgent(
            model="gpt-oss:20b",
            temperature=0.1,
            show_model_info=False
        )

    def test_agent_initialization(self):
        """Test agent initializes correctly."""
        self.assertEqual(self.agent.model, "gpt-oss:20b")
        self.assertEqual(self.agent.temperature, 0.1)

    @patch('bmlibrarian.agents.base.BaseAgent._make_ollama_request')
    def test_process_data(self, mock_request):
        """Test data processing."""
        # Setup mock
        mock_request.return_value = {"result": "success"}

        # Run test
        result = self.agent.process({"data": "test"})

        # Assertions
        self.assertEqual(result["result"], "success")
        mock_request.assert_called_once()

    def tearDown(self):
        """Clean up after tests."""
        pass


if __name__ == '__main__':
    unittest.main()

What to Test

Unit Tests - Test individual functions/methods in isolation
Integration Tests - Test component interactions
Edge Cases - Test boundary conditions
Error Handling - Test exception cases
Mock External Dependencies - Mock Ollama, database calls

Test Coverage

Aim for >80% code coverage for new code:

uv run python -m pytest tests/ --cov=bmlibrarian --cov-report=html
open htmlcov/index.html

Documentation

When to Document

Document:

All public APIs
Plugin development guides
Agent development guides
Configuration options
New features

Where to Document

Docstrings - In-code documentation
User Guides - doc/users/ directory
Developer Guides - doc/developers/ directory
Wiki - High-level guides and tutorials
README - Project overview and quick start
CHANGELOG - Version history and changes

Documentation Standards

Use Markdown for all documentation
Include code examples
Add screenshots for GUI features
Keep documentation up-to-date with code changes
Use clear, concise language
Avoid jargon (or explain it)

Pull Request Process

Before Submitting

Update your branch with latest develop:

git fetch upstream
git rebase upstream/develop

Run tests:
```
uv run python -m pytest tests/ -v
```

Check code style:

uv run python -m black src/bmlibrarian tests/
uv run python -m flake8 src/bmlibrarian tests/

Update documentation if needed
Add tests for new functionality

Creating the Pull Request

Push your branch:

git push origin feature/your-feature-name

Create PR on GitHub:
- Base branch: develop
- Compare branch: feature/your-feature-name
- Use the PR template
- Link related issues
PR Description Should Include:
- Summary of changes
- Motivation and context
- Type of change (bug fix, feature, etc.)
- Testing performed
- Checklist completion

PR Template

## Description
Brief description of changes

## Motivation
Why is this change needed?

## Type of Change
- [ ] Bug fix (non-breaking change)
- [ ] New feature (non-breaking change)
- [ ] Breaking change
- [ ] Documentation update

## Testing
- [ ] Tests added/updated
- [ ] All tests pass
- [ ] Manual testing performed

## Checklist
- [ ] Code follows style guidelines
- [ ] Documentation updated
- [ ] No new warnings
- [ ] Added/updated tests
- [ ] All tests pass

Review Process

Automated checks must pass (CI/CD)
Code review by maintainers
Address feedback with new commits
Squash and merge when approved

After Merge

Delete your feature branch:

git branch -d feature/your-feature-name
git push origin --delete feature/your-feature-name

Update your fork:

git fetch upstream
git checkout develop
git merge upstream/develop

Development Areas

Creating a New Agent

See Plugin Development Guide for Qt plugins.

For a new AI agent:

Inherit from BaseAgent:

from bmlibrarian.agents.base import BaseAgent

class MyAgent(BaseAgent):
    def get_agent_type(self) -> str:
        return "my_agent"

    def process(self, data):
        # Implementation

Add configuration to config.py
Write tests in tests/test_my_agent.py
Add documentation in doc/developers/
Update README with usage example

Creating a Qt Plugin

See Plugin Development Guide for complete details.

Quick start:

Create directory: src/bmlibrarian/gui/qt/plugins/my_plugin/
Implement plugin.py with create_plugin() function
Inherit from BaseTabPlugin
Add to gui_config.json
Test in Qt GUI

Adding a New Data Source

Create importer: src/bmlibrarian/importers/new_source_importer.py
Add source to database: INSERT INTO source ...
Implement import logic
Create CLI tool: new_source_import_cli.py
Update documentation

Getting Help

Questions: GitHub Discussions
Bugs: GitHub Issues
Chat: Check repository for community chat links
Documentation: BMLibrarian Wiki

License

By contributing, you agree that your contributions will be licensed under the same license as the project.

Thank you for contributing to BMLibrarian! 🙏

Your contributions help make biomedical research more accessible and efficient for everyone.

BMLibrarian | GitHub | Issues | Version 0.6+

BMLibrarian Wiki

Home

User Guides

Getting Started

Applications

Features

Advanced

Developer Docs

Architecture

Systems

Contributing

GitHub Repository

contributing

Contributing to BMLibrarian

Table of Contents

Code of Conduct

Getting Started

Prerequisites

Development Setup

How to Contribute

Reporting Bugs

Suggesting Features

Contributing Code

Development Workflow

Branch Strategy

Creating a Feature Branch

Making Changes

Testing Your Changes

Updating Your Branch

Coding Standards

Python Style

Type Hints (MANDATORY)

Docstrings (MANDATORY)

No Magic Numbers

Logging (MANDATORY)

Import Order

Code Organization

Testing Guidelines

Test Structure

What to Test

Test Coverage

Documentation

When to Document

Where to Document

Documentation Standards

Pull Request Process

Before Submitting

Creating the Pull Request

PR Template

Review Process

After Merge

Development Areas

Creating a New Agent

Creating a Qt Plugin

Adding a New Data Source

Getting Help

License

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!