Skip to content

subhakantrout/rosetta-engine

Rosetta Engine — Translational Intelligence Command Center

Rosetta Engine is an open-source, AI-powered drug repurposing pipeline. It federates live data across 8 biomedical databases in real time to identify, score, and rank drug repurposing candidates for neglected tropical diseases and rare genetic disorders.

                                         ┌─────────────────┐
                                         │   Open Targets  │
                                         │   (GraphQL)     │
                                         └────────┬────────┘
                                                  │
                    ┌─────────────────────────────┼─────────────────────────────┐
                    │                             │                             │
                    ▼                             ▼                             ▼
            ┌──────────────┐             ┌──────────────┐             ┌──────────────┐
            │   ChEMBL     │             │  UniProt     │             │   STRING-DB  │
            │  (Bioactivity)│             │ (Proteins)   │             │    (PPI)     │
            └──────────────┘             └──────────────┘             └──────────────┘
                    │                             │                             │
                    └─────────────────────────────┼─────────────────────────────┘
                                                  │
                                                  ▼
                                    ┌────────────────────────┐
                                    │  ClinicalTrials.gov    │
                                    │  PubMed / NCBI         │
                                    │  OpenFDA               │
                                    │  Zen AI (DeepSeek V4)  │
                                    └────────────────────────┘
                                                  │
                                                  ▼
                                    ┌────────────────────────┐
                                    │   3D Knowledge Graph   │
                                    │   + Ranked Candidates  │
                                    └────────────────────────┘

Features

  • 🔬 Multi-Database Live Query — Searches Open Targets, ChEMBL, UniProt, STRING-DB, ClinicalTrials.gov, PubMed, OpenFDA simultaneously
  • 🧠 AI-Powered Analysis — DeepSeek V4 Flash generates mechanistic summaries and ranks repurposing candidates
  • 🌐 Interactive 3D Knowledge Graph — Visualize disease-target-drug-PPI networks in the browser
  • 📊 FAIR Data Export — Download results as CSV, JSON, or GraphML (Cytoscape/Gephi compatible)
  • 🖥️ Dual Interface — Streamlit dashboard or standalone HTML/JS frontend served by FastAPI
  • 🔑 Dynamic API Key — Enter your Zen API key in the UI or set it via .env — no hardcoded credentials

Quick Start

Option 1: Docker (Recommended)

git clone https://github.com/your-org/rosetta-engine.git
cd rosetta-engine

# Create environment file
cp .env.example .env
# Edit .env and add your ZEN_API_KEY (get one free at https://opencode.ai)

# Boot the engine
docker-compose up -d

# Open http://localhost:8501

Option 2: Local Python

# Requires Python 3.10+
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys

# Run Streamlit UI
streamlit run src/ui/app.py

# OR run FastAPI server (serves HTML frontend + REST API)
uvicorn src.api.server:app --reload --port 8501

Configuration

Environment Variables (.env)

Variable Required Default Description
ZEN_API_BASE No https://opencode.ai/zen/v1 Zen AI API base URL
ZEN_API_KEY Yes* "" Zen AI API key (get free key)
ZEN_FREE_MODEL No deepseek-v4-flash-free Default LLM model
ZEN_FALLBACK_MODEL No big-pickle Fallback model on rate limit
NCBI_EMAIL No your_email@example.com Email for PubMed API (increases rate limits)
NCBI_API_KEY No "" NCBI API key (optional)

*You can also enter your API key directly in the UI — it overrides the .env value.


Architecture

Project Structure

rosetta-engine/
├── src/
│   ├── api/                    # FastAPI backend
│   │   ├── server.py           # REST API endpoints
│   │   └── zen_client.py       # Zen AI / DeepSeek LLM client
│   ├── frontend/               # Standalone HTML/JS frontend
│   │   ├── index.html          # Main page
│   │   └── static/
│   │       ├── css/style.css   # Monochrome design system
│   │       └── js/app.js       # Application logic + 3D graph
│   ├── hypothesis/
│   │   └── engine.py           # Core analysis pipeline
│   ├── ingestion/              # 8 data source clients
│   │   ├── opentargets_client.py
│   │   ├── chembl_client.py
│   │   ├── clinical_trials_client.py
│   │   ├── pubmed_client.py
│   │   ├── uniprot_client.py
│   │   ├── string_client.py
│   │   ├── openfda_client.py
│   │   ├── pubchem_client.py
│   │   └── gene_resolver.py
│   ├── ui/
│   │   ├── app.py              # Streamlit dashboard (835 lines)
│   │   └── visuals.py          # PyVis, Plotly, GraphML generators
│   └── config/
│       └── settings.py         # Environment loader
├── tests/                      # 22 unit/integration tests
├── lib/                        # Vendored JS libraries
├── Dockerfile
├── docker-compose.yml
└── .env.example

Data Flow

  1. User enters a disease (e.g., "Chagas disease")
  2. Disease Resolver → Open Targets GraphQL → resolves to EFO ID
  3. Target Discovery → Fetches top 10 associated gene targets ranked by association score
  4. Drug Mapping → For each target, queries Open Targets for known drugs + mechanisms
  5. Concurrent Enrichment (8 parallel workers):
    • ClinicalTrials.gov → active/completed trials
    • PubMed → relevant publications
    • UniProt → protein annotations, GO terms, PDB structures
    • STRING-DB → protein-protein interactions + pathway enrichment
    • OpenFDA → adverse event safety signals
    • Zen AI → LLM mechanistic hypothesis + ranking
  6. Visualization → Interactive 3D knowledge graph + ranked candidate cards

API Reference

POST /api/analyze

Run the full analysis pipeline.

Request:

{
  "disease": "malaria",
  "api_key": "sk-..."  // optional, overrides .env
}

Response:

{
  "disease": "Malaria",
  "disease_id": "EFO_0000429",
  "total_targets": 42,
  "candidates": [
    {
      "drug": "Artemisinin",
      "target": "TARGET1",
      "mechanism": "Endoperoxide activator",
      "target_association_score": 0.95,
      "max_phase": 4,
      "drug_type": "Small molecule",
      "sources": ["Open Targets"]
    }
  ],
  "trials": [...],
  "literature": [...],
  "ai_summary": "...",
  "proteins": {...},
  "ppi_network": [...],
  "enrichment": [...],
  "safety": {...}
}

GET /api/health

{ "status": "ok", "engine": "rosetta", "version": "1.0.0" }

Data Sources

Source Type Data License
Open Targets Platform GraphQL Disease-gene associations, drug targets CC0 / Apache 2.0
ChEMBL REST Bioactivity, mechanisms, compounds CC-BY-SA 3.0
UniProt REST Protein annotations, GO terms, PDB CC-BY 4.0
STRING-DB REST Protein-protein interactions, enrichment CC-BY 4.0
ClinicalTrials.gov REST v2 Clinical study registry Public Domain
PubMed (NCBI) E-utilities Biomedical literature NIH Public Access
OpenFDA REST Adverse events, drug labels Public Domain
Zen AI (DeepSeek V4) Chat API LLM analysis & ranking API Access

Export Formats

Format Description Compatible With
CSV Tabular drug candidates Excel, R, Python pandas
JSON Complete pipeline output Any JSON parser
GraphML Network topology Cytoscape, Gephi, yEd

Testing

# Run all tests (22 tests)
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_engine.py -v

Tests are fully mocked — no API keys or network access required.


Security

See SECURITY.md for our security policy and responsible disclosure guidelines.

Key security practices:

  • No hardcoded credentials — API keys go in .env (gitignored) or UI input
  • API keys never logged or exposed in responses
  • CSP headers recommended for production deployment
  • All external APIs use HTTPS

Contributing

We welcome contributions! Please see CONTRIBUTING.md.

  • Bug reportsGitHub Issues
  • Feature requests → Discussion via Issues
  • Pull requests → Fork + PR against main

License

Copyright 2026 Rosetta Engine Contributors

Licensed under the Apache License, Version 2.0.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

AI-powered drug repurposing pipeline that federates live data across 8 biomedical databases in real time, ranks candidates via DeepSeek V4 LLM analysis, and visualizes target–drug–protein networks in an interactive 3D knowledge graph.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors