Rosetta Engine is an open-source, AI-powered drug repurposing pipeline. It federates live data across 8 biomedical databases in real time to identify, score, and rank drug repurposing candidates for neglected tropical diseases and rare genetic disorders.
┌─────────────────┐
│ Open Targets │
│ (GraphQL) │
└────────┬────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ ChEMBL │ │ UniProt │ │ STRING-DB │
│ (Bioactivity)│ │ (Proteins) │ │ (PPI) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└─────────────────────────────┼─────────────────────────────┘
│
▼
┌────────────────────────┐
│ ClinicalTrials.gov │
│ PubMed / NCBI │
│ OpenFDA │
│ Zen AI (DeepSeek V4) │
└────────────────────────┘
│
▼
┌────────────────────────┐
│ 3D Knowledge Graph │
│ + Ranked Candidates │
└────────────────────────┘
- 🔬 Multi-Database Live Query — Searches Open Targets, ChEMBL, UniProt, STRING-DB, ClinicalTrials.gov, PubMed, OpenFDA simultaneously
- 🧠 AI-Powered Analysis — DeepSeek V4 Flash generates mechanistic summaries and ranks repurposing candidates
- 🌐 Interactive 3D Knowledge Graph — Visualize disease-target-drug-PPI networks in the browser
- 📊 FAIR Data Export — Download results as CSV, JSON, or GraphML (Cytoscape/Gephi compatible)
- 🖥️ Dual Interface — Streamlit dashboard or standalone HTML/JS frontend served by FastAPI
- 🔑 Dynamic API Key — Enter your Zen API key in the UI or set it via
.env— no hardcoded credentials
git clone https://github.com/your-org/rosetta-engine.git
cd rosetta-engine
# Create environment file
cp .env.example .env
# Edit .env and add your ZEN_API_KEY (get one free at https://opencode.ai)
# Boot the engine
docker-compose up -d
# Open http://localhost:8501# Requires Python 3.10+
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys
# Run Streamlit UI
streamlit run src/ui/app.py
# OR run FastAPI server (serves HTML frontend + REST API)
uvicorn src.api.server:app --reload --port 8501| Variable | Required | Default | Description |
|---|---|---|---|
ZEN_API_BASE |
No | https://opencode.ai/zen/v1 |
Zen AI API base URL |
ZEN_API_KEY |
Yes* | "" |
Zen AI API key (get free key) |
ZEN_FREE_MODEL |
No | deepseek-v4-flash-free |
Default LLM model |
ZEN_FALLBACK_MODEL |
No | big-pickle |
Fallback model on rate limit |
NCBI_EMAIL |
No | your_email@example.com |
Email for PubMed API (increases rate limits) |
NCBI_API_KEY |
No | "" |
NCBI API key (optional) |
*You can also enter your API key directly in the UI — it overrides the .env value.
rosetta-engine/
├── src/
│ ├── api/ # FastAPI backend
│ │ ├── server.py # REST API endpoints
│ │ └── zen_client.py # Zen AI / DeepSeek LLM client
│ ├── frontend/ # Standalone HTML/JS frontend
│ │ ├── index.html # Main page
│ │ └── static/
│ │ ├── css/style.css # Monochrome design system
│ │ └── js/app.js # Application logic + 3D graph
│ ├── hypothesis/
│ │ └── engine.py # Core analysis pipeline
│ ├── ingestion/ # 8 data source clients
│ │ ├── opentargets_client.py
│ │ ├── chembl_client.py
│ │ ├── clinical_trials_client.py
│ │ ├── pubmed_client.py
│ │ ├── uniprot_client.py
│ │ ├── string_client.py
│ │ ├── openfda_client.py
│ │ ├── pubchem_client.py
│ │ └── gene_resolver.py
│ ├── ui/
│ │ ├── app.py # Streamlit dashboard (835 lines)
│ │ └── visuals.py # PyVis, Plotly, GraphML generators
│ └── config/
│ └── settings.py # Environment loader
├── tests/ # 22 unit/integration tests
├── lib/ # Vendored JS libraries
├── Dockerfile
├── docker-compose.yml
└── .env.example
- User enters a disease (e.g., "Chagas disease")
- Disease Resolver → Open Targets GraphQL → resolves to EFO ID
- Target Discovery → Fetches top 10 associated gene targets ranked by association score
- Drug Mapping → For each target, queries Open Targets for known drugs + mechanisms
- Concurrent Enrichment (8 parallel workers):
- ClinicalTrials.gov → active/completed trials
- PubMed → relevant publications
- UniProt → protein annotations, GO terms, PDB structures
- STRING-DB → protein-protein interactions + pathway enrichment
- OpenFDA → adverse event safety signals
- Zen AI → LLM mechanistic hypothesis + ranking
- Visualization → Interactive 3D knowledge graph + ranked candidate cards
Run the full analysis pipeline.
Request:
{
"disease": "malaria",
"api_key": "sk-..." // optional, overrides .env
}Response:
{
"disease": "Malaria",
"disease_id": "EFO_0000429",
"total_targets": 42,
"candidates": [
{
"drug": "Artemisinin",
"target": "TARGET1",
"mechanism": "Endoperoxide activator",
"target_association_score": 0.95,
"max_phase": 4,
"drug_type": "Small molecule",
"sources": ["Open Targets"]
}
],
"trials": [...],
"literature": [...],
"ai_summary": "...",
"proteins": {...},
"ppi_network": [...],
"enrichment": [...],
"safety": {...}
}{ "status": "ok", "engine": "rosetta", "version": "1.0.0" }| Source | Type | Data | License |
|---|---|---|---|
| Open Targets Platform | GraphQL | Disease-gene associations, drug targets | CC0 / Apache 2.0 |
| ChEMBL | REST | Bioactivity, mechanisms, compounds | CC-BY-SA 3.0 |
| UniProt | REST | Protein annotations, GO terms, PDB | CC-BY 4.0 |
| STRING-DB | REST | Protein-protein interactions, enrichment | CC-BY 4.0 |
| ClinicalTrials.gov | REST v2 | Clinical study registry | Public Domain |
| PubMed (NCBI) | E-utilities | Biomedical literature | NIH Public Access |
| OpenFDA | REST | Adverse events, drug labels | Public Domain |
| Zen AI (DeepSeek V4) | Chat API | LLM analysis & ranking | API Access |
| Format | Description | Compatible With |
|---|---|---|
| CSV | Tabular drug candidates | Excel, R, Python pandas |
| JSON | Complete pipeline output | Any JSON parser |
| GraphML | Network topology | Cytoscape, Gephi, yEd |
# Run all tests (22 tests)
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_engine.py -vTests are fully mocked — no API keys or network access required.
See SECURITY.md for our security policy and responsible disclosure guidelines.
Key security practices:
- No hardcoded credentials — API keys go in
.env(gitignored) or UI input - API keys never logged or exposed in responses
- CSP headers recommended for production deployment
- All external APIs use HTTPS
We welcome contributions! Please see CONTRIBUTING.md.
- Bug reports → GitHub Issues
- Feature requests → Discussion via Issues
- Pull requests → Fork + PR against
main
Copyright 2026 Rosetta Engine Contributors
Licensed under the Apache License, Version 2.0.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.