Drug → Target → Keyword Pipeline

This repository contains a modular, enterprise‑level Python application that:

Retrieves all approved drugs from the ChEMBL database
Filters for those first approved in 2019 or later, sorted by approval year and name
Fetches UniProt accession numbers for each drug’s protein targets
Retrieves UniProt keywords (functional annotations) for each target
Outputs a consolidated CSV table linking drugs → targets → keywords

This pipeline demonstrates how to integrate two public REST APIs (ChEMBL and UniProt) into an end‑to‑end data workflow, with caching, progress reporting, and clean, reusable code.

📋 Prerequisites

Python 3.8+
Git (optional, for version control)
Internet access to query the ChEMBL and UniProt services

🛠 Installation

Clone this repository (or download the source files):
```
git clone <your-repo-url>
cd <your-repo-folder>
```

Create and activate a Python virtual environment:

python3 -m venv venv
# macOS/Linux
source venv/bin/activate
# Windows (PowerShell)
.\venv\Scripts\Activate.ps1

Install the required packages:
```
pip install -r requirements.txt
```

⚙️ Usage

Run the pipeline with:

python -m src.main

This will:

Fetch all approved drugs (max_phase=4) from ChEMBL
Filter to those approved ≥ 2019 and sort by year & name
For each drug, retrieve all UniProt target accessions
For each accession, fetch UniProt keywords
Write the results to drugs_targets_keywords.csv in the project root

A progress bar and INFO‑level logs will report progress and elapsed time.

📂 Project Structure

.
├── README.md
├── requirements.txt
├── drugs_targets_keywords.csv   # (generated output)
└── src
    ├── main.py                  # entry point & pipeline orchestration
    ├── chembl_client
    │   ├── __init__.py
    │   └── client.py            # ChemblClient: approved drugs & targets
    └── uniprot_client
        ├── __init__.py
        └── client.py            # UniProtClient: keyword retrieval

🧩 Module Descriptions

`chembl_client/client.py`

ChemblClient
- get_approved_drugs(max_phase=4, fields=…)
  Returns a list of approved drugs with specified fields.
- get_target_accessions(molecule_chembl_id)
  Uses the ChEMBL /target/{tid} endpoint to retrieve all protein components for each mechanism‑of‑action entry and extracts UniProt accessions.

`uniprot_client/client.py`

UniProtClient
- get_keywords(accession)
  Calls the EBI Proteins REST API (/proteins/{acc}) to fetch curated keywords for the given UniProt accession.

`main.py`

Orchestrates the full workflow, including:
- Logging setup (INFO & WARNING)
- Progress reporting with tqdm
- Data loading & filtering with pandas
- Writing drugs_targets_keywords.csv

💾 Output

drugs_targets_keywords.csv
A CSV table with columns:
```
ChEMBL ID,Drug Name,Approval Year,UniProt Accession,UniProt Keywords
```
Each row links one drug to one UniProt target and its list of keywords.

🔍 Next Steps & Extensions

Configuration: externalize parameters (e.g. cutoff year) into config.yaml
Unit Tests: add pytest tests for each client and the end‑to‑end script
Logging to File: configure logging.FileHandler for persistent logs
Dockerization: wrap the pipeline in a Docker container for reproducibility

📖 References

ChEMBL Web Services: https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services
EBI Proteins API: https://www.ebi.ac.uk/proteins/api/doc/#!/proteins/get_proteins__accession_
chembl_webresource_client Python package: https://pypi.org/project/chembl_webresource_client/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drug → Target → Keyword Pipeline

📋 Prerequisites

🛠 Installation

⚙️ Usage

📂 Project Structure

🧩 Module Descriptions

`chembl_client/client.py`

`uniprot_client/client.py`

`main.py`

💾 Output

🔍 Next Steps & Extensions

📖 References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
README.md		README.md
drugs_targets_keywords.csv		drugs_targets_keywords.csv
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Drug → Target → Keyword Pipeline

📋 Prerequisites

🛠 Installation

⚙️ Usage

📂 Project Structure

🧩 Module Descriptions

chembl_client/client.py

uniprot_client/client.py

main.py

💾 Output

🔍 Next Steps & Extensions

📖 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`chembl_client/client.py`

`uniprot_client/client.py`

`main.py`

Packages