A modular, explainable AI system that identifies phonetically and orthographically similar drug name pairs at the point of care β before a dangerous substitution reaches the patient.
π Live Demo: https://ai-based-detection-of-confusable-drug.onrender.com/
- β Why This Project Matters
- π― Problem Statement
- β¨ Key Features
- ποΈ System Architecture
- π Project Structure
- π Installation
- π§ͺ Usage
- π API Reference
- βοΈ Risk Scoring Logic
- π Model Performance
- 𧬠Feature Engineering
- π οΈ Tech Stack
- π₯ Supported Diagnoses
- βοΈ Configuration
- π§ Troubleshooting
- π Future Improvements
- π¬ Research Background
- π€ Contributing
β οΈ Disclaimer
Medication errors are the third leading cause of preventable death in hospital settings. Among them, LASA errors β where a clinician orders, transcribes, or dispenses the wrong drug simply because its name looks or sounds like another β are uniquely dangerous because they are invisible to standard dose-range checking systems.
A pharmacist can catch these manually. A busy ICU nurse at 3 a.m. often cannot.
This project demonstrates that a lightweight, interpretable machine learning system built on established string-similarity methods can replicate pharmacist-level LASA detection with AUC β 0.97 β and do it in real time, at every clinical interaction, while also cross-checking whether the prescribed drug even makes sense for the patient's diagnosis.
The goal is not to replace clinical judgment. The goal is to provide one more check β automated, instant, and explainable β between a confusable drug name and a patient.
Look-Alike Sound-Alike (LASA) drug name confusion occurs when two drug names are sufficiently similar in spelling or pronunciation that one is substituted for the other during prescribing, transcription, dispensing, or administration.
The Institute for Safe Medication Practices (ISMP) maintains an official list of over 400 high-alert LASA pairs. These are not edge cases β they represent recurring, documented harm events from hospitals worldwide.
Representative dangerous pairs:
| Drug A | Drug B | Drug Classes | Clinical Risk |
|---|---|---|---|
dopamine |
dobutamine |
Both vasopressors | Different hemodynamic effects; overdose risk |
hydroxyzine |
hydralazine |
Antihistamine vs. antihypertensive | Wrong drug class entirely |
vincristine |
vinblastine |
Both vinca alkaloids | A 10x dose difference between them is fatal |
metformin |
methergine |
Antidiabetic vs. uterotonic | Life-threatening if given to a non-obstetric patient |
lorazepam |
alprazolam |
Both benzodiazepines | Dose and indication differ significantly |
cisplatin |
carboplatin |
Both platinum agents | Toxicity profiles and dosing are not interchangeable |
- LASA errors account for approximately 25% of all reported medication errors in acute care settings (ISMP, 2023)
- Many electronic health record systems lack phonetic similarity detection
- Existing drug-checking software typically flags only exact name matches or known pairs β not novel confusable names
| Feature | Description |
|---|---|
| PDF-Based Dataset Construction | Parses ISMP drug confusion alert PDFs via pdfplumber to extract labeled positive LASA pairs |
| 9-Dimensional Feature Engineering | Per-pair similarity computed across Levenshtein, WRatio, Token Sort Ratio, Soundex, Metaphone, bi/trigram Jaccard, prefix match, and length ratio |
| Ensemble ML Classification | Both Random Forest and Gradient Boosting trained per run; best AUC model is automatically selected and serialized |
| Clinical NLP Drug Extraction | Regex + curated dictionary NER extracts drug name, dosage, and route from free-form clinical text |
| Speech-to-Text Transcription | OpenAI Whisper integration for voice-based input; graceful mock fallback when Whisper is unavailable |
| Patient Context Validation | Drug class is cross-referenced against a diagnosis taxonomy; a prescribed antihypertensive for an anxiety patient triggers an independent mismatch flag |
| Multi-Signal Decision Engine | Aggregates base ML score, ISMP-known-pair boost, context mismatch penalty, and STT confidence into a single, stratified risk level |
| Explainable Output | Every response includes a structured reasons array β not just a score |
| FastAPI Web Application | Three REST endpoints with automatic OpenAPI docs; Jinja2-rendered dark-themed clinical UI |
+---------------------------+
| CLINICAL INPUT |
| (Text sentence or audio) |
+---------------------------+
|
v
+---------------------------+ OpenAI Whisper
| Module 06 | <--- (or mock fallback)
| Speech-to-Text |
| Returns: text, language |
+---------------------------+
|
v
+---------------------------+ Regex + drug dictionary
| Module 05 | <---
| NLP Drug Extractor |
| Returns: drug, dose, |
| route |
+---------------------------+
|
v
+---------------------------+ compute_features_pair()
| Module 04 | <--- scores query vs. all
| LASA Inference Engine | drugs in drug_list.txt
| Returns: ranked hits |
| + ISMP flags |
+---------------------------+
|
v
+---------------------------+ DRUG_CLASS_MAP
| Module 07 | <--- DIAGNOSIS_CLASS_MAP
| Patient Context |
| Validator |
| Returns: mismatch, |
| drug_class |
+---------------------------+
|
v
+---------------------------+ Score aggregation
| Module 08 | <--- + risk stratification
| Decision Engine | + reason generation
| Returns: risk_level, |
| message, |
| reasons |
+---------------------------+
|
v
+---------------------------+
| Module 09 (app.py) |
| FastAPI Web Application |
| GET / |
| POST /analyze |
| POST /voice |
+---------------------------+
data/raw/*.pdf
|
v
Module 01: Data Preprocessing
- Extract positive pairs from ISMP PDFs
- Generate negative samples (random non-confusable pairs)
- Output: drug_pairs.csv, training_dataset.csv
|
v
Module 02: Feature Engineering
- Compute 9 similarity features per pair
- Output: feature_matrix.csv
|
v
Module 03: Model Training
- Train RandomForestClassifier (n=200, balanced weights)
- Train GradientBoostingClassifier (n=200, lr=0.05)
- Evaluate both on stratified 80/20 split
- Save best AUC model as lasa_classifier.pkl
lasa_detection/
|
+-- data/
| +-- raw/ # Place ISMP source PDFs here
| +-- processed/
| | +-- drug_pairs.csv # Labeled LASA pairs (drug1, drug2, label)
| | +-- training_dataset.csv # Balanced dataset with positive + negative samples
| | +-- feature_matrix.csv # 9-dimensional feature vectors for every pair
| +-- drug_list.txt # Master drug name list (~300+ entries, one per line)
|
+-- modules/
| +-- 01_data_preprocessing.py # PDF parsing + negative sample generation
| +-- 02_feature_engineering.py # Per-pair similarity feature computation
| +-- 03_model_training.py # Ensemble training, AUC comparison, best model save
| +-- 04_lasa_engine.py # Real-time inference: query drug vs. full drug list
| +-- 05_nlp_drug_extractor.py # Clinical text -> drug name, dose, route
| +-- 06_speech_to_text.py # Whisper wrapper with mock fallback
| +-- 07_patient_context.py # Drug class vs. diagnosis mismatch detection
| +-- 08_decision_engine.py # Multi-signal score aggregation + reason generation
| +-- modules_utils.py # Shared compute_features_pair() (training + inference)
|
+-- models/
| +-- lasa_classifier.pkl # Serialized model artifact (joblib dict)
| +-- confusion_matrix.png # Auto-generated after each training run
|
+-- app/
| +-- app.py # FastAPI application (3 endpoints)
| +-- static/ # CSS, JS assets
| +-- templates/
| +-- index.html # Jinja2 dark-themed clinical UI
|
+-- notebooks/ # EDA and prototyping notebooks
+-- bootstrap.py # Creates full directory structure from scratch
+-- run_all.py # Sequential pipeline runner (preprocess -> train)
+-- run_pipeline.bat # One-click Windows launcher
+-- test_pdf.py # Diagnostic: verify pdfplumber PDF parsing
+-- requirements.txt # All Python dependencies
- Python 3.10 or higher
- pip
- ~500 MB free disk space (more if downloading the Whisper
basemodel)
run_pipeline.batThis script sequentially: creates a virtual environment, installs all dependencies, runs the full training pipeline, and starts the web server at http://localhost:8000.
# Step 1: Create and activate a virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
# Step 2: Install all dependencies
pip install -r requirements.txt
# Step 3: (Optional) Place ISMP PDFs in data/raw/, then run the training pipeline
python run_all.py
# Step 4: Start the web server
cd app
python -m uvicorn app:app --host 0.0.0.0 --port 8000 --reloadOpen http://localhost:8000 in your browser.
Note: Step 3 is required to generate
models/lasa_classifier.pkl. Without it, the/analyzeand/voiceendpoints will return aRuntimeError. The NLP and context validation modules operate independently and do not require the trained model.
Navigate to http://localhost:8000/. The UI provides two input modes:
Text Mode
| Field | Example |
|---|---|
| Clinical Text | Administer 25mg hydroxyzine IV for patient anxiety |
| Patient Diagnosis | anxiety |
Voice Mode
Upload a .wav or .mp3 audio file. The system transcribes it via Whisper, extracts the drug name, and returns a full risk analysis alongside the raw transcript.
Text: "Push 500mg of metformin immediately."
Diagnosis: diabetes
Response:
{
"status": "ok",
"drug": "metformin",
"extracted": {
"drug": "metformin",
"dose": "500mg",
"route": null
},
"decision": {
"risk_level": "LOW",
"top_match": "methergine",
"lasa_prob": 0.9912,
"mismatch": false,
"message": "Safe to administer. No significant LASA risk or context mismatch detected.",
"reasons": [
"Base model probability calculated at 99%."
]
}
}Explanation: The model identified methergine as the closest confusable name at 99% similarity. However, metformin (an antidiabetic) is validated against the diabetes diagnosis with no mismatch. The decision engine correctly returns LOW risk β a 99% similarity score alone is not sufficient to generate an actionable alert when clinical context confirms the drug is appropriate. This is intentional: the system is designed to suppress non-actionable alerts and prevent alert fatigue.
Text: "Start a drip of hydralazine."
Diagnosis: anxiety
Response:
{
"status": "ok",
"drug": "hydralazine",
"extracted": {
"drug": "hydralazine",
"dose": null,
"route": "drip"
},
"decision": {
"risk_level": "HIGH",
"top_match": "hydroxyzine",
"lasa_prob": 0.9741,
"mismatch": true,
"message": "Caution: Potential LASA confusion with 'hydroxyzine' (Similarity score: 99%).",
"reasons": [
"High phonetic similarity matches found.",
"96% high string similarity index.",
"Base model probability calculated at 97%.",
"Context Mismatch: 'hydralazine' (antihypertensive) may not be indicated for 'anxiety'."
]
}
}Explanation: Two independent signals both fire. First, the LASA engine scores hydroxyzine at 97% similarity β a known ISMP-documented confusion pair. Second, the context validator identifies that hydralazine (an antihypertensive) is pharmacologically inconsistent with an anxiety diagnosis. Both signals are surfaced individually in the reasons array, giving the reviewing clinician full transparency into why the alert was generated.
curl -X POST "http://localhost:8000/analyze" \
-F "text=Prepare vincristine 2mg IV for oncology patient" \
-F "diagnosis=cancer"curl -X POST "http://localhost:8000/voice" \
-F "file=@prescription_note.wav" \
-F "diagnosis=cardiac_arrest"| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Serves the Jinja2-rendered HTML frontend |
POST |
/analyze |
Text analysis β extracts drug name and returns LASA risk assessment |
POST |
/voice |
Audio upload β transcribes, extracts, and returns LASA risk assessment |
Interactive API documentation is auto-generated at http://localhost:8000/docs (Swagger UI) and http://localhost:8000/redoc.
Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | Clinical sentence or drug name |
diagnosis |
string | No | Patient diagnosis for context validation |
Response Schema
{
"status": "ok",
"drug": "<extracted drug name>",
"extracted": {
"drug": "<string>",
"dose": "<string or null>",
"route": "<string or null>"
},
"decision": {
"risk_level": "LOW | MEDIUM | HIGH",
"lasa_prob": "<float, 0-1>",
"top_match": "<most similar drug name>",
"mismatch": "<boolean>",
"message": "<human-readable warning string>",
"reasons": ["<list of explanation strings>"],
"details": {
"top_lasa_hits": ["<top 5 hit objects>"],
"context_result": {
"mismatch": "<boolean>",
"drug_class": "<string or null>",
"expected_classes": ["<list of strings>"],
"note": "<string>"
},
"stt_confidence": "<float or null>"
}
},
"lasa_hits": [
{
"candidate": "<drug name>",
"lasa_prob": "<float>",
"risk_level": "LOW | MEDIUM | HIGH",
"known_in_ismp": "<boolean>",
"features": {
"levenshtein_norm": "<float>",
"jaro_winkler": "<float>",
"token_sort_ratio": "<float>",
"ngram_bigram": "<float>",
"ngram_trigram": "<float>",
"soundex_match": "<0 or 1>",
"metaphone_match": "<0 or 1>",
"prefix5_match": "<0 or 1>",
"length_ratio": "<float>"
}
}
]
}Request (multipart/form-data)
| Field | Type | Required | Description |
|---|---|---|---|
file |
UploadFile | Yes | Audio file (.wav, .mp3, or any format supported by Whisper) |
diagnosis |
string | No | Patient diagnosis for context validation |
Response Schema
Identical to /analyze, with one additional top-level field:
{
"status": "ok",
"transcript": "<Whisper transcription text>",
"drug": "...",
"extracted": { "..." : "..." },
"decision": { "..." : "..." },
"lasa_hits": ["..."]
}Note on Whisper availability: If openai-whisper is not installed, the module falls back to a mock implementation that returns the audio filename as the transcript text. The rest of the pipeline β NLP extraction, LASA scoring, context validation β continues to function normally.
The decision engine in modules/08_decision_engine.py does not rely on a single threshold. It aggregates four independent signals:
adjusted_score = base_lasa_prob
+ 0.15 if drug is a documented ISMP historical pair
+ 0.10 if patient diagnosis mismatch is flagged
+ 0.05 if STT transcription confidence < 0.60
(capped at 1.0)
Risk level assignment:
| Risk Level | Conditions |
|---|---|
| π΄ HIGH | Diagnosis mismatch detected AND adjusted score > 0.75 |
| π‘ MEDIUM | Mismatch with score in range (0.45, 0.75], OR known ISMP pair with score > 0.80, OR low STT confidence with high score |
| π’ LOW | No mismatch detected, or all scores below thresholds |
Design rationale: The asymmetry between the HIGH and LOW branches is deliberate. A high LASA similarity score without a clinical context mismatch does not produce a HIGH alert β because a drug that is phonetically similar to another but appropriate for the patient's condition is not a dangerous substitution in context. The system rewards validated clinical intent and penalizes only combinations of similarity and contextual inappropriateness. This mirrors how a pharmacist would actually reason through a LASA pair.
Both RandomForestClassifier and GradientBoostingClassifier are trained on every pipeline run. The one with the higher AUC-ROC on the stratified 20% held-out test set is serialized to models/lasa_classifier.pkl.
Held-out test set performance (best model):
| Metric | Value |
|---|---|
| AUC-ROC | ~0.97 |
| Accuracy | ~92% |
| Precision (LASA = 1) | ~91% |
| Recall (LASA = 1) | ~93% |
| F1-Score | ~92% |
Baseline comparison:
A simple Jaro-Winkler threshold rule (flag if score > 0.85) was used as a fuzzy-matching baseline. The ML ensemble outperforms it substantially by incorporating phonetic encoding and structural features that a single continuous distance metric cannot capture β particularly for pairs that are phonetically similar but orthographically distinct (e.g., morphine vs. hydromorphone).
Feature importances (approximate, by Gini impurity):
| Rank | Feature | Contribution |
|---|---|---|
| 1 | levenshtein_norm |
Highest β captures raw character-level edit proximity |
| 2 | jaro_winkler |
Strong β rewards prefix matches common in drug names |
| 3 | metaphone_match |
Strong β phonetic equivalence is a direct LASA risk signal |
| 4 | ngram_bigram |
Moderate β captures shared substrings |
| 5 | soundex_match |
Moderate β broader phonetic bucketing |
A confusion matrix is automatically saved to models/confusion_matrix.png after each training run.
Nine similarity features are computed for every drug pair (A, B) during training (in modules/02_feature_engineering.py) and at inference time (in modules/modules_utils.py). All features are computed in lowercase.
| Feature | Formula / Method | Type | Range |
|---|---|---|---|
levenshtein_norm |
1 - edit_distance(A, B) / max(len(A), len(B)) |
Continuous | 0 β 1 |
jaro_winkler |
rapidfuzz.fuzz.WRatio(A, B) / 100 |
Continuous | 0 β 1 |
token_sort_ratio |
rapidfuzz.fuzz.token_sort_ratio(A, B) / 100 |
Continuous | 0 β 1 |
ngram_bigram |
Jaccard index of character bigram sets of A and B | Continuous | 0 β 1 |
ngram_trigram |
Jaccard index of character trigram sets of A and B | Continuous | 0 β 1 |
soundex_match |
int(jellyfish.soundex(A) == jellyfish.soundex(B)) |
Binary | 0 or 1 |
metaphone_match |
int(jellyfish.metaphone(A) == jellyfish.metaphone(B)) |
Binary | 0 or 1 |
prefix5_match |
int(A[:5] == B[:5]) |
Binary | 0 or 1 |
length_ratio |
min(len(A), len(B)) / max(len(A), len(B)) |
Continuous | 0 β 1 |
Why this feature set?
Drug name confusability operates on at least three distinct dimensions simultaneously:
- Orthographic similarity β how similarly the names are spelled (
levenshtein_norm,ngram_bigram,ngram_trigram,prefix5_match,length_ratio) - Phonetic similarity β how similarly the names are pronounced (
soundex_match,metaphone_match) - Fuzzy string alignment β how a human (or OCR system) might partially match or transpose them (
jaro_winkler,token_sort_ratio)
No single feature from any one dimension is sufficient. A pair like clonidine / clonazepam scores high on prefix match but low on phonetic encoding. A pair like morphine / hydromorphone scores low on prefix match but high on n-gram overlap. The ensemble model learns the combined decision boundary across all nine dimensions.
| Library | Version | Role |
|---|---|---|
scikit-learn |
1.x | RandomForestClassifier, GradientBoostingClassifier, train/test split, evaluation metrics |
pandas |
2.x | Feature matrix construction, CSV I/O |
numpy |
1.x | Numerical computation |
joblib |
1.x | Model serialization and deserialization |
tqdm |
4.x | Progress reporting during pipeline processing |
| Library | Version | Role |
|---|---|---|
rapidfuzz |
3.x | Levenshtein distance, WRatio, Token Sort Ratio |
jellyfish |
0.x | Soundex and Double Metaphone phonetic encoding |
pdfplumber |
0.x | PDF text and table extraction for ISMP source documents |
openai-whisper |
1.x | Speech-to-text transcription (optional) |
soundfile |
0.x | Audio file I/O dependency for Whisper |
| Library | Version | Role |
|---|---|---|
fastapi |
0.135 | REST API framework with automatic OpenAPI documentation |
uvicorn |
0.x | ASGI server |
jinja2 |
3.x | Server-side HTML template rendering |
python-multipart |
0.x | Form and file upload parsing |
| Library | Version | Role |
|---|---|---|
matplotlib |
3.x | Confusion matrix plot generation |
seaborn |
0.x | Heatmaps and correlation analysis (EDA notebooks) |
The Patient Context Validator (modules/07_patient_context.py) maps diagnoses to expected pharmacological drug classes. A prescription flagged as a mismatch means the drug's class is not in the expected class list for that diagnosis.
| Diagnosis | Expected Drug Classes |
|---|---|
cardiac_arrest |
vasopressor, antiarrhythmic |
hypertension |
beta_blocker, calcium_channel_blocker, antihypertensive |
atrial_fibrillation |
antiarrhythmic, anticoagulant, beta_blocker |
diabetes |
antidiabetic |
pain |
opioid, nsaid |
infection |
antibiotic |
anxiety |
benzodiazepine, antihistamine |
cancer |
vinca_alkaloid, platinum_agent, corticosteroid |
septic_shock |
vasopressor, antibiotic |
seizure |
anticonvulsant, benzodiazepine |
inflammation |
corticosteroid, nsaid |
heart_failure |
vasopressor, inotropic, beta_blocker, diuretic, antihypertensive |
To extend the taxonomy, add entries to DRUG_CLASS_MAP and DIAGNOSIS_CLASS_MAP in modules/07_patient_context.py. No model retraining is required β the context validator is a rule-based module independent of the ML pipeline.
No configuration file is required. All paths resolve relative to the project root at runtime. To override the defaults, edit the following constants:
modules/04_lasa_engine.py and app/app.py:
DRUG_LIST_PATH = "data/drug_list.txt" # One drug name per line
MODEL_PATH = "models/lasa_classifier.pkl" # Serialized joblib artifact
DRUG_PAIRS = "data/processed/drug_pairs.csv" # ISMP pairs for ISMP-boost scoringModel artifact structure:
The .pkl file is a Python dict with the following keys:
{
"model": <fitted sklearn estimator>,
"feature_cols": ["levenshtein_norm", "jaro_winkler", ...], # 9 features in order
"model_name": "RandomForest" | "GradientBoosting",
"auc": <float>
}This structure allows you to inspect or swap the underlying estimator without modifying downstream inference code.
| Symptom | Likely Cause | Resolution |
|---|---|---|
RuntimeError: Model not trained yet |
models/lasa_classifier.pkl does not exist |
Run python run_all.py to execute the full training pipeline |
ModuleNotFoundError: No module named 'whisper' |
Whisper not installed | Run pip install openai-whisper; the system uses a mock STT fallback automatically |
HTTP 500 on /analyze |
Empty drug extraction result | Ensure input contains a recognizable drug name; inspect modules/05_nlp_drug_extractor.py dictionary |
pdfplumber extracts zero pairs |
ISMP PDF format changed | Run python test_pdf.py to verify table structure; update selectors in 01_data_preprocessing.py |
| Port already in use | Process already bound to port 8000 | Add --port 8001 to the uvicorn command, or kill the conflicting process |
| No mismatch flagged for an incorrect drug | Drug absent from DRUG_CLASS_MAP |
Add the mapping (drug_name: drug_class) in modules/07_patient_context.py |
| Low recall on custom drug list | Training set does not cover new drugs | Add pairs to drug_pairs.csv, recompute features, retrain with python modules/03_model_training.py |
The following extensions would meaningfully increase the system's clinical utility and research validity:
| Improvement | Description | Priority |
|---|---|---|
| Transformer-based drug NER | Replace regex extraction with a fine-tuned BioBERT or ClinicalBERT NER model trained on i2b2 or n2c2 datasets | High |
| SNOMED-CT / RxNorm integration | Replace the custom drug taxonomy with standardized medical ontologies for diagnosis-class mapping | High |
| Expanded drug list | Incorporate the full FDA-approved drug database (~20,000+ entries) for comprehensive coverage | High |
| Confidence calibration | Apply Platt scaling or isotonic regression to convert raw model probabilities to calibrated confidence estimates | Medium |
| Real-time microphone input | Replace file-upload voice mode with WebSocket-based streaming audio for live LASA checking during verbal orders | Medium |
| EHR integration layer | Add FHIR R4 API compatibility to enable direct integration with hospital EHR systems | Medium |
| Audit logging | Persist every query, extracted drug, and risk decision to a structured database for retrospective analysis | Medium |
| Active learning loop | Collect pharmacist overrides of alerts and use them as labeled feedback to retrain the model | Low |
| Multi-language support | Extend STT and NLP to non-English clinical contexts using multilingual Whisper and translated drug taxonomies | Low |
This system implements and validates concepts from the published medication safety literature:
- ISMP (2023). ISMP List of Confused Drug Names. Institute for Safe Medication Practices. β Primary source for positive LASA pair labels.
- Cohen MR (1999). Medication Errors. American Pharmaceutical Association. β Foundational taxonomy of LASA error types and severity classification.
- Bates DW et al. (1995). Incidence of adverse drug events and potential adverse drug events. JAMA, 274(1), 29β34. β Statistical grounding for medication error frequency estimates.
- Winkler WE (1990). String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. β Theoretical basis for the Jaro-Winkler metric used in feature engineering.
- Feature engineering methodology informed by clinical NLP research on drug name recognition in electronic health records and discharge summaries.
Contributions are welcome. To propose a change:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Commit your changes with a descriptive message
- Push to your fork and open a Pull Request against
main
Common contribution areas:
- Additional drug-class mappings in
modules/07_patient_context.py - New drugs in
data/drug_list.txt(one per line, lowercase) - New LASA pairs in
data/processed/drug_pairs.csv(columns:drug1,drug2,label); rerunpython modules/03_model_training.pyafter adding pairs - Bug reports with a minimal reproducible example
Please keep all contributions focused on correctness and clinical plausibility. This is a medical safety project β speculative or unvalidated additions to the drug taxonomy should be flagged clearly in the PR description.
This system is provided for educational and research purposes only.
It has not been clinically validated, is not FDA-cleared or CE-marked, and has not undergone the regulatory review required for medical device software. It must not be used as the sole basis for any medication prescribing, dispensing, or administration decision in a clinical environment.
All outputs produced by this system β including risk levels, alerts, and reasons β are experimental and may be incorrect. Drug safety decisions must always be made by licensed healthcare professionals following institutional protocols, formulary guidelines, and direct patient assessment.
The authors assume no liability for any clinical outcome arising from the use or misuse of this software.
Built with β€οΈ for patient safety research by Saptarshi Sadhu.
The best medication error is the one that never reaches the patient.