Driftium - MLOps Drift Monitor & LLM Observability Platform

Driftium is an open-source MLOps telemetry console and LLM observability platform. It is designed to trace statistical feature drift in tabular data pipelines and semantic distribution shifts in Large Language Model (LLM) responses. By analyzing incoming production telemetry against reference baselines, Driftium translates abstract distribution deltas and topic shifts into developer-friendly, LLM-generated Root Cause Analysis (RCA) summaries.

Overview

What Problem the Project Solves

As machine learning models and LLMs run in production, they inevitably suffer from silent degradation. Shifted demographic distributions, system pipeline bugs, or changing prompt topics introduce data drift and semantic drift. Driftium detects these regressions before they corrupt downstream applications by providing:

Continuous Statistical Alarms for structured, tabular feature datasets.
Semantic Observability to detect topic shifts, vocabulary shifts, and system performance anomalies in LLM response distributions.
Contextual Explanations that translate raw numerical scores into natural language diagnostics using a local LLM agent.

Key Objectives

Tabular Diagnostics: Detect continuous and categorical feature drift feature-by-feature using mathematical distribution tests.
LLM Observability: Monitor LLM semantic response variance by mapping outputs into high-dimensional vector spaces and analyzing population distances.
Automated Root Cause Analysis (RCA): Explain why drift has occurred rather than just flagging that it happened, comparing telemetry patterns against stable baselines.
Sandbox Experimentation: Provide an interactive Playground to engineer prompts, generate real-time responses, and set stable baseline metrics.

Brief Architecture Overview

Driftium features a single-server backend built with FastAPI that exposes two mounted subsystems: a legacy statistical data monitoring service and a semantic LLM observability application. The subsystems utilize Sentence Transformers (all-MiniLM-L6-v2) and a local Qdrant vector database to vectorize and store telemetry. The React frontend communicates with the unified server on port 8000 to render dynamic health indexes, stability trend lines, and sample comparisons.

Features

Implemented Features (Active in Codebase)

LLM Observability Dashboard:
- Health Index: A 100-point scale evaluating prompt-to-response alignment based on average cosine distance.
- Maximum Mean Discrepancy (MMD): Kernel-based MMD scoring using Radial Basis Functions (RBF) to measure distribution-level semantic shifts.
- Stability Trend Line: Graph tracing the historical health index and MMD scores over the last 20 calculations.
- Response Comparer: Side-by-side view of active baseline responses vs. current telemetry responses.
- Light/Dark Theme Toggle: Accessible sun/moon toggle in the navigation header that switches seamlessly between the default dark cyberpunk theme and a clean, light-mode enterprise SaaS dashboard, persisted in localStorage.
Interactive Prompt Playground:
- Direct Generation: Send custom prompt instructions to local Ollama (smollm:135m) and view responses instantly.
- Baseline Promotion: Promote current telemetry response lists to the baseline pool with a single click (POST /baseline).
- Auto-Sync: Automatically refreshes the main dashboard telemetry state when a baseline is promoted or new responses are generated.
Agentic AI Root Cause Analysis (RCA):
- Collaborative Multi-Agent Flow: Sequentially routes drift telemetry through specialized agents (Triage Agent, Diagnosis Agent, and Recommendation Agent) via an Orchestrator to compile a final structured RCA report.
- Severity-Aware Action Items: Automatically formulates concrete next actions matching the exact drift severity level (LOW, MEDIUM, HIGH, CRITICAL) to prevent generic recommendations on stable baselines.
- Developer Trace Console: Displays a retro, terminal-style step-by-step developer log of agent communication and reasoning directly in the UI.
- Formatting Sanitization & Offline Fallbacks: Filters out raw JSON/dictionaries from model outputs to present clean, readable lists of actions in the UI, and falls back to pre-defined structured recommendation items if Ollama is unreachable.
Persistent SQLite Storage:
- Drift History Persistence: Automatically persists calculated drift metrics to a local SQLite database (drift_history.db).
- Auto-Recovery & Maintenance: Automatically initializes tables on start and limits table size to the latest 1000 records. Timestamps are stored in UTC ISO format.
Persistent Qdrant Local Vector Store:
- Baseline Vector Persistence: Persists baseline response embeddings to disk under qdrant_db/, allowing reuse across server restarts.
- Robust Environment Isolation: Automatically detects testing states (pytest runs or PYTEST_CURRENT_TEST env vars) to route test data to an isolated qdrant_test_db/ folder, avoiding false positives from implicit standard library imports and allowing dev servers to run concurrently without file lock conflicts.
Tabular Data Drift Monitoring:
- Kolmogorov-Smirnov (KS) Test: Compares continuous/numeric fields to detect distribution variances.
- Chi-Square & Cramer's V: Evaluates categorical fields and ranks them by effect size.
- Interactive Simulation: Filters baseline demographics to simulate shifts (e.g. age < 35) or accepts custom CSV uploads.
- Tabular LLM Explainer: Summarizes flagged tabular drift anomalies in natural language.

Planned Features (Roadmap)

Persistent Prompt Pools: Migrate raw in-memory response lists to a persistent database schema.
Automated Scraper Cron Jobs: Schedule background scrapers to pull model logs automatically instead of relying on manual playground generations.
Alert Webhook Integrations: Route critical drift severity alerts directly to Slack, Teams, or PagerDuty.

Tech Stack

Frontend

Core: React 18 (Functional components, hooks, local state mapping)
Tooling: Vite (Hot module reloading, dev server, compilation bundling)
Styling: Vanilla CSS (Modern dot-matrix theme, glassmorphic grids, custom cards)
Icons: Lucide React

Backend

Framework: FastAPI (Asynchronous ASGI server routing)
Web Server: Uvicorn
Data Processing: Pandas, NumPy, SciPy (for Kolmogorov-Smirnov and Chi-square statistics)

Database

Vector Database: Qdrant (Client running in persistent local directory mode at qdrant_db/ or qdrant_test_db/ for isolated unit tests)
Relational Database: SQLite for persistent historical drift logs (drift_history.db)
Data Storage: In-memory telemetry queues and persistent historical trend records

MLOps / AI Components

Embedding Pipeline: Sentence Transformers (all-MiniLM-L6-v2 generating 384-dimensional vector embeddings)
Local LLM Host: Ollama (smollm:135m footprint optimized to ~240MB RAM usage)
Multi-Agent System: Sequential agent orchestrator (Triage, Diagnosis, Recommendation agents)

Infrastructure

CI & Automation: GitHub Actions workflow (automating linting, formatting, and unit tests)
Testing: Pytest framework (automated suite running 28 validation checks covering agents, DB persistence, scorers, and vector storage)

System Architecture

Data Flow

Ingestion: The user either loads a simulated demographic dataset, uploads a custom CSV, or executes prompts in the Playground.
Vectorization: For LLM telemetry, the backend passes generated texts to the Sentence Transformer model to compute 384-dimensional vector embeddings.
Indexing: Embeddings are stored in separate baseline and current collections in the persistent Qdrant vector database.
Metric Computation: The backend runs the Centroid Cosine Distance and MMD tests on the vector distributions.
RCA Generation: If drift is present, the backend sends sample outputs to Ollama to summarize the semantic shift, falling back to a rule-based generator if Ollama is unreachable.
Visualization: The React UI polls/fetches these endpoints and updates metrics, historical trends, and text cards dynamically.

Component Interactions

graph TD
    User([User / Engineer]) -->|Interacts| UI[React Frontend Dashboard]
    UI -->|GET /drift & GET /samples| API[FastAPI Server :8000]
    UI -->|POST /generate| API
    UI -->|POST /baseline| API
    
    subgraph Tabular Telemetry
        API -->|Simulate / Upload| TabEngine[Tabular Drift Engine]
        TabEngine -->|KS & Chi-Square| Stats[SciPy / Pandas]
        TabEngine -->|RCA prompt| LLMExp[LLM Explainer]
    end

    subgraph LLM Telemetry
        API -->|GET /drift/rca| LlmRca[LLM RCA Handler]
        API -->|Ollama Client| Ollama[Ollama Server :11434]
        API -->|Text List| Embedder[Sentence Transformer]
        Embedder -->|Vectors| Qdrant[(Qdrant in-memory)]
        Qdrant -->|Retrieve| Scorer[Centroid & MMD Scorer]
        Scorer -->|Drift Scores| API
    end
    
    Ollama -->|smollm:135m| LlmRca
    Ollama -->|smollm:135m| API

Project Structure

mlops-drift-monitor/
├── frontend/                     # React / Vite SPA frontend
│   ├── src/
│   │   ├── App.jsx               # Dashboard application, layout grids, and sections
│   │   ├── api.js                # API client integration communicating with FastAPI
│   │   ├── main.jsx              # React app entry point
│   │   └── styles.css            # Theme-aware styles (supporting Light/Dark modes), layouts, and animations
│   ├── package.json              # Frontend node packages & run scripts
│   └── vite.config.js            # Vite configurations
├── src/                          # Backend source code modules
│   ├── llm_monitoring/           # LLM Observability & Semantic Drift Engine
│   │   ├── agents/               # Multi-Agent RCA workflow package
│   │   │   ├── __init__.py
│   │   │   ├── triage_agent.py   # Analyzes numerical severity thresholds
│   │   │   ├── diagnosis_agent.py # Analyzes topic/semantic shift logs
│   │   │   ├── recommendation_agent.py # Formulates action recommendations
│   │   │   └── orchestrator.py   # Orchestrates sequential agent invocations
│   │   ├── api.py                # LLM API endpoints (/generate, /drift, /samples, /drift/rca)
│   │   ├── embedder.py           # Sentence Transformers embedding mapping
│   │   ├── llm_drift_scorer.py   # Centroid distance and MMD calculation
│   │   ├── simulator.py          # Standalone simulation testing logic
│   │   ├── inference_server.py   # Dedicated LLM completion server
│   │   └── vector_store.py       # Persistent Qdrant client utility
│   ├── monitoring/               # Tabular Feature Drift Engine
│   │   ├── api.py                # Main FastAPI Server app (with LLM sub-app mounted)
│   │   ├── drift_detection.py    # Statistical tests (KS, Chi2, Cramer's V)
│   │   └── service.py            # Data loading, payload building, and simulations
│   └── llm/                      # Tabular LLM RCA agent
│       └── llm_explainer.py      # Ollama helper for tabular RCA
├── tests/                        # Automated Pytest suite
│   ├── test_agentic_rca.py       # Unit tests for multi-agent RCA system
│   ├── test_drift_detection.py   # Statistical check validations
│   ├── test_vector_store.py      # Qdrant vector store upsert/retrieval checks
│   ├── test_monitoring_service.py # Tabular service integrations
│   └── test_persistence.py       # SQLite database persistence tests
├── main.py                       # CLI workflow entry point
├── pytest.ini                    # Pytest framework configurations
├── requirements.txt              # Python requirements and package list
```r service integrations
├── main.py                       # CLI workflow entry point
├── pytest.ini                    # Pytest framework configurations
└── requirements.txt              # Python requirements and package list

Directory Explanation

frontend/: The client application. Built as a Single Page Application (SPA) using React, custom CSS, and Lucide React.
src/llm_monitoring/: Houses the LLM monitoring endpoints, Sentence Transformers embeddings pipeline, in-memory Qdrant client, and the Centroid/MMD scoring math.
src/monitoring/: Houses the main FastAPI server setup, mounting configurations, and statistical drift test pipelines for structured tabular data.
src/llm/: Hosts the prompt engineering models and agents that explain tabular data drift anomalies.
tests/: Verification scripts covering statistical accuracy, vector ingestion, and edge cases.

Setup & Installation

Prerequisites

Python: Version 3.10+
Node.js: Node 18+ (with npm)
Ollama: Installed and running locally. Run the model using:
```
ollama pull smollm:135m
```

Backend Setup

Navigate to the root directory and create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
.\venv\Scripts\activate
```
- macOS/Linux:
```
source venv/bin/activate
```
Install dependencies:
```
pip install -r requirements.txt
```

Frontend Setup

Navigate to the frontend folder:
```
cd frontend
```
Install npm packages:
```
npm install
```

Environment Variables

To customize the connection URL, create a .env file in the frontend folder:

VITE_API_BASE_URL=http://127.0.0.1:8000

Running the Application

Start the FastAPI backend server:

uvicorn src.monitoring.api:app --reload --port 8000

In a separate terminal, launch the React development server:
```
cd frontend
npm run dev
```
Navigate to http://127.0.0.1:5173 to view the Driftium Dashboard.

API Endpoints

Endpoint	Method	Purpose	Request Body	Response Shape
`/api/health`	`GET`	API status and availability check	None	`{"status": "ok", "generated_at": "..."}`
`/api/monitoring/simulated`	`GET`	Retrieve tabular simulated drift payload	Query parameters: `age_threshold` (int), `p_threshold` (float)	`{"generated_at": "...", "summary": {...}, "display_rows": [...]}`
`/api/monitoring/upload`	`POST`	Upload a custom CSV for tabular drift analysis	Binary CSV payload	`{"generated_at": "...", "summary": {...}, "display_rows": [...]}`
`/api/rca`	`POST`	Generate tabular RCA explaining numerical shifts	`{"feature": "balance", "drift_rows": [...], "incoming_source_description": "..."}`	`{"available": true, "content": "...", "error": null, "model": "..."}`
`/generate`	`POST`	Get LLM response completion and append to telemetry pool	`{"prompt": "text"}`	`{"id": "...", "prompt": "...", "response": "...", "timestamp": "..."}`
`/baseline`	`POST`	Promote current telemetry responses to the baseline pool	None	`{"message": "Baseline set successfully", "baseline_size": 4}`
`/drift`	`GET`	Calculate LLM semantic drift scores and append to history	None	`{"status": "not_initialized" \| "waiting_for_baseline" \| "waiting_for_telemetry" \| "ready", "centroid_score": 0.22, "mmd_score": 0.05, "severity": "LOW", "timestamp": "...", "message": "..."}`
`/drift/history`	`GET`	Retrieve last 20 drift history calculation logs	None	`{"history": [{"timestamp": "...", "centroid_score": 0.22, "mmd_score": 0.05, "severity": "LOW"}]}`
`/samples`	`GET`	Fetch active baseline and current telemetry text samples	None	`{"baseline": ["..."], "current": ["..."]}`
`/drift/rca`	`GET`	Compare response pools and return semantic topic shift analysis	None	`{"available": false, "message": "Not enough data for RCA."}` or `{"available": true, "baseline_size": 4, "telemetry_size": 3, "severity": "CRITICAL", "summary": "...", "possible_cause": "..."}`
`/drift/agentic-rca`	`GET`	Execute the collaborative Multi-Agent RCA workflow and return structured report	None	`{"triage": {...}, "diagnosis": {...}, "recommendations": [...], "agent_collaboration_log": [...]}`

Drift Monitoring Workflow

1. Feature Drift Detection Flow (Tabular)

Ingest Production Batch: Production telemetry data is ingested via simulated subsets (e.g. filtering age) or custom CSV uploads.
Numerical Statistics: Continuous features are evaluated using the Kolmogorov-Smirnov (KS) test comparing incoming data to baseline reference data. Columns with a p-value below the target threshold (default 0.05) are flagged.
Categorical Statistics: Categorical features are analyzed with a Chi-square contingency test. Cramer's V is computed to rank the severity of the shift.
Tabular RCA: Flagged columns are sent to the tabular LLM explainer where a prompt compiles reference ranges and current statistics to output a natural language explanation.

2. LLM Drift Evaluation Flow (Semantic)

Playground Ingestion: Prompt logs are routed to /generate where the local Ollama instance outputs text responses and appends them to the telemetry pool.
Embedding Generation: The baseline and current response pools are passed to the all-MiniLM-L6-v2 encoder model, converting sentences into 384-dimensional dense vectors.
Vector Database Indexing: Embeddings are stored in separate baseline and current collections in the in-memory Qdrant vector database.
Divergence Metric Scoring:
- Centroid Distance: Computes the cosine distance between the mean vectors of the baseline and current populations.
- Maximum Mean Discrepancy (MMD): Calculates kernel-based distribution distance to identify subtle structural variations.
Semantic RCA Summary: The system triggers Ollama to compare text samples directly, identifying topic shifts, keyword deviations, or tone shifts.

Screenshots / Demo

Product Landing Page

LLM Observability Dashboard

RCA Reports

Interactive Playground

Highlights

Built an End-to-End MLOps Drift Monitor & LLM Observability Platform utilizing FastAPI, React (Vite), Sentence Transformers, and Qdrant to detect tabular and semantic drift in real-time.
Designed a Theme-Aware Dashboard Interface featuring a fully accessible Light/Dark theme toggle with localStorage persistence and smooth transition animations.
Designed a Collaborative Multi-Agent AI System incorporating Triage, Diagnosis, and Recommendation agents orchestrated sequentially to automatically troubleshoot and explain semantic drift.
Engineered Statistical Analysis Engines in Python using SciPy to perform Kolmogorov-Smirnov (KS) and Chi-square contingency tests, flagging feature deviations in incoming dataset telemetry.
Implemented High-Dimensional LLM Observability by computing centroid cosine distance and Maximum Mean Discrepancy (MMD) scores over 384-dimensional response embeddings.
Integrated Local LLM Agents utilizing Ollama (smollm:135m) to compare response pools and perform automated, natural language Root Cause Analysis (RCA) on detected semantic anomalies.
Designed a Robust Unified Server Architecture using FastAPI sub-app mounting to bundle tabular and LLM observability engines under a single CORS configuration on port 8000.
Authored Exhaustive Automated Test Suites in Pytest validating vector database writes, statistical thresholds, agent orchestration, and SQLite persistence.

Overview

Why the Project Was Built

ML and LLM deployments suffer from silent degradation. Standard APM tools (e.g. Datadog) monitor system status like CPU or latency, but cannot spot mathematical data drift or semantic topic shifts. Driftium was built to bridge this gap, giving AI engineers a unified visual dashboard that alerts on distribution shifts and provides prompt explanations of why the data shifted.

Design Decisions

Collaborative Multi-Agent Architecture: Replaced the legacy single-prompt RCA with a collaborative multi-agent setup. Running specialized agents (Triage, Diagnosis, Recommendations) in sequence enforces separation of concerns, improves structured reasoning, and produces highly detailed next steps.
Model Optimization for Local Hosts: Shifted LLM reasoning from phi3:mini (2.2 GB) to smollm:135m (91 MB). This optimization reduced local memory usage from 3.8 GiB to 242 MiB RAM, preventing out-of-memory errors on evaluators' systems while maintaining execution latency under 1 second.
Embedding Model Selection: Sentence Transformers (all-MiniLM-L6-v2) was selected because it generates compact 384-dimensional vector embeddings, significantly reducing memory footprint and processing latency compared to larger models, while maintaining rich semantic density.
FastAPI Sub-App Mounting: Rather than managing multiple backend servers, ports, and CORS setups, mounting the LLM observatory at the root / of the tabular monitoring app enables single-port execution.
Rule-Based Exception Handling: LLM services hosted locally can be volatile. The RCA module detects Ollama timeouts or failure codes and gracefully falls back to deterministic rule-based analysis, ensuring high system uptime.

Challenges Solved

Accessible Light/Dark Theme Integration: Integrated a clean, theme-aware CSS custom properties layout system that preserves the dark theme pixel-identically while supporting a premium light-theme SaaS layout. Addressed accessibility constraints with visible focus states and screen-reader compliant aria-labels.
Stale Dashboard Telemetry Trends & SQLite Persistence: Resolved a bug where the /drift evaluation endpoint generated metrics but failed to store them. Implemented a persistent SQLite database storage mechanism (drift_history.db) in the backend to record and load scores across server restarts, resolving the flat dashboard trend line and maintaining historical data.
Playground Prop Desynchronization: Fixed a frontend bug where the prompt playground component was not properly bound to the parent refresh context. Corrected the signature to accept and trigger the reload prop on successful generation/promotion, ensuring drift metrics refresh instantly on tab switch.
Sub-App Route Conflicts: Solved routing blockages by arranging endpoints such that specific static routes take precedence, while mounting the sub-app as a root-level fallback.
Severity-Aware Recommendations & Formatting: Fixed a bug where recommendation action items were uniform across low/medium/high/critical severities and UI displayed raw dictionaries. Developed a severity-tiered routing prompt for the Recommendation Agent, sanitizers to filter out raw dictionaries from the response, and standard severity-specific fallback lists.
Standardized Empty States & Unified Status Enum: Resolved a startup issue where opening the dashboard on a fresh installation triggered red "Failed to fetch" error banners due to empty baseline/telemetry response lists. Standardized all LLM monitoring endpoints to return a unified status enum (not_initialized, waiting_for_baseline, waiting_for_telemetry, ready) as valid JSON payloads rather than throwing exceptions or returning unhandled formats. Integrated clean, informative cyan banners in the React frontend to welcome users and guide onboarding during empty states.
Concurrent Database Locks in Local Qdrant Storage: Addressed parallel test-suite and reload conflicts by shifting init_collection to FastAPI lifespans and implementing a robust environment detection check (is_testing) that avoids false-positive folder locks on qdrant_test_db caused by implicit standard-library imports of unittest in the backend uvicorn process.

Scalability Considerations

Vector Query Scaling: In production, querying vector distances over millions of runs can be slow. Implementing collection partitions and index HNSW graphs in Qdrant ensures sub-millisecond distance lookups.
Batching Embeddings: Rather than vectorizing incoming responses individually, batching text arrays before sending them to the encoder pipeline minimizes redundant GPU/CPU overhead.

Author

Shravani Rane

License

Driftium is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
docs		docs
frontend		frontend
monitoring		monitoring
notebooks		notebooks
reports		reports
scratch		scratch
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api-server.err.log		api-server.err.log
api-server.log		api-server.log
drift_history.db		drift_history.db
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Driftium - MLOps Drift Monitor & LLM Observability Platform

Overview

What Problem the Project Solves

Key Objectives

Brief Architecture Overview

Features

Implemented Features (Active in Codebase)

Planned Features (Roadmap)

Tech Stack

Frontend

Backend

Database

MLOps / AI Components

Infrastructure

System Architecture

Data Flow

Component Interactions

Project Structure

Directory Explanation

Setup & Installation

Prerequisites

Backend Setup

Frontend Setup

Environment Variables

Running the Application

API Endpoints

Drift Monitoring Workflow

1. Feature Drift Detection Flow (Tabular)

2. LLM Drift Evaluation Flow (Semantic)

Screenshots / Demo

Product Landing Page

LLM Observability Dashboard

RCA Reports

Interactive Playground

Highlights

Overview

Why the Project Was Built

Design Decisions

Challenges Solved

Scalability Considerations

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages