Driftium is an open-source MLOps telemetry console and LLM observability platform. It is designed to trace statistical feature drift in tabular data pipelines and semantic distribution shifts in Large Language Model (LLM) responses. By analyzing incoming production telemetry against reference baselines, Driftium translates abstract distribution deltas and topic shifts into developer-friendly, LLM-generated Root Cause Analysis (RCA) summaries.
As machine learning models and LLMs run in production, they inevitably suffer from silent degradation. Shifted demographic distributions, system pipeline bugs, or changing prompt topics introduce data drift and semantic drift. Driftium detects these regressions before they corrupt downstream applications by providing:
- Continuous Statistical Alarms for structured, tabular feature datasets.
- Semantic Observability to detect topic shifts, vocabulary shifts, and system performance anomalies in LLM response distributions.
- Contextual Explanations that translate raw numerical scores into natural language diagnostics using a local LLM agent.
- Tabular Diagnostics: Detect continuous and categorical feature drift feature-by-feature using mathematical distribution tests.
- LLM Observability: Monitor LLM semantic response variance by mapping outputs into high-dimensional vector spaces and analyzing population distances.
- Automated Root Cause Analysis (RCA): Explain why drift has occurred rather than just flagging that it happened, comparing telemetry patterns against stable baselines.
- Sandbox Experimentation: Provide an interactive Playground to engineer prompts, generate real-time responses, and set stable baseline metrics.
Driftium features a single-server backend built with FastAPI that exposes two mounted subsystems: a legacy statistical data monitoring service and a semantic LLM observability application. The subsystems utilize Sentence Transformers (all-MiniLM-L6-v2) and a local Qdrant vector database to vectorize and store telemetry. The React frontend communicates with the unified server on port 8000 to render dynamic health indexes, stability trend lines, and sample comparisons.
- LLM Observability Dashboard:
- Health Index: A 100-point scale evaluating prompt-to-response alignment based on average cosine distance.
- Maximum Mean Discrepancy (MMD): Kernel-based MMD scoring using Radial Basis Functions (RBF) to measure distribution-level semantic shifts.
- Stability Trend Line: Graph tracing the historical health index and MMD scores over the last 20 calculations.
- Response Comparer: Side-by-side view of active baseline responses vs. current telemetry responses.
- Light/Dark Theme Toggle: Accessible sun/moon toggle in the navigation header that switches seamlessly between the default dark cyberpunk theme and a clean, light-mode enterprise SaaS dashboard, persisted in
localStorage.
- Interactive Prompt Playground:
- Direct Generation: Send custom prompt instructions to local Ollama (
smollm:135m) and view responses instantly. - Baseline Promotion: Promote current telemetry response lists to the baseline pool with a single click (
POST /baseline). - Auto-Sync: Automatically refreshes the main dashboard telemetry state when a baseline is promoted or new responses are generated.
- Direct Generation: Send custom prompt instructions to local Ollama (
- Agentic AI Root Cause Analysis (RCA):
- Collaborative Multi-Agent Flow: Sequentially routes drift telemetry through specialized agents (Triage Agent, Diagnosis Agent, and Recommendation Agent) via an Orchestrator to compile a final structured RCA report.
- Severity-Aware Action Items: Automatically formulates concrete next actions matching the exact drift severity level (LOW, MEDIUM, HIGH, CRITICAL) to prevent generic recommendations on stable baselines.
- Developer Trace Console: Displays a retro, terminal-style step-by-step developer log of agent communication and reasoning directly in the UI.
- Formatting Sanitization & Offline Fallbacks: Filters out raw JSON/dictionaries from model outputs to present clean, readable lists of actions in the UI, and falls back to pre-defined structured recommendation items if Ollama is unreachable.
- Persistent SQLite Storage:
- Drift History Persistence: Automatically persists calculated drift metrics to a local SQLite database (
drift_history.db). - Auto-Recovery & Maintenance: Automatically initializes tables on start and limits table size to the latest 1000 records. Timestamps are stored in UTC ISO format.
- Drift History Persistence: Automatically persists calculated drift metrics to a local SQLite database (
- Persistent Qdrant Local Vector Store:
- Baseline Vector Persistence: Persists baseline response embeddings to disk under
qdrant_db/, allowing reuse across server restarts. - Robust Environment Isolation: Automatically detects testing states (
pytestruns orPYTEST_CURRENT_TESTenv vars) to route test data to an isolatedqdrant_test_db/folder, avoiding false positives from implicit standard library imports and allowing dev servers to run concurrently without file lock conflicts.
- Baseline Vector Persistence: Persists baseline response embeddings to disk under
- Tabular Data Drift Monitoring:
- Kolmogorov-Smirnov (KS) Test: Compares continuous/numeric fields to detect distribution variances.
- Chi-Square & Cramer's V: Evaluates categorical fields and ranks them by effect size.
- Interactive Simulation: Filters baseline demographics to simulate shifts (e.g.
age < 35) or accepts custom CSV uploads. - Tabular LLM Explainer: Summarizes flagged tabular drift anomalies in natural language.
- Persistent Prompt Pools: Migrate raw in-memory response lists to a persistent database schema.
- Automated Scraper Cron Jobs: Schedule background scrapers to pull model logs automatically instead of relying on manual playground generations.
- Alert Webhook Integrations: Route critical drift severity alerts directly to Slack, Teams, or PagerDuty.
- Core: React 18 (Functional components, hooks, local state mapping)
- Tooling: Vite (Hot module reloading, dev server, compilation bundling)
- Styling: Vanilla CSS (Modern dot-matrix theme, glassmorphic grids, custom cards)
- Icons: Lucide React
- Framework: FastAPI (Asynchronous ASGI server routing)
- Web Server: Uvicorn
- Data Processing: Pandas, NumPy, SciPy (for Kolmogorov-Smirnov and Chi-square statistics)
- Vector Database: Qdrant (Client running in persistent local directory mode at
qdrant_db/orqdrant_test_db/for isolated unit tests) - Relational Database: SQLite for persistent historical drift logs (
drift_history.db) - Data Storage: In-memory telemetry queues and persistent historical trend records
- Embedding Pipeline: Sentence Transformers (
all-MiniLM-L6-v2generating 384-dimensional vector embeddings) - Local LLM Host: Ollama (
smollm:135mfootprint optimized to ~240MB RAM usage) - Multi-Agent System: Sequential agent orchestrator (Triage, Diagnosis, Recommendation agents)
- CI & Automation: GitHub Actions workflow (automating linting, formatting, and unit tests)
- Testing: Pytest framework (automated suite running 28 validation checks covering agents, DB persistence, scorers, and vector storage)
- Ingestion: The user either loads a simulated demographic dataset, uploads a custom CSV, or executes prompts in the Playground.
- Vectorization: For LLM telemetry, the backend passes generated texts to the Sentence Transformer model to compute 384-dimensional vector embeddings.
- Indexing: Embeddings are stored in separate baseline and current collections in the persistent Qdrant vector database.
- Metric Computation: The backend runs the Centroid Cosine Distance and MMD tests on the vector distributions.
- RCA Generation: If drift is present, the backend sends sample outputs to Ollama to summarize the semantic shift, falling back to a rule-based generator if Ollama is unreachable.
- Visualization: The React UI polls/fetches these endpoints and updates metrics, historical trends, and text cards dynamically.
graph TD
User([User / Engineer]) -->|Interacts| UI[React Frontend Dashboard]
UI -->|GET /drift & GET /samples| API[FastAPI Server :8000]
UI -->|POST /generate| API
UI -->|POST /baseline| API
subgraph Tabular Telemetry
API -->|Simulate / Upload| TabEngine[Tabular Drift Engine]
TabEngine -->|KS & Chi-Square| Stats[SciPy / Pandas]
TabEngine -->|RCA prompt| LLMExp[LLM Explainer]
end
subgraph LLM Telemetry
API -->|GET /drift/rca| LlmRca[LLM RCA Handler]
API -->|Ollama Client| Ollama[Ollama Server :11434]
API -->|Text List| Embedder[Sentence Transformer]
Embedder -->|Vectors| Qdrant[(Qdrant in-memory)]
Qdrant -->|Retrieve| Scorer[Centroid & MMD Scorer]
Scorer -->|Drift Scores| API
end
Ollama -->|smollm:135m| LlmRca
Ollama -->|smollm:135m| API
mlops-drift-monitor/
├── frontend/ # React / Vite SPA frontend
│ ├── src/
│ │ ├── App.jsx # Dashboard application, layout grids, and sections
│ │ ├── api.js # API client integration communicating with FastAPI
│ │ ├── main.jsx # React app entry point
│ │ └── styles.css # Theme-aware styles (supporting Light/Dark modes), layouts, and animations
│ ├── package.json # Frontend node packages & run scripts
│ └── vite.config.js # Vite configurations
├── src/ # Backend source code modules
│ ├── llm_monitoring/ # LLM Observability & Semantic Drift Engine
│ │ ├── agents/ # Multi-Agent RCA workflow package
│ │ │ ├── __init__.py
│ │ │ ├── triage_agent.py # Analyzes numerical severity thresholds
│ │ │ ├── diagnosis_agent.py # Analyzes topic/semantic shift logs
│ │ │ ├── recommendation_agent.py # Formulates action recommendations
│ │ │ └── orchestrator.py # Orchestrates sequential agent invocations
│ │ ├── api.py # LLM API endpoints (/generate, /drift, /samples, /drift/rca)
│ │ ├── embedder.py # Sentence Transformers embedding mapping
│ │ ├── llm_drift_scorer.py # Centroid distance and MMD calculation
│ │ ├── simulator.py # Standalone simulation testing logic
│ │ ├── inference_server.py # Dedicated LLM completion server
│ │ └── vector_store.py # Persistent Qdrant client utility
│ ├── monitoring/ # Tabular Feature Drift Engine
│ │ ├── api.py # Main FastAPI Server app (with LLM sub-app mounted)
│ │ ├── drift_detection.py # Statistical tests (KS, Chi2, Cramer's V)
│ │ └── service.py # Data loading, payload building, and simulations
│ └── llm/ # Tabular LLM RCA agent
│ └── llm_explainer.py # Ollama helper for tabular RCA
├── tests/ # Automated Pytest suite
│ ├── test_agentic_rca.py # Unit tests for multi-agent RCA system
│ ├── test_drift_detection.py # Statistical check validations
│ ├── test_vector_store.py # Qdrant vector store upsert/retrieval checks
│ ├── test_monitoring_service.py # Tabular service integrations
│ └── test_persistence.py # SQLite database persistence tests
├── main.py # CLI workflow entry point
├── pytest.ini # Pytest framework configurations
├── requirements.txt # Python requirements and package list
```r service integrations
├── main.py # CLI workflow entry point
├── pytest.ini # Pytest framework configurations
└── requirements.txt # Python requirements and package list
frontend/: The client application. Built as a Single Page Application (SPA) using React, custom CSS, and Lucide React.src/llm_monitoring/: Houses the LLM monitoring endpoints, Sentence Transformers embeddings pipeline, in-memory Qdrant client, and the Centroid/MMD scoring math.src/monitoring/: Houses the main FastAPI server setup, mounting configurations, and statistical drift test pipelines for structured tabular data.src/llm/: Hosts the prompt engineering models and agents that explain tabular data drift anomalies.tests/: Verification scripts covering statistical accuracy, vector ingestion, and edge cases.
- Python: Version 3.10+
- Node.js: Node 18+ (with
npm) - Ollama: Installed and running locally. Run the model using:
ollama pull smollm:135m
- Navigate to the root directory and create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- Windows:
.\venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
- Install dependencies:
pip install -r requirements.txt
- Navigate to the
frontendfolder:cd frontend - Install npm packages:
npm install
To customize the connection URL, create a .env file in the frontend folder:
VITE_API_BASE_URL=http://127.0.0.1:8000- Start the FastAPI backend server:
uvicorn src.monitoring.api:app --reload --port 8000
- In a separate terminal, launch the React development server:
cd frontend npm run dev - Navigate to
http://127.0.0.1:5173to view the Driftium Dashboard.
| Endpoint | Method | Purpose | Request Body | Response Shape |
|---|---|---|---|---|
/api/health |
GET |
API status and availability check | None | {"status": "ok", "generated_at": "..."} |
/api/monitoring/simulated |
GET |
Retrieve tabular simulated drift payload | Query parameters: age_threshold (int), p_threshold (float) |
{"generated_at": "...", "summary": {...}, "display_rows": [...]} |
/api/monitoring/upload |
POST |
Upload a custom CSV for tabular drift analysis | Binary CSV payload | {"generated_at": "...", "summary": {...}, "display_rows": [...]} |
/api/rca |
POST |
Generate tabular RCA explaining numerical shifts | {"feature": "balance", "drift_rows": [...], "incoming_source_description": "..."} |
{"available": true, "content": "...", "error": null, "model": "..."} |
/generate |
POST |
Get LLM response completion and append to telemetry pool | {"prompt": "text"} |
{"id": "...", "prompt": "...", "response": "...", "timestamp": "..."} |
/baseline |
POST |
Promote current telemetry responses to the baseline pool | None | {"message": "Baseline set successfully", "baseline_size": 4} |
/drift |
GET |
Calculate LLM semantic drift scores and append to history | None | {"status": "not_initialized" | "waiting_for_baseline" | "waiting_for_telemetry" | "ready", "centroid_score": 0.22, "mmd_score": 0.05, "severity": "LOW", "timestamp": "...", "message": "..."} |
/drift/history |
GET |
Retrieve last 20 drift history calculation logs | None | {"history": [{"timestamp": "...", "centroid_score": 0.22, "mmd_score": 0.05, "severity": "LOW"}]} |
/samples |
GET |
Fetch active baseline and current telemetry text samples | None | {"baseline": ["..."], "current": ["..."]} |
/drift/rca |
GET |
Compare response pools and return semantic topic shift analysis | None | {"available": false, "message": "Not enough data for RCA."} or {"available": true, "baseline_size": 4, "telemetry_size": 3, "severity": "CRITICAL", "summary": "...", "possible_cause": "..."} |
/drift/agentic-rca |
GET |
Execute the collaborative Multi-Agent RCA workflow and return structured report | None | {"triage": {...}, "diagnosis": {...}, "recommendations": [...], "agent_collaboration_log": [...]} |
- Ingest Production Batch: Production telemetry data is ingested via simulated subsets (e.g. filtering age) or custom CSV uploads.
- Numerical Statistics: Continuous features are evaluated using the Kolmogorov-Smirnov (KS) test comparing incoming data to baseline reference data. Columns with a p-value below the target threshold (default
0.05) are flagged. - Categorical Statistics: Categorical features are analyzed with a Chi-square contingency test. Cramer's V is computed to rank the severity of the shift.
- Tabular RCA: Flagged columns are sent to the tabular LLM explainer where a prompt compiles reference ranges and current statistics to output a natural language explanation.
- Playground Ingestion: Prompt logs are routed to
/generatewhere the local Ollama instance outputs text responses and appends them to the telemetry pool. - Embedding Generation: The baseline and current response pools are passed to the
all-MiniLM-L6-v2encoder model, converting sentences into 384-dimensional dense vectors. - Vector Database Indexing: Embeddings are stored in separate baseline and current collections in the in-memory Qdrant vector database.
- Divergence Metric Scoring:
- Centroid Distance: Computes the cosine distance between the mean vectors of the baseline and current populations.
- Maximum Mean Discrepancy (MMD): Calculates kernel-based distribution distance to identify subtle structural variations.
- Semantic RCA Summary: The system triggers Ollama to compare text samples directly, identifying topic shifts, keyword deviations, or tone shifts.
- Built an End-to-End MLOps Drift Monitor & LLM Observability Platform utilizing FastAPI, React (Vite), Sentence Transformers, and Qdrant to detect tabular and semantic drift in real-time.
- Designed a Theme-Aware Dashboard Interface featuring a fully accessible Light/Dark theme toggle with localStorage persistence and smooth transition animations.
- Designed a Collaborative Multi-Agent AI System incorporating Triage, Diagnosis, and Recommendation agents orchestrated sequentially to automatically troubleshoot and explain semantic drift.
- Engineered Statistical Analysis Engines in Python using SciPy to perform Kolmogorov-Smirnov (KS) and Chi-square contingency tests, flagging feature deviations in incoming dataset telemetry.
- Implemented High-Dimensional LLM Observability by computing centroid cosine distance and Maximum Mean Discrepancy (MMD) scores over 384-dimensional response embeddings.
- Integrated Local LLM Agents utilizing Ollama (
smollm:135m) to compare response pools and perform automated, natural language Root Cause Analysis (RCA) on detected semantic anomalies. - Designed a Robust Unified Server Architecture using FastAPI sub-app mounting to bundle tabular and LLM observability engines under a single CORS configuration on port 8000.
- Authored Exhaustive Automated Test Suites in Pytest validating vector database writes, statistical thresholds, agent orchestration, and SQLite persistence.
ML and LLM deployments suffer from silent degradation. Standard APM tools (e.g. Datadog) monitor system status like CPU or latency, but cannot spot mathematical data drift or semantic topic shifts. Driftium was built to bridge this gap, giving AI engineers a unified visual dashboard that alerts on distribution shifts and provides prompt explanations of why the data shifted.
- Collaborative Multi-Agent Architecture: Replaced the legacy single-prompt RCA with a collaborative multi-agent setup. Running specialized agents (Triage, Diagnosis, Recommendations) in sequence enforces separation of concerns, improves structured reasoning, and produces highly detailed next steps.
- Model Optimization for Local Hosts: Shifted LLM reasoning from
phi3:mini(2.2 GB) tosmollm:135m(91 MB). This optimization reduced local memory usage from 3.8 GiB to 242 MiB RAM, preventing out-of-memory errors on evaluators' systems while maintaining execution latency under 1 second. - Embedding Model Selection: Sentence Transformers (
all-MiniLM-L6-v2) was selected because it generates compact 384-dimensional vector embeddings, significantly reducing memory footprint and processing latency compared to larger models, while maintaining rich semantic density. - FastAPI Sub-App Mounting: Rather than managing multiple backend servers, ports, and CORS setups, mounting the LLM observatory at the root
/of the tabular monitoring app enables single-port execution. - Rule-Based Exception Handling: LLM services hosted locally can be volatile. The RCA module detects Ollama timeouts or failure codes and gracefully falls back to deterministic rule-based analysis, ensuring high system uptime.
- Accessible Light/Dark Theme Integration: Integrated a clean, theme-aware CSS custom properties layout system that preserves the dark theme pixel-identically while supporting a premium light-theme SaaS layout. Addressed accessibility constraints with visible focus states and screen-reader compliant aria-labels.
- Stale Dashboard Telemetry Trends & SQLite Persistence: Resolved a bug where the
/driftevaluation endpoint generated metrics but failed to store them. Implemented a persistent SQLite database storage mechanism (drift_history.db) in the backend to record and load scores across server restarts, resolving the flat dashboard trend line and maintaining historical data. - Playground Prop Desynchronization: Fixed a frontend bug where the prompt playground component was not properly bound to the parent refresh context. Corrected the signature to accept and trigger the
reloadprop on successful generation/promotion, ensuring drift metrics refresh instantly on tab switch. - Sub-App Route Conflicts: Solved routing blockages by arranging endpoints such that specific static routes take precedence, while mounting the sub-app as a root-level fallback.
- Severity-Aware Recommendations & Formatting: Fixed a bug where recommendation action items were uniform across low/medium/high/critical severities and UI displayed raw dictionaries. Developed a severity-tiered routing prompt for the Recommendation Agent, sanitizers to filter out raw dictionaries from the response, and standard severity-specific fallback lists.
- Standardized Empty States & Unified Status Enum: Resolved a startup issue where opening the dashboard on a fresh installation triggered red "Failed to fetch" error banners due to empty baseline/telemetry response lists. Standardized all LLM monitoring endpoints to return a unified status enum (
not_initialized,waiting_for_baseline,waiting_for_telemetry,ready) as valid JSON payloads rather than throwing exceptions or returning unhandled formats. Integrated clean, informative cyan banners in the React frontend to welcome users and guide onboarding during empty states. - Concurrent Database Locks in Local Qdrant Storage: Addressed parallel test-suite and reload conflicts by shifting
init_collectionto FastAPI lifespans and implementing a robust environment detection check (is_testing) that avoids false-positive folder locks onqdrant_test_dbcaused by implicit standard-library imports ofunittestin the backend uvicorn process.
- Vector Query Scaling: In production, querying vector distances over millions of runs can be slow. Implementing collection partitions and index HNSW graphs in Qdrant ensures sub-millisecond distance lookups.
- Batching Embeddings: Rather than vectorizing incoming responses individually, batching text arrays before sending them to the encoder pipeline minimizes redundant GPU/CPU overhead.
Shravani Rane
Driftium is released under the MIT License.