Production NLP system that makes education-policy discourse comparable across England, Scotland and Ireland — using a deployed BERTopic pipeline as the production spine, with NMF as a convergent-validity baseline.
The headline isn't a topic model — it's the dual-pipeline design: two structurally different methods (semantic-embedding BERTopic and lexical NMF) are run over the same corpus, and they independently reproduce the same cross-national finding. That convergence is what makes the result trustworthy rather than an artefact of one model.
Core finding: the three systems foreground genuinely different things — England accountability & structures (Ofsted, academies), Scotland equity & rights, Ireland teaching, curriculum & inclusion. The difference is statistically strong (Cramér's V ≈ 0.28, p ≪ 0.001), survives controlling for who is speaking (V = 0.431 within government), and is reproduced by both methods.
- Inference API (FastAPI on Render):
https://atlased-epoch-api.onrender.com/health - Weekly pipeline: GitHub Actions — self-healing scrape → LLM relevance gate → inference → drift monitoring
- Dashboard: Chart.js front end served by the API (
/+/api/data)
weekly scrape ──► Supabase (raw)
│ self-healing watermark (GitHub Actions)
▼
LLM relevance gate (gpt-4o-mini, frozen rubric, cost-capped)
│
▼
BERTopic inference ──► Supabase (epoch_* topic tables) ──► dashboard + /predict API
│ cosine vs 138 frozen topic centroids │
└──────────────────────────────────────────────────────────► drift monitoring (monthly)
- Canonical store: one Supabase (Postgres) source of truth that every component reads from.
- Inference: documents are chunked (~100 words), embedded (MiniLM), and scored by cosine similarity to 138 frozen topic centroids — fast, CPU-only, no GPU.
- Serving: a Dockerised FastAPI app on Render with the embedder baked into the image (no runtime model download).
MiniLM sentence-embeddings → UMAP → HDBSCAN → c-TF-IDF → frozen centroids → cosine inference
Three per-country models (England 75 / Scotland 30 / Ireland 33 topics) plus a combined model, rolled up to a curated 20-category crosswalk. Chosen for the comparative task: a shared embedding space lets equivalents align despite different national vocabulary (England SEND ↔ Scotland ASN at cosine 0.78).
Country-specific TF-IDF + NMF models. Transparent, lexical, CPU-cheap. Retained deliberately — not as a discarded first attempt — as the independent check.
NMF, sharing no architecture with BERTopic, reproduces the same national fingerprints → the cross-national finding is method-robust, not a BERTopic artefact. This is convergent validity, and it's the project's real contribution.
- Three national registers, triple-validated (by measure, by actor, by method).
- Statistical: χ² p ≪ 0.001, Cramér's V ≈ 0.284, bootstrap CIs that don't overlap across countries.
- Robustness: leave-one-source-out shifts no category > ~3pp; the country effect survives within the government-only stratum (V = 0.431).
- Diversity: > 0.93 per country; inference agrees ~71% with an LLM judge (Claude Haiku, category) and ~86% with the model's own HDBSCAN labels.
- Monitoring: content-drift vs model-drift separated — current verdict no retrain (fit stable across 14 quarters).
Full write-up: docs/results_analysis_bertopic.md.
- Self-healing weekly scrape (watermark-based; catches up after any missed run).
- Decoupled LLM relevance gate — drains ungated rows independently; a failure just retries.
- Idempotent Supabase upserts, versioned API I/O contract (HTTP 409 on model-version mismatch).
- Monitoring + failure alerting (GitHub Actions opens an issue on any pipeline failure).
- Tests + CI, SBOM, secrets baseline.
atlas-ed/
├── pipelines/
│ ├── bertopic_epoch/ # PRODUCTION — api, training, inference, monitoring, models, outputs, sql, tests, notebooks
│ └── nmf_baseline/ # BASELINE — NMF pipeline (convergent-validity control)
├── src/atlased/ # shared, installable package (inference core, preprocessing, path resolver)
├── ingestion/ # scrape → gate (feeds both pipelines)
├── dashboard/epoch/ # Chart.js dashboard + build_data.py (data.json generator)
├── requirements/ # api.txt, scraping.txt
├── docs/ # architecture, governance (model card / datasheet / DPIA), methods, results
├── experiments/ # MLflow runs + scratch
├── Dockerfile · render.yaml · pyproject.toml
└── .github/workflows/ # weekly scrape · gate · inference · monthly monitor · alert
pip install -e . -r requirements/api.txt # install the `atlased` core + API deps
uvicorn main:app --app-dir pipelines/bertopic_epoch/api # serves API + dashboard at /# score a document
curl -s -X POST localhost:8000/predict -H 'Content-Type: application/json' \
-d '{"docs":[{"doc_id":"t1","text":"Ofsted inspection of academy trusts...","country":"eng"}]}'Quick tests: pip install -e . -r requirements/api.txt && pytest
Model/API tests load the frozen BERTopic artefacts and sentence-transformer, so they are opt-in:
pytest -m model pipelines/bertopic_epoch/tests
- Model Card · Datasheet · DPIA
- Source discretion: the system surfaces analysis and links, never the source text or document URLs.
- Fairness framing: representational (whose discourse is amplified); all comparison is within-country share, never raw counts.
Python · BERTopic · sentence-transformers (MiniLM) · UMAP / HDBSCAN · scikit-learn (NMF) · FastAPI · Supabase (Postgres) · Docker · Render · GitHub Actions · MLflow · Chart.js
UCL Institute of Education · Education Research Programme · funded by UCL Grand Challenges · Level 6 ML Engineering Apprenticeship · 2025–2026