AI research intelligence — aggregate, score, cluster, summarise, and semantically search content from across the AI landscape.
Athena ingests papers and posts from ArXiv, Semantic Scholar, Papers With Code, RSS feeds, Substack, LessWrong and Playwright-scraped sites, enriches them through a Celery pipeline (embed → score → cluster → summarise), and serves ranked, semantically searchable results — with AI summaries, Q&A, topic clusters, and saved lists — to a React UI.
Data flows through five layers:
Scrapers → PostgreSQL → Celery workers (embed · score · cluster · summarise) → Qdrant → FastAPI → React UI
- Scrapers normalise raw content into PostgreSQL.
- Celery workers embed (OpenAI → Qdrant), score (6 weighted signals), cluster (UMAP + K-Means), and summarise (OpenAI).
- FastAPI serves the ranked feed, semantic search, clusters, Q&A, sources, and saved lists.
- Frontend — React 19 + TypeScript + Vite (Zustand, TanStack Query).
Prerequisites: Docker, and a .env file — copy .env.example and set at minimum OPENAI_API_KEY.
Athena ships two launchers:
Runs everything (Postgres, Redis, Qdrant, API, worker, frontend) via Docker Compose. Best for a production-like run.
cp .env.example .env # then set OPENAI_API_KEY
./run_docker.sh # start with pre-built GHCR images
./run_docker.sh --build # build images locally instead
./run_docker.sh --logs # follow worker logs after starting
./run_docker.sh --down # stop & remove the stackRuns infra (Postgres, Redis, Qdrant) in Docker and the API, Celery worker, and frontend as local host processes — best for development (hot reload, fast iteration).
pip install -r requirements.txt && playwright install chromium # one-time
cp .env.example .env
./run_native.sh # start infra + app processes
./run_native.sh --stop # stop everythingNeeds python3, uvicorn, celery, npm on PATH; logs land in ${TMPDIR:-/tmp}/athena-local/.
Once up — Frontend http://localhost:5173 · API http://localhost:8000 (/docs).
| Service | Port |
|---|---|
| Frontend (Vite) | 5173 |
| FastAPI | 8000 |
| PostgreSQL | 5432 |
| Redis | 6379 |
| Qdrant | 6333 |
All configuration is via environment variables (athena/api/config.py, Pydantic settings). Copy .env.example → .env. Minimum required: OPENAI_API_KEY. DATABASE_URL, REDIS_URL, and QDRANT_URL default to the Compose service addresses.
# Backend tests (SQLite — no live services required)
DATABASE_URL="sqlite:///test.db" REDIS_URL="redis://localhost:6379/0" \
QDRANT_URL="http://localhost:6333" OPENAI_API_KEY="test-key" \
pytest tests/test_scoring.py tests/test_preprocessing.py tests/test_connectors.py -v
# Lint (max line length 120)
flake8 . --max-line-length=120 --exclude=".venv,venv,frontend/node_modules"
# Frontend
cd frontend && npm install && npm run devCI (ci.yml) runs flake8 + pytest on every push to main/develop; CD (cd.yml) builds and publishes the API, worker, and frontend images to GHCR on push to main.
athena/
scrapers/ source collectors (ArXiv, RSS, Substack, Playwright, …)
database/ SQLAlchemy engine + all SQL operations
pipeline/ Celery workers — embedding, scoring, clustering, summarisation
api/ FastAPI app + routers
core/ models & schemas
frontend/ React 19 + TypeScript + Vite
docker/ Dockerfiles + compose files
scripts/ setup, crawl, backfill, maintenance
Branch from main → Conventional Commits (feat:, fix:, chore:) → open a PR against main. CI must pass.
MIT © 2025 Thisen Ekanayake
