Real-time semantic clustering of YouTube live chat via Gemini embeddings, pgvector cosine search, and a 6-stage Redis worker pipeline — with RAG-augmented answer generation and WebSocket delivery.
Built by Siddharth Patel and Sarthak Chauhan
Teachers running live YouTube sessions are bombarded with hundreds of chat messages per minute — most are noise (emojis, greetings, spam), but buried in the flood are genuine student questions. A teacher can't possibly read every message, let alone answer the important ones. Questions get lost, students feel ignored, and the learning experience suffers.
StreamMind solves this. It watches the live chat in real-time, uses Gemini AI to identify actual questions from the noise, clusters similar questions together (so "What is recursion?" and "Can you explain recursive functions?" become one group), and generates grounded answers using the teacher's own uploaded materials. The result is a real-time dashboard that turns an unreadable chat stream into an organized, actionable Q&A feed.
![]() |
![]() |
| Landing page — light & dark mode | |
Live dashboard — real-time question clustering, AI answers, and YouTube integration
The teacher authenticates with Google OAuth, links their YouTube live stream, and starts a new session. The system captures the live chat ID and begins monitoring.
A YouTube polling worker hits the YouTube Data API every second, pulling new chat messages into a Redis ZSET queue with priority scoring. Messages are deduplicated and timestamped before entering the pipeline.
The classification worker sends each message to Gemini AI with a carefully crafted prompt that distinguishes genuine student questions from noise (greetings, emojis, off-topic messages, spam). Only messages classified as questions proceed to the next stage. A content moderation layer filters out inappropriate content before and after classification.
The embeddings worker converts each classified question into a 768-dimensional vector using Gemini's embedding model. These vectors capture the semantic meaning of the question — so "What is a linked list?" and "Explain linked lists" produce vectors that are close together in vector space. Embeddings are stored in PostgreSQL using the pgvector extension.
This is where the magic happens. The clustering worker uses an online nearest-centroid algorithm:
- Take the new question's embedding vector
- Compute cosine distance against all existing cluster centroids in the session using pgvector
- If the nearest centroid is within a similarity threshold → assign the question to that cluster and recompute the centroid as the running mean
- If no cluster is close enough → seed a new cluster with this question as the initial centroid
This all happens in a single atomic database transaction — no batch reprocessing, no scheduled re-clustering jobs. Every question is clustered the moment it arrives.
Answer generation triggers automatically at milestone counts (3, 10, 25 questions in a cluster), ensuring answers are generated only when enough similar questions accumulate to warrant a response.
The answer generation worker doesn't just ask Gemini to answer the question — it uses Retrieval-Augmented Generation (RAG):
- Takes the cluster's centroid vector (not individual question vectors — the centroid represents the cluster's theme better)
- Searches the teacher's uploaded documents (PDF, DOCX, TXT) using pgvector cosine similarity
- Retrieves the most relevant document chunks
- Sends the question + retrieved context to Gemini to generate a grounded answer
Critically, RAG retrieval is teacher-scoped — it only searches documents uploaded by the session's owner, ensuring data isolation between teachers.
Generated answers are pushed to the teacher's dashboard over WebSocket with exponential backoff reconnection. The teacher can:
- Review clustered questions and their AI-generated answers
- Approve answers to be posted back to the YouTube live chat
- Upload additional reference documents to improve answer quality
- View analytics on question patterns and engagement
YouTube Live Chat
│
▼
youtube_polling worker ──► Redis ZSET Queue (priority scoring)
│
┌───────────┼───────────┐
▼ ▼ ▼
classification embeddings (retry/DLQ)
│ │
▼ ▼
Gemini AI pgvector (768-dim)
(question?) (vector store)
│ │
└─────┬─────┘
▼
clustering worker
(nearest-centroid, cosine distance)
│
milestone trigger (3/10/25)
│
▼
answer_generation worker
(RAG: centroid → doc search → Gemini)
│ │
▼ ▼
WebSocket push youtube_posting worker
│ │
▼ ▼
Teacher Dashboard YouTube Live Chat
The system runs 6 independent workers connected by Redis ZSET queues:
| Worker | Input | Output | What It Does |
|---|---|---|---|
youtube_polling |
YouTube API | Redis queue | Polls live chat every second, deduplicates messages |
classification |
Raw messages | Classified messages | Gemini determines if message is a genuine question |
embeddings |
Questions | 768-dim vectors | Gemini generates semantic embedding vectors |
clustering |
Vectors | Cluster assignments | Nearest-centroid grouping via pgvector cosine distance |
answer_generation |
Cluster milestones | AI answers | RAG retrieval + Gemini answer generation |
youtube_posting |
Approved answers | YouTube chat | Posts teacher-approved answers back to the stream |
Every worker has:
- Circuit breaker on Gemini API calls — trips open on sustained failures, exports state to Prometheus
- Dead Letter Queue (DLQ) after 3 retries
- Priority scoring in Redis ZSET queues
- Prometheus metrics for monitoring throughput, latency, and error rates
| Layer | Technology | Why |
|---|---|---|
| Backend API | FastAPI (Python) | Async-first, auto-generated OpenAPI docs, WebSocket support |
| Database | PostgreSQL + pgvector | ACID transactions + vector similarity search in one database |
| Queue | Redis (ZSET) | Priority queues, pub/sub for WebSocket events, rate limiting |
| AI | Google Gemini | Classification, embeddings (768-dim), and answer generation |
| Frontend | React 19 + Vite | Component-based UI with real-time WebSocket updates |
| Chrome Extension | TypeScript + Vite | Currently in development — browser-native YouTube integration |
| Auth | JWT + bcrypt | Stateless authentication with token blacklisting |
| Infrastructure | Docker Compose | Single-command local development stack |
| Cloud (IaC) | Terraform | Infrastructure definitions for API, DB, Redis, monitoring |
| Observability | Prometheus + Grafana | Metrics, alerting rules, and dashboards |
| Migrations | Alembic | Version-controlled database schema changes |
| Feature | Details |
|---|---|
| Real-time question clustering | Online nearest-centroid algorithm — no batch jobs, clusters update with every incoming question |
| RAG-augmented answers | Answers grounded in teacher-uploaded documents (PDF, DOCX, TXT), scoped per teacher |
| YouTube integration | OAuth-based live chat polling + answer posting back to the stream |
| Content moderation | Gemini-powered filtering at two stages: before classification and before YouTube posting |
| WebSocket dashboard | Real-time updates with exponential backoff reconnection and 100-message cap |
| Teacher data isolation | Every endpoint enforces ownership; RAG retrieval is scoped per teacher's documents |
| Circuit breaker pattern | All Gemini calls protected — trips open on sustained failures, auto-recovers |
| Observability | Prometheus metrics on every worker, structured JSON logging, Grafana dashboards |
| Scheduled maintenance | Automatic daily YouTube quota reset and hourly expired token cleanup |
| Chrome extension | In development — TypeScript extension for direct YouTube page integration |
We're actively building a Chrome extension that integrates directly into the YouTube page:
- Background service workers for auth, WebSocket connection, YouTube polling, and quota management
- Content script injection into YouTube live stream pages
- Dashboard UI built with React + TypeScript
- OAuth flow handled natively in the browser
The extension will allow teachers to use StreamMind without leaving the YouTube page — the clustering dashboard overlays directly on the stream.
chrome-extension/
├── manifest.json # Extension manifest
├── src/
│ ├── background/ # Service workers (auth, websocket, polling)
│ ├── content/ # YouTube page injection
│ ├── dashboard/ # React dashboard components
│ ├── api/ # Backend API client
│ └── types/ # Shared TypeScript types
teachers
├── id, email, hashed_password
└── created_at
streaming_sessions
├── id, teacher_id (FK)
├── youtube_video_id, live_chat_id
├── title, status (active/ended)
└── created_at, ended_at
comments
├── id, session_id (FK), cluster_id (FK)
├── author, text, youtube_message_id
├── is_question, embedding (vector 768)
└── created_at
clusters
├── id, session_id (FK)
├── label, summary
├── centroid (vector 768), question_count
└── created_at
answers
├── id, cluster_id (FK)
├── text, status (pending/approved/posted)
├── milestone_trigger, sources
└── created_at
rag_documents
├── id, teacher_id (FK)
├── filename, content_chunks
├── chunk_embeddings (vector 768)
└── uploaded_at
StreamMind/
├── backend/
│ ├── app/
│ │ ├── api/v1/ # REST + WebSocket endpoints
│ │ ├── core/ # Config, security, middleware, rate limiting
│ │ ├── db/ # Models, session management
│ │ ├── schemas/ # Pydantic request/response models
│ │ ├── services/ # Gemini, RAG, YouTube, WebSocket, moderation
│ │ ├── tasks/ # Scheduled jobs (quota reset, token cleanup)
│ │ └── utils/ # Retry logic
│ ├── alembic/ # Database migrations
│ ├── tests/ # API and integration tests
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/ # Dashboard, Auth, Layout, Toast
│ │ ├── context/ # Auth + Theme providers
│ │ ├── hooks/ # useWebSocket, useAuth, useToast
│ │ ├── pages/ # Landing, Dashboard, Login, Register, Settings
│ │ └── services/ # API client
│ └── package.json
├── chrome-extension/ # TypeScript Chrome extension (in development)
├── workers/
│ ├── classification/ # Question vs noise classifier
│ ├── embeddings/ # Vector embedding generator
│ ├── clustering/ # Nearest-centroid clustering
│ ├── answer_generation/ # RAG + Gemini answer generation
│ ├── youtube_polling/ # Live chat ingestion
│ ├── youtube_posting/ # Answer posting back to YouTube
│ ├── scheduler/ # Cron-based maintenance tasks
│ ├── common/ # Shared DB, Redis, queue, metrics
│ └── tests/ # Worker unit + integration tests
├── shared/ # Contracts, schemas, constants
├── infra/
│ ├── docker/ # Dockerfiles for API and workers
│ ├── terraform/ # Cloud infrastructure definitions
│ └── prometheus/ # Alert rules
├── scripts/ # Migration, seeding, load testing
├── docs/ # Comprehensive documentation
├── docker-compose.yml
├── Makefile
└── start_dev.sh # tmux-based dev launcher (9 panes)
cp .env.example .env
# Fill in GEMINI_API_KEY, SECRET_KEY, YouTube OAuth credentials
docker-compose up
cd backend && alembic upgrade head# Prerequisites: Python 3.13+, Node.js 20+, PostgreSQL 15+ (pgvector), Redis 7+
cp .env.example .env.development
cd backend && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cd ../frontend && npm install
make migrate
./start_dev.sh # Opens tmux with 9 panes: API + 6 workers + scheduler + ViteVisit http://localhost:5173
| Variable | Description |
|---|---|
SECRET_KEY |
Random secret for JWT signing |
GEMINI_API_KEY |
Google Gemini API key |
YOUTUBE_CLIENT_ID |
Google OAuth client ID |
YOUTUBE_CLIENT_SECRET |
Google OAuth client secret |
DATABASE_URL |
PostgreSQL connection string |
REDIS_URL |
Redis connection string |
See .env.example for the full list.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/api/v1/auth/register |
Register a new teacher |
POST |
/api/v1/auth/login |
Authenticate, returns JWT |
GET |
/api/v1/auth/me |
Get current authenticated teacher |
GET |
/api/v1/sessions |
List teacher's sessions |
POST |
/api/v1/sessions |
Create a new streaming session |
GET |
/api/v1/sessions/{id}/clusters |
List question clusters for a session |
GET |
/api/v1/sessions/{id}/analytics |
Get aggregate session analytics |
POST |
/api/v1/dashboard/sessions/{id}/manual-question |
Submit a manual question |
POST |
/api/v1/dashboard/answers/{id}/approve |
Approve an AI-generated answer |
GET |
/api/v1/dashboard/sessions/{id}/stats |
Get session stats |
POST |
/api/v1/rag/documents |
Upload a document for RAG retrieval |
GET |
/api/v1/youtube/auth/url |
Start YouTube OAuth flow |
WS |
/ws/{session_id} |
Real-time event stream |
Full interactive API docs at http://localhost:8000/docs when running.
make format # auto-format code
make lint # run linters
make test # run test suite- No production deployment config — docker-compose is development-oriented; nginx and production Dockerfile are not included
- Chrome extension — functional but currently in active development
- YouTube quota — the YouTube Data API v3 has daily quota limits; high-traffic sessions may hit limits
- Single-region — no multi-region or horizontal scaling configuration
MIT

