Skip to content

sidd707/streammind-live-qa

Repository files navigation

StreamMind

Real-time semantic clustering of YouTube live chat via Gemini embeddings, pgvector cosine search, and a 6-stage Redis worker pipeline — with RAG-augmented answer generation and WebSocket delivery.

Built by Siddharth Patel and Sarthak Chauhan

The Problem

Teachers running live YouTube sessions are bombarded with hundreds of chat messages per minute — most are noise (emojis, greetings, spam), but buried in the flood are genuine student questions. A teacher can't possibly read every message, let alone answer the important ones. Questions get lost, students feel ignored, and the learning experience suffers.

StreamMind solves this. It watches the live chat in real-time, uses Gemini AI to identify actual questions from the noise, clusters similar questions together (so "What is recursion?" and "Can you explain recursive functions?" become one group), and generates grounded answers using the teacher's own uploaded materials. The result is a real-time dashboard that turns an unreadable chat stream into an organized, actionable Q&A feed.

Screenshots

Landing page — light mode Landing page — dark mode
Landing page — light & dark mode

Live dashboard
Live dashboard — real-time question clustering, AI answers, and YouTube integration

How It Works — Step by Step

1. Connect YouTube

The teacher authenticates with Google OAuth, links their YouTube live stream, and starts a new session. The system captures the live chat ID and begins monitoring.

2. Ingest Chat Messages

A YouTube polling worker hits the YouTube Data API every second, pulling new chat messages into a Redis ZSET queue with priority scoring. Messages are deduplicated and timestamped before entering the pipeline.

3. Classify Messages

The classification worker sends each message to Gemini AI with a carefully crafted prompt that distinguishes genuine student questions from noise (greetings, emojis, off-topic messages, spam). Only messages classified as questions proceed to the next stage. A content moderation layer filters out inappropriate content before and after classification.

4. Generate Embeddings

The embeddings worker converts each classified question into a 768-dimensional vector using Gemini's embedding model. These vectors capture the semantic meaning of the question — so "What is a linked list?" and "Explain linked lists" produce vectors that are close together in vector space. Embeddings are stored in PostgreSQL using the pgvector extension.

5. Cluster Similar Questions

This is where the magic happens. The clustering worker uses an online nearest-centroid algorithm:

  1. Take the new question's embedding vector
  2. Compute cosine distance against all existing cluster centroids in the session using pgvector
  3. If the nearest centroid is within a similarity threshold → assign the question to that cluster and recompute the centroid as the running mean
  4. If no cluster is close enough → seed a new cluster with this question as the initial centroid

This all happens in a single atomic database transaction — no batch reprocessing, no scheduled re-clustering jobs. Every question is clustered the moment it arrives.

Answer generation triggers automatically at milestone counts (3, 10, 25 questions in a cluster), ensuring answers are generated only when enough similar questions accumulate to warrant a response.

6. Generate RAG-Augmented Answers

The answer generation worker doesn't just ask Gemini to answer the question — it uses Retrieval-Augmented Generation (RAG):

  1. Takes the cluster's centroid vector (not individual question vectors — the centroid represents the cluster's theme better)
  2. Searches the teacher's uploaded documents (PDF, DOCX, TXT) using pgvector cosine similarity
  3. Retrieves the most relevant document chunks
  4. Sends the question + retrieved context to Gemini to generate a grounded answer

Critically, RAG retrieval is teacher-scoped — it only searches documents uploaded by the session's owner, ensuring data isolation between teachers.

7. Deliver in Real-Time

Generated answers are pushed to the teacher's dashboard over WebSocket with exponential backoff reconnection. The teacher can:

  • Review clustered questions and their AI-generated answers
  • Approve answers to be posted back to the YouTube live chat
  • Upload additional reference documents to improve answer quality
  • View analytics on question patterns and engagement

Architecture

YouTube Live Chat
      │
      ▼
youtube_polling worker  ──► Redis ZSET Queue (priority scoring)
                                │
                    ┌───────────┼───────────┐
                    ▼           ▼           ▼
            classification  embeddings  (retry/DLQ)
                    │           │
                    ▼           ▼
            Gemini AI       pgvector (768-dim)
            (question?)     (vector store)
                    │           │
                    └─────┬─────┘
                          ▼
                    clustering worker
                    (nearest-centroid, cosine distance)
                          │
                    milestone trigger (3/10/25)
                          │
                          ▼
               answer_generation worker
               (RAG: centroid → doc search → Gemini)
                    │           │
                    ▼           ▼
            WebSocket push   youtube_posting worker
                    │               │
                    ▼               ▼
            Teacher Dashboard   YouTube Live Chat

Worker Pipeline Deep Dive

The system runs 6 independent workers connected by Redis ZSET queues:

Worker Input Output What It Does
youtube_polling YouTube API Redis queue Polls live chat every second, deduplicates messages
classification Raw messages Classified messages Gemini determines if message is a genuine question
embeddings Questions 768-dim vectors Gemini generates semantic embedding vectors
clustering Vectors Cluster assignments Nearest-centroid grouping via pgvector cosine distance
answer_generation Cluster milestones AI answers RAG retrieval + Gemini answer generation
youtube_posting Approved answers YouTube chat Posts teacher-approved answers back to the stream

Every worker has:

  • Circuit breaker on Gemini API calls — trips open on sustained failures, exports state to Prometheus
  • Dead Letter Queue (DLQ) after 3 retries
  • Priority scoring in Redis ZSET queues
  • Prometheus metrics for monitoring throughput, latency, and error rates

Tech Stack

Layer Technology Why
Backend API FastAPI (Python) Async-first, auto-generated OpenAPI docs, WebSocket support
Database PostgreSQL + pgvector ACID transactions + vector similarity search in one database
Queue Redis (ZSET) Priority queues, pub/sub for WebSocket events, rate limiting
AI Google Gemini Classification, embeddings (768-dim), and answer generation
Frontend React 19 + Vite Component-based UI with real-time WebSocket updates
Chrome Extension TypeScript + Vite Currently in development — browser-native YouTube integration
Auth JWT + bcrypt Stateless authentication with token blacklisting
Infrastructure Docker Compose Single-command local development stack
Cloud (IaC) Terraform Infrastructure definitions for API, DB, Redis, monitoring
Observability Prometheus + Grafana Metrics, alerting rules, and dashboards
Migrations Alembic Version-controlled database schema changes

Features

Feature Details
Real-time question clustering Online nearest-centroid algorithm — no batch jobs, clusters update with every incoming question
RAG-augmented answers Answers grounded in teacher-uploaded documents (PDF, DOCX, TXT), scoped per teacher
YouTube integration OAuth-based live chat polling + answer posting back to the stream
Content moderation Gemini-powered filtering at two stages: before classification and before YouTube posting
WebSocket dashboard Real-time updates with exponential backoff reconnection and 100-message cap
Teacher data isolation Every endpoint enforces ownership; RAG retrieval is scoped per teacher's documents
Circuit breaker pattern All Gemini calls protected — trips open on sustained failures, auto-recovers
Observability Prometheus metrics on every worker, structured JSON logging, Grafana dashboards
Scheduled maintenance Automatic daily YouTube quota reset and hourly expired token cleanup
Chrome extension In development — TypeScript extension for direct YouTube page integration

Chrome Extension (In Development)

We're actively building a Chrome extension that integrates directly into the YouTube page:

  • Background service workers for auth, WebSocket connection, YouTube polling, and quota management
  • Content script injection into YouTube live stream pages
  • Dashboard UI built with React + TypeScript
  • OAuth flow handled natively in the browser

The extension will allow teachers to use StreamMind without leaving the YouTube page — the clustering dashboard overlays directly on the stream.

chrome-extension/
├── manifest.json              # Extension manifest
├── src/
│   ├── background/            # Service workers (auth, websocket, polling)
│   ├── content/               # YouTube page injection
│   ├── dashboard/             # React dashboard components
│   ├── api/                   # Backend API client
│   └── types/                 # Shared TypeScript types

Database Schema

teachers
├── id, email, hashed_password
└── created_at

streaming_sessions
├── id, teacher_id (FK)
├── youtube_video_id, live_chat_id
├── title, status (active/ended)
└── created_at, ended_at

comments
├── id, session_id (FK), cluster_id (FK)
├── author, text, youtube_message_id
├── is_question, embedding (vector 768)
└── created_at

clusters
├── id, session_id (FK)
├── label, summary
├── centroid (vector 768), question_count
└── created_at

answers
├── id, cluster_id (FK)
├── text, status (pending/approved/posted)
├── milestone_trigger, sources
└── created_at

rag_documents
├── id, teacher_id (FK)
├── filename, content_chunks
├── chunk_embeddings (vector 768)
└── uploaded_at

Project Structure

StreamMind/
├── backend/
│   ├── app/
│   │   ├── api/v1/            # REST + WebSocket endpoints
│   │   ├── core/              # Config, security, middleware, rate limiting
│   │   ├── db/                # Models, session management
│   │   ├── schemas/           # Pydantic request/response models
│   │   ├── services/          # Gemini, RAG, YouTube, WebSocket, moderation
│   │   ├── tasks/             # Scheduled jobs (quota reset, token cleanup)
│   │   └── utils/             # Retry logic
│   ├── alembic/               # Database migrations
│   ├── tests/                 # API and integration tests
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── components/        # Dashboard, Auth, Layout, Toast
│   │   ├── context/           # Auth + Theme providers
│   │   ├── hooks/             # useWebSocket, useAuth, useToast
│   │   ├── pages/             # Landing, Dashboard, Login, Register, Settings
│   │   └── services/          # API client
│   └── package.json
├── chrome-extension/           # TypeScript Chrome extension (in development)
├── workers/
│   ├── classification/        # Question vs noise classifier
│   ├── embeddings/            # Vector embedding generator
│   ├── clustering/            # Nearest-centroid clustering
│   ├── answer_generation/     # RAG + Gemini answer generation
│   ├── youtube_polling/       # Live chat ingestion
│   ├── youtube_posting/       # Answer posting back to YouTube
│   ├── scheduler/             # Cron-based maintenance tasks
│   ├── common/                # Shared DB, Redis, queue, metrics
│   └── tests/                 # Worker unit + integration tests
├── shared/                    # Contracts, schemas, constants
├── infra/
│   ├── docker/                # Dockerfiles for API and workers
│   ├── terraform/             # Cloud infrastructure definitions
│   └── prometheus/            # Alert rules
├── scripts/                   # Migration, seeding, load testing
├── docs/                      # Comprehensive documentation
├── docker-compose.yml
├── Makefile
└── start_dev.sh               # tmux-based dev launcher (9 panes)

Quick Start

With Docker

cp .env.example .env
# Fill in GEMINI_API_KEY, SECRET_KEY, YouTube OAuth credentials
docker-compose up
cd backend && alembic upgrade head

Without Docker

# Prerequisites: Python 3.13+, Node.js 20+, PostgreSQL 15+ (pgvector), Redis 7+
cp .env.example .env.development
cd backend && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cd ../frontend && npm install
make migrate
./start_dev.sh    # Opens tmux with 9 panes: API + 6 workers + scheduler + Vite

Visit http://localhost:5173

Environment Variables

Variable Description
SECRET_KEY Random secret for JWT signing
GEMINI_API_KEY Google Gemini API key
YOUTUBE_CLIENT_ID Google OAuth client ID
YOUTUBE_CLIENT_SECRET Google OAuth client secret
DATABASE_URL PostgreSQL connection string
REDIS_URL Redis connection string

See .env.example for the full list.

API Endpoints

Method Path Description
GET /health Health check
POST /api/v1/auth/register Register a new teacher
POST /api/v1/auth/login Authenticate, returns JWT
GET /api/v1/auth/me Get current authenticated teacher
GET /api/v1/sessions List teacher's sessions
POST /api/v1/sessions Create a new streaming session
GET /api/v1/sessions/{id}/clusters List question clusters for a session
GET /api/v1/sessions/{id}/analytics Get aggregate session analytics
POST /api/v1/dashboard/sessions/{id}/manual-question Submit a manual question
POST /api/v1/dashboard/answers/{id}/approve Approve an AI-generated answer
GET /api/v1/dashboard/sessions/{id}/stats Get session stats
POST /api/v1/rag/documents Upload a document for RAG retrieval
GET /api/v1/youtube/auth/url Start YouTube OAuth flow
WS /ws/{session_id} Real-time event stream

Full interactive API docs at http://localhost:8000/docs when running.

Development

make format   # auto-format code
make lint     # run linters
make test     # run test suite

Known Limitations

  • No production deployment config — docker-compose is development-oriented; nginx and production Dockerfile are not included
  • Chrome extension — functional but currently in active development
  • YouTube quota — the YouTube Data API v3 has daily quota limits; high-traffic sessions may hit limits
  • Single-region — no multi-region or horizontal scaling configuration

Authors

License

MIT

About

Real-time AI-powered Q&A system for YouTube live streams — clusters student questions using Gemini embeddings and pgvector, generates RAG-augmented answers, and delivers via WebSocket dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors