StreamMind

Real-time semantic clustering of YouTube live chat via Gemini embeddings, pgvector cosine search, and a 6-stage Redis worker pipeline — with RAG-augmented answer generation and WebSocket delivery.

Built by Siddharth Patel and Sarthak Chauhan

The Problem

Teachers running live YouTube sessions are bombarded with hundreds of chat messages per minute — most are noise (emojis, greetings, spam), but buried in the flood are genuine student questions. A teacher can't possibly read every message, let alone answer the important ones. Questions get lost, students feel ignored, and the learning experience suffers.

StreamMind solves this. It watches the live chat in real-time, uses Gemini AI to identify actual questions from the noise, clusters similar questions together (so "What is recursion?" and "Can you explain recursive functions?" become one group), and generates grounded answers using the teacher's own uploaded materials. The result is a real-time dashboard that turns an unreadable chat stream into an organized, actionable Q&A feed.

Screenshots


Landing page — light & dark mode

Live dashboard — real-time question clustering, AI answers, and YouTube integration

How It Works — Step by Step

1. Connect YouTube

The teacher authenticates with Google OAuth, links their YouTube live stream, and starts a new session. The system captures the live chat ID and begins monitoring.

2. Ingest Chat Messages

A YouTube polling worker hits the YouTube Data API every second, pulling new chat messages into a Redis ZSET queue with priority scoring. Messages are deduplicated and timestamped before entering the pipeline.

3. Classify Messages

The classification worker sends each message to Gemini AI with a carefully crafted prompt that distinguishes genuine student questions from noise (greetings, emojis, off-topic messages, spam). Only messages classified as questions proceed to the next stage. A content moderation layer filters out inappropriate content before and after classification.

4. Generate Embeddings

The embeddings worker converts each classified question into a 768-dimensional vector using Gemini's embedding model. These vectors capture the semantic meaning of the question — so "What is a linked list?" and "Explain linked lists" produce vectors that are close together in vector space. Embeddings are stored in PostgreSQL using the pgvector extension.

5. Cluster Similar Questions

This is where the magic happens. The clustering worker uses an online nearest-centroid algorithm:

Take the new question's embedding vector
Compute cosine distance against all existing cluster centroids in the session using pgvector
If the nearest centroid is within a similarity threshold → assign the question to that cluster and recompute the centroid as the running mean
If no cluster is close enough → seed a new cluster with this question as the initial centroid

This all happens in a single atomic database transaction — no batch reprocessing, no scheduled re-clustering jobs. Every question is clustered the moment it arrives.

Answer generation triggers automatically at milestone counts (3, 10, 25 questions in a cluster), ensuring answers are generated only when enough similar questions accumulate to warrant a response.

6. Generate RAG-Augmented Answers

The answer generation worker doesn't just ask Gemini to answer the question — it uses Retrieval-Augmented Generation (RAG):

Takes the cluster's centroid vector (not individual question vectors — the centroid represents the cluster's theme better)
Searches the teacher's uploaded documents (PDF, DOCX, TXT) using pgvector cosine similarity
Retrieves the most relevant document chunks
Sends the question + retrieved context to Gemini to generate a grounded answer

Critically, RAG retrieval is teacher-scoped — it only searches documents uploaded by the session's owner, ensuring data isolation between teachers.

7. Deliver in Real-Time

Generated answers are pushed to the teacher's dashboard over WebSocket with exponential backoff reconnection. The teacher can:

Review clustered questions and their AI-generated answers
Approve answers to be posted back to the YouTube live chat
Upload additional reference documents to improve answer quality
View analytics on question patterns and engagement

Architecture

YouTube Live Chat
      │
      ▼
youtube_polling worker  ──► Redis ZSET Queue (priority scoring)
                                │
                    ┌───────────┼───────────┐
                    ▼           ▼           ▼
            classification  embeddings  (retry/DLQ)
                    │           │
                    ▼           ▼
            Gemini AI       pgvector (768-dim)
            (question?)     (vector store)
                    │           │
                    └─────┬─────┘
                          ▼
                    clustering worker
                    (nearest-centroid, cosine distance)
                          │
                    milestone trigger (3/10/25)
                          │
                          ▼
               answer_generation worker
               (RAG: centroid → doc search → Gemini)
                    │           │
                    ▼           ▼
            WebSocket push   youtube_posting worker
                    │               │
                    ▼               ▼
            Teacher Dashboard   YouTube Live Chat

Worker Pipeline Deep Dive

The system runs 6 independent workers connected by Redis ZSET queues:

Worker	Input	Output	What It Does
`youtube_polling`	YouTube API	Redis queue	Polls live chat every second, deduplicates messages
`classification`	Raw messages	Classified messages	Gemini determines if message is a genuine question
`embeddings`	Questions	768-dim vectors	Gemini generates semantic embedding vectors
`clustering`	Vectors	Cluster assignments	Nearest-centroid grouping via pgvector cosine distance
`answer_generation`	Cluster milestones	AI answers	RAG retrieval + Gemini answer generation
`youtube_posting`	Approved answers	YouTube chat	Posts teacher-approved answers back to the stream

Every worker has:

Circuit breaker on Gemini API calls — trips open on sustained failures, exports state to Prometheus
Dead Letter Queue (DLQ) after 3 retries
Priority scoring in Redis ZSET queues
Prometheus metrics for monitoring throughput, latency, and error rates

Tech Stack

Layer	Technology	Why
Backend API	FastAPI (Python)	Async-first, auto-generated OpenAPI docs, WebSocket support
Database	PostgreSQL + pgvector	ACID transactions + vector similarity search in one database
Queue	Redis (ZSET)	Priority queues, pub/sub for WebSocket events, rate limiting
AI	Google Gemini	Classification, embeddings (768-dim), and answer generation
Frontend	React 19 + Vite	Component-based UI with real-time WebSocket updates
Chrome Extension	TypeScript + Vite	Currently in development — browser-native YouTube integration
Auth	JWT + bcrypt	Stateless authentication with token blacklisting
Infrastructure	Docker Compose	Single-command local development stack
Cloud (IaC)	Terraform	Infrastructure definitions for API, DB, Redis, monitoring
Observability	Prometheus + Grafana	Metrics, alerting rules, and dashboards
Migrations	Alembic	Version-controlled database schema changes

Features

Feature	Details
Real-time question clustering	Online nearest-centroid algorithm — no batch jobs, clusters update with every incoming question
RAG-augmented answers	Answers grounded in teacher-uploaded documents (PDF, DOCX, TXT), scoped per teacher
YouTube integration	OAuth-based live chat polling + answer posting back to the stream
Content moderation	Gemini-powered filtering at two stages: before classification and before YouTube posting
WebSocket dashboard	Real-time updates with exponential backoff reconnection and 100-message cap
Teacher data isolation	Every endpoint enforces ownership; RAG retrieval is scoped per teacher's documents
Circuit breaker pattern	All Gemini calls protected — trips open on sustained failures, auto-recovers
Observability	Prometheus metrics on every worker, structured JSON logging, Grafana dashboards
Scheduled maintenance	Automatic daily YouTube quota reset and hourly expired token cleanup
Chrome extension	In development — TypeScript extension for direct YouTube page integration

Chrome Extension (In Development)

We're actively building a Chrome extension that integrates directly into the YouTube page:

Background service workers for auth, WebSocket connection, YouTube polling, and quota management
Content script injection into YouTube live stream pages
Dashboard UI built with React + TypeScript
OAuth flow handled natively in the browser

The extension will allow teachers to use StreamMind without leaving the YouTube page — the clustering dashboard overlays directly on the stream.

chrome-extension/
├── manifest.json              # Extension manifest
├── src/
│   ├── background/            # Service workers (auth, websocket, polling)
│   ├── content/               # YouTube page injection
│   ├── dashboard/             # React dashboard components
│   ├── api/                   # Backend API client
│   └── types/                 # Shared TypeScript types

Database Schema

teachers
├── id, email, hashed_password
└── created_at

streaming_sessions
├── id, teacher_id (FK)
├── youtube_video_id, live_chat_id
├── title, status (active/ended)
└── created_at, ended_at

comments
├── id, session_id (FK), cluster_id (FK)
├── author, text, youtube_message_id
├── is_question, embedding (vector 768)
└── created_at

clusters
├── id, session_id (FK)
├── label, summary
├── centroid (vector 768), question_count
└── created_at

answers
├── id, cluster_id (FK)
├── text, status (pending/approved/posted)
├── milestone_trigger, sources
└── created_at

rag_documents
├── id, teacher_id (FK)
├── filename, content_chunks
├── chunk_embeddings (vector 768)
└── uploaded_at

Project Structure

StreamMind/
├── backend/
│   ├── app/
│   │   ├── api/v1/            # REST + WebSocket endpoints
│   │   ├── core/              # Config, security, middleware, rate limiting
│   │   ├── db/                # Models, session management
│   │   ├── schemas/           # Pydantic request/response models
│   │   ├── services/          # Gemini, RAG, YouTube, WebSocket, moderation
│   │   ├── tasks/             # Scheduled jobs (quota reset, token cleanup)
│   │   └── utils/             # Retry logic
│   ├── alembic/               # Database migrations
│   ├── tests/                 # API and integration tests
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── components/        # Dashboard, Auth, Layout, Toast
│   │   ├── context/           # Auth + Theme providers
│   │   ├── hooks/             # useWebSocket, useAuth, useToast
│   │   ├── pages/             # Landing, Dashboard, Login, Register, Settings
│   │   └── services/          # API client
│   └── package.json
├── chrome-extension/           # TypeScript Chrome extension (in development)
├── workers/
│   ├── classification/        # Question vs noise classifier
│   ├── embeddings/            # Vector embedding generator
│   ├── clustering/            # Nearest-centroid clustering
│   ├── answer_generation/     # RAG + Gemini answer generation
│   ├── youtube_polling/       # Live chat ingestion
│   ├── youtube_posting/       # Answer posting back to YouTube
│   ├── scheduler/             # Cron-based maintenance tasks
│   ├── common/                # Shared DB, Redis, queue, metrics
│   └── tests/                 # Worker unit + integration tests
├── shared/                    # Contracts, schemas, constants
├── infra/
│   ├── docker/                # Dockerfiles for API and workers
│   ├── terraform/             # Cloud infrastructure definitions
│   └── prometheus/            # Alert rules
├── scripts/                   # Migration, seeding, load testing
├── docs/                      # Comprehensive documentation
├── docker-compose.yml
├── Makefile
└── start_dev.sh               # tmux-based dev launcher (9 panes)

Quick Start

With Docker

cp .env.example .env
# Fill in GEMINI_API_KEY, SECRET_KEY, YouTube OAuth credentials
docker-compose up
cd backend && alembic upgrade head

Without Docker

# Prerequisites: Python 3.13+, Node.js 20+, PostgreSQL 15+ (pgvector), Redis 7+
cp .env.example .env.development
cd backend && python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cd ../frontend && npm install
make migrate
./start_dev.sh    # Opens tmux with 9 panes: API + 6 workers + scheduler + Vite

Visit http://localhost:5173

Environment Variables

Variable	Description
`SECRET_KEY`	Random secret for JWT signing
`GEMINI_API_KEY`	Google Gemini API key
`YOUTUBE_CLIENT_ID`	Google OAuth client ID
`YOUTUBE_CLIENT_SECRET`	Google OAuth client secret
`DATABASE_URL`	PostgreSQL connection string
`REDIS_URL`	Redis connection string

See .env.example for the full list.

API Endpoints

Method	Path	Description
`GET`	`/health`	Health check
`POST`	`/api/v1/auth/register`	Register a new teacher
`POST`	`/api/v1/auth/login`	Authenticate, returns JWT
`GET`	`/api/v1/auth/me`	Get current authenticated teacher
`GET`	`/api/v1/sessions`	List teacher's sessions
`POST`	`/api/v1/sessions`	Create a new streaming session
`GET`	`/api/v1/sessions/{id}/clusters`	List question clusters for a session
`GET`	`/api/v1/sessions/{id}/analytics`	Get aggregate session analytics
`POST`	`/api/v1/dashboard/sessions/{id}/manual-question`	Submit a manual question
`POST`	`/api/v1/dashboard/answers/{id}/approve`	Approve an AI-generated answer
`GET`	`/api/v1/dashboard/sessions/{id}/stats`	Get session stats
`POST`	`/api/v1/rag/documents`	Upload a document for RAG retrieval
`GET`	`/api/v1/youtube/auth/url`	Start YouTube OAuth flow
`WS`	`/ws/{session_id}`	Real-time event stream

Full interactive API docs at http://localhost:8000/docs when running.

Development

make format   # auto-format code
make lint     # run linters
make test     # run test suite

Known Limitations

No production deployment config — docker-compose is development-oriented; nginx and production Dockerfile are not included
Chrome extension — functional but currently in active development
YouTube quota — the YouTube Data API v3 has daily quota limits; high-traffic sessions may hit limits
Single-region — no multi-region or horizontal scaling configuration

Authors

Siddharth Patel
Sarthak Chauhan

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StreamMind

The Problem

Screenshots

How It Works — Step by Step

1. Connect YouTube

2. Ingest Chat Messages

3. Classify Messages

4. Generate Embeddings

5. Cluster Similar Questions

6. Generate RAG-Augmented Answers

7. Deliver in Real-Time

Architecture

Worker Pipeline Deep Dive

Tech Stack

Features

Chrome Extension (In Development)

Database Schema

Project Structure

Quick Start

With Docker

Without Docker

Environment Variables

API Endpoints

Development

Known Limitations

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
Assets		Assets
alloy		alloy
backend		backend
chrome-extension		chrome-extension
docs		docs
frontend		frontend
infra		infra
scripts		scripts
shared		shared
workers		workers
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
start_dev.sh		start_dev.sh
stop_dev.sh		stop_dev.sh

Folders and files

Latest commit

History

Repository files navigation

StreamMind

The Problem

Screenshots

How It Works — Step by Step

1. Connect YouTube

2. Ingest Chat Messages

3. Classify Messages

4. Generate Embeddings

5. Cluster Similar Questions

6. Generate RAG-Augmented Answers

7. Deliver in Real-Time

Architecture

Worker Pipeline Deep Dive

Tech Stack

Features

Chrome Extension (In Development)

Database Schema

Project Structure

Quick Start

With Docker

Without Docker

Environment Variables

API Endpoints

Development

Known Limitations

Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages