Conversational RAG With PDF Uploads and Chat History

A Streamlit web application that lets you upload PDF documents and have a conversational Q&A session with their content. Built using Retrieval-Augmented Generation (RAG) with LangChain, Groq, and ChromaDB.

How It Works

PDF Upload — Upload one or more PDF files through the web interface.
Document Processing — PDFs are parsed, split into chunks (5000 chars with 500 overlap), and embedded into a ChromaDB vector store using HuggingFace's all-MiniLM-L6-v2 model.
Question Contextualization — Each new question is reformulated into a standalone query using the chat history, so follow-up questions like "tell me more about that" resolve correctly.
Answer Generation — Relevant document chunks are retrieved from the vector store and passed to the Groq LLM (llama-3.1-8b-instant) to generate a concise answer.
Chat History — Conversation history is maintained per session, enabling multi-turn dialogue about the uploaded documents.

Architecture

User Question
      |
      v
[ Contextualize Question ] -- chat history --> Standalone Question
      |
      v
[ ChromaDB Retriever ] -- vector search --> Relevant Document Chunks
      |
      v
[ Groq LLM (llama-3.1-8b-instant) ] -- context + question --> Answer
      |
      v
  Response displayed in Streamlit UI

Prerequisites

Python 3.10+
A Groq API key

Setup

Clone the repository

git clone https://github.com/sothulthorn/RAG-Q-A-Conversation.git
cd RAG-Q&A-Conversation

Create and activate a virtual environment

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Create a .env file (optional, for HuggingFace token)
```
HF_TOKEN=your_huggingface_token_here
```

Usage

Start the application
```
streamlit run app.py
```
Enter your Groq API key in the password field.
Upload one or more PDF files using the file uploader.
Ask questions about the uploaded documents in the text input field.
Use Session IDs to maintain separate conversation threads (default: default_session).

Project Structure

RAG-Q&A-Conversation/
├── app.py              # Main Streamlit application
├── requirements.txt    # Python dependencies
├── .env                # Environment variables (create manually)
└── README.md           # This file

Dependencies

Package	Purpose
`streamlit`	Web UI framework
`langchain-classic`	RAG chain orchestration
`langchain-chroma`	ChromaDB vector store integration
`langchain-community`	PDF loader, chat message history
`langchain-core`	Prompts, runnables, base classes
`langchain-groq`	Groq LLM integration
`langchain-huggingface`	HuggingFace embedding models
`langchain-text-splitters`	Document chunking
`python-dotenv`	Environment variable loading
`pypdf`	PDF parsing backend
`chromadb`	Vector database
`sentence-transformers`	Embedding model runtime

Configuration

Parameter	Value	Location
Embedding model	`all-MiniLM-L6-v2`	`app.py:27`
LLM model	`llama-3.1-8b-instant`	`app.py:39`
Chunk size	5000 characters	`app.py:67`
Chunk overlap	500 characters	`app.py:67`
Max answer length	3 sentences	System prompt

Troubleshooting

Groq API errors

Ensure your Groq API key is valid and has available quota at console.groq.com.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
app-screenshot.png		app-screenshot.png
app.py		app.py
requirements.txt		requirements.txt
temp.pdf		temp.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational RAG With PDF Uploads and Chat History

How It Works

Architecture

Prerequisites

Setup

Usage

Project Structure

Dependencies

Configuration

Troubleshooting

Groq API errors

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conversational RAG With PDF Uploads and Chat History

How It Works

Architecture

Prerequisites

Setup

Usage

Project Structure

Dependencies

Configuration

Troubleshooting

Groq API errors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages