Skip to content

sothulthorn/RAG-Q-A-Conversation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Conversational RAG With PDF Uploads and Chat History

A Streamlit web application that lets you upload PDF documents and have a conversational Q&A session with their content. Built using Retrieval-Augmented Generation (RAG) with LangChain, Groq, and ChromaDB.

App Screenshot

How It Works

  1. PDF Upload — Upload one or more PDF files through the web interface.
  2. Document Processing — PDFs are parsed, split into chunks (5000 chars with 500 overlap), and embedded into a ChromaDB vector store using HuggingFace's all-MiniLM-L6-v2 model.
  3. Question Contextualization — Each new question is reformulated into a standalone query using the chat history, so follow-up questions like "tell me more about that" resolve correctly.
  4. Answer Generation — Relevant document chunks are retrieved from the vector store and passed to the Groq LLM (llama-3.1-8b-instant) to generate a concise answer.
  5. Chat History — Conversation history is maintained per session, enabling multi-turn dialogue about the uploaded documents.

Architecture

User Question
      |
      v
[ Contextualize Question ] -- chat history --> Standalone Question
      |
      v
[ ChromaDB Retriever ] -- vector search --> Relevant Document Chunks
      |
      v
[ Groq LLM (llama-3.1-8b-instant) ] -- context + question --> Answer
      |
      v
  Response displayed in Streamlit UI

Prerequisites

Setup

  1. Clone the repository

    git clone https://github.com/sothulthorn/RAG-Q-A-Conversation.git
    cd RAG-Q&A-Conversation
  2. Create and activate a virtual environment

    python -m venv .venv
    
    # Windows
    .venv\Scripts\activate
    
    # macOS/Linux
    source .venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Create a .env file (optional, for HuggingFace token)

    HF_TOKEN=your_huggingface_token_here
    

Usage

  1. Start the application

    streamlit run app.py
  2. Enter your Groq API key in the password field.

  3. Upload one or more PDF files using the file uploader.

  4. Ask questions about the uploaded documents in the text input field.

  5. Use Session IDs to maintain separate conversation threads (default: default_session).

Project Structure

RAG-Q&A-Conversation/
├── app.py              # Main Streamlit application
├── requirements.txt    # Python dependencies
├── .env                # Environment variables (create manually)
└── README.md           # This file

Dependencies

Package Purpose
streamlit Web UI framework
langchain-classic RAG chain orchestration
langchain-chroma ChromaDB vector store integration
langchain-community PDF loader, chat message history
langchain-core Prompts, runnables, base classes
langchain-groq Groq LLM integration
langchain-huggingface HuggingFace embedding models
langchain-text-splitters Document chunking
python-dotenv Environment variable loading
pypdf PDF parsing backend
chromadb Vector database
sentence-transformers Embedding model runtime

Configuration

Parameter Value Location
Embedding model all-MiniLM-L6-v2 app.py:27
LLM model llama-3.1-8b-instant app.py:39
Chunk size 5000 characters app.py:67
Chunk overlap 500 characters app.py:67
Max answer length 3 sentences System prompt

Troubleshooting

Groq API errors

Ensure your Groq API key is valid and has available quota at console.groq.com.

About

A Streamlit web application that lets you upload PDF documents and have a conversational Q&A session with their content. Built using Retrieval-Augmented Generation (RAG) with LangChain, Groq, and ChromaDB.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages