SmartB100 — Agriculture RAG Agent

Self-hostable RAG assistant for agricultural technical support: it answers questions grounded in your own PDF manuals, adapts the response to the reader's expertise, and tags every answer with a continuous 0.0–1.0 semantic-entropy hallucination score so users know when to double-check.

What It Does

SmartB100 turns a folder of agricultural PDFs into a question-answering service backed by a local LLM, grounding every answer in retrieved content.

Grounded Q&A — indexes PDF manuals into a vector database and answers questions from the retrieved chunks, not from model memory.
Expertise-adaptive answers — the same RAG context is rendered for beginner, intermediate, or expert readers via profile-aware system prompts.
Hallucination scoring — semantic entropy over multiple candidate answers produces a continuous 0.0–1.0 score flagging low-confidence responses.
Authenticated API — bcrypt password hashing + JWT-gated /chat, with per-IP rate limiting on login and registration.
Runs fully local — Ollama serves both chat and embeddings; no paid API key is required to operate the core pipeline.

What It Is

SmartB100 is a REST API (FastAPI) with an optional Gradio web UI that converts a corpus of agricultural PDFs into a source-grounded chat service. It targets agricultural extension workers and agronomists who need fast, reliable answers about crop management, soil treatment, pest control, and planting schedules — without manually searching dense technical manuals.

Tech Stack

Layer	Technology
Language	Python 3.12+
API / Runtime	FastAPI, Uvicorn
UI	Gradio
Vector DB	Qdrant (`archives_v2`, 768-dim embeddings)
Inference	Ollama — `llama3.2:3b` (chat) + `nomic-embed-text` (embeddings)
Verification	Multi-provider semantic entropy (Groq / Ollama / OpenRouter)
Persistence	SQLite (auth + conversation history)
Auth	bcrypt + JWT (passlib, slowapi rate limiting)
Testing / CI	pytest, ruff, mypy `--strict`, GitHub Actions
Packaging	uv, Docker (multi-stage `Dockerfile.api`)

Architecture

Architectural Style

SmartB100 is a modular monolith with composed deployment:

One application process. api/main.py loads every domain module (api/routes/*, core/*, retrieval/*, memory/*, generation/*, verification/*, database/*) into a single FastAPI runtime. Inter-module communication is function calls inside the same Python interpreter — no RPC, no message broker, no queue.
Eight internal layers, one binary. The folder boundary is a convention for testability and review; it is not a network boundary.
External processes are limited to genuine third-party services. No domain code lives outside the API process.

External components (each runs in its own process):

Component	Role	Containerized?	Protocol
Qdrant	Vector DB (`archives_v2` collection, 768-dim embeddings)	Yes — `docker compose --profile infra`	HTTP REST `:6333` + gRPC `:6334`
Ollama	LLM chat (`llama3.2:3b`) + embeddings (`nomic-embed-text`)	No — runs on the host	HTTP REST `:11434` via `OLLAMA_HOST`
SQLite	Auth + conversation history	No (filesystem)	Bind-mount `./smartb100_v2.db:/app/smartb100_v2.db`

Client tier (two paths):

Gradio UI (ui/chat_ui.py) — stateless HTTP client containerized via docker compose --profile app. Calls only POST /chat. Does not import any domain module — it is a UI shell, not a microservice.
Direct HTTP — curl, scripts, future mobile clients. Same endpoint, same JSON contract.

Why not microservices. The RAG pipeline (embed → search → generate → verify) shares the same ChatRequest/ChatResponse model and runs synchronously within a single request. Splitting any step into its own service would add network latency between calls that are currently in-process, plus contract-versioning overhead, without delivering independent scaling benefit at current load.

When to reconsider. If verification/ (entropy sampling, the slowest step) needs to scale independently of generation/, or if the workload grows beyond ~500 req/s, the verification gate is the natural extraction point — it already has a clean async-friendly interface (evaluate(question, context, answer)).

flowchart TD
    subgraph CLIENT["Client"]
        GRADIO["Gradio UI\n:7860"]
        CURL["curl / HTTP"]
    end

    subgraph API["API Layer"]
        ENDPOINT["POST /chat"]
        AUTH["POST /auth/*"]
        HEALTH["GET /health"]
    end

    subgraph PIPELINE["RAG Pipeline"]
        EMBED["Embedder\nOllama nomic-embed-text\n768 dims"]
        SEARCH["Vector Search\nCosine Similarity"]
        MEMORY["ConversationBuffer\nFIFO deque (maxlen=10)"]
        PROFILE["Profiling\nbeginner | intermediate | expert"]
        LLM["LLM Generator\nOllama llama3.2:3b"]
    end

    subgraph VERIFY["Verification"]
        ENTROPY["Semantic Entropy\nMulti-provider (Groq/Ollama/OpenRouter)"]
        GATE["Verification Gate\nRetry + Fallback"]
    end

    subgraph DATA["Data Layer"]
        QDRANT[("Qdrant\n:6333\narchives_v2")]
        SQLITE[("SQLite\nusers / conversations")]
    end

    GRADIO -->|HTTP JSON| ENDPOINT
    CURL -->|HTTP JSON| ENDPOINT

    ENDPOINT --> EMBED
    EMBED --> SEARCH
    SEARCH --> QDRANT

    ENDPOINT --> MEMORY
    MEMORY -.->|history| LLM
    SEARCH -->|context| PROFILE
    PROFILE --> LLM

    LLM --> GATE
    GATE -->|verification_enabled| ENTROPY
    ENTROPY -->|score| GATE
    GATE -->|retry if high entropy| LLM

    GATE --> RESPONSE["ChatResponse\n{answer, hallucination_score}"]

    AUTH --> SQLITE

RAG Pipeline Flow:

sequenceDiagram
    participant C as Client
    participant A as API /chat
    participant E as Embedder
    participant Q as Qdrant
    participant G as LLM Generator
    participant V as Verification Gate

    C->>A: POST /chat {session_id, question, profile}
    A->>E: generate_embedding(question)
    E-->>A: vector[768]
    A->>Q: search_context(vector, top_k=3)
    Q-->>A: chunks[]
    A->>G: generate(question, context, history, profile)
    G-->>A: answer
    alt verification_enabled
        A->>V: evaluate(question, context, answer)
        V-->>A: {answer, hallucination_score}
    end
    A-->>C: ChatResponse {answer, hallucination_score}

Deployment Topology:

flowchart LR
    subgraph CLIENTS["Clients"]
        direction TB
        BROWSER["Browser"]
        SCRIPTS["curl / scripts"]
    end

    subgraph HOST["Developer host"]
        OLLAMA["Ollama :11434<br/>llama3.2:3b + nomic-embed-text"]
    end

    subgraph COMPOSE["docker-compose stack"]
        direction TB
        subgraph INFRA["profile: infra"]
            QDRANT[("Qdrant<br/>:6333 REST / :6334 gRPC")]
        end
        subgraph APP["profile: app"]
            API["FastAPI :8000<br/>monolith binary"]
            GRADIO["Gradio :7860"]
            SQLITE[("SQLite<br/>bind-mount")]
        end
    end

    BROWSER -->|HTTP| GRADIO
    SCRIPTS -->|HTTP /chat| API
    GRADIO -->|HTTP /chat| API
    API -->|HTTP REST| QDRANT
    API -->|HTTP /api/chat,<br/>/api/embeddings| OLLAMA
    API -. SQLAlchemy .-> SQLITE

The first two diagrams are logical (what runs); the last is topological (where it runs). They complement, not duplicate.

Engineering Decisions

A curated index of the most significant decisions; each row links the ADR that holds the full rationale, alternatives, and consequences.

Decision	Alternative considered	Rationale
Modular monolith	Microservice per RAG step	Shared request model, synchronous pipeline — ADR-0001
Semantic entropy for the hallucination score	Binary classifier / LLM-as-judge	Continuous `0.0–1.0` score with no labeled data — ADR-0002
Local-first inference via Ollama	Hosted embeddings / larger hosted model	Offline, free, stable embedding space — ADR-0003
Multi-provider verification dispatch	OpenAI-only verification	Removes the hard paid dependency — ADR-0004
Synchronous `/chat` handler	`async def` handler	Threadpool keeps the event loop free — ADR-0005
bcrypt + JWT gate on `/chat`	Session cookies / static API keys	Stateless, instantly revocable auth — ADR-0006
SQLite for persistence	PostgreSQL	Zero-ops at single-node scale — ADR-0007
Deepagents on LangGraph as the agent substrate	Raw LangGraph / hand-rolled loop	Built-in planning, sub-agents, filesystem; isolated behind `agent/` — ADR-0008
Hosted Groq (GPT-OSS) for the agent reasoning tier	Larger local model / Claude	No local GPU; reuses the default verification provider; reliable tool-calling — ADR-0009

Getting Started

Prerequisites

Python 3.12+ (download)
Docker Desktop (download) — for Qdrant
Ollama (download) — for local inference

Installation

git clone https://github.com/LukeSantossz/sb100_agents.git
cd sb100_agents

# Pull inference models
ollama pull llama3.2:3b && ollama pull nomic-embed-text

# Install dependencies
uv sync                            # or: python -m venv .venv && .venv/bin/pip install -e .

# Configure environment (defaults work for local dev)
cp .env.example .env

Running

# 1. Start Qdrant
docker compose --profile infra up -d

# 2. Index documents (first run only)
.venv/bin/python database/semantic_chunker.py index ./archives/

# 3. Start API
.venv/bin/python -m uvicorn api.main:app --reload

# 4. (Optional) Start Gradio UI
.venv/bin/python ui/chat_ui.py

Windows users: replace .venv/bin/python with .venv\Scripts\python.exe, or run .\start.bat / .\start.ps1 after installation.

Full Docker deployment: docker compose --profile infra --profile app up -d. The compose stack uses a multi-stage Dockerfile.api (no build-essential in the final image), healthchecks that gate depends_on ordering, and log rotation (max-size: 10m, max-file: 3). On Linux the OLLAMA_HOST override is required — see SETUP.md §9.1. See SETUP.md for remote Qdrant configuration.

Verify the stack is up:

curl http://localhost:6333/healthz           # Qdrant: "healthz check passed"
curl http://localhost:8000/health            # API: {"status":"ok"}

Tests

pytest tests/ -m "not requires_infra"   # full suite, infra-bound tests excluded (CI default)
ruff check .                                           # lint
mypy retrieval/ generation/ memory/ --strict          # type check

API Reference

Endpoint	Description
`POST /chat`	RAG query (requires JWT); returns answer with hallucination score
`POST /auth/register`	Creates new user (rate-limit 3/hour per IP)
`POST /auth/token`	OAuth2 login; returns JWT (rate-limit 5 / 15min per IP)
`GET /health`	API health status

POST /chat:

TOKEN=$(curl -s -X POST "http://localhost:8000/auth/token" \
  -d "username=demo&password=long-enough-pw" | jq -r .access_token)

curl -X POST "http://localhost:8000/chat" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "demo-session",
    "question": "Qual a epoca ideal de plantio da soja?",
    "profile": {"name": "User", "expertise": "beginner"}
  }'
# {"answer": "...", "hallucination_score": 0.18}

Without the Authorization header the API returns 401 Unauthorized.

Request Field	Type	Description
`session_id`	string	UUID for conversation continuity
`question`	string	User query
`profile.expertise`	enum	`beginner` \| `intermediate` \| `expert`

Response Field	Type	Description
`answer`	string	Generated response adapted to expertise level
`hallucination_score`	float	0.0 (grounded) to 1.0 (likely hallucinated)

Project Structure

sb100_agents/
├── api/                            # FastAPI backend
│   ├── main.py                     # App entry (CORS + routers + lifespan)
│   └── routes/                     # chat.py, auth.py, health.py
├── core/                           # Pydantic schemas & configuration
├── retrieval/                      # Embeddings + Qdrant vector search
├── generation/                     # LLM response generation
├── memory/                         # Conversation buffer (FIFO)
├── verification/                   # Semantic entropy + verification gate
├── database/                       # SQLite + PDF semantic chunking
├── eval/                           # 5-step evaluation pipeline
├── ui/                             # Gradio chat interface
├── tests/                          # Unit + integration tests
├── .github/workflows/              # CI + Claude Code automation
├── Dockerfile.api                  # Multi-stage build (builder + runtime)
├── docker-compose.yml              # Qdrant (infra) + API+Gradio (app) with healthchecks
└── pyproject.toml

Project Status

Status: MVP complete — actively hardened.

Done

PDF indexing pipeline (semantic chunking → Qdrant)
RAG chat with expertise-adaptive responses
Semantic-entropy hallucination scoring (multi-provider)
bcrypt + JWT auth with per-IP rate limiting
Dockerized deployment (infra + app profiles, healthchecks, log rotation)
5-step offline evaluation pipeline (eval/)
Test suite (205 tests, ~83% coverage) with CI: ruff + mypy --strict + pytest

Pending

Raise critical-module coverage to a 70% CI gate
Optional Langfuse tracing for the RAG pipeline
Hybrid search (dense + sparse vectors, RRF fusion)
LangGraph migration (ReAct agent + agricultural intent filter)
Claim verification (atomic decomposition + RAG fact-checking)
Streaming responses (SSE)

The pending work is sequenced into delivery Waves in the agentic migration roadmap.

Known Issues & Limitations

CPU inference latency — llama3.2:3b with RAG context can take minutes per answer on CPU-only hosts. A configurable CHAT_TIMEOUT (default 600s) plus transient-error retries exist for this reason; the limitation disappears with a GPU or a hosted provider.
Single-node persistence — SQLite is single-writer. It fits one API process but does not support horizontal scaling; PostgreSQL is the migration path once writes contend.
Windows + Docker bind mount — if ./smartb100_v2.db does not already exist as a file, Docker Desktop may create it as a directory. Create the empty file before docker compose --profile app up; the API raises an explicit RuntimeError if it finds a directory.
Coverage gate is conservative — the CI coverage threshold is currently below the 70% target on critical modules. Raising it is in progress (see Project Status).
Breaking auth change — users created before the bcrypt + JWT gate (SHA-256 hashes) must be re-registered.
Verification adds latency — entropy sampling generates multiple candidate answers. It is opt-in via VERIFICATION_ENABLED and falls back to a neutral score on failure rather than blocking the answer.

Contributing

See CONTRIBUTING.md. Quick summary: fork, branch (type/NNN-short-description, NNN = issue number), tests, Conventional Commits, PR.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartB100 — Agriculture RAG Agent

What It Does

What It Is

Tech Stack

Architecture

Architectural Style

Engineering Decisions

Getting Started

Prerequisites

Installation

Running

Tests

API Reference

Project Structure

Project Status

Done

Pending

Known Issues & Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 348 Commits
.github		.github
.standards @ 776a1b5		.standards @ 776a1b5
agent		agent
api		api
archives		archives
core		core
database		database
docs		docs
eval		eval
generation		generation
memory		memory
retrieval		retrieval
scripts		scripts
tests		tests
ui		ui
verification		verification
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.api		Dockerfile.api
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start.bat		start.bat
start.ps1		start.ps1
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

SmartB100 — Agriculture RAG Agent

What It Does

What It Is

Tech Stack

Architecture

Architectural Style

Engineering Decisions

Getting Started

Prerequisites

Installation

Running

Tests

API Reference

Project Structure

Project Status

Done

Pending

Known Issues & Limitations

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages