Skip to content

GgauravJ05/build-your-own-agent

Repository files navigation

Mini-Copilot: A Framework-less AI Agent from Scratch 🤖

A hands-on implementation of a tool-using AI Agent (ReAct pattern) built using pure Python and the Google Gemini API.

This repository documents my exploration and learning journey of AI agent architectures, going under the hood to build a lightweight "Mini-Copilot" without relying on heavy frameworks like LangGraph or Microsoft Semantic Kernel.


💡 The Core Philosophy: Why Build From Scratch?

Frameworks like LangGraph (focused on state-driven cyclic graphs) and Semantic Kernel (Microsoft's enterprise-grade agent SDK) are powerful, but they often abstract away the actual mechanics of agentic behavior.

By building this agent in plain Python, I wanted to understand:

  1. How function calling works at the API protocol level.
  2. How the ReAct (Reasoning and Acting) loop orchestrates between reasoning, selecting tools, executing them, and feeding the observations back to the LLM.
  3. How persistent memory is maintained across multi-turn chats in a simple state machine.

🛠️ What This Agent Does

The agent runs a loop that allows it to:

  1. Deconstruct complex user prompts (e.g., "Search for the latest stock price, calculate 17.5% of it, and save the report").
  2. Execute tools dynamically:
    • 🔍 Web Search: Queries DuckDuckGo for live facts.
    • 🧮 Calculator: Parses and safely evaluates mathematical expressions.
    • 💾 File Writer: Saves generated reports directly to the local disk.
  3. Iterate autonomously until it achieves the goal or hits safety limits.

📁 Repository Structure

The files are structured chronologically to reflect the building steps:

File Type Description
step1_llm.py Reference Plain LLM interaction showing the limitations of a static model (no tool use or real-time info).
step2_tools.py Reference Implementing function declaration and parsing function calls requested by the model.
step3_agent.py Reference The heart of the agent: the autonomous ReAct tool execution loop.
step4_chat.py Reference Making the agent interactive with multi-turn chat memory in a terminal REPL.
step5_challenge.py Challenge Custom extensions to implement new tools (e.g., email drafting, system utilities).

🚀 Getting Started

1. Prerequisites & Setup

Clone the repository and install the dependencies:

pip install -r requirements.txt

2. Configure Environment

Create a .env file in the root directory:

GEMINI_API_KEY=your_gemini_api_key_here

🔑 Get a free key at Google AI Studio.

3. Running the Agent

Run the interactive multi-turn chat agent:

python step4_chat.py

Try asking it to perform research, compute a value, and save a file:

You: Find the top 3 AI agent frameworks in 2026.
You: Save the comparison of these frameworks to report.md.
You: Translate that report to Spanish and save it as report_es.md.

🧠 Behind the Scenes: The ReAct Loop

Every agentic turn follows this loop under the hood:

┌────────────────────────────────────────────────────────┐
│                                                        │
│   USER ──▶ LLM ──▶ "I need to call tool X(args)"       │
│             ▲                  │                       │
│             │                  ▼                       │
│             │             [run tool X]                 │
│             │                  │                       │
│             └──── observation ◀┘                       │
│                                                        │
│   ... repeat until LLM says "Here is the final answer"  │
└────────────────────────────────────────────────────────┘

🛡️ Key Takeaways & Best Practices

  • Safe Evaluation: Used safe parsing and constraints (empty __builtins__ and character whitelist) for the mathematical evaluator to prevent arbitrary code execution vulnerabilities.
  • API Limits: Handled rate limits and token considerations using the Gemini API.
  • Framework Comparisons:
    • vs LangGraph: Unlike LangGraph's complex state graphs, our state is managed via Gemini's built-in chat session history, making it perfect for rapid prototyping.
    • vs Semantic Kernel: Unlike SK's heavy focus on C# and enterprise plugins, our Python agent is extremely modular and easy to read.

📜 License

MIT — feel free to fork, adapt, and build your own custom tools!

About

A step-by-step hands-on guide to building a tool-using AI Agent from scratch in Python with the Google Gemini API. Zero heavy frameworks, just core concepts.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages