A hands-on implementation of a tool-using AI Agent (ReAct pattern) built using pure Python and the Google Gemini API.
This repository documents my exploration and learning journey of AI agent architectures, going under the hood to build a lightweight "Mini-Copilot" without relying on heavy frameworks like LangGraph or Microsoft Semantic Kernel.
Frameworks like LangGraph (focused on state-driven cyclic graphs) and Semantic Kernel (Microsoft's enterprise-grade agent SDK) are powerful, but they often abstract away the actual mechanics of agentic behavior.
By building this agent in plain Python, I wanted to understand:
- How function calling works at the API protocol level.
- How the ReAct (Reasoning and Acting) loop orchestrates between reasoning, selecting tools, executing them, and feeding the observations back to the LLM.
- How persistent memory is maintained across multi-turn chats in a simple state machine.
The agent runs a loop that allows it to:
- Deconstruct complex user prompts (e.g., "Search for the latest stock price, calculate 17.5% of it, and save the report").
- Execute tools dynamically:
- 🔍 Web Search: Queries DuckDuckGo for live facts.
- 🧮 Calculator: Parses and safely evaluates mathematical expressions.
- 💾 File Writer: Saves generated reports directly to the local disk.
- Iterate autonomously until it achieves the goal or hits safety limits.
The files are structured chronologically to reflect the building steps:
| File | Type | Description |
|---|---|---|
| step1_llm.py | Reference | Plain LLM interaction showing the limitations of a static model (no tool use or real-time info). |
| step2_tools.py | Reference | Implementing function declaration and parsing function calls requested by the model. |
| step3_agent.py | Reference | The heart of the agent: the autonomous ReAct tool execution loop. |
| step4_chat.py | Reference | Making the agent interactive with multi-turn chat memory in a terminal REPL. |
| step5_challenge.py | Challenge | Custom extensions to implement new tools (e.g., email drafting, system utilities). |
Clone the repository and install the dependencies:
pip install -r requirements.txtCreate a .env file in the root directory:
GEMINI_API_KEY=your_gemini_api_key_here🔑 Get a free key at Google AI Studio.
Run the interactive multi-turn chat agent:
python step4_chat.pyTry asking it to perform research, compute a value, and save a file:
You: Find the top 3 AI agent frameworks in 2026.
You: Save the comparison of these frameworks to report.md.
You: Translate that report to Spanish and save it as report_es.md.
Every agentic turn follows this loop under the hood:
┌────────────────────────────────────────────────────────┐
│ │
│ USER ──▶ LLM ──▶ "I need to call tool X(args)" │
│ ▲ │ │
│ │ ▼ │
│ │ [run tool X] │
│ │ │ │
│ └──── observation ◀┘ │
│ │
│ ... repeat until LLM says "Here is the final answer" │
└────────────────────────────────────────────────────────┘
- Safe Evaluation: Used safe parsing and constraints (empty
__builtins__and character whitelist) for the mathematical evaluator to prevent arbitrary code execution vulnerabilities. - API Limits: Handled rate limits and token considerations using the Gemini API.
- Framework Comparisons:
- vs LangGraph: Unlike LangGraph's complex state graphs, our state is managed via Gemini's built-in chat session history, making it perfect for rapid prototyping.
- vs Semantic Kernel: Unlike SK's heavy focus on C# and enterprise plugins, our Python agent is extremely modular and easy to read.
MIT — feel free to fork, adapt, and build your own custom tools!