AI Debate Arena is a full stack multi agent system that simulates structured debates between AI agents representing real-world personas. The system orchestrates autonomous agents, grounds their reasoning using live web search, and evaluates outcomes using a structured three-judge panel.
The project is designed as a production grade demonstration of:
- Multi agent orchestration with CrewAI
- Web grounded reasoning via DuckDuckGo
- Real time event streaming with Server-Sent Events (SSE)
- Structured debate evaluation
- A card based strategic UI abstraction
- Full Dockerized reproducibility
This repository showcases a complete production style AI system, including:
- Autonomous AI agents debating sequentially over a fixed number of rounds
- Mandatory web grounding to incorporate current information
- Redis backed streaming and concurrency control
- Celery powered asynchronous execution
- SQLite persistence for debate history and analytics
- A modern Next.js frontend visualizing debates as tactical argument cards
- A three agent judge panel scoring debates using structured rubrics
The entire stack runs locally using Docker Compose with minimal setup.
Instead of rendering debates as simple chat messages, the frontend abstracts each agent response into a structured argument card.
Each move includes:
- A Move Type (Attack, Defense, Refute, etc.)
- A Power Level representing argumentative strength
- The Generated Argument
- Web grounded supporting context
- Structured judge evaluation
This approach transforms debate into a turn based strategic exchange rather than a basic chatbot conversation. It makes agent reasoning more interpretable, comparable, and engaging.
Debates follow a structured lifecycle:
- A user selects two debaters and a topic.
- The backend creates a debate session.
- Agents take sequential turns for a fixed number of rounds.
- Each turn is grounded using live DuckDuckGo search.
- Debate events are streamed in real time to the frontend.
- After final arguments, three independent judge agents evaluate performance.
- Results are stored and made available for analytics.
All agents use the same temperature configuration, and debates are time limited and deterministic in structure.
The system consists of the following components:
- Next.js App Router
- Real-time SSE event consumption
- XState orchestration
- Card-based debate UI
- FastAPI API server
- CrewAI debate flow orchestration
- Celery worker for asynchronous execution
- Redis for broker, cache, locks, and streaming
- SQLite for persistence
- Docker + Docker Compose
- Separate dev and production configurations
The entire stack can be started locally with Docker.
git clone <repository-url>
cd <repository-root>
cp .env.example .envAdd required API keys to .env (for example, GROQ_API_KEY).
Then run:
docker compose up --buildAccess the application at:
- Frontend: http://localhost:3000
- Backend API Docs: http://localhost:8000/docs
This will start:
- FastAPI backend
- Celery worker
- Celery beat scheduler
- Redis
- Next.js frontend
No additional setup is required.
You can try:
Watch Elon Musk debate Donald Trump about DOGE (Department of Government Efficiency) performance and see a neutral three judge AI panel score them.
This example demonstrates:
- Multi agent coordination
- Real time streaming
- Web grounded reasoning
- Structured judge scoring
backend/ # FastAPI + CrewAI orchestration
frontend/ # Next.js UI
docker-compose.yml
docker-compose.prod.yml
Each directory contains its own detailed README explaining service-specific configuration and development workflows.
The system tracks:
- Debate duration
- Token usage
- Cost estimation
- Judge scoring
- Historical sessions
SQLite is used for local persistence. For production environments, a full relational database is recommended.
This system generates AI-simulated debates involving real public figures.
Important considerations:
- Outputs are generated by large language models.
- Content may contain inaccuracies or bias.
- Statements do not represent real individuals.
- The project is not affiliated with or endorsed by any real person referenced.
- The system is intended for educational and demonstration purposes only.
API keys should never be committed to the repository. Production deployments should use secure secret management systems.
For production-grade deployment, consider:
- Replacing SQLite with PostgreSQL
- Using managed Redis with persistence policies
- Adding TLS termination and reverse proxy
- Implementing proper authentication and authorization
- Adding structured logging and monitoring
- Implementing CI/CD pipelines
Contributions are welcome.
If you would like to improve the system:
- Create a feature branch.
- Make focused commits.
- Ensure Docker builds successfully.
- Open a pull request with a clear description of changes.
Please maintain architectural consistency and avoid breaking changes to core orchestration flows without discussion.