Skip to content

Commit b5178e0

Browse files
committed
feat: implement Gopher-Ops SRE Telegram bot with AI-driven infrastructure management and monitoring capabilities.
1 parent 3194699 commit b5178e0

18 files changed

Lines changed: 834 additions & 160 deletions

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@
88
# Environment variables
99
.env
1010
.env.*
11+
state.json
12+
13+
# Binaries
14+
gopher-ops
15+
gopher-ops.exe
1116

1217
# Go workspace
1318
go.work

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Gopher-Ops Contributors
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

Makefile

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
.PHONY: build run tidy tf-init tf-apply clean help
2+
3+
# Variables
4+
BINARY_NAME=gopher-ops
5+
MAIN_PATH=cmd/main.go
6+
7+
help:
8+
@echo "🤖 Gopher-Ops Management Commands:"
9+
@echo " make build - Build the Go binary"
10+
@echo " make run - Run the bot directly"
11+
@echo " make tidy - Clean up Go dependencies"
12+
@echo " make tf-init - Initialize Terraform"
13+
@echo " make tf-apply - Deploy the infrastructure lab"
14+
@echo " make clean - Remove binary and state files"
15+
16+
build:
17+
go build -o $(BINARY_NAME) $(MAIN_PATH)
18+
19+
run:
20+
go run $(MAIN_PATH)
21+
22+
tidy:
23+
go mod tidy
24+
25+
tf-init:
26+
cd terraform && terraform init
27+
28+
tf-apply:
29+
cd terraform && terraform apply -auto-approve
30+
31+
clean:
32+
rm -f $(BINARY_NAME)
33+
rm -f state.json
34+
@echo "✅ Cleanup done."

README.md

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,22 @@
55
![Terraform](https://img.shields.io/badge/Terraform-IaC-7B42BC?style=flat&logo=terraform)
66
![Gemini AI](https://img.shields.io/badge/AI-Google_Gemini-8E75B2?style=flat&logo=google)
77
![CI/CD](https://img.shields.io/badge/CI%2FCD-GitHub_Actions-2088FF?style=flat&logo=github-actions)
8+
![License](https://img.shields.io/badge/License-MIT-green.svg)
89

9-
An AI-driven Site Reliability Engineering (SRE) bot powered by **Google Gemini** and **Telegram**. Gopher-Ops monitors your host system metrics (CPU, RAM) and allows you to perform basic triage and healing actions (start, stop, restart containers, clear cache) via a Telegram chat interface.
1010

11-
Designed with modern DevOps principles, it features **Infrastructure as Code (IaC)** provisioning via Terraform and automated CI/CD pipelines.
11+
**Gopher-Ops** is a Secure AI SRE Telegram bot managing Docker, Kubernetes, and system metrics via natural language.
1212

13-
---
13+
## 🚀 Key Features
1414

15-
## 🚀 Key Features (Portfolio Highlights)
16-
17-
- **Infrastructure as Code (IaC):** Automated provisioning of a containerized lab environment (Nginx & stateful Redis cluster) using **Terraform**, complete with Custom Networks, Persistent Volumes, Variables, and Count loops for scaling.
18-
- **System Observability & SRE:** Developed in **Golang**, the bot fetches real-time host metrics (CPU/RAM) and container statuses directly via the Docker API.
19-
- **Generative AI ChatOps:** Integrated **Google Gemini (2.5-flash-lite)** to act as an intent parser. It understands natural language logs and triggers technical SRE actions in a casual "Gen-Z" persona to reduce operator cognitive load.
20-
- **Interactive Healing Actions:** Parses AI intents to generate Human-in-the-loop (HITL) inline Telegram buttons, allowing the operator to start, stop, restart, or clear cache directly from chat.
15+
- **AI ChatOps:** Powered by **Google Gemini (2.0/2.5-flash)** to parse intents and logs, answering infrastructure queries in a casual persona to reduce operator cognitive load.
16+
- **Telemetry & Observability:** Real-time monitoring of host OS (CPU/RAM), Docker, and Kubernetes states via **gopsutil**, **Docker SDK**, and **MCP**. Includes a 1-hour in-memory metric history for sustained high-load detection and proactive alerting.
17+
- **Guided Triage & HITL Execution:** Parses AI suggestions into clickable Telegram buttons for safe actions (Start/Stop/Restart) and interactive troubleshooting flows (Network triage & Configuration validation).
18+
- **Infrastructure as Code (IaC):** **Terraform** provisions a local microservices lab environment (Nginx, scalable/stateful Redis cluster, custom networks, and persistent volumes).
19+
- **Sec & Ops:** Zero-Trust ID gating via Telegram; Automated Docker image vulnerability scanning; and a robust **GitHub Actions** CI/CD pipeline for Go tests and Terraform validation.
20+
- **Kubernetes & MCP Support:** Seamlessly manages cluster operations using the **Model Context Protocol (MCP)**, bridging AI with Kubernetes native tools.
2121
- **Robust CI/CD Pipeline:** Configured with **GitHub Actions** for automated Go unit testing and Terraform validation/formatting upon every push/PR.
22-
- **Zero-Trust Security:** Hardcoded Telegram Chat ID gating ensuring only the authorized operator can view metrics or execute system-level docker commands.
22+
- **Zero-Trust & DevSecOps:** Hardcoded Telegram Chat ID gating ensuring only the authorized operator can execute commands. Includes **automated image vulnerability scanning** for outdated Docker tags/CVEs.
23+
2324

2425
## 🏗️ Architecture Workflow
2526

@@ -35,9 +36,9 @@ graph TD;
3536

3637
## 🛠️ Tech Stack
3738

38-
- **Backend:** Go (Golang), Docker API SDK, gopsutil
39-
- **AI / NLP:** Google Generative AI (Gemini Flash-Lite)
40-
- **Infrastructure:** Docker, Terraform (HCL)
39+
- **Backend:** Go (Golang), Docker API SDK, gopsutil, **MCP Go SDK**
40+
- **AI / NLP:** Google Generative AI (Gemini 2.0 Flash)
41+
- **Infrastructure:** Docker, Kubernetes, Terraform (HCL), **MCP Server Kubernetes**
4142
- **CI/CD:** GitHub Actions
4243
- **Interface:** Telegram Bot API
4344

@@ -84,8 +85,16 @@ go run cmd/main.go
8485

8586
Once the bot is running, simply PM it on Telegram to start managing your infrastructure:
8687
- *"Bro, check system health jap"* -> Bot reads live CPU/RAM and lists the Terraform-provisioned containers.
87-
- *"Tolong stop container gopher-ops-nginx-lab"* -> Bot understands the intent, verifies state, and provides a clickable inline `🛑 Stop` button.
88-
- *"Minta restart redis satu"* -> Re-initializes specific containers directly across the Docker daemon.
88+
- *"List pods dalam cluster k8s aku"* -> Bot uses MCP to fetch real-time pod data from Kubernetes.
89+
- *"Kenapa pod database asyik restart?"* -> Bot triggers an automated `k8s-diagnose` workflow to find the root cause.
90+
91+
## 📜 Credits & Acknowledgments
92+
93+
The Kubernetes management capabilities of Gopher-Ops are powered by the **Model Context Protocol (MCP)** and the excellent [MCP Server Kubernetes](https://github.com/Flux159/mcp-server-kubernetes) community project. Special thanks to the authors for their work in bridging AI and Kubernetes.
8994

9095
## ⚠️ Disclaimer
91-
This project binds to the host's Docker socket to execute real container lifecycles. Please ensure your `AUTHORIZED_CHAT_ID` is strictly configured to prevent unauthorized infrastructure manipulation.
96+
This project binds to the host's Docker socket and Kubernetes API to execute real infrastructure lifecycles. Please ensure your `AUTHORIZED_CHAT_ID` is strictly configured to prevent unauthorized manipulation.
97+
98+
## 📄 License
99+
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more details.
100+

RESUME_CONTEXT.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
My project is named Gopher-Ops. It is a Secure AI SRE Telegram bot managing Docker & metrics via natural language. Key Features are: AI ChatOps (Gemini 2.5 parses intents/logs, answering infra queries in a casual persona), Telemetry & Observability (Live host OS (CPU/RAM) & Docker states via gopsutil & Docker SDK; includes 1-hour in-memory metric history for sustained high-load detection), Guided Triage & HITL Execution (Parses AI suggestions into clickable Telegram buttons for safe actions (Start/Stop/Restart) and interactive troubleshooting flows), IaC (Terraform provisions a local microservices lab (Nginx, scalable/stateful Redis, networks, volumes)), Sec & Ops (Zero-Trust ID gating (Telegram); Automated Docker image vulnerability scanning; GitHub Actions CI/CD (Go tests & Terraform validation)). The stack includes Go, Gemini API, Docker SDK, Terraform, GitHub Actions, Telegram API.

TERRAFORM_NOTES.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# 🛠 Terraform Context: Apa Yang Kita Dah Buat?
2+
3+
Dokumen ni ditulis khas sebagai nota peribadi (cheat sheet) untuk ingat balik segala gila/magic **Infrastructure as Code (IaC)** yang kita dah implement dalam fail-fail kat folder `terraform/`.
4+
5+
Projek asalnya cuma AI bot (SRE) dalam Go, tapi kita tambah kuasa "Automated Environment Provisioning" pakai Terraform.
6+
7+
## 1. Konsep Asas (HCL & Docker Provider)
8+
- **Tool yang pakai:** `terraform init` (download provider), `terraform plan` (buat draf perancangan PC kita sebelum ejas code), dan `terraform apply -auto-approve` (bina container betul-betul)
9+
- **The Provider:** Kat fail `main.tf`, kita "tarik" provider nama `kreuzwerker/docker`. Ini ajar Terraform yang dia kena borak dengan Docker Engine kat PC host kita.
10+
- **Fail Utama `main.tf`:** Ini tempat kita mengarah (secara declarative) container apa yang patut NAIK. Kita list "Nginx" as Web Server, dengan "Redis" as Database/Cache.
11+
12+
## 2. Scale & Automate Pakai Variables & Loop (Mid-Level Skill)
13+
Dulu kita hardcode Nginx & Redis satu bijik je, tapi kita upgrade jadi **Dynamic Deployment**! 🚀
14+
- **Fail `variables.tf`:** Tempat kita simpan parameter macam `redis_count = 2` dan port `nginx_port`.
15+
- **Argumen `count`:** Dekat `main.tf`, kat resource redis kita letak `count = var.redis_count`. Terraform secara automatik loop dan create "gopher-ops-redis-node-1" dan "gopher-ops-redis-node-2". Kita tak payah COPY-PASTE kod tu dua kali! No cap fr fr.
16+
17+
## 3. Persistent State & Networking (Senior-Level Skill)
18+
Macam microservices betul-betul di *production*, kita bina dua "nyawa" yang power kat `main.tf` ni:
19+
- **Custom Docker Network:** Nama dia `gopher_ops_network`. Nginx & Redis 1/2 semua ditarik masuk duduk satu bumbung (isolated). Nginx boleh *ping* Redis direct tanpa IP Address, just guna hostname.
20+
- **Persistent Volume:** Kita tambah resource `docker_volume` khas untuk folder `/data` Redis-redis tadi. Ini buat dia jadi **Stateful Service**. Maksudnya? Kalau container redis tu di kill (Destroy) oleh Terraform atau Bot Gopher-Ops, DATA cache dalam tu MASIH WUJUD.
21+
22+
## 4. "Resit" Deployment pakai Outputs
23+
- **Fail `outputs.tf`:** Selepas deployment siap (Apply complete), Terraform print out dekat terminal port Nginx yang dia pilih (`8080`) dan list array semua hostname redis yang tengah run (`node-1`, `node-2`). Senang untuk operator Go/bot baca!
24+
25+
## 5. Hubung kait Nginx/Redis Ni Dengan Bot Go Kita (The End Goal)
26+
Segala perbuatan dari Terraform (Destroy/Create container, Scaling) akan serta merta di_"detect"_ oleh **Bot Gopher-Ops telegram kita**.
27+
Ini konsep **Observability (via gopsutil + Docker API)**.
28+
SRE tak perlu manual deploy lab; bot SRE AI (Gemini) ada *full context* pada container-container yang Terraform tolong *spin up*! 🔥🦅
29+
30+
---
31+
*Siap! Nota ini untuk rujukan diri sendiri atau bila disoal siasat waktu interview pasal Terraform flow.*

0 commit comments

Comments
 (0)