You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An AI-driven Site Reliability Engineering (SRE) bot powered by **Google Gemini** and **Telegram**. Gopher-Ops monitors your host system metrics (CPU, RAM) and allows you to perform basic triage and healing actions (start, stop, restart containers, clear cache) via a Telegram chat interface.
10
10
11
-
Designed with modern DevOps principles, it features **Infrastructure as Code (IaC)** provisioning via Terraform and automated CI/CD pipelines.
11
+
**Gopher-Ops** is a Secure AI SRE Telegram bot managing Docker, Kubernetes, and system metrics via natural language.
12
12
13
-
---
13
+
## 🚀 Key Features
14
14
15
-
## 🚀 Key Features (Portfolio Highlights)
16
-
17
-
-**Infrastructure as Code (IaC):**Automated provisioning of a containerized lab environment (Nginx & stateful Redis cluster) using **Terraform**, complete with Custom Networks, Persistent Volumes, Variables, and Count loops for scaling.
18
-
-**System Observability & SRE:**Developed in **Golang**, the bot fetches real-time host metrics (CPU/RAM) and container statuses directly via the Docker API.
19
-
-**Generative AI ChatOps:**Integrated **Google Gemini (2.5-flash-lite)** to act as an intent parser. It understands natural language logs and triggers technical SRE actions in a casual "Gen-Z" persona to reduce operator cognitive load.
20
-
-**Interactive Healing Actions:**Parses AI intents to generate Human-in-the-loop (HITL) inline Telegram buttons, allowing the operator to start, stop, restart, or clear cache directly from chat.
15
+
-**AI ChatOps:** Powered by **Google Gemini (2.0/2.5-flash)** to parse intents and logs, answering infrastructure queries in a casual persona to reduce operator cognitive load.
16
+
-**Telemetry & Observability:** Real-time monitoring of host OS (CPU/RAM), Docker, and Kubernetes states via **gopsutil**, **Docker SDK**, and **MCP**. Includes a 1-hour in-memory metric history for sustained high-load detection and proactive alerting.
17
+
-**Guided Triage & HITL Execution:**Parses AI suggestions into clickable Telegram buttons for safe actions (Start/Stop/Restart) and interactive troubleshooting flows (Network triage & Configuration validation).
18
+
-**Infrastructure as Code (IaC):****Terraform** provisions a local microservices lab environment (Nginx, scalable/stateful Redis cluster, custom networks, and persistent volumes).
19
+
-**Sec & Ops:**Zero-Trust ID gating via Telegram; Automated Docker image vulnerability scanning; and a robust **GitHub Actions** CI/CD pipeline for Go tests and Terraform validation.
20
+
-**Kubernetes & MCP Support:**Seamlessly manages cluster operations using the**Model Context Protocol (MCP)**, bridging AI with Kubernetes native tools.
21
21
-**Robust CI/CD Pipeline:** Configured with **GitHub Actions** for automated Go unit testing and Terraform validation/formatting upon every push/PR.
22
-
-**Zero-Trust Security:** Hardcoded Telegram Chat ID gating ensuring only the authorized operator can view metrics or execute system-level docker commands.
22
+
-**Zero-Trust & DevSecOps:** Hardcoded Telegram Chat ID gating ensuring only the authorized operator can execute commands. Includes **automated image vulnerability scanning** for outdated Docker tags/CVEs.
23
+
23
24
24
25
## 🏗️ Architecture Workflow
25
26
@@ -35,9 +36,9 @@ graph TD;
35
36
36
37
## 🛠️ Tech Stack
37
38
38
-
-**Backend:** Go (Golang), Docker API SDK, gopsutil
39
-
-**AI / NLP:** Google Generative AI (Gemini Flash-Lite)
40
-
-**Infrastructure:** Docker, Terraform (HCL)
39
+
-**Backend:** Go (Golang), Docker API SDK, gopsutil, **MCP Go SDK**
40
+
-**AI / NLP:** Google Generative AI (Gemini 2.0 Flash)
41
+
-**Infrastructure:** Docker, Kubernetes, Terraform (HCL), **MCP Server Kubernetes**
41
42
-**CI/CD:** GitHub Actions
42
43
-**Interface:** Telegram Bot API
43
44
@@ -84,8 +85,16 @@ go run cmd/main.go
84
85
85
86
Once the bot is running, simply PM it on Telegram to start managing your infrastructure:
86
87
-*"Bro, check system health jap"* -> Bot reads live CPU/RAM and lists the Terraform-provisioned containers.
87
-
-*"Tolong stop container gopher-ops-nginx-lab"* -> Bot understands the intent, verifies state, and provides a clickable inline `🛑 Stop` button.
88
-
-*"Minta restart redis satu"* -> Re-initializes specific containers directly across the Docker daemon.
88
+
-*"List pods dalam cluster k8s aku"* -> Bot uses MCP to fetch real-time pod data from Kubernetes.
89
+
-*"Kenapa pod database asyik restart?"* -> Bot triggers an automated `k8s-diagnose` workflow to find the root cause.
90
+
91
+
## 📜 Credits & Acknowledgments
92
+
93
+
The Kubernetes management capabilities of Gopher-Ops are powered by the **Model Context Protocol (MCP)** and the excellent [MCP Server Kubernetes](https://github.com/Flux159/mcp-server-kubernetes) community project. Special thanks to the authors for their work in bridging AI and Kubernetes.
89
94
90
95
## ⚠️ Disclaimer
91
-
This project binds to the host's Docker socket to execute real container lifecycles. Please ensure your `AUTHORIZED_CHAT_ID` is strictly configured to prevent unauthorized infrastructure manipulation.
96
+
This project binds to the host's Docker socket and Kubernetes API to execute real infrastructure lifecycles. Please ensure your `AUTHORIZED_CHAT_ID` is strictly configured to prevent unauthorized manipulation.
97
+
98
+
## 📄 License
99
+
This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more details.
My project is named Gopher-Ops. It is a Secure AI SRE Telegram bot managing Docker & metrics via natural language. Key Features are: AI ChatOps (Gemini 2.5 parses intents/logs, answering infra queries in a casual persona), Telemetry & Observability (Live host OS (CPU/RAM) & Docker states via gopsutil & Docker SDK; includes 1-hour in-memory metric history for sustained high-load detection), Guided Triage & HITL Execution (Parses AI suggestions into clickable Telegram buttons for safe actions (Start/Stop/Restart) and interactive troubleshooting flows), IaC (Terraform provisions a local microservices lab (Nginx, scalable/stateful Redis, networks, volumes)), Sec & Ops (Zero-Trust ID gating (Telegram); Automated Docker image vulnerability scanning; GitHub Actions CI/CD (Go tests & Terraform validation)). The stack includes Go, Gemini API, Docker SDK, Terraform, GitHub Actions, Telegram API.
Dokumen ni ditulis khas sebagai nota peribadi (cheat sheet) untuk ingat balik segala gila/magic **Infrastructure as Code (IaC)** yang kita dah implement dalam fail-fail kat folder `terraform/`.
4
+
5
+
Projek asalnya cuma AI bot (SRE) dalam Go, tapi kita tambah kuasa "Automated Environment Provisioning" pakai Terraform.
6
+
7
+
## 1. Konsep Asas (HCL & Docker Provider)
8
+
-**Tool yang pakai:**`terraform init` (download provider), `terraform plan` (buat draf perancangan PC kita sebelum ejas code), dan `terraform apply -auto-approve` (bina container betul-betul)
9
+
-**The Provider:** Kat fail `main.tf`, kita "tarik" provider nama `kreuzwerker/docker`. Ini ajar Terraform yang dia kena borak dengan Docker Engine kat PC host kita.
10
+
-**Fail Utama `main.tf`:** Ini tempat kita mengarah (secara declarative) container apa yang patut NAIK. Kita list "Nginx" as Web Server, dengan "Redis" as Database/Cache.
Dulu kita hardcode Nginx & Redis satu bijik je, tapi kita upgrade jadi **Dynamic Deployment**! 🚀
14
+
-**Fail `variables.tf`:** Tempat kita simpan parameter macam `redis_count = 2` dan port `nginx_port`.
15
+
-**Argumen `count`:** Dekat `main.tf`, kat resource redis kita letak `count = var.redis_count`. Terraform secara automatik loop dan create "gopher-ops-redis-node-1" dan "gopher-ops-redis-node-2". Kita tak payah COPY-PASTE kod tu dua kali! No cap fr fr.
16
+
17
+
## 3. Persistent State & Networking (Senior-Level Skill)
18
+
Macam microservices betul-betul di *production*, kita bina dua "nyawa" yang power kat `main.tf` ni:
19
+
-**Custom Docker Network:** Nama dia `gopher_ops_network`. Nginx & Redis 1/2 semua ditarik masuk duduk satu bumbung (isolated). Nginx boleh *ping* Redis direct tanpa IP Address, just guna hostname.
20
+
-**Persistent Volume:** Kita tambah resource `docker_volume` khas untuk folder `/data` Redis-redis tadi. Ini buat dia jadi **Stateful Service**. Maksudnya? Kalau container redis tu di kill (Destroy) oleh Terraform atau Bot Gopher-Ops, DATA cache dalam tu MASIH WUJUD.
21
+
22
+
## 4. "Resit" Deployment pakai Outputs
23
+
-**Fail `outputs.tf`:** Selepas deployment siap (Apply complete), Terraform print out dekat terminal port Nginx yang dia pilih (`8080`) dan list array semua hostname redis yang tengah run (`node-1`, `node-2`). Senang untuk operator Go/bot baca!
24
+
25
+
## 5. Hubung kait Nginx/Redis Ni Dengan Bot Go Kita (The End Goal)
26
+
Segala perbuatan dari Terraform (Destroy/Create container, Scaling) akan serta merta di_"detect"_ oleh **Bot Gopher-Ops telegram kita**.
27
+
Ini konsep **Observability (via gopsutil + Docker API)**.
28
+
SRE tak perlu manual deploy lab; bot SRE AI (Gemini) ada *full context* pada container-container yang Terraform tolong *spin up*! 🔥🦅
29
+
30
+
---
31
+
*Siap! Nota ini untuk rujukan diri sendiri atau bila disoal siasat waktu interview pasal Terraform flow.*
0 commit comments