K8s LLM Observability Stack

Deploying a model is the easy part. Keeping it alive under load? That’s where the engineering begins Most people can run an LLM in a terminal. Few can run one on Kubernetes that doesn't melt under pressure.This project demonstrates how to deploy an Open Source LLM (TinyLlama) on a Kubernetes cluster using Kind, with monitoring provided by Prometheus and Grafana.

Prerequisites

Docker installed and running
kubectl installed
Helm installed
Python 3.8+ installed

1. Environment Cleanup and Cluster Setup

If you previously had K3s or other configurations, clean them up first to avoid port and permission conflicts.

# Remove old configuration directories
sudo rm -rf /etc/rancher

# Reset Kubeconfig environment variable
unset KUBECONFIG

# Install Kind binary (Linux AMD64)
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

# Create the cluster
kind create cluster --name llm-deploy

# Link kubectl to the new Kind cluster
kind export kubeconfig --name llm-deploy

2. Deploy LLM Infrastructure

We use Ollama to serve the LLM inside the cluster.

# Apply the Kubernetes manifest
kubectl apply -f llm-stack.yaml

# Monitor the pod status until it is 'Running'
kubectl get pods -w

# Download the TinyLlama model into the running pod
kubectl exec -it $(kubectl get pods -l app=ollama -o name) -- ollama pull tinyllama

3. Install Monitoring Stack

Use Helm to deploy a lightweight version of Prometheus and Grafana.

# Add and update the Prometheus community repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the monitoring stack with a set password and low memory requests
helm install obs prometheus-community/kube-prometheus-stack \
  --set prometheus.prometheusSpec.resources.requests.memory=300Mi \
  --set grafana.adminPassword=admin

4. Application Access (Port Forwarding)

Because the services are inside the Kubernetes network, you must forward the ports to access them locally. Open three separate terminal tabs for these commands:

Tab 1: LLM API

kubectl port-forward svc/ollama-service 11434:11434

Tab 2: Grafana Dashboard

kubectl port-forward svc/obs-grafana 3000:80

Tab 3: Launch Chatbot UI

# Install UI dependencies
pip install streamlit requests

# Run the application
python3 -m streamlit run app.py

5. Usage and Monitoring

Open the Chatbot UI in your browser (usually http://localhost:8501).
Open Grafana in your browser at http://localhost:3000.
Login to Grafana with:
- User: admin
- Password: admin
Navigate to Dashboards -> Compute Resources / Pod and select the ollama pod.
Interact with the Chatbot and observe the CPU and Memory usage spikes in the Grafana dashboard.

Run a prompt in the Chatbot and watch the CPU/Memory usage spike in Grafana. If it stays red, you know exactly why your infra is struggling.

🧹 The Clean-Up

Don't hog resources on your machine!

kind delete cluster --name llm-deploy

If you want to understand what we did in this setup in more depth, I wrote a deep-dive here: https://heyyayush.hashnode.dev/how-to-deploy-an-open-source-llm-reliably-on-kubernetes?utm_source=hashnode&utm_medium=feed

Made with ❤️ by Ayush More. If this helped you learn how to monitor AI, give it a ⭐ and find me on LinkedIn.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
llm-stack.yaml		llm-stack.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K8s LLM Observability Stack

Prerequisites

1. Environment Cleanup and Cluster Setup

2. Deploy LLM Infrastructure

3. Install Monitoring Stack

4. Application Access (Port Forwarding)

5. Usage and Monitoring

🧹 The Clean-Up

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

K8s LLM Observability Stack

Prerequisites

1. Environment Cleanup and Cluster Setup

2. Deploy LLM Infrastructure

3. Install Monitoring Stack

4. Application Access (Port Forwarding)

5. Usage and Monitoring

🧹 The Clean-Up

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages