Skip to content

abhayapattanaik/kblame

Repository files navigation

kblame

When an alert fires, one command should tell you everything: what's failing (from alerts and logs), what changed, how they connect—across clusters, with history and learned patterns. That's kblame. That's the vision.

80% of Kubernetes outages trace to recent changes (Komodor, 2025). Yet today, finding them is a slog. Kubectl events expire in an hour. ArgoCD shows syncs, not details. Slack becomes your search engine. Forty-five minutes later: someone found a ConfigMap edit.

Our first release fills a gap: a unified change timeline across common Kubernetes resource types—images, ConfigMaps, Secrets, RBAC, HPA, NetworkPolicy, env vars, node events—in a single command. No open-source CLI has this today.

$ kblame -n payments --since 30m

CHANGES (last 30m)                              NAMESPACE: payments
────────────────────────────────────────────────────────────────────
TIME           KIND          NAME              CHANGE
14:21:03 UTC   Deployment    payments-service  image: v2.3.1 → v2.3.2
14:21:14 UTC   Event         payments-svc-7f.. Failed to pull image "v2.3.2"
14:22:01 UTC   ConfigMap     payments-config   updated by payments-team (2 fields)
14:25:30 UTC   HPA           payments-hpa      scaled 3 → 5 replicas

Single binary. Client-side today—no agents, no RBAC, no persistent storage. An in-cluster controller may be introduced in V3+ for persistent history and deeper correlation.

Here's how we're building toward the full vision:

Roadmap

Version Functionality Status
V1 What changed — unified change timeline across common Kubernetes resource types Current
V2 What changed + what's failing — logs, confirmed alert correlation Planned
V3 What changed + what's failing + how they connect — dependency-aware correlation, persistent history via in-cluster controller Planned
V4 Multi-cluster, web UI, learned patterns Planned
V5 Ideas fermenting...

Why kblame?

Several tools address parts of the incident response workflow. We believe kblame brings these capabilities together cohesively in a single CLI:

Tool What It Does Where It Falls Short
kubectl get events Raw event stream Expires after 1 hour. Events, not changes. Wall of noise.
kubectl rollout history Deployment revision history Single resource type only. No cross-resource view.
Sloop (Salesforce) Event timeline web dashboard Requires deploying a persistent service. Web UI, not CLI.
KHI (Google) Audit log timeline visualization Web app. Primarily GKE + Cloud Logging.
Robusta In-cluster change tracking + alerts Requires Helm installation, persistent cluster resources.
kubectl-blame Field-level attribution (who edited field X) Different problem — field audit, not change timeline.
Komodor Commercial change intelligence Proprietary, requires agent deployment, paid.
ArgoCD/Flux GitOps sync history Shows syncs, not what specifically changed or why it broke.

kblame is different:

  • CLI-native — run from terminal, pipe to jq, script it
  • Client-side only — no agents, no RBAC changes, no persistent storage
  • Multi-resource — images, ConfigMaps, Secrets, RBAC, HPA, NetworkPolicy, env vars, node events in one view
  • Instant — no setup, no deployment, just run it

Install

Pre-built binaries — download and run, no Go required:

Platform Architecture Download Status
macOS ARM64 (Apple Silicon) kblame-darwin-arm64 Available
macOS AMD64 (Intel) kblame-darwin-amd64 Available
Linux AMD64 kblame-linux-amd64 Available
Linux ARM64 kblame-linux-arm64 Available
Windows AMD64 kblame-windows-amd64.exe Available

From source (requires Go 1.22+):

go install github.com/abcdedf/kblame/cmd/kblame@latest

# Or clone and build
git clone https://github.com/abcdedf/kblame.git
cd kblame
make build
# Binary at bin/kblame

Copy bin/kblame to your PATH. To use as a kubectl plugin:

mv bin/kblame ~/.local/bin/kubectl-kblame
# Now run as: kubectl kblame

Usage

# Show changes in current namespace, last 30 minutes
kblame

# Show changes in a specific namespace, last 1 hour
kblame -n payments --since 1h

# All namespaces
kblame -A --since 15m

# Filter by resource kind
kblame -n payments --kind deployment,configmap

# JSON output (for scripting / piping to jq)
kblame -n payments --output json

# Markdown output (for Slack / GitHub issues)
kblame -n payments --output md

Alert Correlation (Experimental, V2+)

kblame can correlate changes with Prometheus AlertManager alerts. This feature exists in V1 but is intentionally experimental—it shows temporal and label-based scoring, but is not the primary value yet. Richer, dependency-aware correlation comes in V2+.

# Correlate changes with active alerts
kblame -n payments --alerts --since 30m

# Specify AlertManager URL
kblame --alerts --alertmanager-url http://alertmanager.monitoring:9093

When --alerts is enabled, kblame scores each (change, alert) pair using temporal proximity (40%), namespace match (20%), label overlap (20%), and change severity (20%), then shows the top 3 candidates. Use this to spot potential connections, but verify them manually.

What It Detects

Change Type Detection Method Severity
Image updates ReplicaSet revision diffing 0.9
ConfigMap changes managedFields timestamps 0.8
Secret changes managedFields (values redacted) 0.8
Environment variables Pod spec diffing 0.8
Resource limits Pod spec diffing 0.7
RBAC changes Role/RoleBinding creation/modification 0.7
HPA scaling Kubernetes events 0.6
NetworkPolicy changes Resource creation/modification 0.9
Node issues Node events (OOMKill, NotReady, etc.) 0.6

Limitations

  • Event TTL: Kubernetes events expire after 1 hour by default. kblame can only see changes within this window (or within --since duration) unless your cluster has extended event retention.
  • Client-side only: No persistent history. Each run queries the live cluster. For persistent change tracking, consider Sloop or an in-cluster controller (planned for V3).
  • Not a monitoring tool: kblame complements your monitoring stack (Datadog, Grafana, PagerDuty). It does not replace it.
  • No application-level bugs: kblame detects infrastructure changes (deploys, config, scaling). It cannot detect application logic bugs.

Development

# Build
make build

# Run tests
make test

# Lint
make lint

# Cross-compile for all platforms
make cross-compile

Demo Environment

Set up a local kind cluster with sample services:

chmod +x scripts/demo-setup.sh
./scripts/demo-setup.sh

# Then run kblame against it
go run ./cmd/kblame -n demo --since 5m

License

MIT

About

git blame for Kubernetes production incidents — a kubectl plugin that shows what changed in your cluster and correlates changes with alerts

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors