Status: manifests flipped to Raft (in this PR). The HelmRelease now runs 3-node Integrated Storage (Raft); the
standalone+ file backend is gone. This is a deliberately disruptive switch, taken now because nothing critical is stored yet — the cluster fresh-inits an empty raft (no data to migrate) and re-seeds from the vault-config / vault-seed Jobs. The fresh-init / unseal / join flow and the network prerequisite are below. The data-migration paths are kept for the future: once real secrets exist, changing the storage backend again needsbao operator migrateor a snapshot restore, never a blind flip.
| Current | Target | |
|---|---|---|
| Mode | server.standalone.enabled: true |
server.ha.enabled + ha.raft.enabled |
| Storage | storage "file" (/openbao/data) |
storage "raft" (Integrated Storage) |
| Replicas | 1 (openbao_replicas: "1") |
3 (Raft quorum minimum) |
| Seal | Shamir 1-of-1, key in openbao-unseal Secret, postStart auto-unseal |
unchanged (or transit auto-unseal — see hardening) |
Raft is a different storage backend. New raft pods initialize an empty store;
the existing file-storage PVC data is orphaned. Data must be carried over with
either bao operator migrate (server offline) or a fresh init + snapshot
restore. The Shamir unseal key and root token survive both paths (migrate
preserves the seal; a snapshot is taken post-unseal).
Replace the server.standalone block in
k8s/bases/infrastructure/controllers/openbao/helm-release.yaml with:
server:
# standalone and ha are mutually exclusive — standalone MUST be disabled.
standalone:
enabled: false
ha:
enabled: true
replicas: ${openbao_replicas:=1} # set openbao_replicas: "3" in prod
raft:
enabled: true
setNodeId: true # BAO_RAFT_NODE_ID = pod name (openbao-0/1/2)
config: |
ui = true
disable_mlock = true # keep — matches the PSS-baseline constraint
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "raft" {
path = "/openbao/data"
# Auto-join: each follower discovers the cluster via the headless
# service so no manual `bao operator raft join` is needed.
retry_join { leader_api_addr = "http://openbao-0.openbao-internal:8200" }
retry_join { leader_api_addr = "http://openbao-1.openbao-internal:8200" }
retry_join { leader_api_addr = "http://openbao-2.openbao-internal:8200" }
}
service_registration "kubernetes" {}
# Re-declare the file audit device (currently in standalone.config).
audit "file" {
type = "file"
options = { file_path = "/openbao/audit/audit.log" }
}Keep the existing readinessProbe.path, the unseal-keys volume + postStart
auto-unseal hook (it covers all 3 pods), dataStorage (now one RWO PVC per
replica: data-openbao-0/1/2), and auditStorage. The hetzner overlay's
storageClass: hcloud patch still applies. Set openbao_replicas: "3" in
k8s/clusters/prod/bootstrap/variables-cluster-config-map.yaml (and remove
openbao from the validate-replica-floor namespace exemptions) as the last
step, once the cluster is healthy.
Single-node only needed :8200. Raft peers talk on :8201. Before the
cutover:
- Cilium: allow pod-to-pod
:8201among openbao pods ink8s/bases/apps/.../networkpolicy(or the openbao netpol) — ingress on 8201 from the openbao pod selector. - Talos firewall (block mode): if openbao pods can land on different nodes,
ensure the node-to-node allowlist covers
:8201(cf. the Cilium mutual-auth:4250precedent —talos/.../NetworkRuleConfig).
This netpol/firewall prep is safe to land ahead of time (it only permits traffic that doesn't exist yet at 1 node) and is the only part of this migration that can go in via a normal PR.
This is cleaner on Kubernetes than bao operator migrate (which wants a stopped
process with both filesystems mounted) and it exercises the DR restore path.
- Snapshot the current vault (rollback point). With the single node unsealed:
For file storage there is no raft snapshot; rely on the existing
kubectl -n openbao exec openbao-0 -- bao operator raft snapshot save /tmp/pre-raft.snap 2>/dev/null \ || kubectl -n openbao exec openbao-0 -- sh -c 'bao read -format=json sys/storage/raft/snapshot' # file-mode: use the vault-snapshot CronJob output instead
vault-backupCronJob's latest snapshot, andvelero backupthe openbao namespace + PVCs. Verify the backup exists before proceeding. - Record the unseal key + root token (already in the
openbao-unsealSecret; confirm you can read it out-of-band — the SOPS-encrypted copy is the source of truth). - Apply the Raft values (above) +
openbao_replicas: "3"+ the netpol. Flux brings upopenbao-0/1/2with empty raft stores, sealed. - Initialise exactly once, on openbao-0:
With
kubectl -n openbao exec -ti openbao-0 -- bao operator init -key-shares=1 -key-threshold=1 # save the NEW unseal key + root token; update the openbao-unseal Secret (SOPS) kubectl -n openbao exec -ti openbao-0 -- bao operator unseal <new-unseal-key>
retry_join, openbao-1 and openbao-2 auto-join openbao-0; then unseal each (thepostStarthook does this once the Secret carries the new key, or unseal manually):kubectl -n openbao exec -ti openbao-1 -- bao operator unseal <new-unseal-key> kubectl -n openbao exec -ti openbao-2 -- bao operator unseal <new-unseal-key> kubectl -n openbao exec -ti openbao-0 -- bao operator raft list-peers # expect 3 voters
- Restore the data into the fresh raft cluster:
(If migrating from file storage where no raft snapshot exists, use
kubectl -n openbao exec -ti openbao-0 -- bao operator raft snapshot restore /tmp/pre-raft.snapbao operator migrateinstead — see the alternative path below — or re-seed via thevault-config/vault-seedJobs if the secret material is reproducible. Most KV/dynamic-secret config here is re-seedable fromvault-config, but app data and rotated DB passwords are not — prefer a real snapshot restore.) - Verify:
bao statuson all 3 (unsealed, one leader),bao secrets list, and confirm a sample ExternalSecret re-syncs. - Land the manifest (values +
openbao_replicas: "3"+ drop the openbao namespace fromvalidate-replica-floor) so the state is captured in Git.
Take OpenBao offline, then run bao operator migrate with
storage_source "file" + storage_destination "raft" + cluster_addr. It
copies at the storage layer without decrypting, so the existing Shamir key +
root token keep working and all KV/auth/policies/DB-engine config carry over.
After migrate the raft cluster has a single node; bring up replicas 1–2, let them
retry_join, and unseal. Heavier on Kubernetes (needs a stopped process with
both backends mounted), so the snapshot-restore path above is preferred.
| Risk | Mitigation |
|---|---|
| Flip values without migrating → empty cluster. | This runbook: snapshot/backup first, init once, restore. Never merge the values flip standalone. |
| Double-init split-brain (init races on >1 pod → two 1-node clusters). | bao operator init exactly once on openbao-0; let others retry_join. |
| Unseal-before-join (a follower unsealed with mismatched keys before joining corrupts membership). | Join first (retry_join handles it), then unseal followers with the new key. |
| Even-node quorum loss. | Always odd (3). 3 voters tolerate 1 failure; the chart's default PDB maxUnavailable: 1 is correct. |
| 8201 blocked → joins wedge. | Open Cilium netpol + Talos firewall for :8201 (above) before cutover. |
| Audit fail-closed × 3. | The file audit device fails closed when its PVC fills; now there are 3 audit-openbao-* PVCs — monitor all three kubelet_volume_stats_available_bytes. |
| No pre-cutover backup → unrecoverable. | Confirm a fresh vault-backup snapshot and a Velero backup of the openbao namespace + PVCs before step 3. |
A single 1-of-1 Shamir key in a Kubernetes Secret is a single point of
compromise. HA is a good moment to move to transit auto-unseal (a small
separate OpenBao/Vault via a seal "transit" {} stanza) or cloud KMS, and a
higher key-share threshold — this also removes the postStart unseal hook
dependency. Track separately.