Skip to content

SDK: default-protect dedicated (M10+) Atlas tiers from destructive actions #38

Description

@mikeumus

Summary

The SDK currently lets a destructive Atlas action (pause-cluster, disconnect, delete) be configured against a dedicated production-tier (M10+) cluster with no guardrail. This is a sharp edge that caused two production-MongoDB outages for the Divinci dogfood account (2026-06-12 and 2026-06-20): a placeholder mongodbStorageSizeGB: 10 threshold on an M30 cluster (~16.8 GB used) tripped a cost rule wired to pause-cluster, taking prod offline.

Crucially, pausing a fixed-tier (M10+) cluster does not save cost — the dedicated tier bills the same hourly rate regardless of storage/usage — so a destructive action there is all downside.

Proposed safe-by-default

Treat dedicated (M10+) Atlas clusters as protected-from-destructive-actions by default. A pause-cluster / disconnect / delete rule targeting a dedicated-tier cluster should be refused (or downgraded to alert-only) unless the operator sets an explicit, loud opt-in, e.g.:

{ type: "pause-cluster", target: "...", allowProductionPause: true }
  • Shared/serverless (M0/M2/M5, Flex) tiers keep today's behavior — pausing there genuinely saves cost.
  • Dedicated tiers default to alert-only (snapshot); destructive actions require the explicit flag.

This single default would have prevented both outages regardless of the threshold misconfiguration.

Context

  • Root-cause write-up: notebooks/2026-06-23-divinci-prod-atlas-pause-incident.md
  • Live remediation (done): prod cluster added to protectedServices, all prod rules are alert-only snapshot, and the stray divinci-stage-atlas-* rules (one still carrying pause-cluster) have been deleted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions