Skip to content

bug: auto-disconnect uses the SAME threshold as alerting (caused a prod outage) #21

Description

@mikeumus

Priority: HIGH — this caused a production incident.

Gating auto-disconnect on the SAME threshold as alerting means either (a) any over-threshold service is auto-killed, or (b) you raise the threshold and lose early alerting.

It took down a live service for us: a legitimate high-traffic Durable Object (a voice agent, ~1.5M reqs/day = $0.23/day) was auto-disconnected because the default DO threshold (1M reqs) doubled as the kill threshold. A $0.23 cost signal caused a 2am outage.

Fix we adopted: a DISCONNECT_THRESHOLD_MULTIPLIER (default 10) — alerts fire at 1× the threshold, auto-disconnect/delete only above N×. Strongly recommend the self-hosted package default to alert-only, or ship this multiplier defaulting high. (Simpler cousin of the managed product's metric-category gating.)

Reference implementation (built + running in prod at Divinci): Divinci-AI/cloudflare-billing-kill-switch.
From FEEDBACK-from-divinci-deployment.md — real-world findings from the Divinci self-hosted deployment, 2026-06-17.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:auto-disconnectKill actions, restore, route removalbugSomething isn't workingpriority:highCritical — outage risk or biggest coverage gap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions