Skip to content

Delete a Query's messages from the memory backend when the Query is deleted #2533

@CDimonaco

Description

@CDimonaco

Follow-up of the broker backend roadmap (#2153). Tie the message lifecycle to the Query lifecycle: when a Query is deleted, its messages are deleted from the memory backend.

Why

Now that messages can live in a durable backend (Phase 1, #2418, messages on Postgres), they outlive their owner. When a Query is deleted — from the dashboard, via the API, with kubectl, or automatically on TTL expiry — nothing removes the messages it produced. They remain in the backend until their independent expires_at, and for a user-initiated delete that happens before TTL, they stay fully visible and queryable with no owner.

This is the asymmetry raised on #2153 (the "delete-event / cascade-delete" question): chunk cleanup is anchored to a positive fact (POST /messages), but there is no equivalent cleanup anchored to the deletion of a Query. While the in-memory backend made this mostly invisible (state was lost on restart anyway), with a persistent backend it becomes orphaned conversational data whose owner no longer exists — which matters for predictability, for the sessions read-model, and for any "forget this query" / erasure expectation.

The goal: deleting a Query is the single lifecycle event that also clears the messages it generated, for every deletion path.

What it does

When a Query is finalized (any deletion path), the controller asks the Query's memory backend to delete that query's messages. Concretely:

  • Where: the existing Query finalizer in the controller. This is the only chokepoint that intercepts all deletions — dashboard, API, kubectl, and the TTL-expiry path (which already routes through the same finalizer). Placing it anywhere else (e.g. only in ark-api) would miss kubectl and TTL.
  • What gets deleted: messages only, scoped to the query. Events and chunks are out of scope for now (in-memory / ephemeral; the same pattern can be applied later if needed).
  • How hard: a physical hard-delete of the query's messages, not a soft-delete — this is about removing orphaned data, not hiding it.

How

Delete becomes a first-class operation of the memory contract

Today the memory contract (the broker's HTTP API, and the Go MemoryInterface on the executor side) has no notion of "delete a query's messages" — only a conversation-scoped delete used elsewhere. Because the broker is just one implementation of the memory interface, this cleanup should be a standard memory operation, not a broker-specific endpoint:

  • REST contract: a new query-scoped resource, DELETE /queries/{queryId}/messages, sitting naturally alongside the existing DELETE /conversations/{conversationId}/queries/{queryId}/messages. Query-scoped (not conversation-scoped) because the query is the lifecycle key we are reacting to, and it does not depend on a conversation id being known.
  • Go parity: add the same operation to the MemoryInterface so the executor side stays consistent with the wire contract (real implementation for the HTTP-backed memory, no-op for the noop memory).
  • All message backends covered: Postgres deletes by query_id directly using the existing index; the in-memory / JSON-file backend (the same generic stream) already supports predicate deletion and persists it, so it is aligned for free.

Controller calls the backend itself, statelessly

The controller does not currently talk to the broker (only A2A to executors), and it must not import the executor package (wrong dependency direction — it would pull in executor-only concerns and risk cycles). So the controller performs its own small, stateless HTTP DELETE against the resolved memory address. Extracting a shared Go memory client between controller and executor was considered and deliberately deferred — it is only worth doing if the controller grows further interactions with the memory; for a single delete it would be over-engineering on the executor's hot path.

The broker URL is resolved the same way the executor resolves it: from the Memory resource the Query references — status.lastResolvedAddress when available, falling back to resolving the spec.address value source.

Resolving which memory a Query used

Messages exist only if the Query used a memory, and the resolution must mirror the executor's: use spec.memory when set, otherwise fall back to a Memory named default in the namespace; if neither exists, the Query never wrote messages and there is nothing to clean up. The finalizer replicates this resolution rather than assuming spec.memory is set.

Failure policy

Deletion is part of finalization, so it must not strand a Query forever if the backend is unreachable. The finalizer retries (the delete-by-query is idempotent, so retries and HA are safe) and blocks the Query's deletion until cleanup succeeds, up to a grace period, after which it gives up — logging an error and emitting a Kubernetes event — so a long backend outage cannot permanently block Query deletion. Backends that do not implement the operation (a custom memory) respond with a not-implemented status, which is treated as best-effort success rather than an error.

Testing approach

  • Unit coverage on the backend for the new delete-by-query (Postgres against a real container; the in-memory path via predicate) and on the Go memory parity method.
  • Controller tests covering all three memory-resolution paths — explicit spec.memory, fallback to the default memory, and no memory at all (no call made) — plus the failure-policy behaviours (retry/block, give-up after grace, referenced memory not found).
  • End-to-end (chainsaw) scenarios mirroring the existing query-broker tests, with a durable backend, asserting messages are present after the query completes and gone after the Query is deleted — one scenario with an explicit spec.memory and one relying on the default memory fallback.

Out of scope

  • Events, chunks, and traces cleanup (messages only for now).
  • Exposing a query-scoped delete through ark-api for external/dashboard callers (the controller calls the backend directly; revisit if there is a UX need).
  • Extracting a shared Go memory client between controller and executor (deferred until justified).

Metadata

Metadata

Assignees

Labels

productionProduction-readiness work

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions