Skip to content

fix(server): treat released query runner as a transient import error#21800

Closed
thomtrp wants to merge 2 commits into
mainfrom
tt-fix-calendar-released-query-runner
Closed

fix(server): treat released query runner as a transient import error#21800
thomtrp wants to merge 2 commits into
mainfrom
tt-fix-calendar-released-query-runner

Conversation

@thomtrp

@thomtrp thomtrp commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Problem

Sentry is reporting calendar import failures like:

Unknown error importing calendar events for calendar channel … : Query runner already released. Cannot run queries anymore.

The underlying error is TypeORM's QueryRunnerAlreadyReleasedError, thrown when a query runs on a query runner whose connection was already torn down.

Root cause

The workspace datasource pool sets node-postgres query_timeout. Calendar events are saved inside a transaction whose reads run on the transaction's dedicated query runner. When a query on that runner exceeds query_timeout, node-postgres destroys the underlying connection mid-flight; a follow-up read on the now-released runner throws QueryRunnerAlreadyReleasedError. The Sentry breadcrumb fits exactly — it fails right after a successful GET .../events [200], i.e. during the save/import DB work.

This is a transient connection-lifecycle error, but it currently falls through to the unknown-error path, which marks the channel FAILED, flushes the pending events to import, and fires a Sentry alert. The sibling error "Query read timeout" (same query_timeout mechanism) is already classified as a temporary, retryable error — released-runner was just never mapped.

Fix

Classify the released-runner error as transient/retryable, mirroring the existing QUERY_READ_TIMEOUT handling:

  • computeTwentyORMException maps QueryRunnerAlreadyReleasedError → new TwentyORMExceptionCode.QUERY_RUNNER_RELEASED (the central error path all workspace query-builder operations flow through).
  • Added the QUERY_RUNNER_RELEASED code + user-friendly message.
  • Calendar and messaging import exception handlers treat the code as a temporary error.

Now an occurrence increments throttleFailureCount and reschedules instead of failing the channel + flushing events. If it genuinely persists past the throttle max attempts, it still escalates — so a real bug isn't hidden forever.

Scope

This stops the false failures/alerts for a transient error. Fully preventing the timeout would mean reworking how long-running import transactions interact with query_timeout — a larger, riskier change left for a separate effort if these timeouts prove frequent rather than sporadic.

Testing

  • New unit test for the QueryRunnerAlreadyReleasedError mapping (passes).
  • Lint + format clean on all changed files.

Review in cubic

A torn-down DB connection (e.g. node-postgres query_timeout destroying a
pooled connection mid-transaction) surfaces as TypeORM's
QueryRunnerAlreadyReleasedError. During calendar/message import this fell
through to the unknown-error path, marking the channel FAILED, flushing
pending events, and firing a Sentry alert for what is a transient issue.

Map QueryRunnerAlreadyReleasedError to a new retryable
TwentyORMExceptionCode.QUERY_RUNNER_RELEASED in computeTwentyORMException
and handle it as a temporary error in the calendar and messaging import
exception handlers, mirroring the existing QUERY_READ_TIMEOUT handling.
@twenty-ci-bot-public

Copy link
Copy Markdown

👋 Thanks for contributing to Twenty!

Your PR has been set to draft while you work on it. Once you're done, mark it as Ready for review and our automated checks will run.

Looking forward to your contribution!

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Re-trigger cubic

…e root cause

The released-runner error is a transient lifecycle race, so it is retried
like other temporary errors instead of hard-failing the channel. The exact
operation that escapes its transaction boundary could not be pinned
statically, and the call site was masked because the error was re-wrapped
twice, discarding the original stack.

Preserve the original stack and cause when wrapping in
computeTwentyORMException, and report the released-runner error to the
exception handler on every occurrence so the originating query surfaces in
monitoring and the root cause can be located.
@twenty-ci-bot-public

Copy link
Copy Markdown

🔍 Automated Pre-Review

No issues detected - This PR is ready for human review.


View details

Automated pre-review — human approval still required.

@twenty-eng-sync twenty-eng-sync Bot closed this Jun 18, 2026
@twenty-eng-sync

Copy link
Copy Markdown

Auto-closed by the Build Companion: this PR has only the -PR: draft label and has had no update in over 8 hours. Please reopen it once you have the bandwidth to take it forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant