fix(server): treat released query runner as a transient import error#21800
Closed
thomtrp wants to merge 2 commits into
Closed
fix(server): treat released query runner as a transient import error#21800thomtrp wants to merge 2 commits into
thomtrp wants to merge 2 commits into
Conversation
A torn-down DB connection (e.g. node-postgres query_timeout destroying a pooled connection mid-transaction) surfaces as TypeORM's QueryRunnerAlreadyReleasedError. During calendar/message import this fell through to the unknown-error path, marking the channel FAILED, flushing pending events, and firing a Sentry alert for what is a transient issue. Map QueryRunnerAlreadyReleasedError to a new retryable TwentyORMExceptionCode.QUERY_RUNNER_RELEASED in computeTwentyORMException and handle it as a temporary error in the calendar and messaging import exception handlers, mirroring the existing QUERY_READ_TIMEOUT handling.
|
👋 Thanks for contributing to Twenty! Your PR has been set to draft while you work on it. Once you're done, mark it as Ready for review and our automated checks will run. Looking forward to your contribution! |
…e root cause The released-runner error is a transient lifecycle race, so it is retried like other temporary errors instead of hard-failing the channel. The exact operation that escapes its transaction boundary could not be pinned statically, and the call site was masked because the error was re-wrapped twice, discarding the original stack. Preserve the original stack and cause when wrapping in computeTwentyORMException, and report the released-runner error to the exception handler on every occurrence so the originating query surfaces in monitoring and the root cause can be located.
🔍 Automated Pre-Review✅ No issues detected - This PR is ready for human review. Automated pre-review — human approval still required. |
|
Auto-closed by the Build Companion: this PR has only the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Sentry is reporting calendar import failures like:
The underlying error is TypeORM's
QueryRunnerAlreadyReleasedError, thrown when a query runs on a query runner whose connection was already torn down.Root cause
The workspace datasource pool sets node-postgres
query_timeout. Calendar events are saved inside a transaction whose reads run on the transaction's dedicated query runner. When a query on that runner exceedsquery_timeout, node-postgres destroys the underlying connection mid-flight; a follow-up read on the now-released runner throwsQueryRunnerAlreadyReleasedError. The Sentry breadcrumb fits exactly — it fails right after a successfulGET .../events [200], i.e. during the save/import DB work.This is a transient connection-lifecycle error, but it currently falls through to the unknown-error path, which marks the channel FAILED, flushes the pending events to import, and fires a Sentry alert. The sibling error "Query read timeout" (same
query_timeoutmechanism) is already classified as a temporary, retryable error — released-runner was just never mapped.Fix
Classify the released-runner error as transient/retryable, mirroring the existing
QUERY_READ_TIMEOUThandling:computeTwentyORMExceptionmapsQueryRunnerAlreadyReleasedError→ newTwentyORMExceptionCode.QUERY_RUNNER_RELEASED(the central error path all workspace query-builder operations flow through).QUERY_RUNNER_RELEASEDcode + user-friendly message.Now an occurrence increments
throttleFailureCountand reschedules instead of failing the channel + flushing events. If it genuinely persists past the throttle max attempts, it still escalates — so a real bug isn't hidden forever.Scope
This stops the false failures/alerts for a transient error. Fully preventing the timeout would mean reworking how long-running import transactions interact with
query_timeout— a larger, riskier change left for a separate effort if these timeouts prove frequent rather than sporadic.Testing
QueryRunnerAlreadyReleasedErrormapping (passes).