fix(benchmarks): robust retry/backoff for transient provider 503s#88
Merged
Conversation
…backoff retry Real Gemini pilot produced 4 transient 503 UNAVAILABLE failures; current retry_max=2 with a flat ~1s backoff was too tight. Add a small, scientifically controlled retry layer: - providers.base: TransientProviderError + is_transient_error classifier for HTTP 429/500/502/503/504, gRPC UNAVAILABLE / RESOURCE_EXHAUSTED / DEADLINE_EXCEEDED, and read-timeout signatures. - gemini_adapter: wraps SDK exceptions and re-raises transient ones as TransientProviderError so retries only fire on retryable conditions. - executor: exponential backoff with jitter, capped at retry_backoff_max_s; retries only on transient errors; permanent errors abort immediately. Log retried_attempts, per-attempt error class/type/sleep, final_error_class, and cumulative_retry_delay_s in both raw_outputs.jsonl and errors.jsonl. - runner CLI: --retry-max default 5 (cap 8), --retry-backoff default 2s (cap 10s), --retry-backoff-max default 30s (cap 30s), --retry-jitter 0.25. - workflow: expose retry_max/retry_backoff/retry_backoff_max inputs with the same caps; surface them in the job summary. - tests: classifier coverage, retry-after-503-then-success, persistent failure surfaces with full trace, no retry on permanent auth/config errors, runner caps, manifest records retry settings (18 new tests, 66 total pass; zero network calls in CI). Failures are still written to errors.jsonl as before — retries do not hide them.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The first real v4.1 Gemini pilot produced 4 transient
503 UNAVAILABLE high demandfailures. Existing retry wasretry_max=2with a flat ~1s backoff and no error classification, so transient infrastructure flakes surfaced as noisy benchmark failures.This PR adds a small, scientifically controlled retry layer before any larger pilot/full runs:
providers.base.is_transient_error+TransientProviderError) recognises HTTP429/500/502/503/504, gRPCUNAVAILABLE/RESOURCE_EXHAUSTED/DEADLINE_EXCEEDED, read timeouts, and overloaded/high-demand signatures.TransientProviderErrorso the executor only retries on conditions where a retry can plausibly succeed.retry_backoff_max_s. Permanent errors (auth, config, schema) abort immediately and are still recorded — retries do not hide them.retried_attempts, per-attempterror_class/error_type/sleep_s,final_error_class, andcumulative_retry_delay_sin bothraw_outputs.jsonlanderrors.jsonl. The run manifest records the full retry configuration.--retry-max(default 5, cap 8),--retry-backoff(default 2s, cap 10s),--retry-backoff-max(default 30s, cap 30s),--retry-jitter(default 0.25).retry_max,retry_backoff,retry_backoff_maxworkflow_dispatch inputs, validates and caps them, and surfaces them in the job summary. Low default concurrency is unchanged.Concurrency defaults stay low (default 1, workflow cap 2); the retry layer prefers waiting over hammering.
Testing
python3 -m pytest benchmarks/v4.1/tests/ -q→ 66 passed (18 new intest_retry_backoff.py).--retry-max/--retry-backoff-max; manifest records retry settings.yaml.safe_loadvalidates the updated workflow.No publish / no tag / no release / no Zenodo / no npm / no PyPI.
🤖 Generated by Computer