test(pool): concurrent-client HTTP load test for the DB connection pool#225
Merged
Conversation
Follow-up to the connection pool (#210, shipped in #221). Unit coverage existed (pooled_database_shared_across_threads, concurrent_reader_writer_no_locks) but exercised the Database directly — not the HTTP/SSE transport — and the pool knobs were hard-coded, so the durability/throughput characteristics under sustained concurrent client load were unverified. Two changes: 1. Make the pool tunable via env (db.rs): MIMIR_POOL_MAX_SIZE (default 16) and MIMIR_BUSY_TIMEOUT_MS (default 5000). Defaults preserve prior behavior; this lets operators size the pool to their workload and lets the load test sweep the knobs. 2. Add an #[ignore]d load test (transport.rs) that drives the REAL HTTP transport — the same init_transport_state + build_transport_router + axum::serve path main.rs uses — with N concurrent ureq clients interleaving writes (mimir_remember, unique high-entropy bodies so each is a real create, not a dedup) and reads (mimir_recall + mimir_context). It asserts the four properties #223 calls out: no `database is locked` / SQLITE_BUSY after the busy_timeout, no lost writes (rows persisted == writes issued), no deadlock (all clients join), and reports p50/p99/max latency + throughput. It is #[ignore]d on purpose — a load/soak test, not a CI correctness gate (the contention characteristics "can't be proven by CI"). Run it explicitly and sweep: cargo test --release pool_load_test_http_transport -- --ignored --nocapture MIMIR_POOL_MAX_SIZE=4 MIMIR_LOADTEST_CLIENTS=32 cargo test ... -- --ignored Verified on x86_64-pc-windows-msvc (default 16x16 pass; pool=2 sweep pass). Closes #223 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #223.
Follow-up to the connection pool (#210, shipped in #221). The existing unit tests (
pooled_database_shared_across_threads,concurrent_reader_writer_no_locks) exerciseDatabasedirectly — not the HTTP/SSE transport — and the pool knobs were hard-coded, so the durability/throughput characteristics under sustained concurrent client load were unverified.Changes
1. Make the pool tunable via env (
src/db.rs)MIMIR_POOL_MAX_SIZE(default16)MIMIR_BUSY_TIMEOUT_MS(default5000)Defaults preserve prior behavior. This lets operators size the pool to their workload and lets the load test sweep the knobs the issue asks about.
2. Add an
#[ignore]d load test (src/transport.rs)Drives the real HTTP transport — the same
init_transport_state+build_transport_router+axum::servepathmain.rsuses — with N concurrentureqclients interleaving:mimir_rememberwith unique high-entropy bodies (so each is a real create, not a near-duplicate dedup — otherwisepersisted == issuedwouldn't test durability)mimir_recall+mimir_contextIt asserts the four properties #223 calls out:
database is locked/SQLITE_BUSYafter thebusy_timeoutWhy
#[ignore]It's a load/soak test, not a CI correctness gate — the contention characteristics "can't be proven by CI" (per the issue). Run it explicitly and sweep:
Tunables (env):
MIMIR_LOADTEST_CLIENTS(16),MIMIR_LOADTEST_WRITES/MIMIR_LOADTEST_READSper client (25 / 75).Verification (x86_64-pc-windows-msvc, debug)
Default
16clients ×16pool: 2800 requests, 400/400 writes persisted, 0 lock errors, 0 other errors, p50≈1ms / p99≈27ms. The defaultcargo testrun shows itignored(does not gate CI); existing pool/transport tests still pass.🤖 Generated with Claude Code