Skip to content

[forge] saturate encrypted max-load test (backlog 100 -> 4000)#19892

Merged
ibalajiarun merged 4 commits into
mainfrom
ibalajiarun/forge-100-node-encrypted-tps
Jun 2, 2026
Merged

[forge] saturate encrypted max-load test (backlog 100 -> 4000)#19892
ibalajiarun merged 4 commits into
mainfrom
ibalajiarun/forge-100-node-encrypted-tps

Conversation

@ibalajiarun

@ibalajiarun ibalajiarun commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • realistic_env_max_load_encrypted had mempool_backlog = 100, throttling the inner traffic well below the ~2k peak encrypted TPS — so despite the name it wasn't actually a max-load test.
  • Bumped to 4000 (~2s of work outstanding at peak), so the test now actually saturates the encrypted-txn pipeline.

Run at 100 validators by passing FORGE_NUM_VALIDATORS=100 to the adhoc-forge workflow — no separate test variant needed (mainnet_calibrated_for_validator_count only branches at >100, so chaos is identical at 5 vs 100).

Test plan

  • Adhoc forge run: FORGE_TEST_SUITE=realistic_env_max_load_encrypted, FORGE_NUM_VALIDATORS=100
  • Confirm inner traffic saturates and reported TPS is near ~2k

🤖 Generated with Claude Code

@ibalajiarun ibalajiarun added the CICD:build-images when this label is present github actions will start build+push rust images from the PR. label May 26, 2026
@ibalajiarun ibalajiarun marked this pull request as ready for review June 1, 2026 18:19
@ibalajiarun ibalajiarun enabled auto-merge (squash) June 1, 2026 18:20

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 885180e. Configure here.

}))
.with_pfn_override_node_config_fn(Arc::new(|config, _| {
config.api.allow_encrypted_txns_submission = true;
}))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PFN config missing consensus observer and connectivity settings

Medium Severity

The newly added with_pfn_override_node_config_fn only sets allow_encrypted_txns_submission but is missing the standard PFN settings that every other test with PFNs includes: enable_validator_pfn_connections, observer_enabled, and the consensus observer fallback thresholds. Additionally, the validator override is missing enable_validator_pfn_connections = true. Since ENABLE_ON_PUBLIC_FULLNODES is false, observer_enabled won't auto-enable, so the 3 PFNs likely can't sync via consensus observer — potentially making them non-functional.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 885180e. Configure here.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

ibalajiarun and others added 4 commits June 1, 2026 17:07
mempool_backlog=100 was throttling the inner traffic well below peak
encrypted TPS (~2k). Bump to 4000 so the test actually measures max
throughput.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
At 100 validators with backlog 4000, encrypted MaxLoad committed
~960 TPS with P50 latency 3.6s — backlog saturated below peak. Bump
to 8000 to give the pipeline more headroom for benchmarking peak
throughput on longer runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
At 100 validators with backlog 8000, execution wasn't saturated and no
backpressure was firing — bottleneck was the single fullnode submission
path. Add 3 PFNs so the emitter can spread submissions and actually
load consensus/execution.

PFN config also opts into encrypted txn submission.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backlog 8000 overran the chain's saturation point (~960 TPS at 100
validators) and triggered an unrecoverable expiration cascade. 4000
sits right at the saturation knee with stable commit throughput.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ibalajiarun ibalajiarun force-pushed the ibalajiarun/forge-100-node-encrypted-tps branch from 885180e to dea3dd8 Compare June 2, 2026 00:07
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

✅ Forge suite compat success on 519f9927274a9f0da56887013c25bae8fcfea048 ==> dea3dd8f2c259ef74184161fa900aea28e1d2aad

Compatibility test results for 519f9927274a9f0da56887013c25bae8fcfea048 ==> dea3dd8f2c259ef74184161fa900aea28e1d2aad (PR)
1. Check liveness of validators at old version: 519f9927274a9f0da56887013c25bae8fcfea048
compatibility::simple-validator-upgrade::liveness-check : committed: 14968.78 txn/s, latency: 2299.14 ms, (p50: 2200 ms, p70: 2500, p90: 3300 ms, p99: 4200 ms), latency samples: 489840
2. Upgrading first Validator to new version: dea3dd8f2c259ef74184161fa900aea28e1d2aad
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6088.22 txn/s, latency: 5486.56 ms, (p50: 6100 ms, p70: 6200, p90: 6300 ms, p99: 6400 ms), latency samples: 212740
3. Upgrading rest of first batch to new version: dea3dd8f2c259ef74184161fa900aea28e1d2aad
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6266.33 txn/s, latency: 5379.04 ms, (p50: 5900 ms, p70: 6100, p90: 6200 ms, p99: 6400 ms), latency samples: 217540
4. upgrading second batch to new version: dea3dd8f2c259ef74184161fa900aea28e1d2aad
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 9747.45 txn/s, latency: 3477.95 ms, (p50: 3800 ms, p70: 3900, p90: 4000 ms, p99: 4100 ms), latency samples: 323160
5. check swarm health
Compatibility test for 519f9927274a9f0da56887013c25bae8fcfea048 ==> dea3dd8f2c259ef74184161fa900aea28e1d2aad passed
Test Ok

@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

✅ Forge suite framework_upgrade success on 519f9927274a9f0da56887013c25bae8fcfea048 ==> dea3dd8f2c259ef74184161fa900aea28e1d2aad

Compatibility test results for 519f9927274a9f0da56887013c25bae8fcfea048 ==> dea3dd8f2c259ef74184161fa900aea28e1d2aad (PR)
Upgrade the nodes to version: dea3dd8f2c259ef74184161fa900aea28e1d2aad
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2283.61 txn/s, submitted: 2289.80 txn/s, failed submission: 6.19 txn/s, expired: 6.19 txn/s, latency: 1245.82 ms, (p50: 1200 ms, p70: 1300, p90: 1800 ms, p99: 2700 ms), latency samples: 206681
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2315.32 txn/s, submitted: 2323.89 txn/s, failed submission: 8.57 txn/s, expired: 8.57 txn/s, latency: 1218.97 ms, (p50: 1200 ms, p70: 1300, p90: 1800 ms, p99: 2300 ms), latency samples: 210742
5. check swarm health
Compatibility test for 519f9927274a9f0da56887013c25bae8fcfea048 ==> dea3dd8f2c259ef74184161fa900aea28e1d2aad passed
Upgrade the remaining nodes to version: dea3dd8f2c259ef74184161fa900aea28e1d2aad
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 2365.60 txn/s, submitted: 2371.36 txn/s, failed submission: 5.76 txn/s, expired: 5.76 txn/s, latency: 1220.57 ms, (p50: 1200 ms, p70: 1200, p90: 1700 ms, p99: 2100 ms), latency samples: 213621
Test Ok

@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

✅ Forge suite realistic_env_max_load success on dea3dd8f2c259ef74184161fa900aea28e1d2aad

two traffics test: inner traffic : committed: 15598.45 txn/s, latency: 1088.72 ms, (p50: 1000 ms, p70: 1100, p90: 1300 ms, p99: 1600 ms), latency samples: 5825380
two traffics test : committed: 99.98 txn/s, latency: 845.97 ms, (p50: 800 ms, p70: 900, p90: 1000 ms, p99: 1400 ms), latency samples: 1760
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 0.286, avg: 0.248", "ConsensusProposalToOrdered: max: 0.111, avg: 0.107", "ConsensusOrderedToCommit: max: 0.146, avg: 0.133", "ConsensusProposalToCommit: max: 0.254, avg: 0.240"]
Max non-epoch-change gap was: 2 rounds at version 6616071 (avg 0.00) [limit 4], 1.17s no progress at version 6616071 (avg 0.06s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.25s no progress at version 3045644 (avg 0.25s) [limit 16].
Test Ok

@ibalajiarun ibalajiarun merged commit e057acc into main Jun 2, 2026
141 of 151 checks passed
@ibalajiarun ibalajiarun deleted the ibalajiarun/forge-100-node-encrypted-tps branch June 2, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CICD:build-images when this label is present github actions will start build+push rust images from the PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants