setup turbo benchmarks#24
Conversation
Those benchmarks are real-world scripts extracted from mainnet, and to which the script context has been pre-applied. They've been produced by https://github.com/r2rationality/turbocardano. This commit adds a preliminary setup to run those benchmarks, and measure our performance against real data. Using only data from epoch 519, 520 and 521, we already run into cases where the VM crashes. The reason for the crash could be plural: a bug in the VM implementation, or a bug in whomever produced the benchmark. Either way though, the VM should not crash but fail gracefully (or succeed, should the script actually be valid). Also, I have restricted the benchmarks to V3 only, since the semantic for V1 and V2 are still to be implemented. The goal from here would be to get all those benchmarks to pass; and ultimately compare with the Haskell & C++ implementations.
|
Closed in favor of #26 |
|
@yHSJ Thank you for looking into this! Were you able to run all the scripts from the set? I believe the Rust implementation should be able to achieve performance comparable to the C++ version, since I didn’t apply any particularly aggressive optimizations there (e.g., direct compilation to machine code). Additional confirmation of this performance potential would also help in discussions with the Haskell/Plutus team regarding prioritizing optimization work on their side. P.S. Do you have any feedback on the dataset format or the way it was constructed (a random sample from mainnet)? |
|
Hi @sierkov! To be honest, @jonathanlim222 did most of the work around this, I just did some assistance, so he may be better suited to answer your questions. As far as the dataset format, I think it makes perfect sense. I think we're definitely missing some of the possible "extremes" which may be interesting to evaluate, but this has been very helpful for sure. |
Those benchmarks are real-world scripts extracted from mainnet, and to
which the script context has been pre-applied.
They've been produced by https://github.com/r2rationality/turbocardano.
This commit adds a preliminary setup to run those benchmarks, and
measure our performance against real data. Using only data from epoch
519, 520 and 521, we already run into cases where the VM crashes.
The reason for the crash could be plural: a bug in the VM
implementation, or a bug in whomever produced the benchmark. Either
way though, the VM should not crash but fail gracefully (or succeed,
should the script actually be valid).
Also, I have restricted the benchmarks to V3 only, since the semantic
for V1 and V2 are still to be implemented.
The goal from here would be to get all those benchmarks to pass; and
ultimately compare with the Haskell & C++ implementations.
cc @yHSJ @jonathanlim222 @sierkov