Skip to content

Commit 5d18443

Browse files
committed
[vm] Record txn read sets for hot state promotion
Hot state promotions (the `to_make_hot` set in the block epilogue) are part of consensus, so their inputs must be deterministic. This reworks how that set is derived. ## Why the old derivation was problematic Promotions were fed from BlockSTM's read/write summary, which exists for conflict detection, not for this: - **Path-dependent.** The summary differs between parallel and sequential execution (e.g. aggregator v1 reads served by delta resolution are dropped in parallel but kept in sequential), so the same block could promote different keys depending on how it happened to execute — a divergence in a consensus-agreed artifact. - **Coupled to gas config.** It was only populated inside the `conflict_penalty_window` branch, so promotions silently depended on an unrelated block-gas knob. - **Incomplete write exclusion.** Its write side misses in-place delayed-field rewrites, aggregator v1 writes/deltas and module writes, so a key the block writes could still be promoted by the epilogue even though the write already makes it hot. ## Approach Record the read set at the VM boundary, where it is a pure function of the transaction and the pre-state — identical across parallel and sequential execution and independent of gas config: - `StorageAdapter` records resource, resource-group, table-item, aggregator v1 and config reads. Respawned sessions share one recorder so all of a transaction's sessions accumulate into a single set. - `ReadRecordingCodeStorage` wraps the code storage and records module fetches. It sits above the global module cache, so a module is recorded whether served from that cache, the per-block cache or storage. - A transaction's output carries the data and module keys as two collections; they are disjoint by construction, so consumers iterate a chained view and nothing pays to merge them. - Written keys are enumerated directly from the change set, covering every write kind the conflict summary missed. ## Worth calling out The new set intentionally differs from the old one — e.g. `exists<T>` now loads the resource and counts as a read — so it ships behind the existing hotness feature flag, which the mixed-version forge suites keep off (see the preceding forge commit) to avoid old/new nodes disagreeing on transaction output. Adds a per-block promotions histogram and tests covering sequential/parallel parity, each read and write kind, and discard handling. Follow-ups: read-kind/hotness tagging and an on-chain, byte-based promotion cap. ## Performance The recording runs unconditionally, so its per-transaction cost has to stay small. Even a trivial transaction fetches 10+ framework modules and re-reads the same resources across its sessions, so the hot paths are engineered around interning, allocation and clone traffic: - A module's `StateKey` is a pure function of `(address, name)`, yet interning one goes through the global, lock-guarded key registry. A bounded per-thread memo serves repeats, and module reads are recorded directly as memoized `StateKey`s: a transaction allocates no owned module ids, and extraction moves the set out instead of interning. - The interpreter fetches the same module many times in a row. A last-recorded fast path (a reused string buffer, so module switches allocate nothing) skips the bookkeeping for such runs, and the recording wrapper's delegators are `#[inline]` so wrapping does not deoptimize the module fetch path. - All recording sets are keyed by `StateKey` — which carries a precomputed 32-byte hash — under FxHash (via the maintained `rustc-hash`), pre-sized past the typical framework-module count, and keys are cloned only on first sighting; repeats, e.g. framework modules touched by every transaction, are the common case. Measured on the single-node benchmark with interleaved paired runs, together with the preceding `to_make_hot` ordering commit: ~2-2.5% net mean TPS cost across representative workloads, and ~0.3-0.4us per transaction on the sequential move-e2e micros. An ablation build with recording no-oped attributes the entire cost to the recording machinery itself (diffuse per-transaction set and clone work rather than any single hot symbol) and confirms the accumulator commit offsets part of it. Candidate follow-ups if more needs to come back: pointer-hashed recorder sets (`StateKey` equality is already pointer identity), pooling the per-transaction sets, and an indexed key registry to cut refcount traffic on hot shared keys.
1 parent 1520f5a commit 5d18443

26 files changed

Lines changed: 1347 additions & 21 deletions

File tree

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

aptos-move/aptos-vm-types/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ move-core-types = { workspace = true }
2929
move-vm-runtime = { workspace = true }
3030
move-vm-types = { workspace = true }
3131
rand = { workspace = true }
32+
rustc-hash = { workspace = true }
3233
serde = { workspace = true }
3334
triomphe = { workspace = true }
3435

aptos-move/aptos-vm-types/src/module_and_script_storage/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
pub mod code_storage;
55
pub mod module_storage;
6+
pub mod read_recording;
67

78
mod state_view_adapter;
89
pub use state_view_adapter::{AptosCodeStorageAdapter, AsAptosCodeStorage};
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
// Copyright (c) Aptos Foundation
2+
// Licensed pursuant to the Innovation-Enabling Source Code License, available at https://github.com/aptos-labs/aptos-core/blob/main/LICENSE
3+
4+
#![allow(clippy::duplicated_attributes)]
5+
6+
use crate::{
7+
module_and_script_storage::module_storage::AptosModuleStorage,
8+
resolver::{ambassador_impl_BlockSynchronizationKillSwitch, BlockSynchronizationKillSwitch},
9+
};
10+
use ambassador::delegate_to_methods;
11+
use aptos_types::state_store::{state_key::StateKey, state_value::StateValueMetadata};
12+
use bytes::Bytes;
13+
use move_binary_format::{
14+
errors::{PartialVMResult, VMResult},
15+
file_format::CompiledScript,
16+
CompiledModule,
17+
};
18+
use move_core_types::{
19+
account_address::AccountAddress,
20+
identifier::{IdentStr, Identifier},
21+
language_storage::ModuleId,
22+
};
23+
use move_vm_runtime::{
24+
ambassador_impl_LayoutCache, ambassador_impl_WithRuntimeEnvironment, LayoutCache,
25+
LayoutCacheEntry, Module, ModuleStorage, RuntimeEnvironment, Script, StructKey,
26+
WithRuntimeEnvironment,
27+
};
28+
use move_vm_types::code::{ambassador_impl_ScriptCache, Code, ScriptCache};
29+
use rustc_hash::{FxHashMap, FxHashSet};
30+
use std::{cell::RefCell, sync::Arc};
31+
32+
thread_local! {
33+
/// A module's `StateKey` is a pure function of its `(address, name)` and never changes, yet
34+
/// interning one goes through the global, lock-guarded `StateKey` registry. Worker threads
35+
/// re-read the same modules on every transaction, so memoize the interned keys per thread to
36+
/// keep that lock off the extraction path. Bounded so an unbounded module working set can't
37+
/// grow it without limit; the hot (framework) modules are seen first and stay cached.
38+
static MODULE_STATE_KEYS: RefCell<FxHashMap<AccountAddress, FxHashMap<Identifier, StateKey>>> =
39+
RefCell::new(FxHashMap::default());
40+
}
41+
42+
/// Cap on distinct addresses memoized per thread, bounding the memo under a very large or
43+
/// adversarial module working set.
44+
const MODULE_STATE_KEY_CACHE_MAX_ADDRESSES: usize = 1 << 13;
45+
46+
/// Interns a module `StateKey`, serving repeats from the per-thread memo so that the common case
47+
/// (a module already seen by this thread) avoids the global registry lock entirely.
48+
fn interned_module_state_key(address: &AccountAddress, name: &IdentStr) -> StateKey {
49+
MODULE_STATE_KEYS.with(|cache| {
50+
let mut cache = cache.borrow_mut();
51+
if let Some(key) = cache.get(address).and_then(|names| names.get(name)) {
52+
return key.clone();
53+
}
54+
let key = StateKey::module(address, name);
55+
// Only new addresses are gated by the cap; an already-cached address keeps accumulating
56+
// its modules so it is never left half-populated.
57+
if cache.len() < MODULE_STATE_KEY_CACHE_MAX_ADDRESSES || cache.contains_key(address) {
58+
cache
59+
.entry(*address)
60+
.or_default()
61+
.insert(name.to_owned(), key.clone());
62+
}
63+
key
64+
})
65+
}
66+
67+
/// Wraps a code storage and records every module the VM fetches through it, so that module
68+
/// accesses become part of the transaction's observed read set (the basis for hot state
69+
/// promotion).
70+
///
71+
/// Recorded reads are kept directly as interned `StateKey`s served from the per-thread memo,
72+
/// so steady state a transaction's recording performs no allocation beyond growing its key
73+
/// set: no owned module ids, and no interning pass at extraction.
74+
///
75+
/// Scripts are not state items, so script cache accesses are not recorded.
76+
pub struct ReadRecordingCodeStorage<'a, C> {
77+
code_storage: &'a C,
78+
module_reads: RefCell<FxHashSet<StateKey>>,
79+
/// The previously recorded `(address, name)`. The interpreter fetches the same module many
80+
/// times in a row, so this lets a burst of accesses skip the memo and set lookups. A
81+
/// `String` buffer rather than an `Identifier` so updating it on a module switch reuses
82+
/// the allocation instead of making a fresh one.
83+
last_recorded: RefCell<(AccountAddress, String)>,
84+
}
85+
86+
impl<'a, C> ReadRecordingCodeStorage<'a, C> {
87+
pub fn new(code_storage: &'a C) -> Self {
88+
Self {
89+
code_storage,
90+
// Even a trivial transaction touches 10+ framework modules through its prologue
91+
// and epilogue, so start with room for the typical count and skip the rehashes.
92+
module_reads: RefCell::new(FxHashSet::with_capacity_and_hasher(24, Default::default())),
93+
// Module names are never empty, so an empty name means "nothing recorded yet".
94+
last_recorded: RefCell::new((AccountAddress::ZERO, String::new())),
95+
}
96+
}
97+
98+
/// Returns the state keys of modules fetched so far, deduplicated by key.
99+
pub fn into_recorded_reads(self) -> FxHashSet<StateKey> {
100+
self.module_reads.into_inner()
101+
}
102+
103+
#[inline]
104+
fn record(&self, address: &AccountAddress, module_name: &IdentStr) {
105+
{
106+
// Fast path: a run of accesses to the same module needs no further work. Only an
107+
// exact (address, name) match is skipped, so the recorded set is identical either
108+
// way.
109+
let last = self.last_recorded.borrow();
110+
if last.0 == *address && last.1.as_str() == module_name.as_str() {
111+
return;
112+
}
113+
}
114+
let key = interned_module_state_key(address, module_name);
115+
self.module_reads.borrow_mut().insert(key);
116+
let mut last = self.last_recorded.borrow_mut();
117+
last.0 = *address;
118+
last.1.clear();
119+
last.1.push_str(module_name.as_str());
120+
}
121+
}
122+
123+
#[delegate_to_methods]
124+
#[delegate(
125+
WithRuntimeEnvironment,
126+
target_ref = "inner",
127+
where = "C: WithRuntimeEnvironment"
128+
)]
129+
#[delegate(LayoutCache, target_ref = "inner", where = "C: LayoutCache")]
130+
#[delegate(
131+
BlockSynchronizationKillSwitch,
132+
target_ref = "inner",
133+
where = "C: BlockSynchronizationKillSwitch"
134+
)]
135+
impl<C> ReadRecordingCodeStorage<'_, C> {
136+
/// Returns the wrapped code storage.
137+
fn inner(&self) -> &C {
138+
self.code_storage
139+
}
140+
}
141+
142+
impl<C: ModuleStorage> ModuleStorage for ReadRecordingCodeStorage<'_, C> {
143+
#[inline]
144+
fn unmetered_check_module_exists(
145+
&self,
146+
address: &AccountAddress,
147+
module_name: &IdentStr,
148+
) -> VMResult<bool> {
149+
self.record(address, module_name);
150+
self.code_storage
151+
.unmetered_check_module_exists(address, module_name)
152+
}
153+
154+
#[inline]
155+
fn unmetered_get_module_bytes(
156+
&self,
157+
address: &AccountAddress,
158+
module_name: &IdentStr,
159+
) -> VMResult<Option<Bytes>> {
160+
self.record(address, module_name);
161+
self.code_storage
162+
.unmetered_get_module_bytes(address, module_name)
163+
}
164+
165+
#[inline]
166+
fn unmetered_get_module_hash_and_size(
167+
&self,
168+
address: &AccountAddress,
169+
module_name: &IdentStr,
170+
) -> VMResult<Option<([u8; 32], usize)>> {
171+
self.record(address, module_name);
172+
self.code_storage
173+
.unmetered_get_module_hash_and_size(address, module_name)
174+
}
175+
176+
#[inline]
177+
fn unmetered_get_module_size(
178+
&self,
179+
address: &AccountAddress,
180+
module_name: &IdentStr,
181+
) -> VMResult<Option<usize>> {
182+
self.record(address, module_name);
183+
self.code_storage
184+
.unmetered_get_module_size(address, module_name)
185+
}
186+
187+
#[inline]
188+
fn unmetered_get_deserialized_module(
189+
&self,
190+
address: &AccountAddress,
191+
module_name: &IdentStr,
192+
) -> VMResult<Option<Arc<CompiledModule>>> {
193+
self.record(address, module_name);
194+
self.code_storage
195+
.unmetered_get_deserialized_module(address, module_name)
196+
}
197+
198+
#[inline]
199+
fn unmetered_get_eagerly_verified_module(
200+
&self,
201+
address: &AccountAddress,
202+
module_name: &IdentStr,
203+
) -> VMResult<Option<Arc<Module>>> {
204+
self.record(address, module_name);
205+
self.code_storage
206+
.unmetered_get_eagerly_verified_module(address, module_name)
207+
}
208+
209+
#[inline]
210+
fn unmetered_get_lazily_verified_module(
211+
&self,
212+
module_id: &ModuleId,
213+
) -> VMResult<Option<Arc<Module>>> {
214+
self.record(module_id.address(), module_id.name());
215+
self.code_storage
216+
.unmetered_get_lazily_verified_module(module_id)
217+
}
218+
219+
#[cfg(fuzzing)]
220+
#[inline]
221+
fn unmetered_get_module_skip_verification(
222+
&self,
223+
address: &AccountAddress,
224+
module_name: &IdentStr,
225+
) -> VMResult<Option<Arc<Module>>> {
226+
self.record(address, module_name);
227+
self.code_storage
228+
.unmetered_get_module_skip_verification(address, module_name)
229+
}
230+
}
231+
232+
impl<C: AptosModuleStorage> AptosModuleStorage for ReadRecordingCodeStorage<'_, C> {
233+
#[inline]
234+
fn unmetered_get_module_state_value_metadata(
235+
&self,
236+
address: &AccountAddress,
237+
module_name: &IdentStr,
238+
) -> PartialVMResult<Option<StateValueMetadata>> {
239+
self.record(address, module_name);
240+
self.code_storage
241+
.unmetered_get_module_state_value_metadata(address, module_name)
242+
}
243+
}
244+
245+
#[delegate_to_methods]
246+
#[delegate(ScriptCache, target_ref = "as_script_cache")]
247+
impl<C> ReadRecordingCodeStorage<'_, C>
248+
where
249+
C: ScriptCache<Key = [u8; 32], Deserialized = CompiledScript, Verified = Script>,
250+
{
251+
/// Returns the wrapped script cache.
252+
fn as_script_cache(
253+
&self,
254+
) -> &dyn ScriptCache<Key = [u8; 32], Deserialized = CompiledScript, Verified = Script> {
255+
self.code_storage
256+
}
257+
}

aptos-move/aptos-vm-types/src/output.rs

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ use move_core_types::{
2424
};
2525
use move_vm_runtime::execution_tracing::Trace;
2626
use move_vm_types::delayed_values::delayed_field_id::DelayedFieldID;
27+
use rustc_hash::FxHashSet;
2728
use std::{collections::BTreeMap, mem};
2829

2930
/// Output produced by the VM after executing a transaction.
@@ -261,3 +262,28 @@ impl VMOutput {
261262
self.into_transaction_output()
262263
}
263264
}
265+
266+
/// A transaction's read set, used for hot-state promotion. Unordered at the
267+
/// per-transaction level; ordering is imposed later when aggregating per-block.
268+
///
269+
/// Data and module keys are kept as recorded. Both sides are already deduplicated and
270+
/// they can never contain the same key (module and data state keys are disjoint), so
271+
/// merging them into one set would only re-hash every module key.
272+
#[derive(Clone, Debug, Default)]
273+
pub struct UnorderedReadSet {
274+
data_keys: FxHashSet<StateKey>,
275+
module_keys: FxHashSet<StateKey>,
276+
}
277+
278+
impl UnorderedReadSet {
279+
pub fn new(data_keys: FxHashSet<StateKey>, module_keys: FxHashSet<StateKey>) -> Self {
280+
Self {
281+
data_keys,
282+
module_keys,
283+
}
284+
}
285+
286+
pub fn iter(&self) -> impl Iterator<Item = &StateKey> {
287+
self.data_keys.iter().chain(self.module_keys.iter())
288+
}
289+
}

aptos-move/aptos-vm-types/src/resolver.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
// Copyright (c) Aptos Foundation
22
// Licensed pursuant to the Innovation-Enabling Source Code License, available at https://github.com/aptos-labs/aptos-core/blob/main/LICENSE
33

4+
use ambassador::delegatable_trait;
45
use aptos_aggregator::resolver::{TAggregatorV1View, TDelayedFieldView};
56
use aptos_types::{
67
serde_helper::bcs_utils::size_u32_as_uleb128,
@@ -22,6 +23,7 @@ use std::collections::{BTreeMap, HashMap};
2223
/// Allows requesting an immediate interrupt to ongoing transaction execution. For example, this
2324
/// allows an early return from a useless speculative execution when block execution has already
2425
/// halted (e.g. due to gas limit, committing only a block prefix).
26+
#[delegatable_trait]
2527
pub trait BlockSynchronizationKillSwitch {
2628
fn interrupt_requested(&self) -> bool;
2729
}

aptos-move/aptos-vm/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ once_cell = { workspace = true }
7373
ouroboros = { workspace = true }
7474
rand = { workspace = true }
7575
rayon = { workspace = true }
76+
rustc-hash = { workspace = true }
7677
serde = { workspace = true }
7778
triomphe = { workspace = true }
7879

0 commit comments

Comments
 (0)