wasmerio · theduke · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/...ks/2026-05-07_fd-corruption-investigation/findings/1-fdlist-aggressive-reuse.md b/...ks/2026-05-07_fd-corruption-investigation/findings/1-fdlist-aggressive-reuse.md
@@ -0,0 +1,75 @@
+# Finding 1: `FdList` aggressively reuses the lowest freed descriptor
+
+## Verdict
+
+INVALID
+
+`FdList` reuses the lowest available descriptor on purpose, and the implementation is explicitly documented as matching Unix/WASI-style fd allocation semantics. By itself, this does not create wrong-file writes or fd corruption. It only makes any separate stale-fd bug easier to observe.
+
+## What the code does
+
+`lib/wasix/src/fs/fd_list.rs` is a small fd-slot allocator:
+
+- File header comment: "The Unix spec requires newly allocated FDs to always be the lowest-numbered FD available."
+- `insert_first_free()` fills `self.first_free` if a hole exists, otherwise appends (`fd_list.rs:67-84`).
+- `remove()` updates `self.first_free` to the lowest newly freed slot (`fd_list.rs:176-192`).
+- `create_fd_ext(..., idx: None, ...)` uses `guard.insert_first_free(fd)` for normal opens (`lib/wasix/src/fs/mod.rs:1838-1880`).
+- `clone_fd_ext()` uses `insert_first_free_after(...)`, which is the expected `dup`/`fcntl(F_DUPFD)` style behavior: lowest free fd at or above the requested minimum (`lib/wasix/src/fs/mod.rs:1887-1918`).
+
+The unit tests in `fd_list.rs` also intentionally lock this behavior in:
+
+- `can_append_in_holes()` expects a newly inserted fd to reuse slot `1` after `1` and `2` were freed.
+- `hole_moves_back_correctly()` expects `first_free` to move backward when a lower fd is freed.
+- `next_and_last_fd_reported_correctly()` expects `next_free_fd()` to become `3` after removing fd `3`, then `1` after removing fd `1`.
+
+So the "aggressive reuse" is not accidental; it is the designed and tested behavior.
+
+## Why this is not sufficient to prove corruption
+
+Reusing a closed fd number is normal. After `close(1)`, a later open returning `1` means fd `1` now refers to the new file description. A subsequent `fd_write(1, ...)` going to that new file is expected behavior, not corruption.
+
+Within this codebase, write operations generally use the current fd mapping or capture the underlying file handle explicitly:
+
+- `fd_write()` calls `state.fs.get_fd(fd)` first, clones the `Fd`, and then `fd_write_internal()` uses the captured inode/handle during the async write (`lib/wasix/src/syscalls/wasi/fd_write.rs:37-51`, `:128-225`).
+- `close_fd()` itself just removes the mapping from `fd_map`; it does not perform any write or flush by numeric fd after removal (`lib/wasix/src/fs/mod.rs:2303-2322`).
+
+That means `FdList` alone is only an allocator policy. It does not by itself create a path where bytes intended for one still-open file description are silently redirected into another file description.
+
+## Evidence that reuse is already assumed elsewhere
+
+Other code already treats fd-number reuse as normal and works around specific races that reuse can expose:
+
+- `fd_close()` has a special non-stdio path that captures the file handle *before* removing the fd, then flushes the captured handle afterward. The comment explicitly says this avoids an "fd-number reuse race" where async pre-close flush could otherwise affect a newly allocated descriptor with the same number (`lib/wasix/src/syscalls/wasi/fd_close.rs:57-60`).
+- `fd_renumber_internal()` holds one write lock across removing `to` and inserting the replacement specifically to stop another thread from allocating into the target slot between those operations (`lib/wasix/src/syscalls/wasi/fd_renumber.rs:64-67`).
+
+These are real reuse-sensitive sites, but they are separate suspects. Their existence is evidence that `FdList` reuse is expected platform behavior, not the bug by itself.
+
+## What would have to be true for reuse to matter
+
+For this suspect to become real corruption, another bug must already exist, for example:
+
+1. A code path keeps a raw numeric `WasiFd` after the logical file description has changed, then later uses that stale number.
+2. A code path does async work keyed by fd number instead of by a captured handle/inode.
+3. Internal runtime code incorrectly assumes fd `1`/`2` always means the original host stdout/stderr instead of the current guest mapping.
+
+I found nearby code that could fit those broader classes, but they are independent suspects:
+
+- `fd_close()` still uses `state.fs.flush(fd)` for stdio before removing the mapping, unlike the safer captured-handle path used for non-stdio (`lib/wasix/src/syscalls/wasi/fd_close.rs:47-79`). If that path races, the bug is in stdio close/flush ordering, not in `FdList`.
+- `/dev/stdout` and friends can resolve through `get_special_fd()` and then `clone_fd(fd)`, meaning they deliberately follow the current fd mapping (`lib/wasix/src/syscalls/wasix/path_open2.rs:337-349`).
+- `stderr_write()` writes via `WasiInodes::stderr_mut(&state.fs.fd_map)`, which also follows the current fd `2` mapping (`lib/wasix/src/syscalls/mod.rs:246-255`, `lib/wasix/src/fs/mod.rs:321-336`).
+
+Those behaviors may be surprising depending on intended stdio semantics, but they are not caused by `FdList` reusing the lowest free slot.
+
+## Bottom line
+
+`FdList` immediate reuse is expected and deliberately tested behavior. It does not independently explain fd corruption or wrong-file writes.
+
+Mark this suspect INVALID unless paired with another bug that mishandles stale numeric fds, async flush/write ordering, or stdio remapping semantics.
+
+## Best next investigation targets
+
+If a later agent wants a reproduction or a fix, the higher-value follow-ups are:
+
+- stdio-specific `fd_close()` ordering, because it still flushes by descriptor number before close;
+- any code that keeps raw `WasiFd` values across async boundaries;
+- helpers that treat current fd `1`/`2` as "stdout/stderr" (`/dev/stdout`, `stderr_write`, similar runtime helpers).
diff --git a/...d-corruption-investigation/findings/10-fd_write-clones-fd-before-async-write.md b/...d-corruption-investigation/findings/10-fd_write-clones-fd-before-async-write.md
@@ -0,0 +1,81 @@
+# Finding 10: `fd_write` clones `Fd` before async write
+
+## Verdict
+
+INVALID
+
+## Suspect
+
+`fd_write` obtains an `Fd` snapshot with `state.fs.get_fd(fd)` and passes that cloned `Fd` into `fd_write_internal()`. Inside the file path, `fd_write_internal()` clones the underlying `Arc<RwLock<Box<dyn VirtualFile>>>` before awaiting the actual write. Superficially, that means an in-flight write can continue to the old backing object even if the numeric fd is closed and reused while the write is suspended.
+
+## What the code actually does
+
+### 1. `fd_write` snapshots the descriptor entry once
+
+- `WasiFs::get_fd()` returns a cloned `Fd` from the fd table, not a live borrow into `fd_map` (`lib/wasix/src/fs/mod.rs:1514-1521`).
+- `fd_write()` captures that clone up front and passes it into `fd_write_internal()` (`lib/wasix/src/syscalls/wasi/fd_write.rs:37-51`).
+
+This means `fd_write_internal()` intentionally operates on a stable descriptor snapshot instead of re-looking up the numeric fd later.
+
+### 2. The file write path binds to the old backing handle, not the fd number
+
+In the `Kind::File` branch of `fd_write_internal()`:
+
+- it takes `fd_entry.inode.write()`,
+- clones `handle` out of `Kind::File { handle, .. }`,
+- drops the inode guard,
+- then performs the async write through that cloned handle (`lib/wasix/src/syscalls/wasi/fd_write.rs:152-159` and `:166-218`).
+
+After that point, the actual write no longer depends on the numeric fd slot. It is pinned to the previously captured `Arc<RwLock<Box<dyn VirtualFile>>>`.
+
+The tail of the function likewise updates cursor/stat state through the captured `fd_entry`, not by re-reading `fd_map` (`lib/wasix/src/syscalls/wasi/fd_write.rs:519-553`).
+
+### 3. Closing or reusing the numeric fd does not retarget that in-flight write
+
+`WasiFs::close_fd()` only removes the entry from `fd_map` (`lib/wasix/src/fs/mod.rs:2294-2312`).
+
+`FdList::remove()` correspondingly drops one inode handle count for the removed table entry (`lib/wasix/src/fs/fd_list.rs:176-191`), but that is only the fd-table ownership. The in-flight write still holds its own cloned `Arc` to the old `VirtualFile`, so the old backing object remains alive until the write finishes.
+
+When the same numeric fd is later reused, `FdList::insert*()` installs a new `Fd` entry and increments the new inode's handle count (`lib/wasix/src/fs/fd_list.rs:67-83`, `:140-174`). Reuse fills the slot with a different `Fd`; it does not mutate the `Fd` snapshot already captured by `fd_write`.
+
+### 4. This lifetime model is deliberate elsewhere too
+
+Two nearby code paths show that WASIX intentionally preserves the old backing object across async descriptor-table changes:
+
+- `fd_close()` explicitly captures the file handle before removing the fd so a delayed flush cannot affect a newly reused descriptor number (`lib/wasix/src/syscalls/wasi/fd_close.rs:69-79`).
+- `fd_renumber()` keeps remove/insert atomic under one lock, then flushes the old target handle afterward via a captured cloned handle (`lib/wasix/src/syscalls/wasi/fd_renumber.rs:64-125`).
+
+There is also an explicit regression test for the same design in `fd_close.rs`, `stdio_close_does_not_remove_replacement_fd`, which verifies that a delayed flush on the old stdout does not delete or disturb a replacement installed at fd 1 while the flush is pending (`lib/wasix/src/syscalls/wasi/fd_close.rs:399-476`).
+
+## Why this is not fd-corruption
+
+The suspicious behavior is real in the narrow sense: an async `fd_write` can continue against the old file object after the numeric fd has been closed or rebound.
+
+However, that is not descriptor corruption in this codebase. It is the mechanism that prevents descriptor corruption:
+
+- the in-flight write stays attached to the old file description,
+- the newly reused fd number points at a different `Fd` entry,
+- and the old write path never switches over to the new occupant of that fd slot.
+
+So the observed behavior is "write completes on the object that was open when the syscall started", not "write is redirected to whoever later acquired the same fd number".
+
+That matches the rest of WASIX's fd-table design:
+
+- descriptor duplication and renumbering intentionally share inode/offset state (`lib/wasix/src/fs/mod.rs:1877-1908`, `lib/wasix/src/syscalls/wasi/fd_renumber.rs:86-99`);
+- close/renumber already contain comments and tests specifically aimed at preventing fd-number reuse races.
+
+## Important nuance
+
+If someone expected "close in another thread immediately cancels a blocked write", this code does not provide that behavior. An in-flight `fd_write` may still complete on the old backing object.
+
+But that is a semantics question, not wrong-target corruption. By itself, this suspect does **not** explain writes landing in a newly reused descriptor.
+
+## Out-of-scope note
+
+`fd_write_internal()` does still use the numeric `fd` afterward for journal snapshotting under `#[cfg(feature = "journal")]` (`lib/wasix/src/syscalls/wasi/fd_write.rs:502-513`). The repository note for this investigation explicitly says journal/replay paths are out of scope, so that does not make this suspect valid for the current fd-corruption hunt.
+
+## Bottom line
+
+This suspect is **INVALID** for the corruption question.
+
+`fd_write` capturing the old `Fd`/handle before the async boundary is intentional and is the reason an fd-number reuse does **not** redirect the in-flight write to the new descriptor occupant.
diff --git a/...026-05-07_fd-corruption-investigation/findings/11-offset-updates-not-coupled.md b/...026-05-07_fd-corruption-investigation/findings/11-offset-updates-not-coupled.md
@@ -0,0 +1,170 @@
+# Suspect 11: offset updates are partly atomic but not coupled to file seek/write
+
+## Verdict
+
+VALID
+
+## Short conclusion
+
+This is a real bug for regular files that share one open-file description offset via dup-like operations. The code intentionally shares the logical cursor with `Arc<AtomicU64>`, but `fd_write` does not reserve and consume that cursor under the same critical section as the actual `seek`/`write`, and `fd_seek(Whence::Cur)` / `fd_seek(Whence::Set)` update only the atomic offset. As a result, concurrent operations on duplicated fds can write at stale positions, overwrite earlier data, and report a logical offset / `st_size` that does not match the real file contents.
+
+This does not need any separate bug such as fd-number reuse or close/reopen races. It is an intra-file corruption / wrong-position bug on its own.
+
+## Why duplicated fds really share one offset
+
+The fd layer stores the cursor as shared state:
+
+- `lib/wasix/src/fs/fd.rs:36-40` defines `FdInner.offset` as `Arc<AtomicU64>`.
+- `lib/wasix/src/fs/mod.rs:1887-1894` (`clone_fd_ext`) clones that same `offset` arc into the new fd.
+- `lib/wasix/src/syscalls/wasi/fd_renumber.rs:86-89` also clones the same `offset` arc.
+
+So this code is clearly trying to model shared-offset dup/dup2/renumber semantics, not independent cursors per numeric fd.
+
+## Relevant write/seek behavior
+
+### `fd_write` snapshots the offset before taking the file-handle lock
+
+`lib/wasix/src/syscalls/wasi/fd_write.rs:41-48`:
+
+- `fd_write()` loads `fd_entry.inner.offset` into a local `offset`.
+- It passes that captured value into `fd_write_internal()`.
+
+Then `lib/wasix/src/syscalls/wasi/fd_write.rs:167-178`:
+
+- The regular-file path takes `handle.write()` and seeks the underlying file handle to the previously captured `offset`.
+- In append mode it first reads `inode.stat.st_size`, stores that into the shared atomic offset, and then seeks to that value.
+
+Only after the async write finishes does `lib/wasix/src/syscalls/wasi/fd_write.rs:519-547`:
+
+- `fetch_add(bytes_written)` on the shared atomic offset, and
+- update `inode.stat.st_size`.
+
+That means the operation is split into:
+
+1. read offset,
+2. later seek/write under the handle lock,
+3. later update shared offset and cached size after the write.
+
+Those three steps are not one atomic transaction.
+
+### `fd_seek` is only partly coupled
+
+`lib/wasix/src/syscalls/wasi/fd_seek.rs:66-84`:
+
+- `Whence::Cur` updates only `fd_entry.inner.offset` with `fetch_add` / `fetch_sub`.
+- It does not lock or reposition the underlying file handle.
+
+`lib/wasix/src/syscalls/wasi/fd_seek.rs:138-142`:
+
+- `Whence::Set` only stores to the atomic offset.
+
+`lib/wasix/src/syscalls/wasi/fd_seek.rs:87-110`:
+
+- `Whence::End` does take the handle lock and performs `handle.seek(SeekFrom::End(...))`, then stores the result back into the shared atomic offset.
+
+So `Whence::End` is somewhat coupled to the handle, but `Cur` and `Set` are purely logical cursor mutations. In single-threaded use that can still work because later reads/writes explicitly seek from the atomic cursor. Under concurrency, it is not enough.
+
+## Concrete bad interleavings
+
+### 1. Two concurrent `fd_write`s can overwrite each other
+
+Precondition:
+
+- one regular file,
+- duplicated fd sharing the same `Arc<AtomicU64>`,
+- initial shared offset = 0.
+
+Interleaving:
+
+1. writer A enters `fd_write()` and loads offset `0`.
+2. writer B enters `fd_write()` and also loads offset `0`.
+3. A acquires the file-handle lock, seeks to `0`, writes 100 bytes, releases the handle lock.
+4. Before A runs the post-write `fetch_add(100)`, B acquires the file-handle lock, seeks to its stale captured offset `0`, writes 100 bytes, releases the handle lock.
+5. A and B each run `fetch_add(100)`, so the shared logical offset becomes `200`.
+
+Result:
+
+- the second write can overwrite the first at byte range `0..100`,
+- the logical cursor advances to `200`,
+- cached `st_size` may also advance to `200`,
+- but the actual file may still only contain 100 bytes of final data.
+
+This is not benign shared-offset behavior. Shared-offset semantics should serialize consumption of the current offset, not let two writers reserve the same start position.
+
+### 2. `fd_seek(Whence::Cur)` racing with `fd_write` can place the write at the wrong position
+
+Precondition:
+
+- shared offset = 0,
+- one thread does `fd_seek(fd, +4096, Cur)`,
+- another thread does `fd_write(fd, ...)`.
+
+Interleaving:
+
+1. writer loads offset `0` in `fd_write()`.
+2. seeker executes `fetch_add(4096)` in `fd_seek(Whence::Cur)`, so the shared logical cursor becomes `4096`.
+3. writer acquires the file-handle lock and seeks to its stale captured offset `0`.
+4. writer writes 4096 bytes.
+5. writer post-increments the shared offset by 4096, so the logical cursor becomes `8192`.
+
+Expected shared-cursor behavior would place that write at offset `4096`. Instead it can be written at `0`, while the logical cursor reports `8192`.
+
+The same stale-offset problem also exists for `Whence::Set`, because it only stores the new logical cursor without coordinating with in-flight writes that already captured the old value.
+
+### 3. Append mode is also vulnerable
+
+`lib/wasix/src/syscalls/wasi/fd_write.rs:169-172` uses cached `inode.stat.st_size` to choose the append position, but `st_size` is only updated later in `lib/wasix/src/syscalls/wasi/fd_write.rs:537-547`, after the write completes and outside the write critical section.
+
+So two concurrent append writers can both:
+
+1. observe the same old `st_size`,
+2. seek to the same end position,
+3. write overlapping data,
+4. then independently advance the logical cursor and cached `st_size`.
+
+That is a direct append corruption risk.
+
+## Why this is a real bug, not expected POSIX-like behavior
+
+Sharing one cursor across dup'd descriptors is expected. The bug is that the implementation shares only the numeric offset value, not the full "consume current offset and perform I/O there" operation.
+
+If the intended model is a shared open-file description, then the start position used by a write must be coordinated with the write itself. Here, the code:
+
+- reads the shared cursor too early,
+- performs I/O later under a different lock,
+- and publishes the cursor advance only afterward.
+
+That leaves a race window large enough for wrong-position writes and metadata divergence. So the suspect is VALID even without involving any other fd-corruption candidate.
+
+## Scope / preconditions
+
+- Affects regular-file operations, not stdio writes (stdio skips the regular-file seek path).
+- Requires concurrent operations on the same shared file description, for example after `fd_dup`, `fd_dup2`, or `fd_renumber`.
+- Most damaging cases are concurrent `fd_write`/`fd_write`, `fd_seek(Cur|Set)`/`fd_write`, and concurrent append writes.
+- Single-threaded use will usually look fine because each later operation re-seeks from the current atomic offset.
+
+## Reproduction guidance for a later agent
+
+Useful deterministic repro shapes:
+
+1. Open a regular file, duplicate the fd, then have two guest threads write distinct 100-byte patterns through the two fds at the same time. If the race lands, one pattern overwrites the other while the final logical offset reports 200.
+2. Open a regular file, duplicate the fd, have one thread do `fd_seek(Cur, big_delta)` while another thread writes once. Look for data written at the old offset while `fd_tell` / later writes behave as if the cursor advanced.
+3. Open a regular file with append semantics and race two append writers. Look for overlapping writes and `st_size` larger than the actual file content length.
+
+Instrumentation points:
+
+- entry offset load in `fd_write()` (`lib/wasix/src/syscalls/wasi/fd_write.rs:41`),
+- actual `handle.seek()` in `fd_write_internal()` (`lib/wasix/src/syscalls/wasi/fd_write.rs:175-178`),
+- post-write `fetch_add()` / `st_size` update (`lib/wasix/src/syscalls/wasi/fd_write.rs:520-547`),
+- `fd_seek(Whence::Cur)` atomic adjustment (`lib/wasix/src/syscalls/wasi/fd_seek.rs:67-84`).
+
+## Fix direction
+
+The later fix will likely need one shared critical section for regular files that covers:
+
+- choosing the start offset,
+- seeking / writing on the underlying file handle,
+- publishing the new logical offset,
+- and updating cached size / append position.
+
+An atomic integer alone is not sufficient once the real file position is maintained by a separate locked `VirtualFile` handle.