Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Finding 1: `FdList` aggressively reuses the lowest freed descriptor

## Verdict

INVALID

`FdList` reuses the lowest available descriptor on purpose, and the implementation is explicitly documented as matching Unix/WASI-style fd allocation semantics. By itself, this does not create wrong-file writes or fd corruption. It only makes any separate stale-fd bug easier to observe.

## What the code does

`lib/wasix/src/fs/fd_list.rs` is a small fd-slot allocator:

- File header comment: "The Unix spec requires newly allocated FDs to always be the lowest-numbered FD available."
- `insert_first_free()` fills `self.first_free` if a hole exists, otherwise appends (`fd_list.rs:67-84`).
- `remove()` updates `self.first_free` to the lowest newly freed slot (`fd_list.rs:176-192`).
- `create_fd_ext(..., idx: None, ...)` uses `guard.insert_first_free(fd)` for normal opens (`lib/wasix/src/fs/mod.rs:1838-1880`).
- `clone_fd_ext()` uses `insert_first_free_after(...)`, which is the expected `dup`/`fcntl(F_DUPFD)` style behavior: lowest free fd at or above the requested minimum (`lib/wasix/src/fs/mod.rs:1887-1918`).

The unit tests in `fd_list.rs` also intentionally lock this behavior in:

- `can_append_in_holes()` expects a newly inserted fd to reuse slot `1` after `1` and `2` were freed.
- `hole_moves_back_correctly()` expects `first_free` to move backward when a lower fd is freed.
- `next_and_last_fd_reported_correctly()` expects `next_free_fd()` to become `3` after removing fd `3`, then `1` after removing fd `1`.

So the "aggressive reuse" is not accidental; it is the designed and tested behavior.

## Why this is not sufficient to prove corruption

Reusing a closed fd number is normal. After `close(1)`, a later open returning `1` means fd `1` now refers to the new file description. A subsequent `fd_write(1, ...)` going to that new file is expected behavior, not corruption.

Within this codebase, write operations generally use the current fd mapping or capture the underlying file handle explicitly:

- `fd_write()` calls `state.fs.get_fd(fd)` first, clones the `Fd`, and then `fd_write_internal()` uses the captured inode/handle during the async write (`lib/wasix/src/syscalls/wasi/fd_write.rs:37-51`, `:128-225`).
- `close_fd()` itself just removes the mapping from `fd_map`; it does not perform any write or flush by numeric fd after removal (`lib/wasix/src/fs/mod.rs:2303-2322`).

That means `FdList` alone is only an allocator policy. It does not by itself create a path where bytes intended for one still-open file description are silently redirected into another file description.

## Evidence that reuse is already assumed elsewhere

Other code already treats fd-number reuse as normal and works around specific races that reuse can expose:

- `fd_close()` has a special non-stdio path that captures the file handle *before* removing the fd, then flushes the captured handle afterward. The comment explicitly says this avoids an "fd-number reuse race" where async pre-close flush could otherwise affect a newly allocated descriptor with the same number (`lib/wasix/src/syscalls/wasi/fd_close.rs:57-60`).
- `fd_renumber_internal()` holds one write lock across removing `to` and inserting the replacement specifically to stop another thread from allocating into the target slot between those operations (`lib/wasix/src/syscalls/wasi/fd_renumber.rs:64-67`).

These are real reuse-sensitive sites, but they are separate suspects. Their existence is evidence that `FdList` reuse is expected platform behavior, not the bug by itself.

## What would have to be true for reuse to matter

For this suspect to become real corruption, another bug must already exist, for example:

1. A code path keeps a raw numeric `WasiFd` after the logical file description has changed, then later uses that stale number.
2. A code path does async work keyed by fd number instead of by a captured handle/inode.
3. Internal runtime code incorrectly assumes fd `1`/`2` always means the original host stdout/stderr instead of the current guest mapping.

I found nearby code that could fit those broader classes, but they are independent suspects:

- `fd_close()` still uses `state.fs.flush(fd)` for stdio before removing the mapping, unlike the safer captured-handle path used for non-stdio (`lib/wasix/src/syscalls/wasi/fd_close.rs:47-79`). If that path races, the bug is in stdio close/flush ordering, not in `FdList`.
- `/dev/stdout` and friends can resolve through `get_special_fd()` and then `clone_fd(fd)`, meaning they deliberately follow the current fd mapping (`lib/wasix/src/syscalls/wasix/path_open2.rs:337-349`).
- `stderr_write()` writes via `WasiInodes::stderr_mut(&state.fs.fd_map)`, which also follows the current fd `2` mapping (`lib/wasix/src/syscalls/mod.rs:246-255`, `lib/wasix/src/fs/mod.rs:321-336`).

Those behaviors may be surprising depending on intended stdio semantics, but they are not caused by `FdList` reusing the lowest free slot.

## Bottom line

`FdList` immediate reuse is expected and deliberately tested behavior. It does not independently explain fd corruption or wrong-file writes.

Mark this suspect INVALID unless paired with another bug that mishandles stale numeric fds, async flush/write ordering, or stdio remapping semantics.

## Best next investigation targets

If a later agent wants a reproduction or a fix, the higher-value follow-ups are:

- stdio-specific `fd_close()` ordering, because it still flushes by descriptor number before close;
- any code that keeps raw `WasiFd` values across async boundaries;
- helpers that treat current fd `1`/`2` as "stdout/stderr" (`/dev/stdout`, `stderr_write`, similar runtime helpers).
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Finding 10: `fd_write` clones `Fd` before async write

## Verdict

INVALID

## Suspect

`fd_write` obtains an `Fd` snapshot with `state.fs.get_fd(fd)` and passes that cloned `Fd` into `fd_write_internal()`. Inside the file path, `fd_write_internal()` clones the underlying `Arc<RwLock<Box<dyn VirtualFile>>>` before awaiting the actual write. Superficially, that means an in-flight write can continue to the old backing object even if the numeric fd is closed and reused while the write is suspended.

## What the code actually does

### 1. `fd_write` snapshots the descriptor entry once

- `WasiFs::get_fd()` returns a cloned `Fd` from the fd table, not a live borrow into `fd_map` (`lib/wasix/src/fs/mod.rs:1514-1521`).
- `fd_write()` captures that clone up front and passes it into `fd_write_internal()` (`lib/wasix/src/syscalls/wasi/fd_write.rs:37-51`).

This means `fd_write_internal()` intentionally operates on a stable descriptor snapshot instead of re-looking up the numeric fd later.

### 2. The file write path binds to the old backing handle, not the fd number

In the `Kind::File` branch of `fd_write_internal()`:

- it takes `fd_entry.inode.write()`,
- clones `handle` out of `Kind::File { handle, .. }`,
- drops the inode guard,
- then performs the async write through that cloned handle (`lib/wasix/src/syscalls/wasi/fd_write.rs:152-159` and `:166-218`).

After that point, the actual write no longer depends on the numeric fd slot. It is pinned to the previously captured `Arc<RwLock<Box<dyn VirtualFile>>>`.

The tail of the function likewise updates cursor/stat state through the captured `fd_entry`, not by re-reading `fd_map` (`lib/wasix/src/syscalls/wasi/fd_write.rs:519-553`).

### 3. Closing or reusing the numeric fd does not retarget that in-flight write

`WasiFs::close_fd()` only removes the entry from `fd_map` (`lib/wasix/src/fs/mod.rs:2294-2312`).

`FdList::remove()` correspondingly drops one inode handle count for the removed table entry (`lib/wasix/src/fs/fd_list.rs:176-191`), but that is only the fd-table ownership. The in-flight write still holds its own cloned `Arc` to the old `VirtualFile`, so the old backing object remains alive until the write finishes.

When the same numeric fd is later reused, `FdList::insert*()` installs a new `Fd` entry and increments the new inode's handle count (`lib/wasix/src/fs/fd_list.rs:67-83`, `:140-174`). Reuse fills the slot with a different `Fd`; it does not mutate the `Fd` snapshot already captured by `fd_write`.

### 4. This lifetime model is deliberate elsewhere too

Two nearby code paths show that WASIX intentionally preserves the old backing object across async descriptor-table changes:

- `fd_close()` explicitly captures the file handle before removing the fd so a delayed flush cannot affect a newly reused descriptor number (`lib/wasix/src/syscalls/wasi/fd_close.rs:69-79`).
- `fd_renumber()` keeps remove/insert atomic under one lock, then flushes the old target handle afterward via a captured cloned handle (`lib/wasix/src/syscalls/wasi/fd_renumber.rs:64-125`).

There is also an explicit regression test for the same design in `fd_close.rs`, `stdio_close_does_not_remove_replacement_fd`, which verifies that a delayed flush on the old stdout does not delete or disturb a replacement installed at fd 1 while the flush is pending (`lib/wasix/src/syscalls/wasi/fd_close.rs:399-476`).

## Why this is not fd-corruption

The suspicious behavior is real in the narrow sense: an async `fd_write` can continue against the old file object after the numeric fd has been closed or rebound.

However, that is not descriptor corruption in this codebase. It is the mechanism that prevents descriptor corruption:

- the in-flight write stays attached to the old file description,
- the newly reused fd number points at a different `Fd` entry,
- and the old write path never switches over to the new occupant of that fd slot.

So the observed behavior is "write completes on the object that was open when the syscall started", not "write is redirected to whoever later acquired the same fd number".

That matches the rest of WASIX's fd-table design:

- descriptor duplication and renumbering intentionally share inode/offset state (`lib/wasix/src/fs/mod.rs:1877-1908`, `lib/wasix/src/syscalls/wasi/fd_renumber.rs:86-99`);
- close/renumber already contain comments and tests specifically aimed at preventing fd-number reuse races.

## Important nuance

If someone expected "close in another thread immediately cancels a blocked write", this code does not provide that behavior. An in-flight `fd_write` may still complete on the old backing object.

But that is a semantics question, not wrong-target corruption. By itself, this suspect does **not** explain writes landing in a newly reused descriptor.

## Out-of-scope note

`fd_write_internal()` does still use the numeric `fd` afterward for journal snapshotting under `#[cfg(feature = "journal")]` (`lib/wasix/src/syscalls/wasi/fd_write.rs:502-513`). The repository note for this investigation explicitly says journal/replay paths are out of scope, so that does not make this suspect valid for the current fd-corruption hunt.

## Bottom line

This suspect is **INVALID** for the corruption question.

`fd_write` capturing the old `Fd`/handle before the async boundary is intentional and is the reason an fd-number reuse does **not** redirect the in-flight write to the new descriptor occupant.
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Suspect 11: offset updates are partly atomic but not coupled to file seek/write

## Verdict

VALID

## Short conclusion

This is a real bug for regular files that share one open-file description offset via dup-like operations. The code intentionally shares the logical cursor with `Arc<AtomicU64>`, but `fd_write` does not reserve and consume that cursor under the same critical section as the actual `seek`/`write`, and `fd_seek(Whence::Cur)` / `fd_seek(Whence::Set)` update only the atomic offset. As a result, concurrent operations on duplicated fds can write at stale positions, overwrite earlier data, and report a logical offset / `st_size` that does not match the real file contents.

This does not need any separate bug such as fd-number reuse or close/reopen races. It is an intra-file corruption / wrong-position bug on its own.

## Why duplicated fds really share one offset

The fd layer stores the cursor as shared state:

- `lib/wasix/src/fs/fd.rs:36-40` defines `FdInner.offset` as `Arc<AtomicU64>`.
- `lib/wasix/src/fs/mod.rs:1887-1894` (`clone_fd_ext`) clones that same `offset` arc into the new fd.
- `lib/wasix/src/syscalls/wasi/fd_renumber.rs:86-89` also clones the same `offset` arc.

So this code is clearly trying to model shared-offset dup/dup2/renumber semantics, not independent cursors per numeric fd.

## Relevant write/seek behavior

### `fd_write` snapshots the offset before taking the file-handle lock

`lib/wasix/src/syscalls/wasi/fd_write.rs:41-48`:

- `fd_write()` loads `fd_entry.inner.offset` into a local `offset`.
- It passes that captured value into `fd_write_internal()`.

Then `lib/wasix/src/syscalls/wasi/fd_write.rs:167-178`:

- The regular-file path takes `handle.write()` and seeks the underlying file handle to the previously captured `offset`.
- In append mode it first reads `inode.stat.st_size`, stores that into the shared atomic offset, and then seeks to that value.

Only after the async write finishes does `lib/wasix/src/syscalls/wasi/fd_write.rs:519-547`:

- `fetch_add(bytes_written)` on the shared atomic offset, and
- update `inode.stat.st_size`.

That means the operation is split into:

1. read offset,
2. later seek/write under the handle lock,
3. later update shared offset and cached size after the write.

Those three steps are not one atomic transaction.

### `fd_seek` is only partly coupled

`lib/wasix/src/syscalls/wasi/fd_seek.rs:66-84`:

- `Whence::Cur` updates only `fd_entry.inner.offset` with `fetch_add` / `fetch_sub`.
- It does not lock or reposition the underlying file handle.

`lib/wasix/src/syscalls/wasi/fd_seek.rs:138-142`:

- `Whence::Set` only stores to the atomic offset.

`lib/wasix/src/syscalls/wasi/fd_seek.rs:87-110`:

- `Whence::End` does take the handle lock and performs `handle.seek(SeekFrom::End(...))`, then stores the result back into the shared atomic offset.

So `Whence::End` is somewhat coupled to the handle, but `Cur` and `Set` are purely logical cursor mutations. In single-threaded use that can still work because later reads/writes explicitly seek from the atomic cursor. Under concurrency, it is not enough.

## Concrete bad interleavings

### 1. Two concurrent `fd_write`s can overwrite each other

Precondition:

- one regular file,
- duplicated fd sharing the same `Arc<AtomicU64>`,
- initial shared offset = 0.

Interleaving:

1. writer A enters `fd_write()` and loads offset `0`.
2. writer B enters `fd_write()` and also loads offset `0`.
3. A acquires the file-handle lock, seeks to `0`, writes 100 bytes, releases the handle lock.
4. Before A runs the post-write `fetch_add(100)`, B acquires the file-handle lock, seeks to its stale captured offset `0`, writes 100 bytes, releases the handle lock.
5. A and B each run `fetch_add(100)`, so the shared logical offset becomes `200`.

Result:

- the second write can overwrite the first at byte range `0..100`,
- the logical cursor advances to `200`,
- cached `st_size` may also advance to `200`,
- but the actual file may still only contain 100 bytes of final data.

This is not benign shared-offset behavior. Shared-offset semantics should serialize consumption of the current offset, not let two writers reserve the same start position.

### 2. `fd_seek(Whence::Cur)` racing with `fd_write` can place the write at the wrong position

Precondition:

- shared offset = 0,
- one thread does `fd_seek(fd, +4096, Cur)`,
- another thread does `fd_write(fd, ...)`.

Interleaving:

1. writer loads offset `0` in `fd_write()`.
2. seeker executes `fetch_add(4096)` in `fd_seek(Whence::Cur)`, so the shared logical cursor becomes `4096`.
3. writer acquires the file-handle lock and seeks to its stale captured offset `0`.
4. writer writes 4096 bytes.
5. writer post-increments the shared offset by 4096, so the logical cursor becomes `8192`.

Expected shared-cursor behavior would place that write at offset `4096`. Instead it can be written at `0`, while the logical cursor reports `8192`.

The same stale-offset problem also exists for `Whence::Set`, because it only stores the new logical cursor without coordinating with in-flight writes that already captured the old value.

### 3. Append mode is also vulnerable

`lib/wasix/src/syscalls/wasi/fd_write.rs:169-172` uses cached `inode.stat.st_size` to choose the append position, but `st_size` is only updated later in `lib/wasix/src/syscalls/wasi/fd_write.rs:537-547`, after the write completes and outside the write critical section.

So two concurrent append writers can both:

1. observe the same old `st_size`,
2. seek to the same end position,
3. write overlapping data,
4. then independently advance the logical cursor and cached `st_size`.

That is a direct append corruption risk.

## Why this is a real bug, not expected POSIX-like behavior

Sharing one cursor across dup'd descriptors is expected. The bug is that the implementation shares only the numeric offset value, not the full "consume current offset and perform I/O there" operation.

If the intended model is a shared open-file description, then the start position used by a write must be coordinated with the write itself. Here, the code:

- reads the shared cursor too early,
- performs I/O later under a different lock,
- and publishes the cursor advance only afterward.

That leaves a race window large enough for wrong-position writes and metadata divergence. So the suspect is VALID even without involving any other fd-corruption candidate.

## Scope / preconditions

- Affects regular-file operations, not stdio writes (stdio skips the regular-file seek path).
- Requires concurrent operations on the same shared file description, for example after `fd_dup`, `fd_dup2`, or `fd_renumber`.
- Most damaging cases are concurrent `fd_write`/`fd_write`, `fd_seek(Cur|Set)`/`fd_write`, and concurrent append writes.
- Single-threaded use will usually look fine because each later operation re-seeks from the current atomic offset.

## Reproduction guidance for a later agent

Useful deterministic repro shapes:

1. Open a regular file, duplicate the fd, then have two guest threads write distinct 100-byte patterns through the two fds at the same time. If the race lands, one pattern overwrites the other while the final logical offset reports 200.
2. Open a regular file, duplicate the fd, have one thread do `fd_seek(Cur, big_delta)` while another thread writes once. Look for data written at the old offset while `fd_tell` / later writes behave as if the cursor advanced.
3. Open a regular file with append semantics and race two append writers. Look for overlapping writes and `st_size` larger than the actual file content length.

Instrumentation points:

- entry offset load in `fd_write()` (`lib/wasix/src/syscalls/wasi/fd_write.rs:41`),
- actual `handle.seek()` in `fd_write_internal()` (`lib/wasix/src/syscalls/wasi/fd_write.rs:175-178`),
- post-write `fetch_add()` / `st_size` update (`lib/wasix/src/syscalls/wasi/fd_write.rs:520-547`),
- `fd_seek(Whence::Cur)` atomic adjustment (`lib/wasix/src/syscalls/wasi/fd_seek.rs:67-84`).

## Fix direction

The later fix will likely need one shared critical section for regular files that covers:

- choosing the start offset,
- seeking / writing on the underlying file handle,
- publishing the new logical offset,
- and updating cached size / append position.

An atomic integer alone is not sufficient once the real file position is maintained by a separate locked `VirtualFile` handle.
Loading
Loading