Skip to content

Commit ae567fd

Browse files
2a: model + prove round-trips for the four remaining TLS wire codecs
Model clientHello, serverHello, certificate, and newSessionTicket encoders/decoders in HOL4 and prove decode(encode x) = SOME x under explicit well-formedness predicates capturing the wire length bounds. New supporting lemmas: decodeW16s_loop_correct, encodeW16ListBody_length, decodeExtensionsR_correct, decodeCertEntries_loop_correct. Holmake builds all three theories clean; DB.axioms = 0; the new theorems carry only DISK_THM (no cheat oracle). PROOF_STATUS.md updated. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 79c471b commit ae567fd

2 files changed

Lines changed: 631 additions & 37 deletions

File tree

proof/PROOF_STATUS.md

Lines changed: 50 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -50,30 +50,58 @@ Round-trip theorems `decode (encode x) = SOME (x, …)`:
5050
| `decode_encode_extensions` | **proved (bound)** | each `LENGTH data < 2^16` and total block `< 2^16` |
5151
| `finished_roundtrip` | **proved (bound)** | `verifyData <> []` (matches SML, which rejects empty) |
5252
| `certificateVerify_roundtrip` | **proved (bound)** | `LENGTH sigBytes < 2^16` |
53+
| `decode_encode_newSessionTicket` | **proved (bound)** | `wfNewSessionTicket` (see below) |
54+
| `decode_encode_certificate` | **proved (bound)** | `wfCertificate` (see below) |
55+
| `decode_encode_serverHello` | **proved (bound)** | `wfServerHello` (see below) |
56+
| `decode_encode_clientHello` | **proved (bound)** | `wfClientHello` (see below) |
5357

5458
The length side conditions are genuine: a 2-byte (resp. 3-byte) length field
5559
cannot represent a value `>= 2^16` (resp. `2^24`), so the round-trip can only
5660
hold under the bound. These are not weaknesses of the proof; they are facts
5761
about the format. (TLS itself bounds records below `2^14`.)
5862

59-
### Not modeled: `clientHello`, `serverHello`, `certificate`, `newSessionTicket`
60-
61-
The datatypes are declared (documenting the intended model), but their
62-
encoders/decoders and round-trip theorems are **not modeled in HOL4**. These
63-
structures carry nested, length-prefixed *lists* of sub-structures (cipher-
64-
suite lists, an optional trailing extension block, a list of `CertificateEntry`
65-
each with its own 3-byte length prefix). In the SML (`tls.sml`) the
66-
encoder/decoder pair is **not a clean inverse** in all cases (e.g. an absent
67-
trailing extension block decodes back to `[]`), so an unconditional round-trip
68-
theorem is simply false without extra well-formedness conditions.
69-
70-
These previously had `cheat`-ed round-trip "theorems". Those were **removed**:
71-
because the placeholder encoders returned `[]` and decoders returned `NONE`,
72-
the stated goals (`decode (encode x) = SOME x`) were *false*, and a `cheat` on
73-
a false goal makes the whole theory logically inconsistent. Removing them and
74-
recording the gap here keeps `tls_wireTheory` sound. Mechanizing these codecs
75-
(remainder-passing combinators + list-loop lemmas, building on the proved
76-
framing layer above) is tracked open work.
63+
### Now modeled: `newSessionTicket`, `certificate`, `serverHello`, `clientHello`
64+
65+
All four structures are now fully modeled with `encode<X>_def` / `decode<X>_def`
66+
mirroring the SML (`tls.sml`) field order and framing, a well-formedness
67+
predicate `wf<X>` capturing the on-the-wire length bounds, and a **proved**
68+
round-trip `wf<X> x ==> decode<X> (encode<X> x) = SOME x` (no `cheat`, no
69+
oracle beyond `DISK_THM`).
70+
71+
The nested length-prefixed lists are handled by generic combinators with their
72+
own proved loop-correctness lemmas, layered on the existing framing lemmas:
73+
74+
* `decodeW16s_loop_correct` — parses a length-prefixed list of `word16`
75+
(cipher-suite list) back to the original list.
76+
* `encodeW16ListBody_length` — the encoded suite body is `2 * LENGTH suites`.
77+
* `decodeExtensionsR_correct` — a remainder-passing extension-block decoder
78+
(returns `(exts, rest)`), used inside the per-entry certificate loop.
79+
* `decodeCertEntries_loop_correct` — parses the concatenation of encoded
80+
`CertificateEntry`s (each with its own 3-byte length prefix and nested
81+
extension block) back to the original list.
82+
83+
The exact well-formedness side conditions are:
84+
85+
* **`wfNewSessionTicket t`**: `LENGTH ticketNonce < 2^8`, `LENGTH ticket < 2^24`,
86+
each extension's `LENGTH data < 2^16`, and the encoded extension block
87+
`< 2^16`.
88+
* **`wfCertificate c`**: `LENGTH certificateRequestContext < 2^8`,
89+
`LENGTH (encodeCertEntries certificateList) < 2^24`, and per entry:
90+
`LENGTH certData < 2^24`, each extension's `LENGTH data < 2^16`, and the
91+
encoded extension block `< 2^16`.
92+
* **`wfServerHello sh`**: `LENGTH random = 32`, `LENGTH legacySessionId < 2^8`,
93+
each extension's `LENGTH data < 2^16`, and the encoded extension block
94+
`< 2^16`.
95+
* **`wfClientHello ch`**: `LENGTH random = 32`, `LENGTH legacySessionId < 2^8`,
96+
`LENGTH cipherSuites < 2^15` (so the 2-byte byte-count `2*n` stays `< 2^16`),
97+
`LENGTH legacyCompression < 2^8`, each extension's `LENGTH data < 2^16`, and
98+
the encoded extension block `< 2^16`.
99+
100+
These bounds are genuine consequences of the wire length fields (1-, 2-, and
101+
3-byte length prefixes), exactly in the honest-bound style of the simpler
102+
codecs above. The HOL4 decoders always parse the trailing extension block via
103+
`decodeExtensions` (matching the always-present length-prefixed block the
104+
encoders emit), so the round-trip holds unconditionally given `wf<X>`.
77105

78106
---
79107

@@ -194,8 +222,10 @@ There is **no mechanized refinement proof** linking the HOL4 spec to the SML
194222
with `tls.sml` **by manual inspection** of field order and framing. They are
195223
not proved equal to the SML functions, and there is no CakeML
196224
translator/extraction theorem connecting either to running code.
197-
* `clientHello` / `serverHello` / `certificate` / `newSessionTicket` codecs
198-
exist in the SML but are **not modeled** in HOL4.
225+
* `clientHello` / `serverHello` / `certificate` / `newSessionTicket` codecs are
226+
now modeled in HOL4 with proved round-trips (under `wf<X>` bounds), but they
227+
are still an *independent re-modeling* aligned with `tls.sml` by inspection —
228+
not proved equal to the SML functions.
199229
* The key schedule matches the RFC *structurally*; it is not validated against
200230
the RFC 8448 byte vectors (needs concrete crypto), and is not proved to match
201231
the SML `TlsKeySchedule` module.

0 commit comments

Comments
 (0)