Skip to content

Commit d86bf37

Browse files
P8: add SECURITY.md and update README with accurate project state
SECURITY.md gives the full, honest trust boundary: what is genuinely proved (wire round-trips, key-schedule structure, 8 handshake-safety invariants, all 0 axioms/cheats), what is empirically verified (RFC 8448 vectors, OpenSSL 3.6 differential, 33k+ fuzz inputs crash-free), and what is NOT proved (no constant-time execution, no mechanized spec-to-code refinement, crypto primitives are abstract, no memory zeroing, no audit). README warning and Status section updated to reflect the actual completed state: 280 tests, full 1-RTT + HRR + CertVerify + OpenSSL interop, CakeML L0-L4 port with in-process handshake, HOL4 proofs, and clear pointers to SECURITY.md and proof/PROOF_STATUS.md. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 239ea62 commit d86bf37

2 files changed

Lines changed: 235 additions & 14 deletions

File tree

README.md

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ machine. The caller owns the transport: feed bytes received from the peer to
99
them on the wire yourself. No sockets, no threads, no global state.
1010

1111
> [!WARNING]
12-
> **Experimental — not for production security.** This is a reference /
13-
> educational implementation. It is verified against the RFC 8448 test vectors,
14-
> but passing key-schedule vectors is not the same as being safe to terminate
15-
> real TLS traffic. Pure SML arithmetic is **not constant-time** (timing
16-
> side-channel exposure), the handshake and certificate-chain validation paths
17-
> are only partially exercised, and it has had no security audit. Do not use it
18-
> to protect real secrets or in any setting where an adversary is present.
12+
> **Not for production security.** This is a reference / research implementation.
13+
> Pure SML arithmetic is **not constant-time** (timing side-channel exposure on
14+
> every RSA, AES, X25519, and ChaCha20 operation). There is no mechanized proof
15+
> linking the HOL4 spec to the SML or CakeML code, and the library has had no
16+
> security audit. Do not use it to protect real secrets or in any setting where a
17+
> network adversary is present. See [`SECURITY.md`](SECURITY.md) for the full,
18+
> honest trust boundary.
1919
2020
Part of the `sjqtentacles` monorepo of SML libraries. It builds on the
2121
vendored crypto family:
@@ -49,13 +49,26 @@ vendored crypto family:
4949

5050
## Status
5151

52-
The record layer, alert codec, handshake message codecs, and the full
53-
key schedule are complete and tested against RFC 8448 vectors. The client
54-
and server state machines implement the 1-RTT handshake skeleton
55-
(ClientHello/ServerHello production, key-share extraction, traffic-key
56-
derivation). Record-layer AEAD encryption/decryption is delegated to the
57-
caller, who uses the traffic keys extracted from the state together with
58-
the vendored `sml-aead` -- this keeps the core sans-IO.
52+
**280 tests passing on MLton and Poly/ML.** Completed and interop-verified:
53+
54+
- Full 1-RTT handshake: ClientHello, ServerHello, EncryptedExtensions,
55+
Certificate, CertificateVerify (RSA-PSS, signed and verified), Finished.
56+
- HelloRetryRequest + cookie with the §4.4.1 synthetic transcript-hash
57+
substitution.
58+
- AEAD record protection (AES-128-GCM / AES-256-GCM / ChaCha20-Poly1305)
59+
built into the state machine.
60+
- X.509 certificate chain validation, hostname matching, validity window.
61+
- `signature_algorithms` enforcement; 0-RTT/early_data reject; KeyUpdate;
62+
NewSessionTicket; full alert state machine.
63+
- **Interop-verified against OpenSSL 3.6.0** (live TLS 1.3 handshake to
64+
CONNECTED, 6 real RFC-compliance bugs found and fixed by the differential).
65+
- **CakeML port (L0–L4):** the entire stack — crypto tower through the TLS
66+
state machine — is ported to and verified on the pinned CakeML v3400
67+
compiler. An in-process client↔server 1-RTT handshake runs natively.
68+
- **HOL4 formal proofs:** wire round-trips, key-schedule structure, and 8
69+
handshake-safety invariants machine-checked with 0 axioms / 0 cheats.
70+
See [`proof/PROOF_STATUS.md`](proof/PROOF_STATUS.md) for what is and is not
71+
proved. See [`SECURITY.md`](SECURITY.md) for the complete trust boundary.
5972

6073
## Portability
6174

SECURITY.md

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
# Security and correctness boundary — sml-tls
2+
3+
This document describes precisely what security and correctness properties
4+
`sml-tls` establishes, what it assumes, and what remains out of scope. It is
5+
intended to be honest and non-overstated.
6+
7+
---
8+
9+
## What has been built and verified
10+
11+
### Functional completeness (pure SML, both MLton + Poly/ML)
12+
13+
A full TLS 1.3 (RFC 8446) client and server state machine in pure Standard ML:
14+
15+
- **Record layer** (RFC 8446 §5): `TLSPlaintext` / `TLSCiphertext` encode/decode,
16+
`record_overflow` enforcement, streaming fraining.
17+
- **Alert protocol** (§6): all RFC 8446 alert codes, correct wire bytes,
18+
round-trip tested.
19+
- **Handshake messages** (§4): ClientHello, ServerHello, HelloRetryRequest,
20+
EncryptedExtensions, Certificate, CertificateVerify, Finished,
21+
NewSessionTicket — encode/decode against the RFC 8446 wire format.
22+
- **Extensions** (§4.2): key_share, supported_versions, supported_groups,
23+
signature_algorithms, SNI, ALPN, cookie, pre_shared_key,
24+
psk_key_exchange_modes, early_data — codecs + negotiation helpers.
25+
- **Key schedule** (§7.1): full HKDF-based 1-RTT schedule, verified byte-for-byte
26+
against RFC 8448 published test vectors.
27+
- **AEAD record protection** (§5.2): AES-128-GCM / AES-256-GCM /
28+
ChaCha20-Poly1305 via vendored `sml-aead`, with nonce construction and
29+
sequence-number threading.
30+
- **Certificate chain validation**: X.509 chain traversal, RSA signature
31+
verification on each link, basic-constraints/path-length, hostname matching,
32+
validity window.
33+
- **CertificateVerify** (§4.4.3): RSA-PSS-SHA256 signing (server) and
34+
verification (client), with the correct RFC 8446 `0x00`-separated signed
35+
content and `signature_algorithms` enforcement.
36+
- **Handshake features**: full 1-RTT, HelloRetryRequest + cookie (with the
37+
§4.4.1 synthetic transcript-hash substitution), 0-RTT/early_data reject,
38+
PSK key-schedule primitives + explicit server-side PSK reject.
39+
- **Post-handshake**: KeyUpdate (§4.6.3, §7.2) with traffic-key ratchet;
40+
NewSessionTicket issuance.
41+
- **Alerts and robustness**: every error path raises a fatal alert. A
42+
catch-all backstop maps any unexpected exception to `decode_error`, so
43+
untrusted input cannot crash `step`.
44+
45+
**Test coverage:** 280 passing tests on both MLton and Poly/ML, including RFC
46+
8448 vector tests, round-trip tests, and negative / tamper tests throughout.
47+
48+
### Interoperability (J2)
49+
50+
A live OpenSSL 3.6.0 differential (`openssl s_server` ↔ the pure-SML client
51+
via a socket shim) confirmed a full TLS 1.3 handshake completes:
52+
X25519 + AES-128-GCM, real RSA-2048 certificate parsed, chain + hostname +
53+
validity verified, RSA-PSS `CertificateVerify` verified, server Finished MAC
54+
verified → **CONNECTED**.
55+
56+
Six real interoperability bugs were discovered and fixed by this differential
57+
(alert wire codes, certificate/CV field lengths, Finished handshake type, and
58+
coalesced server-flight draining), each captured by a regression test.
59+
60+
An AFL fuzz harness (33 000+ random inputs) ran crash-free on the parser
61+
entry points.
62+
63+
### CakeML port (L0–L4)
64+
65+
The entire stack — the crypto tower (sha256, bigint, aes, chacha20, x25519,
66+
asn1, pem, aead, kdf, rsa, x509) plus the TLS protocol logic — is ported to
67+
the CakeML dialect and verified for real on the pinned v3400 CakeML compiler
68+
(prebuilt bootstrapped `cake-arm8-64`, 2026-06-18, native arm64). An
69+
in-process client↔server 1-RTT handshake runs under CakeML: both sides reach
70+
CONNECTED, with the server's RSA-PSS CertificateVerify and Finished verified
71+
by the client.
72+
73+
Because the CakeML *compiler* (not just the runtime) is formally verified, the
74+
CakeML-compiled binary's operational behavior is backed by a proof chain from
75+
HOL4 semantics to machine code — **provided** the source program is also
76+
verified (see "What is not proved" below).
77+
78+
### HOL4 formal proofs (`proof/`)
79+
80+
`proof/` builds cleanly under HOL4 Trindemossen 2 (`Holmake cleanAll &&
81+
Holmake` → all three theories OK). Machine check: 0 axioms, 0 custom oracles.
82+
See [`proof/PROOF_STATUS.md`](proof/PROOF_STATUS.md) for theorem-by-theorem
83+
detail.
84+
85+
**What is genuinely proved:**
86+
87+
- **Wire round-trips** (RFC 8446 §4, §5): `decode(encode x) = SOME(x, …)` for
88+
all concrete record/framing structures (TLSPlaintext, TLSCiphertext,
89+
handshake framing, extension framing, Finished, CertificateVerify), under the
90+
honest length-bound side conditions imposed by the wire length fields.
91+
- **Key-schedule structure** (RFC 8446 §7.1): the `schedule` bundle wires its
92+
fields exactly as the staged Extract → Derive → Extract chaining prescribes;
93+
all output lengths are correct (modulo the abstract primitive contract below).
94+
- **Handshake safety invariants** (8 theorems): transcript append-only;
95+
ClientHello is the only first transition; Closed is absorbing; core inductive
96+
safety invariant over reachable states; `Connected` implies peer Finished was
97+
verified; client emits no application data before the server Finished is
98+
verified; keys are never installed in the abstract model.
99+
100+
---
101+
102+
## What is NOT proved (the honest trust boundary)
103+
104+
### 1. Cryptographic security
105+
106+
The cryptographic primitives — SHA-256, HMAC-SHA-256, HKDF, AES-GCM,
107+
ChaCha20-Poly1305, X25519, RSA-PSS — are **trusted black boxes**. Their
108+
correctness relative to the published standards is checked empirically (RFC
109+
vector tests and OpenSSL differential), **not proved**. Their security
110+
properties (collision resistance, PRF/HMAC security, IND-CCA2 of the AEAD
111+
schemes, CDH hardness of X25519, etc.) are **assumed, not derived**.
112+
113+
The HOL4 key-schedule theorems model `sha256` / `hmac_sha256` / `hkdfExpand`
114+
as abstract constants specified only by output length. Nothing about their
115+
values is proved.
116+
117+
### 2. Constant-time execution
118+
119+
Pure Standard ML arithmetic is **not constant-time**. Every conditional
120+
branch and table lookup in the RSA, AES, X25519, and ChaCha20 implementations
121+
can leak information through execution time and cache behavior. An adversary
122+
sharing a CPU with this code can, in principle, recover private key material
123+
through timing side channels. This is inherent to the language/runtime and
124+
cannot be fixed without leaving pure SML (e.g., assembly or hardware-backed
125+
cryptography via C FFI). Do not use this library where a timing adversary is
126+
present.
127+
128+
### 3. No mechanized refinement from spec to code
129+
130+
The HOL4 theories are an **independent RFC re-model**, aligned with `tls.sml`
131+
by manual inspection of field order and framing. There is **no mechanized
132+
proof** that the SML functions implement the HOL4 spec, and no
133+
CakeML-translator/extraction theorem connecting either to the running binary.
134+
135+
Specifically:
136+
- The HOL4 wire codecs are not proved equal to the SML encoder/decoder
137+
functions.
138+
- The HOL4 key-schedule definition is not validated against the RFC 8448 byte
139+
vectors (requires a concrete, verified SHA-256 plugged in — open work).
140+
- The HOL4 handshake automaton is **abstract**: message contents, key
141+
installation, and full server-flight granularity are not modeled. It is not
142+
refined to `tlsstate.sml`.
143+
- Four message codecs (`clientHello`, `serverHello`, `certificate`,
144+
`newSessionTicket`) are **not modeled** in HOL4 (their structured, nested,
145+
length-prefixed encoding does not admit a simple unconditional round-trip
146+
theorem without significant additional proof work).
147+
148+
### 4. No memory zeroing
149+
150+
Secret key material (traffic keys, private keys passed in `serverConfig`) lives
151+
on the SML heap and is subject to GC movement and eventual serialization
152+
without explicit zeroing. There is no guarantee that secrets are erased from
153+
memory after use.
154+
155+
### 5. No audit
156+
157+
This library has had no external security audit.
158+
159+
---
160+
161+
## What it is suitable for
162+
163+
- Reference implementation and educational study of TLS 1.3.
164+
- Testing and fuzzing other TLS implementations (via `sml-tls-tool`).
165+
- Automated RFC conformance verification (the 280-test suite and AFL harnesses
166+
cover the majority of the RFC's normative requirements).
167+
- Formal-methods research: extending the HOL4 spec toward a full mechanized
168+
refinement, or using the CakeML port as the target of `ml_translatorLib`
169+
extraction.
170+
171+
---
172+
173+
## What it is NOT suitable for
174+
175+
- Protecting real secrets or sensitive communications.
176+
- Any deployment where a network adversary is present.
177+
- Any context where timing side-channel resistance is required.
178+
- Drop-in replacement for a production TLS library (OpenSSL, BoringSSL,
179+
rustls, etc.).
180+
181+
---
182+
183+
## Open work toward production security
184+
185+
In approximate dependency order:
186+
187+
1. **Mechanized refinement**: prove the SML implementation implements the HOL4
188+
spec (wire codecs via a `decode ∘ encode = id` approach with formally
189+
verified combinator parsers; key schedule by `EVAL` with a concrete verified
190+
SHA-256).
191+
2. **Model the four remaining codecs** in HOL4 (ClientHello, ServerHello,
192+
Certificate, NewSessionTicket) with appropriate well-formedness side
193+
conditions.
194+
3. **Concrete verified SHA-256**: link the HOL4 key-schedule spec to the
195+
verified CakeML SHA-256 implementation via `ml_translatorLib`, discharging
196+
the RFC 8448 vector theorems.
197+
4. **CakeML translator proof** (J4 full): compile the TLS code via
198+
`ml_translatorLib` to get HOL4 theorems connecting the SML source to the
199+
CakeML semantics, then use the CakeML compiler's proof to close the chain
200+
to machine code.
201+
5. **Constant-time crypto**: replace the pure-SML arithmetic in the crypto
202+
primitives with C FFI calls to a verified or audited constant-time
203+
implementation (e.g. libsodium, HACL*).
204+
6. **Memory zeroing**: add an explicit key-erasure API and implement it via
205+
FFI.
206+
7. **PSK resumption** (accept path): implement the full server-side PSK accept
207+
flow (ticket storage, binder verification, 0-RTT key install) on top of the
208+
currently implemented key-schedule and binder primitives.

0 commit comments

Comments
 (0)