|
| 1 | +# Security and correctness boundary — sml-tls |
| 2 | + |
| 3 | +This document describes precisely what security and correctness properties |
| 4 | +`sml-tls` establishes, what it assumes, and what remains out of scope. It is |
| 5 | +intended to be honest and non-overstated. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## What has been built and verified |
| 10 | + |
| 11 | +### Functional completeness (pure SML, both MLton + Poly/ML) |
| 12 | + |
| 13 | +A full TLS 1.3 (RFC 8446) client and server state machine in pure Standard ML: |
| 14 | + |
| 15 | +- **Record layer** (RFC 8446 §5): `TLSPlaintext` / `TLSCiphertext` encode/decode, |
| 16 | + `record_overflow` enforcement, streaming fraining. |
| 17 | +- **Alert protocol** (§6): all RFC 8446 alert codes, correct wire bytes, |
| 18 | + round-trip tested. |
| 19 | +- **Handshake messages** (§4): ClientHello, ServerHello, HelloRetryRequest, |
| 20 | + EncryptedExtensions, Certificate, CertificateVerify, Finished, |
| 21 | + NewSessionTicket — encode/decode against the RFC 8446 wire format. |
| 22 | +- **Extensions** (§4.2): key_share, supported_versions, supported_groups, |
| 23 | + signature_algorithms, SNI, ALPN, cookie, pre_shared_key, |
| 24 | + psk_key_exchange_modes, early_data — codecs + negotiation helpers. |
| 25 | +- **Key schedule** (§7.1): full HKDF-based 1-RTT schedule, verified byte-for-byte |
| 26 | + against RFC 8448 published test vectors. |
| 27 | +- **AEAD record protection** (§5.2): AES-128-GCM / AES-256-GCM / |
| 28 | + ChaCha20-Poly1305 via vendored `sml-aead`, with nonce construction and |
| 29 | + sequence-number threading. |
| 30 | +- **Certificate chain validation**: X.509 chain traversal, RSA signature |
| 31 | + verification on each link, basic-constraints/path-length, hostname matching, |
| 32 | + validity window. |
| 33 | +- **CertificateVerify** (§4.4.3): RSA-PSS-SHA256 signing (server) and |
| 34 | + verification (client), with the correct RFC 8446 `0x00`-separated signed |
| 35 | + content and `signature_algorithms` enforcement. |
| 36 | +- **Handshake features**: full 1-RTT, HelloRetryRequest + cookie (with the |
| 37 | + §4.4.1 synthetic transcript-hash substitution), 0-RTT/early_data reject, |
| 38 | + PSK key-schedule primitives + explicit server-side PSK reject. |
| 39 | +- **Post-handshake**: KeyUpdate (§4.6.3, §7.2) with traffic-key ratchet; |
| 40 | + NewSessionTicket issuance. |
| 41 | +- **Alerts and robustness**: every error path raises a fatal alert. A |
| 42 | + catch-all backstop maps any unexpected exception to `decode_error`, so |
| 43 | + untrusted input cannot crash `step`. |
| 44 | + |
| 45 | +**Test coverage:** 280 passing tests on both MLton and Poly/ML, including RFC |
| 46 | +8448 vector tests, round-trip tests, and negative / tamper tests throughout. |
| 47 | + |
| 48 | +### Interoperability (J2) |
| 49 | + |
| 50 | +A live OpenSSL 3.6.0 differential (`openssl s_server` ↔ the pure-SML client |
| 51 | +via a socket shim) confirmed a full TLS 1.3 handshake completes: |
| 52 | +X25519 + AES-128-GCM, real RSA-2048 certificate parsed, chain + hostname + |
| 53 | +validity verified, RSA-PSS `CertificateVerify` verified, server Finished MAC |
| 54 | +verified → **CONNECTED**. |
| 55 | + |
| 56 | +Six real interoperability bugs were discovered and fixed by this differential |
| 57 | +(alert wire codes, certificate/CV field lengths, Finished handshake type, and |
| 58 | +coalesced server-flight draining), each captured by a regression test. |
| 59 | + |
| 60 | +An AFL fuzz harness (33 000+ random inputs) ran crash-free on the parser |
| 61 | +entry points. |
| 62 | + |
| 63 | +### CakeML port (L0–L4) |
| 64 | + |
| 65 | +The entire stack — the crypto tower (sha256, bigint, aes, chacha20, x25519, |
| 66 | +asn1, pem, aead, kdf, rsa, x509) plus the TLS protocol logic — is ported to |
| 67 | +the CakeML dialect and verified for real on the pinned v3400 CakeML compiler |
| 68 | +(prebuilt bootstrapped `cake-arm8-64`, 2026-06-18, native arm64). An |
| 69 | +in-process client↔server 1-RTT handshake runs under CakeML: both sides reach |
| 70 | +CONNECTED, with the server's RSA-PSS CertificateVerify and Finished verified |
| 71 | +by the client. |
| 72 | + |
| 73 | +Because the CakeML *compiler* (not just the runtime) is formally verified, the |
| 74 | +CakeML-compiled binary's operational behavior is backed by a proof chain from |
| 75 | +HOL4 semantics to machine code — **provided** the source program is also |
| 76 | +verified (see "What is not proved" below). |
| 77 | + |
| 78 | +### HOL4 formal proofs (`proof/`) |
| 79 | + |
| 80 | +`proof/` builds cleanly under HOL4 Trindemossen 2 (`Holmake cleanAll && |
| 81 | +Holmake` → all three theories OK). Machine check: 0 axioms, 0 custom oracles. |
| 82 | +See [`proof/PROOF_STATUS.md`](proof/PROOF_STATUS.md) for theorem-by-theorem |
| 83 | +detail. |
| 84 | + |
| 85 | +**What is genuinely proved:** |
| 86 | + |
| 87 | +- **Wire round-trips** (RFC 8446 §4, §5): `decode(encode x) = SOME(x, …)` for |
| 88 | + all concrete record/framing structures (TLSPlaintext, TLSCiphertext, |
| 89 | + handshake framing, extension framing, Finished, CertificateVerify), under the |
| 90 | + honest length-bound side conditions imposed by the wire length fields. |
| 91 | +- **Key-schedule structure** (RFC 8446 §7.1): the `schedule` bundle wires its |
| 92 | + fields exactly as the staged Extract → Derive → Extract chaining prescribes; |
| 93 | + all output lengths are correct (modulo the abstract primitive contract below). |
| 94 | +- **Handshake safety invariants** (8 theorems): transcript append-only; |
| 95 | + ClientHello is the only first transition; Closed is absorbing; core inductive |
| 96 | + safety invariant over reachable states; `Connected` implies peer Finished was |
| 97 | + verified; client emits no application data before the server Finished is |
| 98 | + verified; keys are never installed in the abstract model. |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## What is NOT proved (the honest trust boundary) |
| 103 | + |
| 104 | +### 1. Cryptographic security |
| 105 | + |
| 106 | +The cryptographic primitives — SHA-256, HMAC-SHA-256, HKDF, AES-GCM, |
| 107 | +ChaCha20-Poly1305, X25519, RSA-PSS — are **trusted black boxes**. Their |
| 108 | +correctness relative to the published standards is checked empirically (RFC |
| 109 | +vector tests and OpenSSL differential), **not proved**. Their security |
| 110 | +properties (collision resistance, PRF/HMAC security, IND-CCA2 of the AEAD |
| 111 | +schemes, CDH hardness of X25519, etc.) are **assumed, not derived**. |
| 112 | + |
| 113 | +The HOL4 key-schedule theorems model `sha256` / `hmac_sha256` / `hkdfExpand` |
| 114 | +as abstract constants specified only by output length. Nothing about their |
| 115 | +values is proved. |
| 116 | + |
| 117 | +### 2. Constant-time execution |
| 118 | + |
| 119 | +Pure Standard ML arithmetic is **not constant-time**. Every conditional |
| 120 | +branch and table lookup in the RSA, AES, X25519, and ChaCha20 implementations |
| 121 | +can leak information through execution time and cache behavior. An adversary |
| 122 | +sharing a CPU with this code can, in principle, recover private key material |
| 123 | +through timing side channels. This is inherent to the language/runtime and |
| 124 | +cannot be fixed without leaving pure SML (e.g., assembly or hardware-backed |
| 125 | +cryptography via C FFI). Do not use this library where a timing adversary is |
| 126 | +present. |
| 127 | + |
| 128 | +### 3. No mechanized refinement from spec to code |
| 129 | + |
| 130 | +The HOL4 theories are an **independent RFC re-model**, aligned with `tls.sml` |
| 131 | +by manual inspection of field order and framing. There is **no mechanized |
| 132 | +proof** that the SML functions implement the HOL4 spec, and no |
| 133 | +CakeML-translator/extraction theorem connecting either to the running binary. |
| 134 | + |
| 135 | +Specifically: |
| 136 | +- The HOL4 wire codecs are not proved equal to the SML encoder/decoder |
| 137 | + functions. |
| 138 | +- The HOL4 key-schedule definition is not validated against the RFC 8448 byte |
| 139 | + vectors (requires a concrete, verified SHA-256 plugged in — open work). |
| 140 | +- The HOL4 handshake automaton is **abstract**: message contents, key |
| 141 | + installation, and full server-flight granularity are not modeled. It is not |
| 142 | + refined to `tlsstate.sml`. |
| 143 | +- Four message codecs (`clientHello`, `serverHello`, `certificate`, |
| 144 | + `newSessionTicket`) are **not modeled** in HOL4 (their structured, nested, |
| 145 | + length-prefixed encoding does not admit a simple unconditional round-trip |
| 146 | + theorem without significant additional proof work). |
| 147 | + |
| 148 | +### 4. No memory zeroing |
| 149 | + |
| 150 | +Secret key material (traffic keys, private keys passed in `serverConfig`) lives |
| 151 | +on the SML heap and is subject to GC movement and eventual serialization |
| 152 | +without explicit zeroing. There is no guarantee that secrets are erased from |
| 153 | +memory after use. |
| 154 | + |
| 155 | +### 5. No audit |
| 156 | + |
| 157 | +This library has had no external security audit. |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## What it is suitable for |
| 162 | + |
| 163 | +- Reference implementation and educational study of TLS 1.3. |
| 164 | +- Testing and fuzzing other TLS implementations (via `sml-tls-tool`). |
| 165 | +- Automated RFC conformance verification (the 280-test suite and AFL harnesses |
| 166 | + cover the majority of the RFC's normative requirements). |
| 167 | +- Formal-methods research: extending the HOL4 spec toward a full mechanized |
| 168 | + refinement, or using the CakeML port as the target of `ml_translatorLib` |
| 169 | + extraction. |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## What it is NOT suitable for |
| 174 | + |
| 175 | +- Protecting real secrets or sensitive communications. |
| 176 | +- Any deployment where a network adversary is present. |
| 177 | +- Any context where timing side-channel resistance is required. |
| 178 | +- Drop-in replacement for a production TLS library (OpenSSL, BoringSSL, |
| 179 | + rustls, etc.). |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## Open work toward production security |
| 184 | + |
| 185 | +In approximate dependency order: |
| 186 | + |
| 187 | +1. **Mechanized refinement**: prove the SML implementation implements the HOL4 |
| 188 | + spec (wire codecs via a `decode ∘ encode = id` approach with formally |
| 189 | + verified combinator parsers; key schedule by `EVAL` with a concrete verified |
| 190 | + SHA-256). |
| 191 | +2. **Model the four remaining codecs** in HOL4 (ClientHello, ServerHello, |
| 192 | + Certificate, NewSessionTicket) with appropriate well-formedness side |
| 193 | + conditions. |
| 194 | +3. **Concrete verified SHA-256**: link the HOL4 key-schedule spec to the |
| 195 | + verified CakeML SHA-256 implementation via `ml_translatorLib`, discharging |
| 196 | + the RFC 8448 vector theorems. |
| 197 | +4. **CakeML translator proof** (J4 full): compile the TLS code via |
| 198 | + `ml_translatorLib` to get HOL4 theorems connecting the SML source to the |
| 199 | + CakeML semantics, then use the CakeML compiler's proof to close the chain |
| 200 | + to machine code. |
| 201 | +5. **Constant-time crypto**: replace the pure-SML arithmetic in the crypto |
| 202 | + primitives with C FFI calls to a verified or audited constant-time |
| 203 | + implementation (e.g. libsodium, HACL*). |
| 204 | +6. **Memory zeroing**: add an explicit key-erasure API and implement it via |
| 205 | + FFI. |
| 206 | +7. **PSK resumption** (accept path): implement the full server-side PSK accept |
| 207 | + flow (ticket storage, binder verification, 0-RTT key install) on top of the |
| 208 | + currently implemented key-schedule and binder primitives. |
0 commit comments