A SWIM-style epidemic membership and failure-detection protocol — node
liveness states (Alive/Suspect/Dead) with incarnation-number refutation,
random-peer probing, suspicion timeouts, and infection-style dissemination —
modelled as a pure, deterministic state machine in Standard ML. ALL randomness
(which peer a node probes) is drawn from a seeded
sml-prng generator threaded
through the cluster; never ambient randomness. Time is a caller-supplied
logical tick, not a wall clock. No FFI and no external dependencies at runtime,
deterministic and byte-identical under both MLton and
Poly/ML.
- 39 assertions, green on MLton and Poly/ML.
- Basis-library + vendored
sml-prngonly; deterministic across compilers. - Vendors
sml-prng(Layout B) underlib/github.com/sjqtentacles/sml-prng/, so the repo builds standalone.
No FFI, no IO inside the library, no wall-clock, no ambient randomness, and
no threads. Every random choice flows from the seeded sml-prng state threaded
through the cluster, and every protocol round is driven by a logical tick the
caller supplies to step. Consequently a given make {nodes, seed, suspectTimeout} plus a given sequence of kill/step/inject calls always
produces the EXACT same simulation trace — byte-for-byte, across runs, machines,
and compilers. The test suite and demo are byte-identical under MLton and
Poly/ML, and the determinism test asserts that two same-seed runs reproduce the
full membership trace exactly.
SWIM (Scalable Weakly-consistent Infection-style process group Membership) combines two mechanisms, both simulated here:
- Failure detection. Each round, every live node picks a random peer (via
the threaded PRNG) and probes it. A peer that does not respond — because it
has actually failed in the simulation (
kill) — is recordedSuspect. A node that staysSuspectlonger thansuspectTimeoutlogical ticks is aged out toDead. - Epidemic dissemination. When two live nodes interact, they merge each other's membership tables, so an update started anywhere reaches the whole cluster in a logarithmic number of rounds.
Incarnation numbers (refutation). Each node carries a monotonic incarnation
counter. Updates about a node are ordered by incarnation first, then by status
severity (Alive < Suspect < Dead): a higher incarnation always wins, and at
equal incarnation the more-severe status wins. A live node that learns it is
being suspected refutes by bumping its incarnation and re-announcing itself
Alive; the higher incarnation overrides the stale suspicion as it spreads.
This is what prevents a stale Alive rumour from resurrecting a node that a
newer Suspect/Dead rumour has already condemned.
With smlpkg:
smlpkg add github.com/sjqtentacles/sml-gossip
smlpkg sync
Include the MLB from your own (it pulls in the vendored sml-prng):
local
$(SML_LIB)/basis/basis.mlb
lib/github.com/sjqtentacles/sml-gossip/... (via smlpkg)
in
...
end
This brings structure Gossip (and the vendored Prng) into scope.
(* a 6-node cluster, all alive; suspects age out to Dead after 2 ticks *)
val c0 = Gossip.make { nodes = 6, seed = 0wxC0FFEE, suspectTimeout = 2 }
(* run three healthy rounds at logical ticks 0,1,2 -- nothing changes *)
fun stepN c t 0 = c
| stepN c t n = stepN (Gossip.step c t) (t + 1) (n - 1)
val c3 = stepN c0 0 3
(* node 4 actually fails *)
val cK = Gossip.kill c3 4
(* keep stepping: the failure detector + gossip drive everyone to agree *)
val c = stepN cK 3 23
val () = (* SOME Dead once every live node agrees *)
case Gossip.agreedStatus c 4 of
SOME Gossip.Dead => () (* converged on the failure *)
| _ => ()
val view = Gossip.membershipOf c 0
(* [(0,Alive),(1,Alive),(2,Alive),(3,Alive),(4,Dead),(5,Alive)] *)
val ok = Gossip.converged c (* true *)type cluster
datatype status = Alive | Suspect | Dead
val make : { nodes : int, seed : Word64.word, suspectTimeout : int } -> cluster
val size : cluster -> int
val step : cluster -> int -> cluster (* advance one round at tick t *)
val kill : cluster -> int -> cluster (* actually fail a node *)
val inject : cluster -> int -> int -> int -> cluster (* plant a suspicion *)
val isAlive : cluster -> int -> bool (* actual liveness *)
val membershipOf : cluster -> int -> (int * status) list (* node i's view *)
val statusInView : cluster -> int -> int -> status
val incarnationInView : cluster -> int -> int -> int
val converged : cluster -> bool
val agreedStatus : cluster -> int -> status option
val statusToString : status -> string- Cluster model. Node ids are
0 .. nodes-1. Aclusteris an immutable value holding every node's local membership view, each node's actual liveness, the suspicion clocks, and the threaded PRNG state; every transition returns a freshcluster. - Logical time.
step c truns one protocol round "at" tickt. Suspicion timeouts are measured in these ticks: aSuspectentry older thansuspectTimeoutticks is aged out toDead. The caller chooses the tick sequence (typically0, 1, 2, …). - Seeded randomness only. Peer selection uses
Prng.SplitMix64seeded byseedand threaded through thecluster— there is no ambient randomness, so the trace is fully reproducible. - Merge ordering. Opinions about a node are ordered by incarnation, then by
severity (
Alive < Suspect < Dead);killdoes not touch the PRNG, andinjectmodels a (possibly false) suspicion so refutation can be exercised. - Convergence.
convergedis true when every actually-alive node agrees on every node's status;agreedStatus c jreturnsSOME swhen all live nodes assignjthe same statuss, elseNONE.
make test # MLton
make test-poly # Poly/ML
make all-tests # both
make example # build + run examples/demo.sml
make clean
Both compilers run the same strict-TDD suite (39 assertions):
- initial state — a fresh cluster is all-alive at incarnation 0 and
trivially converged; degenerate (0- and 1-node) clusters are handled, and
makerejects negative inputs; - healthy cluster — with no failures every probe succeeds, so after many rounds every view is still all-alive, no incarnation was ever bumped, and the cluster stays converged;
- failure propagation — a killed node is detected, aged from
SuspecttoDead, and epidemically markedDeadin every live node's view within a bounded number of rounds, while live nodes stayAlive; - incarnation / refutation — a falsely
injected suspicion of a live node is refuted (its incarnation bumps above 0, it returns toAliveeverywhere, and the higher incarnation propagates); and a staleAlive(incarnation 0) rumour never resurrects a node already agreedDead; - determinism — two runs from the same seed produce the EXACT same full membership trace (asserted by structural equality of every round's snapshot), while different seeds diverge; plus a pinned exact reference trace (statuses, membership lists, and incarnation numbers) that catches any change to the PRNG wiring, peer-selection arithmetic, or merge/age-out logic.
All assertions compare exact integers, statuses, incarnation numbers, and booleans — there are no reals in the protocol, so equality is exact (no epsilon).
This library depends on
sml-prng, whose sources are
vendored verbatim under lib/github.com/sjqtentacles/sml-prng/ (prng.sig,
prng.sml, and a prng.mlb; the dependency's own tests are not vendored).
src/gossip.mlb references that prng.mlb first, then gossip.sig/gossip.sml;
the Poly/ML use-chain loads the vendored signature and structure first, in
dependency order. sml.pkg records the dependency in its require block so
smlpkg sync can refresh it.
make example stands up a 6-node cluster, kills a node, and prints each node's
view as the failure is detected and disseminated (output is byte-identical under
MLton and Poly/ML):
=== sml-gossip demo ==========================================
Cluster of 6 nodes, seeded PRNG (all peer choices deterministic),
suspectTimeout = 2 logical ticks. Legend: A=Alive S=Suspect D=Dead,
trailing 'x' marks a node that has actually failed.
Initial views (everyone sees everyone Alive):
node 0 |A A A A A A
node 1 |A A A A A A
node 2 |A A A A A A
node 3 |A A A A A A
node 4 |A A A A A A
node 5 |A A A A A A
Killing node 4 at tick 3.
--- after tick 5 ------------------------------------
node 0 |A A A A S A
node 1 |A A A A S A
node 2 |A A A A S A
node 3 |A A A A S A
node 4x |A A A A A A
node 5 |A A A A S A
all live nodes agree node 4 is Suspect
--- after tick 12 ------------------------------------
node 0 |A A A A D A
node 1 |A A A A D A
node 2 |A A A A D A
node 3 |A A A A D A
node 4x |A A A A A A
node 5 |A A A A D A
all live nodes agree node 4 is Dead
--- after tick 25 ------------------------------------
node 0 |A A A A D A
node 1 |A A A A D A
node 2 |A A A A D A
node 3 |A A A A D A
node 4x |A A A A A A
node 5 |A A A A D A
all live nodes agree node 4 is Dead
converged = true
===============================================================
CI builds Poly/ML 5.9.1 from source rather than using the Ubuntu package
(Poly/ML 5.7.1), whose X86 code generator crashes (asGenReg raised while compiling) on some code. See .github/workflows/ci.yml.
MIT — see LICENSE.