A self-contained implementation of an Echo State Network (ESN), the canonical Reservoir Computing model (Jaeger, 2001). The ESN is a genuinely different training paradigm from everything else in this repository: there is no backpropagation-through-time. The recurrent core is a fixed, random, sparse matrix; only a single linear readout is ever trained.
For a 1-D driving signal x_t the reservoir of N units evolves as a leaky
integrator:
h_t = (1 - a) * h_{t-1} + a * tanh(W_in * x_t + W * h_{t-1})
ais the leak rate.W_inis a random input vector.Wis a sparse randomN x Nmatrix rescaled to a chosen spectral radiusrho < 1. Thatrho < 1condition is the echo-state property: the reservoir asymptotically forgets its initial state, so the same input drives it to the same state regardless of where it started. That is what makes a fixed random recurrence usable as a feature generator.
We reuse the library's power-iteration helper
TNNet.EstimateSpectralRadius to measure
the scale of W instead of running a full eigensolver. Unlike its sibling
TNNet.EstimateSpectralNorm — which estimates the spectral norm (largest
singular value sigma_1) by alternating W*v and W^T*u steps —
EstimateSpectralRadius iterates only v := W*v / ‖W*v‖ (no transpose
step) and returns the Rayleigh-flavoured ratio rho ≈ ‖W*v‖ at convergence,
i.e. the true spectral radius |lambda|_max that actually governs the
echo-state property. Because rho <= sigma_1 for a non-symmetric W, scaling
W := W * (rho_target / rho) targets the radius directly and exactly — so
rho_target < 1 can be set straight (here 0.9), with none of the
under-scaling the conservative sigma_1 upper bound would impose. The example
also prints sigma_1 alongside rho to show rho <= sigma_1 on the raw W.
See the comments in EchoStateNetwork.lpr.
- Build
W_inand a sparseWas plain Pascal arrays (hand-rolled recurrence — never touched by a gradient). - Rescale
Wto the target spectral radius usingEstimateSpectralRadius. - Run the reservoir forward (no gradient) over a training sequence and
collect each state
h_tinto aTNNetVolumePair(input =h_t, target =x_{t+1}). - Train only a
TNNetFullConnectLinear(1)readout on those collected pairs. Two arms are trained and compared on the same reservoir, same collected states and same error metric:- an iterative, LR-sensitive SGD loop (a tiny L2-regularised linear fit);
- the classic closed-form ridge (Tikhonov) solve — one shot, no LR.
Because the readout is linear in the reservoir state, its optimal weights are
not something to chase with SGD — they are the one-shot ridge-regression
solution. Collect the state matrix S (rows = training timesteps, columns = the
N reservoir units plus a bias/intercept column of 1s) and the target
matrix Y (one column, x_{t+1}). The ridge readout minimises
||S·Wout − Y||² + lambda·||Wout||², whose normal equations are
(Sᵀ S + lambda·I) · Wout = Sᵀ Y -> A · Wout = B
The example forms A (size (N+1)×(N+1)) and B ((N+1)×1) and solves the
small dense system directly. neuralvolume.pas exposes no matrix
solve/inverse/Cholesky helper (verified by grepping for Solve/Inverse/
Cholesky/Gauss), so the example hand-rolls a GaussJordanSolve routine
— Gauss-Jordan elimination with partial pivoting — clearly commented and exact
for this reservoir size. The solved Wout is then packed back into the same
Input(N)→FullConnectLinear(1) net shape as the SGD arm (reservoir weights into
the neuron's Weights, the intercept into its BiasWeight), so both arms are
evaluated by identical code. No learning rate, no epochs, no shuffling — it is a
single linear solve, deterministic and not LR-sensitive.
The ridge arm runs a small regularisation sweep lambda ∈ {0, 1e-6, 1e-4, 1e-2}
and prints the teacher-forced and free-run NRMSE for each. This shows the
regularisation behaviour directly: at lambda = 0 the readout nails the
teacher-forced one-step prediction but a tiny unregularised readout amplifies
error in the autonomous feedback loop, so its free-run NRMSE explodes; a modest
lambda damps the readout and stabilises the free-run. The headline picks the
lambda with the best free-run NRMSE (the metric that matters for autonomous
generation) and contrasts it with the SGD arm:
The closed-form ridge readout matches or beats the SGD readout in one shot, with no learning rate to tune.
One-step-ahead prediction of the deterministic series
sin(0.2 t) + 0.3 sin(0.31 t). After teacher-forced fitting the network
free-runs: its own prediction is fed back as the next input, and it
autonomously continues the waveform. An ASCII plot shows predicted (o) vs
true (.) over the free-run window.
- Teacher-forced one-step NRMSE well below the persistence baseline
(the trivial
x_{t+1} = x_tpredictor). - The good (
rho < 1) reservoir free-runs accurately. - Echo-state ablation: rebuilding the reservoir with
rhodriven above 1 makes the free-run prediction diverge (NRMSE explodes, often to NaN/Inf) — proving thatrho < 1is what makes the method work.
examples/DiagonalSSMtrains its diagonal linear recurrenceh_t = a*h_{t-1} + b*x_tend-to-end by gradient descent; the learned decay spectrum is the whole point. The ESN does the opposite: it freezes the recurrence at random init and trains only the readout.- The causal-conv / attention baselines likewise learn their sequence
mixer's parameters via backprop. The ESN's mixer (
W,W_in) is never trained — its richness comes purely from being a large random nonlinear dynamical system held just inside the edge of stability (rho < 1).
This is a Lazarus project (.lpi + .lpr). It is pure CPU and dependency-free
and finishes in a few seconds on one thread.
Compile directly with FPC (units live in ../../neural):
fpc -O3 -Mobjfpc -Sc -Sh -Fu../../neural EchoStateNetwork.lpr
./EchoStateNetwork
(Or open EchoStateNetwork.lpi in Lazarus and run.)
Do not commit the compiled EchoStateNetwork binary or .o files — they
are covered by examples/.gitignore and the root .gitignore.
Echo State Network (Reservoir Computing, Jaeger 2001)
Reservoir N=100 leak=0.30 sparsity=0.10 target rho=0.90
Task: one-step prediction of sin(0.2 t) + 0.3 sin(0.31 t).
================================================================
[1] Building reservoir at rho=0.90 ...
measured raw W: spectral RADIUS rho = 1.7471 spectral NORM sigma_1 = 3.6757 (rho <= sigma_1)
-> W rescaled so its true spectral radius = 0.90
training the linear readout (600 epochs)...
teacher-forced one-step NRMSE = 0.0214
persistence baseline NRMSE = 0.2136
free-run (autonomous) NRMSE = 0.0793
Free-run waveform ( . = true o = predicted * = overlap ):
step |---------------------------------------------------|
0 | * |
3 | o. |
6 | o . |
...
----------------------------------------------------------------
[1b] Closed-form RIDGE readout Wout = (S^T S + lambda I)^-1 S^T Y
one-shot solve (no LR, no epochs); lambda regularisation sweep:
lambda teacher-NRMSE free-run-NRMSE
0.0E+000 0.0037 9.0620
1.0E-006 0.0038 5.7713
1.0E-004 0.0368 5.4790
1.0E-002 0.0064 0.0583
SGD-vs-ridge headline (same reservoir, same task):
SGD readout (600 epochs, LR=0.02): teacher 0.0214 free-run 0.0793
ridge readout (one-shot, lambda=1.0E-002): teacher 0.0064 free-run 0.0583
================================================================
[2] ABLATION - rebuilding reservoir at rho=1.80 (> 1, echo-state property BROKEN)
measured raw W: spectral RADIUS rho = 1.9087 spectral NORM sigma_1 = 3.9124
free-run (autonomous) NRMSE = Nan (expected to explode)
================================================================
Correctness checks:
PASS teacher-forced NRMSE 0.0214 < 0.5 x persistence 0.2136
PASS rho<1 free-run NRMSE 0.0793 < 0.5
PASS rho>1 free-run NRMSE Nan explodes vs rho<1 0.0793
================================================================
ALL CHECKS PASSED
(NRMSE = RMSE normalised by the target standard deviation; 1.0 means "no
better than predicting the mean".)