Skip to content

K=4 EM-warmup checkpoint (Pfam top-1000)

Choose a tag to compare

K=4 EM-warmup checkpoint (Pfam top-1000)

The trained TKF-DP checkpoint (best-val-LL snapshot) and full training log
from a single SVI run on the top-1000 Pfam families, K_c=4, with the
multi-seed soft-EM warm-start enabled and no per-class-pair side potentials.

Result

  • Best val_LL = -298.41 at outer iter 55 (validation family PF00076)
  • Early-stopped at iter 85 on patience-6
  • Total wall time: 16 751 s ≈ 4 h 39 min
  • Per-outer mean: 197 s

Configuration

Flag Value
--processed-dir data/pfam_processed_top1000
--n-families 1000
--K (K_c) 4
--em-warmup-iters 500
--em-warmup-seeds 50
--K-H-max 10 (= K_c (K_c+1) / 2)
--alpha-z 100
--val-families PF00076
--use-side-potentials off
Substitution model LG08

Files in the tarball

Path Size
_best_chkpt/state.npz 2.6 MB
_best_chkpt/meta.json 20 KB
_best_chkpt/trace.json 10 KB
logs/exp2_v2_K4_top1000_tsb_emwarm.log 115 KB

Reproduction

gh release download results/K4-emwarm-top1000-2026-05-09 -R evoldoers/tkfdp \
  -p 'K4-emwarm-top1000-2026-05-09.tar.gz'
tar xzf K4-emwarm-top1000-2026-05-09.tar.gz
# Then plug into the BAliBASE eval (recipe step h in REPRODUCTION_MANIFEST.md)

Backs Figure 3 (Potts atom log-odds), Figure 4 (LAMA1 holmes-tile),
Figure 5 (PF00053 MSA triangle), and the inf-PHMM-K=4 row of Table 1.