Nanochat in MLX. Based on Awni Hannun - picochat-mlx; with modifications #393
ediestel
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Nanochat in MLX
Based on Awni Hannun - picochat; with modifications
https://github.com/ediestel/picochat-mlx
Created:
rustbpe/Cargo.toml — pyo3 0.27, cdylib+rlib, edition 2021, extension-module feature flag, LTO release profile
rustbpe/pyproject.toml — maturin build backend with extension-module feature
rustbpe/src/lib.rs — 578 lines with all 6 fixes applied:
Fixed:
A — count type │ pair_counts: AHashMap<Pair, i64>, widened delta as i64 * i64, .max(0) guard before as u64 cast
B — determinism │ Heap init sorts init_pairs by pair before pushing; count_pairs_parallel returns Vec with sort+dedup
C — lean heap nodes │ MergeJob { pair, count } only; pair_positions: AHashMap<Pair, Vec> side map
D — stale scan guard │ windows(2).any(...) pre-check before calling merge_pair
E — O(n log n) encode │ BinaryHeap<Reverse<(u32, usize)>> min-heap + prev/next linked arrays; u32::MAX sentinel marks deleted slots
F — comment │ Clarified tie-break comment in Ord impl
Modified:
pyproject.toml — added rustbpe = { path = "rustbpe/" } under [tool.uv.sources]
Fixed gaps:
C7 — run uv lock (lockfile is stale)
C3 — swap AHashMap → BTreeMap at materialization for deterministic word ordering
C4 — sort local_pos_updates before iterating
C1 — replace unsafe PyO3 0.23 iterator idioms with PyO3 0.27 safe API (try_iter() / unbind())
C2 — remove dead if current <= 0 { break; } or replace with a reachable equivalent
C8 — add rustbpe/.gitignore
To build and test: cd rustbpe maturin develop --release --features extension-module cd .. python -c "import rustbpe; t = rustbpe.Tokenizer(); print('rustbpe ok:', t)" uv lock
Beta Was this translation helpful? Give feedback.
All reactions