MemoryLab is the main operator-facing addition in this fork. Upstream autoresearch gives an agent a tight experiment loop around train.py and program.md. This fork adds a research-operations layer on top of that loop so experiments become legible, searchable, and resumable.
Upstream behavior:
- run short experiments against
train.py - inspect
val_bpb - keep or discard the resulting commit
This fork adds:
- a structured experiment ledger
- history-aware novelty checks against prior failures and successes
- policy modes for exploration, exploitation, and replication
- a run-centric champion/challenger registry
- decision packets with recommended next actions
- archived provenance for each run
- a human-readable morning report
The goal is not to replace the upstream research loop. The goal is to make the loop observable and reusable by a human operator or a future agent.
Primary CLI entrypoint: memorylab.py
Initializes the local MemoryLab store:
results/memorylab/experiments.jsonlresults/memorylab/champion_challenger.jsonresults/memorylab/reports/results/memorylab/runs/- compatibility
results.tsv
History-aware novelty guard for a planned idea.
Inputs:
--description--tags--family--mode explore|exploit|replicate--threshold--limit--fail-on-similar
Outputs:
- a novelty classification
- a policy decision
- the top matching prior runs
Novelty classifications:
novelknown_successincremental_followuprepeat_failureduplicate_run
Policy decisions:
allowcautionblock
Records a completed run, updates MemoryLab state, optionally refreshes the morning report.
Inputs:
- run description and tags
statusinkeep|discard|crash- optional
family,hypothesis, andnotes --summarypath fromAUTORESEARCH_SUMMARY_PATH--logpath for parsing/archiving- novelty
--mode
Side effects:
- appends a JSONL ledger row
- appends a compatibility row to
results.tsv - rebuilds the run-centric registry
- synthesizes a decision packet
- archives run artifacts under
results/memorylab/runs/<run_id>/
Renders a human-readable morning report from the current ledger.
Outputs:
results/memorylab/reports/latest.mdresults/memorylab/reports/<timestamp>.md
File: results/memorylab/experiments.jsonl
High-level fields:
run_idtimestamp_utcbranchcommitparent_commitfamilystatusdescriptionhypothesistagsnotesmetricserrorartifactsnovelty_guarddecision_packet
Important semantics:
status=crashstores nullable structured metrics when no measurement existsresults.tsvremains compatibility-oriented even when the JSONL schema is richer- records are run-centric, so multiple runs on the same commit remain distinct
Produced from the structured training sidecar or parsed from run.log.
Typical fields:
val_bpbtraining_secondstotal_secondspeak_vram_mbmfu_percenttotal_tokens_Mnum_stepsnum_params_Mdepth
Only populated for crash runs.
Fields:
summarytailsource
Carries both the raw history classification and the mode-specific decision layer.
Fields:
classificationprobe_textthresholdmatch_countcountstop_matchesmodeeffective_thresholdpolicy
Decision packets are the fork’s main synthesis layer.
Purpose:
- capture what happened
- describe what it means
- recommend what should happen next
Fields:
summarynext_actionpriorityrationalehypothesis_statusrunnoveltycomparisonerror
Current next_action values:
promotebranch_followupreplicateretryabandonfix_and_retryinvestigate_crash
Responsibilities:
- normalize free-text experiment ideas
- apply aliasing and concept extraction
- score similarity between a new idea and prior runs
- classify a proposal against history
- apply mode-specific novelty policy
Responsibilities:
- choose the current best run
- derive lineages from commit ancestry
- build challenger and lineage summaries
- cluster repeated failures
- render the morning report
Responsibilities:
- compare the current run to prior and current champions
- interpret novelty and crash state
- emit a concise next-action recommendation
- render a skim-friendly markdown packet
File producer: train.py
Environment variable:
AUTORESEARCH_SUMMARY_PATH
If set, train.py writes a machine-readable JSON summary of the final run metrics. This is the bridge between the upstream training loop and the new MemoryLab workflow. It lets the fork log experiments without scraping only human-oriented console output.
- Run
python memorylab.py checkbefore editingtrain.py. - Run training with
AUTORESEARCH_SUMMARY_PATH=.... - Log the run with
python memorylab.py log. - Read the generated decision packet.
- Use the morning report to review overnight progress.
This is the core value proposition of the fork: the repo does not just run experiments anymore, it now keeps enough structured memory to guide the next experiment.