feat(phase-19): track D auto research lessons 54-57 by rohitg00 · Pull Request #203 · rohitg00/ai-engineering-from-scratch

rohitg00 · 2026-05-26T18:31:03Z

Summary

Track D auto-research loop deep capstones (Phase 19, lessons 54-57). Four self-contained Python lessons that compose into a self-running research loop.

54-paper-writer: LaTeX skeleton generator with figure injection, deterministic prose, and a manifest contract. Four validation gates fire before any file is written.
55-critic-loop: Multi-turn revision over five fixed dimensions (clarity, novelty, evidence, methodology, related-work). Three stop conditions in priority order: target met, plateau detected, budget exhausted.
56-iteration-scheduler: Async UCB1 scheduler with parallel slots, branch pruning, and fan-out triggers. Stdlib + numpy. Two budgets (max experiments, wall clock) enforced as hard limits.
57-end-to-end-research-demo: Composes 54, 55, 56 through importlib path injection. Three seed hypotheses, six experiments, one best branch, one paper draft. Deterministic under fixed seed.

Stats

4 lessons, 16 files added
48 unit tests, all passing
4 demos, all exit 0
4 docs, 932-1056 words each, mermaid diagrams only
4 quizzes, 6 questions each

Test plan

All four python3 code/main.py demos exit 0
python3 -m unittest discover -s code/tests passes per lesson (15 + 11 + 12 + 10 = 48)
Docs use mermaid only, all fences tagged, no em-dashes, no AI vocab
Atomic per-lesson commits
Stdlib + numpy only
No changes to site/, root README, or catalog.json

coderabbitai · 2026-05-26T18:31:16Z

📝 Walkthrough

Walkthrough

This PR adds four complete capstone lessons (54–57) to Phase 19, building an end-to-end research automation pipeline. It includes a LaTeX paper generator, an iterative refinement loop with heuristic scoring, a parallel hypothesis scheduler with UCB branch selection, and an orchestrator composing all three. The catalog is updated to reflect new lesson and file counts, and prior capstone projects are refactored with expanded module structures.

Changes

Catalog and Prior Capstone Refactoring

Layer / File(s)	Summary
Metadata and module structure updates `catalog.json`	Totals updated (lessons 435→439, code_files 487→515), phase 19 lesson_count increased (17→21), and capstone projects 9–11 refactored from single-file entrypoints to expanded `ts/src` modules with corresponding test files. Four new capstone lesson entries (54–57) appended with `main.py` and Python test files.

Lesson 54: Paper Writer

Layer / File(s)	Summary
LaTeX paper data models and validation `phases/19-capstone-projects/54-paper-writer/code/main.py`	Defines `BibEntry`, `Figure`, `Section`, `Paper` dataclasses; implements `_validate()` enforcing non-empty title/abstract, unique figure IDs, citation key existence, and figure reference resolution. Provides LaTeX character escaping and `render_latex()` and `render_bibtex()` functions for document and bibliography generation.
Manifest ingestion and paper writing `phases/19-capstone-projects/54-paper-writer/code/main.py`	Implements `read_experiment_manifest()` converting artifact dictionaries to stable `Figure` records with normalized paths; adds `MockProseGenerator` for deterministic prose from titles/outlines; provides `PaperWriter` class with `fill_prose()` and `write()` methods outputting `paper.tex`, `references.bib`, and `manifest.json`.
Tests, documentation, and quiz `phases/19-capstone-projects/54-paper-writer/code/tests/test_paper_writer.py`, `docs/en.md`, `quiz.json`	Unit tests validating LaTeX skeleton, escaping, validation errors, manifest deduplication, and file I/O. Comprehensive documentation of contract, validation gates, and outputs. Quiz covering structure-first validation and expected behavior.

Lesson 55: Critic Loop

Layer / File(s)	Summary
Loop data structures and protocols `phases/19-capstone-projects/55-critic-loop/code/main.py`	Defines `DIMENSIONS` constant (five scoring axes), `MiniSection` and `MiniPaper` models, `Suggestion` and `Critique` structures with serialization, `Critic` and `Reviser` call-signature protocols, and `LoopTrace` and `LoopResult` containers.
CriticLoop convergence engine `phases/19-capstone-projects/55-critic-loop/code/main.py`	Implements `CriticLoop` constructor with configuration defaults and `run()` method driving critic/reviser iteration with target-met convergence, plateau detection over sliding windows, per-round trace recording, and budget exhaustion.
Deterministic scorer, critic, and reviser `phases/19-capstone-projects/55-critic-loop/code/main.py`	Implements `deterministic_score()` computing dimension scores from section metadata, `deterministic_critic()` emitting suggestions for low-scoring dimensions, `deterministic_reviser()` applying in-place mutations (expand sections, bump originality, add references), and `demo()` runner.
Tests, documentation, and quiz `phases/19-capstone-projects/55-critic-loop/code/tests/test_critic_loop.py`, `docs/en.md`, `quiz.json`	Tests validate scoring bounds, suggestion coverage, monotone improvement, convergence types (target/plateau/budget), and reviser edits. Documentation specifies dimensions, convergence rules, and trace semantics. Quiz covers stop-condition priority and terminal verdicts.

Lesson 56: Iteration Scheduler

Layer / File(s)	Summary
Data models and UCB scoring `phases/19-capstone-projects/56-iteration-scheduler/code/main.py`	Defines `Hypothesis`, `Result`, `BranchStats` (with aggregates and flags), `TraceEvent`, and `SchedulerReport` dataclasses with serialization. Implements `ucb_score()` with special handling for zero-run branches.
IterationScheduler async orchestration `phases/19-capstone-projects/56-iteration-scheduler/code/main.py`	Implements `IterationScheduler` with asyncio slot dispatch, UCB-based hypothesis selection, per-branch stats aggregation, paper-trigger thresholding (once per branch), pruning of low-yield branches, optional fan-out expansion, and dual budget enforcement (max experiments and wall-time deadline).
Deterministic runner and demo `phases/19-capstone-projects/56-iteration-scheduler/code/main.py`	Implements `make_deterministic_runner()` computing bounded Gaussian rewards, `deterministic_expander()` generating follow-up hypotheses, and `demo()` orchestrating scheduler with seeded configuration.
Tests, documentation, and quiz `phases/19-capstone-projects/56-iteration-scheduler/code/tests/test_scheduler.py`, `docs/en.md`, `quiz.json`	Tests validate UCB edge cases, scheduling correctness, paper-trigger semantics, pruning, and budget termination. Documentation covers queue/slot/result architecture, UCB formula, asyncio concurrency, and trace semantics. Quiz addresses UCB handling, paper triggers, and budget enforcement.

Lesson 57: End-to-End Research Demo

Layer / File(s)	Summary
Cross-module imports and exception types `phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py`	Sets up sys.path and dynamically loads paper-writer, critic-loop, and iteration-scheduler modules; defines `NoTriggerError` and `BestResultError` exception types.
Hypothesis seeding and best-branch selection `phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py`	Implements `make_seed_hypotheses()` generating three seed branches (alpha, beta, gamma); `pick_best_branch()` selecting highest-mean triggered branch with alphabetical tie-breaking and typed error handling.
Paper model conversion and synthesis `phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py`	Implements `build_mini_paper()` constructing a `MiniPaper` from branch/reward with derived originality tag and targets; `mini_to_full_paper()` promoting `MiniPaper` to `Paper` by aggregating citations, synthesizing bibliography, adding results figures, and mapping section structures.
Full pipeline orchestration `phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py`	Implements `run_demo()` async function orchestrating five stages (seed → scheduler → picker → critic → paper writer), returning `DemoReport` with all upstream outputs. Provides synchronous `demo()` wrapper and `__main__` JSON output.
Tests, documentation, and quiz `phases/19-capstone-projects/57-end-to-end-research-demo/code/tests/test_e2e.py`, `docs/en.md`, `quiz.json`	Tests validate demo completion, paper file emission, determinism across seeds, error modes, validation propagation, seed shape, stop-reason membership, and scenario invariants. Documentation specifies five-stage composition, failure modes, and wiring. Quiz covers composition strategy and determinism contracts.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant E2E as End-to-End<br/>Demo
  participant Sched as Iteration<br/>Scheduler
  participant Critic as Critic<br/>Loop
  participant Writer as Paper<br/>Writer
  User->>E2E: run_demo(out_dir, seed=11)
  E2E->>E2E: make_seed_hypotheses()
  E2E->>Sched: IterationScheduler.run(seeds)
  Sched->>Sched: UCB branch selection<br/>parallel experiments
  Sched-->>E2E: SchedulerReport(best_branch, triggers)
  E2E->>E2E: pick_best_branch(report)
  E2E->>E2E: build_mini_paper(branch, reward)
  E2E->>Critic: CriticLoop.run(mini_paper)
  Critic->>Critic: deterministic_score()<br/>critic/reviser rounds
  Critic-->>E2E: LoopResult(converged_paper)
  E2E->>E2E: mini_to_full_paper(converged, branch)
  E2E->>Writer: PaperWriter.fill_prose(paper)
  Writer->>Writer: MockProseGenerator.generate()
  E2E->>Writer: writer.write(paper, out_dir)
  Writer->>Writer: render_latex(), render_bibtex()
  Writer-->>E2E: manifest.json
  E2E-->>User: DemoReport(scheduler+critic+paper)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.16% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(phase-19): track D auto research lessons 54-57' clearly and accurately describes the main change: adding four new capstone lessons (54-57) for Phase 19 in the Track D auto-research curriculum. It is specific, concise, and directly summarizes the primary content of the changeset.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset. It provides a clear summary of the four lessons being added (paper-writer, critic-loop, iteration-scheduler, end-to-end-research-demo), their individual purposes, file counts, test coverage, and documentation details. It also lists completed test plans and constraints, making it highly informative about the PR's scope and intent.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/phase-19-track-d2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (3)

phases/19-capstone-projects/54-paper-writer/code/tests/test_paper_writer.py (1)

70-99: ⚡ Quick win

Add a regression test for duplicate bibliography keys.

Given the validation gates, a dedicated test for repeated BibEntry.key values will prevent regressions in BibTeX integrity.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phases/19-capstone-projects/54-paper-writer/code/tests/test_paper_writer.py`
around lines 70 - 99, Add a new unit test in TestValidation that constructs a
Paper with two BibEntry objects sharing the same key (e.g., create a Paper via
_paper_min(), append two BibEntry(key="dup", ...)), then call render_latex and
assert it raises PaperValidationError (use assertRaisesRegex to match a message
like "duplicate bibliography key"); reference Paper, BibEntry, render_latex, and
PaperValidationError so the test mirrors the existing duplicate-figure and
unknown-citation tests to prevent regressions.

phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py (1)

183-186: ⚡ Quick win

Avoid mutating MiniPaper during conversion.

On Line 185, sec.cites.append(...) mutates the input mini object in place. This converter should stay side-effect free and only shape the returned Paper.

♻️ Proposed fix

 def mini_to_full_paper(mini: MiniPaper, branch: str) -> Paper:
@@
-    if not bib:
-        bib = [BibEntry(
-            key=f"{branch}-baseline", entry_type="article",
-            fields={"title": "Baseline", "author": "Synthesised", "year": "2026"},
-        )]
-        for sec in mini.sections:
-            if sec.id == "intro":
-                sec.cites.append(f"{branch}-baseline")
-                break
+    baseline_key: str | None = None
+    if not bib:
+        baseline_key = f"{branch}-baseline"
+        bib = [BibEntry(
+            key=baseline_key, entry_type="article",
+            fields={"title": "Baseline", "author": "Synthesised", "year": "2026"},
+        )]
@@
     sections = []
     for s in mini.sections:
         figure_refs = [fig.id] if s.id == "results" else []
+        cites = list(s.cites)
+        if baseline_key is not None and s.id == "intro" and baseline_key not in cites:
+            cites.append(baseline_key)
         sections.append(Section(
             id=s.id, title=s.title, body=s.body,
-            cites=list(s.cites), figure_refs=figure_refs,
+            cites=cites, figure_refs=figure_refs,
         ))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py` around
lines 183 - 186, The loop mutates the input MiniPaper by calling
sec.cites.append(...) on mini.sections; instead leave mini untouched and build
new Section objects (or shallow copies) for the output Paper, creating each
section's cites list as a copy of sec.cites plus the extra f"{branch}-baseline"
where needed; specifically, when iterating over mini.sections, do not call
sec.cites.append — construct a new section value (copying fields and using
sec.cites + [f"{branch}-baseline"] for the intro case) and add that to the
Paper.sections result so MiniPaper and mini remain side-effect free.

phases/19-capstone-projects/56-iteration-scheduler/code/tests/test_scheduler.py (1)

169-202: ⚡ Quick win

Add a regression test for trigger evaluation during drain.

Please add a deadline/max-budget scenario where a branch crosses paper_threshold only in drained results, then assert the trigger is still emitted.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@phases/19-capstone-projects/56-iteration-scheduler/code/tests/test_scheduler.py`
around lines 169 - 202, Add a new test (e.g., test_trigger_emitted_during_drain)
that constructs a seed with multiple Hypothesis objects and a slow runner that
delays long enough so some results finish only during the scheduler's drain
phase; configure IterationScheduler with a very short max_seconds (or a
max_experiments that causes early stop) and a paper_threshold such that the
target branch crosses the threshold only by results that complete during drain;
run sched.run(seed) and then assert the scheduler report contains the trigger
for that branch (for example by checking report.triggers or by asserting
any(t.branch == "b0" for t in report.triggers) — adapt the exact assertion to
the report field used in the codebase).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@phases/19-capstone-projects/54-paper-writer/code/main.py`:
- Around line 88-94: The code currently only checks that cited keys exist but
does not detect duplicate BibEntry.key values in paper.bibliography; add a
validation before the cite-check loop that scans paper.bibliography (e.g., using
bib_keys and a seen set or a Counter) to detect any duplicate BibEntry.key
values and raise PaperValidationError listing the duplicate key(s) (include the
offending key(s) in the error message) so duplicate BibTeX entries cannot be
emitted; keep this check alongside the existing cite validation that uses
bib_keys, and reference the same symbols (paper.bibliography, BibEntry.key,
bib_keys, PaperValidationError, paper.sections, sec.cites).

In `@phases/19-capstone-projects/55-critic-loop/code/main.py`:
- Around line 339-345: The branch handling sug.edit ==
"add-related-work-section" can no-op when a "Related Work" section exists but
has an empty body; change it to check for an existing section (search
paper.sections for s.title.lower() == "related work") and if found and s.body is
empty or whitespace, set s.body = "We survey adjacent work. " + ("x" * 200);
otherwise (no existing section) append a new MiniSection(id="related-work",
title="Related Work", body=...). This ensures the suggestion actually fills an
empty section instead of repeating without improving score.

In `@phases/19-capstone-projects/55-critic-loop/docs/en.md`:
- Around line 76-84: The docs incorrectly state that clarity is driven by
section count; update the description to match deterministic_score() by stating
that clarity is computed from average section body length (clarity grows when
the average/mean section body length crosses the configured threshold), and keep
the other signal descriptions (novelty, evidence, methodology, related-work)
as-is; reference the deterministic_score() function and the clarity signal when
making this edit so the prose matches the implementation.
- Around line 57-59: The doc's convergence flow incorrectly states the stop rule
as "mean score >= target" while the implementation in CriticLoop._target_met
requires every one of the five dimensions to meet or exceed target_score; update
the diagram/text (the A decision node and the related paragraphs) to state that
convergence requires all five dimensions to be >= target_score (or explicitly
list the five dimensions and the "all" condition), and make the same correction
for the other references around the same section (the lines referenced after
66–69) so the documentation matches CriticLoop._target_met.

In `@phases/19-capstone-projects/55-critic-loop/quiz.json`:
- Around line 43-52: The quiz option and explanation are incorrect:
Reviser.__call__ accepts (paper, suggestions), so update the option currently
reading "Only the list of Suggestion records" to something like "The paper and
the list of Suggestion records" and change the explanation to state that the
reviser receives both the paper and the Suggestion records (each Suggestion has
dimension, target section, and an edit instruction) and does not receive scores;
ensure the correct answer index points to the updated option and that the
wording references Reviser.__call__.

In `@phases/19-capstone-projects/56-iteration-scheduler/code/main.py`:
- Around line 209-211: The code indexes stats with result.branch without
guarding against unknown branches, causing crashes if a runner returns an unseen
branch; update both locations where you do bs = stats[result.branch] (the block
updating bs.runs and bs.reward_sum and the similar block at the later
occurrence) to first check if result.branch exists in stats and if not insert a
new BranchStats instance (or use stats.setdefault(result.branch,
BranchStats(...))) and then update bs.runs and bs.reward_sum; reference the
symbols stats, result.branch, bs.runs, bs.reward_sum, and BranchStats when
applying the fix.
- Around line 261-275: Loop draining in_flight tasks updates stats but never
runs paper.trigger, so valid triggers are lost; after successfully awaiting a
task (in the for task in list(in_flight.keys()) block) and after updating
bs.runs and bs.reward_sum, call the trigger evaluation on the associated paper
(e.g., paper.trigger(...) or the existing trigger-evaluation helper used
elsewhere) with the branch and updated stats/reward so trigger conditions are
evaluated and any resulting TraceEvent(s) appended to trace before removing the
task from in_flight; ensure to preserve the current TraceEvent("result.drain")
logic and handle exceptions exactly as before.

In `@phases/19-capstone-projects/56-iteration-scheduler/docs/en.md`:
- Around line 70-71: Update the docs to match the implementation: change wording
that currently states the scheduler posts Results to an asyncio.Queue and uses
random runner delay; instead explain that the scheduler creates tasks with
asyncio.create_task to run the async experiment runner (which yields a Result),
waits on the set of in-flight tasks using asyncio.wait(...,
return_when=asyncio.FIRST_COMPLETED), and then applies a fixed sleep delay
between loop iterations; reference the experiment runner, Result,
asyncio.create_task, the in-flight tasks set, and asyncio.FIRST_COMPLETED so
readers understand the actual control flow and timing.

In `@phases/19-capstone-projects/57-end-to-end-research-demo/docs/en.md`:
- Around line 80-81: Update the stale documentation to match the current tests
and API: remove or rewrite the claim that a test in test_e2e.py asserts an
empty-seed short-circuit to a typed exception (fix the sentence around the
original lines 80-81 to reflect that no such test exists), change the “one
figure per high-yield branch” wording to state that mini_to_full_paper() creates
one figure for the selected branch (replace the text around line 94), and
replace any reference to build_paper_from_result with the actual exposed
functions build_mini_paper and mini_to_full_paper (update references in the
block around lines 98 and the 94-100 range) so the docs accurately reflect
current function names and behavior.

---

Nitpick comments:
In `@phases/19-capstone-projects/54-paper-writer/code/tests/test_paper_writer.py`:
- Around line 70-99: Add a new unit test in TestValidation that constructs a
Paper with two BibEntry objects sharing the same key (e.g., create a Paper via
_paper_min(), append two BibEntry(key="dup", ...)), then call render_latex and
assert it raises PaperValidationError (use assertRaisesRegex to match a message
like "duplicate bibliography key"); reference Paper, BibEntry, render_latex, and
PaperValidationError so the test mirrors the existing duplicate-figure and
unknown-citation tests to prevent regressions.

In
`@phases/19-capstone-projects/56-iteration-scheduler/code/tests/test_scheduler.py`:
- Around line 169-202: Add a new test (e.g., test_trigger_emitted_during_drain)
that constructs a seed with multiple Hypothesis objects and a slow runner that
delays long enough so some results finish only during the scheduler's drain
phase; configure IterationScheduler with a very short max_seconds (or a
max_experiments that causes early stop) and a paper_threshold such that the
target branch crosses the threshold only by results that complete during drain;
run sched.run(seed) and then assert the scheduler report contains the trigger
for that branch (for example by checking report.triggers or by asserting
any(t.branch == "b0" for t in report.triggers) — adapt the exact assertion to
the report field used in the codebase).

In `@phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py`:
- Around line 183-186: The loop mutates the input MiniPaper by calling
sec.cites.append(...) on mini.sections; instead leave mini untouched and build
new Section objects (or shallow copies) for the output Paper, creating each
section's cites list as a copy of sec.cites plus the extra f"{branch}-baseline"
where needed; specifically, when iterating over mini.sections, do not call
sec.cites.append — construct a new section value (copying fields and using
sec.cites + [f"{branch}-baseline"] for the intro case) and add that to the
Paper.sections result so MiniPaper and mini remain side-effect free.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1df3182b-0e2c-486c-aab4-e69bbca980b1

📥 Commits

Reviewing files that changed from the base of the PR and between c1374e1 and 49c52d4.

📒 Files selected for processing (17)

catalog.json
phases/19-capstone-projects/54-paper-writer/code/main.py
phases/19-capstone-projects/54-paper-writer/code/tests/test_paper_writer.py
phases/19-capstone-projects/54-paper-writer/docs/en.md
phases/19-capstone-projects/54-paper-writer/quiz.json
phases/19-capstone-projects/55-critic-loop/code/main.py
phases/19-capstone-projects/55-critic-loop/code/tests/test_critic_loop.py
phases/19-capstone-projects/55-critic-loop/docs/en.md
phases/19-capstone-projects/55-critic-loop/quiz.json
phases/19-capstone-projects/56-iteration-scheduler/code/main.py
phases/19-capstone-projects/56-iteration-scheduler/code/tests/test_scheduler.py
phases/19-capstone-projects/56-iteration-scheduler/docs/en.md
phases/19-capstone-projects/56-iteration-scheduler/quiz.json
phases/19-capstone-projects/57-end-to-end-research-demo/code/main.py
phases/19-capstone-projects/57-end-to-end-research-demo/code/tests/test_e2e.py
phases/19-capstone-projects/57-end-to-end-research-demo/docs/en.md
phases/19-capstone-projects/57-end-to-end-research-demo/quiz.json

Reject duplicate bibliography keys during validation so render_bibtex cannot emit duplicate BibTeX entries that would break downstream compilation.

- add-related-work-section now fills an empty existing section instead of no-op'ing, so convergence cannot stall when the title already exists - Convergence doc + diagram say all five dimensions must hit target_score, matching CriticLoop._target_met - Deterministic-scoring prose says clarity tracks average section body length (matching deterministic_score), not section count - Quiz: reviser receives (paper, suggestions), matching Reviser.__call__

- Result and drain paths now use stats.setdefault, so a runner that returns an unseen branch records the result instead of crashing - Drain phase evaluates paper.trigger after updating stats, so valid triggers fired post-budget/deadline are no longer dropped - Concurrency doc reflects the FIRST_COMPLETED in-flight wait + fixed delay_ms, matching the runner implementation

Align stale docs with current code/tests: - Stage-failure paragraph now references the real picker tests (NoTriggerError / BestResultError) instead of a non-existent empty-seed short-circuit test - mini_to_full_paper attaches one figure for the selected branch (not one per high-yield branch) - Replace stale build_paper_from_result reference with the actual exposed pair build_mini_paper + mini_to_full_paper

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

phases/19-capstone-projects/56-iteration-scheduler/code/main.py (1)
261-277: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't block past the deadline stop.

If stop_reason is "deadline", Lines 261-263 still await every in-flight task to completion. That can push wall_seconds well past max_seconds, so the scheduler no longer honors the advertised hard wall-clock budget. Split the drain path so deadline stops cancel or skip outstanding work, and keep full draining only for non-deadline exits.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phases/19-capstone-projects/56-iteration-scheduler/code/main.py` around lines
261 - 277, The loop that awaits every task in in_flight regardless of
stop_reason can exceed the hard wall-clock budget; change the drain logic in
main.py so that when self.stop_reason == "deadline" you do not await all
outstanding tasks: iterate over in_flight keys and for each outstanding
asyncio.Task call task.cancel() (and optionally await them with
return_exceptions or simply pop them), removing them from the in_flight dict
immediately, whereas keep the existing await-and-process behavior only for
non-deadline exits so BranchStats updates, bs.runs/bs.reward_sum increments,
paper trigger checks (paper_threshold / bs.mean) and TraceEvent appends still
run in the normal draining path.

♻️ Duplicate comments (1)

phases/19-capstone-projects/57-end-to-end-research-demo/docs/en.md (1)
80-80: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix stale test names in the failure-mode paragraph.

The referenced test names do not match current code/tests/test_e2e.py (test_no_trigger_raises, test_orphan_trigger_raises_best_result_error). Please rename them here (and avoid claiming writer non-invocation is pinned unless a direct test exists).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@phases/19-capstone-projects/57-end-to-end-research-demo/docs/en.md` at line
80, Update the failure-mode paragraph to use the current test names: replace
`test_no_triggers_raises_typed_error` with `test_no_trigger_raises` and
`test_best_picker_raises_when_no_triggers` with
`test_orphan_trigger_raises_best_result_error`, and remove or soften the
absolute claim that "the writer is never invoked" unless there is a direct test
asserting writer non-invocation (instead state that tests assert the picker
raises `NoTriggerError` / `BestResultError` when no branch fired a trigger).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@phases/19-capstone-projects/55-critic-loop/docs/en.md`:
- Line 76: The opening sentence incorrectly states "three signals" but the
critic actually scores five dimensions; update that phrase in the paragraph to
"five signals" (or "five scored signals") so it matches the subsequent list
(average section body length (clarity), figure count and citation count
(evidence), `originality_tag` (novelty), and the methodology/related-work
signals) — locate the sentence mentioning "three signals" near the description
of the shipped critic and revise it accordingly.

---

Outside diff comments:
In `@phases/19-capstone-projects/56-iteration-scheduler/code/main.py`:
- Around line 261-277: The loop that awaits every task in in_flight regardless
of stop_reason can exceed the hard wall-clock budget; change the drain logic in
main.py so that when self.stop_reason == "deadline" you do not await all
outstanding tasks: iterate over in_flight keys and for each outstanding
asyncio.Task call task.cancel() (and optionally await them with
return_exceptions or simply pop them), removing them from the in_flight dict
immediately, whereas keep the existing await-and-process behavior only for
non-deadline exits so BranchStats updates, bs.runs/bs.reward_sum increments,
paper trigger checks (paper_threshold / bs.mean) and TraceEvent appends still
run in the normal draining path.

---

Duplicate comments:
In `@phases/19-capstone-projects/57-end-to-end-research-demo/docs/en.md`:
- Line 80: Update the failure-mode paragraph to use the current test names:
replace `test_no_triggers_raises_typed_error` with `test_no_trigger_raises` and
`test_best_picker_raises_when_no_triggers` with
`test_orphan_trigger_raises_best_result_error`, and remove or soften the
absolute claim that "the writer is never invoked" unless there is a direct test
asserting writer non-invocation (instead state that tests assert the picker
raises `NoTriggerError` / `BestResultError` when no branch fired a trigger).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 03fd4c2e-6909-4d54-b61d-77756a8c0a4f

📥 Commits

Reviewing files that changed from the base of the PR and between 49c52d4 and 2556e28.

📒 Files selected for processing (7)

phases/19-capstone-projects/54-paper-writer/code/main.py
phases/19-capstone-projects/55-critic-loop/code/main.py
phases/19-capstone-projects/55-critic-loop/docs/en.md
phases/19-capstone-projects/55-critic-loop/quiz.json
phases/19-capstone-projects/56-iteration-scheduler/code/main.py
phases/19-capstone-projects/56-iteration-scheduler/docs/en.md
phases/19-capstone-projects/57-end-to-end-research-demo/docs/en.md

✅ Files skipped from review due to trivial changes (2)

phases/19-capstone-projects/56-iteration-scheduler/docs/en.md
phases/19-capstone-projects/55-critic-loop/quiz.json

coderabbitai · 2026-05-26T20:13:02Z

+
+## The deterministic critic in this lesson
+
+The lesson does not call a model. The shipped critic is a callable that scores a draft based on three signals: average section body length (clarity), figure count and citation count (evidence), and an `originality_tag` field on the paper metadata (novelty). The reviser knows how to push each score upward.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix scoring-signal count for consistency.

This sentence says “three signals,” but the section documents five scored dimensions/signals right below (including methodology and related-work). Please align the count in this line to avoid confusion.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@phases/19-capstone-projects/55-critic-loop/docs/en.md` at line 76, The opening sentence incorrectly states "three signals" but the critic actually scores five dimensions; update that phrase in the paragraph to "five signals" (or "five scored signals") so it matches the subsequent list (average section body length (clarity), figure count and citation count (evidence), `originality_tag` (novelty), and the methodology/related-work signals) — locate the sentence mentioning "three signals" near the description of the shipped critic and revise it accordingly.

# Conflicts: # catalog.json

feat(phase-19): track D auto research lessons 54-57

rohitg00 added 4 commits May 26, 2026 19:22

feat(phase-19/54): add paper-writer deep capstone

89dce7d

feat(phase-19/55): add critic-loop deep capstone

adad488

feat(phase-19/56): add iteration-scheduler deep capstone

e63bd43

feat(phase-19/57): add end-to-end-research-demo deep capstone

15fc990

chore(catalog): auto-regen

49c52d4

vercel Bot deployed to Preview May 26, 2026 18:31 View deployment

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

rohitg00 added 4 commits May 26, 2026 20:57

fix(phase-19/54): address CodeRabbit review

5ecb3c4

Reject duplicate bibliography keys during validation so render_bibtex cannot emit duplicate BibTeX entries that would break downstream compilation.

vercel Bot deployed to Preview May 26, 2026 20:02 View deployment

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into feat/phase-19-track-d2

73bd748

# Conflicts: # catalog.json

vercel Bot deployed to Preview May 27, 2026 09:14 View deployment

rohitg00 merged commit 01f8413 into main May 27, 2026
5 checks passed

rohitg00 deleted the feat/phase-19-track-d2 branch May 27, 2026 18:41

JerryC8080 pushed a commit to JerryC8080/ai-engineering-from-scratch-zh that referenced this pull request Jun 1, 2026

Merge pull request rohitg00#203 from rohitg00/feat/phase-19-track-d2

125b1f5

feat(phase-19): track D auto research lessons 54-57

JerryC8080 pushed a commit to JerryC8080/ai-engineering-from-scratch-zh that referenced this pull request Jun 1, 2026

Merge pull request rohitg00#203 from rohitg00/feat/phase-19-track-d2

ceb8f70

feat(phase-19): track D auto research lessons 54-57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(phase-19): track D auto research lessons 54-57#203

feat(phase-19): track D auto research lessons 54-57#203
rohitg00 merged 10 commits into
mainfrom
feat/phase-19-track-d2

rohitg00 commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## The deterministic critic in this lesson

		The lesson does not call a model. The shipped critic is a callable that scores a draft based on three signals: average section body length (clarity), figure count and citation count (evidence), and an `originality_tag` field on the paper metadata (novelty). The reviser knows how to push each score upward.

Uh oh!

Conversation

rohitg00 commented May 26, 2026

Summary

Stats

Test plan

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 26, 2026 •

edited

Loading