Skip to content

Commit b61567b

Browse files
chore(release): bump to 0.12.0
SM hardening release: cross-trace generalization gate, action-equivalence rule, atomicity check, ICL-grounded insight format, evidence-only tagging, broaden-via-comparison, prompt caching, removed harmful-count hard removal cap, behavior spec + harness. Drops Skillbook v1 legacy aliases. Submodule fix lets tau-bench retail produce real benchmark numbers.
1 parent 1c00e84 commit b61567b

3 files changed

Lines changed: 47 additions & 2 deletions

File tree

CHANGELOG.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,51 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.12.0] - 2026-05-06
11+
12+
### Added
13+
- **Cross-trace generalization gate** for the SkillManager — four-criterion check
14+
(≥3 instances across ≥2 domains, named slot, no API-specific params in the
15+
action, verifiable runtime trigger) that constrains when SM may write a broad
16+
skill subsuming existing narrow ones. Backed by [skill_generalization.md](ace-eval/research/skill_generalization.md)
17+
(14 cited sources).
18+
- **Action-equivalence rule** for within-run skill writing — splits on action,
19+
not on trigger surface. Prevents over-decomposition of structurally identical
20+
rules.
21+
- **Atomicity rule** in `insight` formatting — one trigger + one action per
22+
skill, with explicit good/bad shape examples in the prompt.
23+
- **Insight format guidance** in the SM prompt sourced from the in-context-
24+
learning research doc ([icl_skill_formatting.md](ace-eval/research/icl_skill_formatting.md)) — 15-50 word cap, imperative
25+
voice, positive framing default, examples only for format/shape rules.
26+
- **Evidence-only tagging** — SM tags only skills the reflection actually
27+
implicates, instead of iterating over every injected_skill_id.
28+
- **Broaden-via-comparison rule** for UPDATE — when two skills target the same
29+
root cause in different niches, broaden `issue` rather than adding a duplicate.
30+
- **Prompt caching for SM** via `CachePoint(ttl="5m")` mirroring RR's caching;
31+
cache_read/write tokens forwarded in run metadata.
32+
- **SM behavior spec + harness**`ace-eval/scripts/sm_behavior_check.py`,
33+
`sm_iterative_check.py`, `sm_stability_check.py` and matching scenario
34+
fixtures cover replay stability, convergence, scope expansion, and the
35+
below-threshold gate boundary.
36+
37+
### Changed
38+
- **`update_skills` signature**`source` is now optional; `SkillbookView`
39+
was dropped from the parameter list (callers pass the real `Skillbook`
40+
directly).
41+
- **Hard removal cap removed** — SM no longer auto-removes skills whose
42+
`harmful_count >= 3`. Heavily-used skills can legitimately accumulate
43+
harmful tags without being net-negative; REMOVE now requires explicit
44+
reflection evidence.
45+
- **TauBench evaluator**`evaluation_type=ALL_WITH_NL_ASSERTIONS` on both
46+
`run_task` and `run_tasks` call sites in
47+
`ace-eval/src/ace_eval/e2e/benchmarks/tau_bench.py`. Retail (and any future
48+
benchmark with `NL_ASSERTION` in `reward_basis`) now produces real reward
49+
numbers instead of crashing on every task during reward computation.
50+
51+
### Removed
52+
- **Skillbook v1 legacy aliases** on `Skill` and `UpdateOperation` — v2 schema
53+
is now the only schema.
54+
1055
## [0.11.0] - 2026-04-29
1156

1257
### Added

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "ace-framework"
7-
version = "0.11.0"
7+
version = "0.12.0"
88
description = "Build self-improving AI agents that learn from experience"
99
readme = "README.md"
1010
requires-python = ">=3.12"

uv.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)