You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Never use direct Anthropic API key or fall back to OpenAI — always use Bedrock via AWS_BEARER_TOKEN_BEDROCK
4
+
type: feedback
5
+
---
6
+
7
+
Always use Bedrock for LLM calls. Never use the Anthropic API key directly, never fall back to OpenAI or any other provider.
8
+
9
+
**Why:** The user has Bedrock configured with `AWS_BEARER_TOKEN_BEDROCK` and does not want direct Anthropic API usage (burns quota/money on the wrong account). Fallback logic is unacceptable — it silently uses the wrong provider.
10
+
11
+
**How to apply:** In integration tests and any code that needs an LLM model string, use `bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0` (or similar Bedrock model). Never write fallback chains like "if ANTHROPIC_KEY else OPENAI". Just use Bedrock, period.
Copy file name to clipboardExpand all lines: AGENTS.md
+40-54Lines changed: 40 additions & 54 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,72 +7,60 @@ This file provides guidance to coding agents working in this repository.
7
7
### Pipeline-First Development (MANDATORY)
8
8
**All new functionality MUST be implemented as pipeline Steps composed via the Pipeline engine.** Do NOT write standalone scripts, ad-hoc loops, or inline logic that bypasses the pipeline. Before writing any code:
9
9
10
-
1. Read `docs/design/PIPELINE_DESIGN.md` to understand the Step -> Pipeline -> Branch model.
10
+
1. Read `docs/design/PIPELINE_DESIGN.md` to understand the Step → Pipeline → Branch model.
11
11
2. Implement logic as a `Step` class with `requires`/`provides` declarations and a `__call__(self, ctx) -> ctx` method.
12
-
3. Compose steps using `Pipeline().then(...)` and `.branch(...)`- never manual for-loops or direct function chaining.
13
-
4. Use `StepContext.replace()` for immutable context updates - never mutate context directly.
12
+
3. Compose steps using `Pipeline().then(...)` and `.branch(...)`— never manual for-loops or direct function chaining.
13
+
4. Use `StepContext.replace()` for immutable context updates — never mutate context directly.
14
14
5. Put integration-specific data in `metadata`, not new context fields, unless the field is shared across multiple pipelines.
15
15
16
16
**Anti-patterns to reject:**
17
17
- Writing a function that calls multiple steps manually instead of composing them in a Pipeline
18
-
- Inline reflection/evaluation logic instead of creating a `ReflectStep` or `EvaluateStep`
18
+
- Inline reflection/evaluation logic instead of creating a ReflectStep or EvaluateStep
19
19
- Ad-hoc `ThreadPoolExecutor` usage instead of `async_boundary` and `max_workers` on steps
20
20
- Standalone scripts that duplicate pipeline functionality without using the pipeline engine
21
21
- Bypassing `requires`/`provides` contracts by accessing context fields not declared in `requires`
22
22
23
-
If a task seems like it cannot fit the pipeline model, explain why to the user before proceeding - do not silently circumvent it.
23
+
If a task seems like it cannot fit the pipeline model, explain why to the user before proceeding — do not silently circumvent it.
24
24
25
25
### Core Code Protection
26
-
**Do NOT modify core modules (`ace/`, `ace/core/`, `pipeline/`) without explicit user approval.** Before proposing any change to these directories:
26
+
**Do NOT modify core modules (`ace/core/`, `pipeline/`) without explicit user approval.** Before proposing any change to these directories:
27
27
1. Read the relevant design docs (`docs/design/ACE_ARCHITECTURE.md`, `docs/design/PIPELINE_DESIGN.md`) thoroughly.
28
-
2. Evaluate whether the change is truly required or if it can be achieved outside the core (for example, in an integration, step, or example).
29
-
3. Clearly explain the proposed change and its justification to the user before making any edits.
28
+
2. Evaluate whether the change is truly required or if it can be achieved outside the core (e.g., in an integration, step, or example).
29
+
3. Clearly explain the proposed change and its justification to the user **before** making any edits.
30
30
4. Wait for the user to explicitly accept before proceeding.
31
31
32
32
### Documentation Maintenance
33
33
Before working on code in `ace/`, read `docs/design/ACE_ARCHITECTURE.md` to understand the current architecture.
34
-
Before working on code in `pipeline/`, read `docs/design/PIPELINE_DESIGN.md` to understand the pipeline engine.
35
-
Before working on code in `ace/rr/`, read `docs/design/RR_DESIGN.md` to understand the recursive reflection design.
36
-
Before working on code in `ace/cli/`, read `docs/design/CLI_DESIGN.md` to understand the CLI architecture.
34
+
Before working on code in `pipeline/` or `ace/core/`, read `docs/design/PIPELINE_DESIGN.md` to understand the pipeline engine.
37
35
38
-
**Docs MUST be kept in sync with code.** Any change that alters a public API, renames a concept, adds or removes a module, or changes execution flow requires a corresponding update to the relevant docs. Do not merge code changes that make the documentation inaccurate.
36
+
**Docs MUST be kept in sync with code.** Any change that alters a public API, renames a concept, adds/removes a module, or changes execution flow **requires** a corresponding update to the relevant docs. Do not merge code changes that make the documentation inaccurate.
39
37
40
38
Key design docs:
41
-
-`docs/design/ACE_ARCHITECTURE.md` - core ACE architecture: roles, runners, skillbook, adaptation loops, integrations, and public API
`sm_iterative_check.py`, `sm_stability_check.py` and matching scenario
34
+
fixtures cover replay stability, convergence, scope expansion, and the
35
+
below-threshold gate boundary.
36
+
37
+
### Changed
38
+
-**`update_skills` signature** — `source` is now optional; `SkillbookView`
39
+
was dropped from the parameter list (callers pass the real `Skillbook`
40
+
directly).
41
+
-**Hard removal cap removed** — SM no longer auto-removes skills whose
42
+
`harmful_count >= 3`. Heavily-used skills can legitimately accumulate
43
+
harmful tags without being net-negative; REMOVE now requires explicit
44
+
reflection evidence.
45
+
-**TauBench evaluator** — `evaluation_type=ALL_WITH_NL_ASSERTIONS` on both
46
+
`run_task` and `run_tasks` call sites in
47
+
`ace-eval/src/ace_eval/e2e/benchmarks/tau_bench.py`. Retail (and any future
48
+
benchmark with `NL_ASSERTION` in `reward_basis`) now produces real reward
49
+
numbers instead of crashing on every task during reward computation.
50
+
51
+
### Removed
52
+
-**Skillbook v1 legacy aliases** on `Skill` and `UpdateOperation` — v2 schema
53
+
is now the only schema.
54
+
55
+
## [0.11.0] - 2026-04-29
56
+
57
+
### Added
58
+
-**`RecursiveAgent` core abstraction** — extracted from RR into `ace/core/recursive_agent.py`; provides a generic recursive PydanticAI agent with sandbox, microcompaction, default tool set, and depth-aware sub-agent registration. Reusable across roles beyond the Reflector.
59
+
-**Skillbook v2 schema** — full rewrite of `ace/core/skillbook.py` with section-grouped storage, richer `InsightSource` provenance, and BM25-backed retrieval (`rank-bm25` runtime dependency).
60
+
-**Agentic SkillManager** — `SkillManager` rewritten as a tool-calling loop (`ace/implementations/sm_tools.py`). Provenance is now populated by the SkillManager agent directly rather than a dedicated step.
61
+
-**RR skillbook tools for the Reflector** — Reflector can introspect and propose updates to the skillbook from inside the recursive loop.
62
+
-**Anthropic prompt caching enabled by default** for RR agents; `cache_read_tokens` and `cache_write_tokens` are forwarded in run metadata for cost accounting.
63
+
-**Logfire spans around recursive agent sessions** for end-to-end observability of nested RR runs.
64
+
-**Online / offline mode** in the ACE runner.
65
+
-**`nest-asyncio`** added to the dev extra to support nested loops in notebooks and live test scripts.
66
+
67
+
### Changed
68
+
-**RR collapsed into a single `RRStep`** — the orchestrator/worker split, batch machinery, and `AttachInsightSourcesStep` have been removed. RR now runs as a true recursive loop with depth-bounded sub-agent delegation and microcompaction of stale tool results.
69
+
-**Reflector prompts** simplified, deduplicated, and made input-agnostic; added early-skillbook-skim and parallel-tool guidance.
70
+
-**`record_observation` tool renamed to `think`** to clarify it is a scratch reasoning channel, not persistent storage.
71
+
-**Native evidence summaries** are produced inside RR before final synthesis.
72
+
-**Skillbook prompt format is now markdown** — `Skillbook.as_prompt()` returns a section-grouped markdown list instead of TOON. The `python-toon` dependency has been dropped.
73
+
-**`metered_model` and `sandbox`** moved from `ace/rr/` into `ace/core/` to reflect their cross-role use.
74
+
-**Pytest defaults** — `uv run pytest` now excludes `integration` and `requires_api` markers by default; coverage flags removed from `addopts` (run with `--cov` explicitly when needed).
75
+
-**Observability** — `tool_arguments` and `tool_response` are no longer scrubbed by the Logfire callback so tool I/O remains inspectable.
76
+
77
+
### Removed
78
+
-`ace/rr/` legacy package layout (`agent.py`, `runner.py`, `trace_context.py`, `message_trimming.py`, batch helpers). Functionality is now in `ace/core/recursive_agent.py` and `ace/implementations/rr/`.
79
+
-`AttachInsightSourcesStep` and its pipeline wiring — provenance is attached by the SkillManager agent.
80
+
-`python-toon` runtime dependency.
81
+
- TAG handling from the SkillManager.
82
+
- Citation scanning from the Reflector.
83
+
10
84
## [0.10.0] - 2026-04-13
11
85
12
86
### Added
@@ -32,7 +106,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
32
106
## [0.9.2] - 2026-03-31
33
107
34
108
### Added
35
-
-**Insight source provenance** — `InsightSource` typed model captures the origin of each skillbook update (trace ID, sample question, epoch/step, reflection summary, integration metadata); `AttachInsightSourcesStep` automatically enriches `UpdateBatch` operations with provenance and is wired into the default learning tail
109
+
-**Insight source provenance** — `InsightSource` typed model captures the origin of each skillbook update (trace ID, sample question, epoch/step, reflection summary, integration metadata); provenance is now populated by the SkillManager agent directly
36
110
-**Claude SDK step** — `ClaudeSDKStep` integration for running Claude Code sub-agents from within ACE pipelines
37
111
-**RR sub-agent code execution** — Recursive Reflector can now delegate to code-execution sub-agents at runtime
38
112
-**RR raw trace batch helpers** — `build_raw_trace_batches` and related runtime utilities for feeding raw traces directly into the RR pipeline
Copy file name to clipboardExpand all lines: CLAUDE.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ Key design docs:
57
57
58
58
### Commands
59
59
-`uv sync` — install all dependencies
60
-
-`uv run pytest` — run tests (coverage enforced `--cov-fail-under=25`)
60
+
-`uv run pytest` — run tests (excludes `integration` and `requires_api` markers by default)
61
61
-`uv run pytest -m unit` / `-m integration` / `-m slow` — run by marker
62
62
-`uv run black ace/ tests/ examples/` — format code
63
63
-`uv run mypy ace/` — type check
@@ -93,3 +93,7 @@ Key design docs:
93
93
|`ACELangChain`| LangChain | Wrap chains/agents with learning |
94
94
|`ACEBrowserUse`| browser-use | Browser automation with learning |
95
95
|`ACEClaudeCode`| Claude Code CLI | Coding tasks with learning |
96
+
97
+
NEVER USE FALLBACKS OR IMPLEMENT THINGS I NEVER ASKED FOR.
98
+
99
+
Keep your answers concise and to the point. If you don't know something, say you don't know instead of making assumptions or fabricating information. Always ask clarifying questions if the user's request is ambiguous or lacks necessary details.
0 commit comments