tcell: applying the autoresearch loop to cognitive bias detection in AI agents #425
VictorVVedtion
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
autoresearch showed that an autonomous loop of modify → train → evaluate → keep/discard can produce real gains on val_bpb overnight. I've been exploring whether the same loop works for a different problem: catching cognitive biases in AI agent outputs.
The problem
I was using Opus 4.6 to generate SFT training data. The agent self-evaluated: "100% pass rate, quality score 1.000." I spawned an independent subagent (same model, fresh context) to double-check. It scored the data 5.5/10 and found 5 critical issues:
Key realization: context isolation, not model diversity, is what matters. Same model, fresh context, catches the blindspots.
How tcell maps to autoresearch
prepare.pyprepare.pytrain.pycritics/*.mdprogram.mdprogram.mdval_bpbdetection_rateresults.tsvresults.tsvCritics self-evolve through mutation → replay on known blindspots (canaries) → keep/discard based on detection rate improvement. Same loop, different domain.
After 5 evolution iterations, the overconfidence critic reached 80% detection rate on canaries with 0% false positives. Still in cold start (8 canaries, need 20 for full evolution mode).
Open question
autoresearch optimizes a fixed metric (val_bpb) on a fixed eval set. tcell's "eval set" (canaries) grows over time as new blindspots are discovered. How do you think about evolving the evaluation criteria alongside the system being optimized? Is that a feature or a bug?
Repo: https://github.com/VictorVVedtion/tcell
Would love feedback from anyone who's thought about applying the autoresearch pattern outside of model training.
Beta Was this translation helpful? Give feedback.
All reactions