Skip to content

Commit e039893

Browse files
authored
feat(agent): Discover Claude Code models dynamically from the CLI (#244)
## Motivation Previously, LeapMux used a static catalog to determine which Claude Code models and effort levels were available. This meant the catalog could fall out of sync with what the user's installed CLI actually supports -- new models would be missing, removed models would appear, and plan-tier differences were invisible to the system. The Claude Code CLI already reports its full model list (including unavailable ones) in the `initialize` response. This change harvests that data to build a per-agent dynamic catalog at runtime, with the static catalog retained as a fallback for CLIs that predate the feature. ## Modifications - Added `claudeCodeModelInfo` struct and `convertClaudeModels()` to parse the `models`/`unavailable_models` arrays from the CLI initialize response into LeapMux `AvailableModels`, including effort ordering, context-window inference from the `[1m]` marker, and deduplication. - Introduced `effortResolver`, which wraps a dynamic catalog plus a static fallback for all capability queries (`supports`, `supportsUltracode`, `resolveEffort`, `contextWindow`). Constructed static-only at launch and promoted to dynamic+fallback post-init, so a model the live CLI dropped still resolves real capabilities from the fallback. - Extracted `runStartupHandshake()` from `StartClaudeCode` and moved startup effort reconciliation onto `effortResolver` (`reconcileStartupFlags`, `reconciledEffortFlags`, `reconcileOmittedLaunch`). - Added the "default" model sentinel (`DefaultModelSentinel`) -- a selectable entry that tracks the account's own default. Launch omits `--model`/`--effort` when it is selected so the CLI resolves the concrete model; the resolved identity is reported back via `get_settings`. - Extended Ultracode (xhigh+workflow tier) to any model whose CLI effort levels include `xhigh`, removing the previous Opus-only restriction. The decode and live-update paths gate the ultracode strip on the model being *known*, so a model in neither catalog running ultracode is left as the CLI applied it rather than silently downgraded. - Added `windowModel` to `contextUsageSnapshot` and three focused methods (`reseedWindow`, `adoptResultWindow`, `buildBroadcast`) to re-seed the context window on model change, omit it when unknown, and debounce the usage broadcast. - Fixed `findPrimaryContextWindow` to use `normalizeClaudeCodeModel` equality instead of substring matching, eliminating false positives such as "opus" matching "opusplus". - Replaced `buildShellWrappedCommand`'s positional parameter tail with a `shellWrapSpec` struct (all 9 provider `Start` functions migrated to named-field literals), and added a `ProbeThirdParty` field so Claude's default-model launch still detects a provider configured only in the user's shell profile. - Added `defaultModelForList()` with 3-tier badge priority (operator env override -> provider-reported sentinel -> configured default); `reportModelChange()`/`buildSettingsChanges()` suppress spurious model-change events on re-spelling but report the sentinel resolving to a concrete model as a real transition; the live-update path persists the settled model/effort to prevent DB drift. - `stripDefaultModelBadge()` strips the transient `IsDefault` badge and drops nil catalog entries before persistence, so the DB never stores a derived badge or a hollow `{}`. - Frontend: `DEFAULT_CLAUDE_MODEL` changed from `'opus[1m]'` to `'default'`; the sentinel renders as "Default (recommended)". Fable added to the static fallback catalog. ## Result The model picker in the Claude agent settings panel is now populated from the live CLI response rather than a hard-coded list. Models unavailable on the current plan are hidden automatically, and new models appear as soon as the CLI reports them -- no LeapMux update required. Selecting "Default (recommended)" lets the CLI pick the concrete model based on the user's account and plan tier. The resolved model is reflected back to the UI via `get_settings` so the chat header and settings panel stay in sync. Old CLI versions that do not include a `models` array in their initialize response fall back to the static catalog transparently, with no change in behavior.
1 parent 86e90ac commit e039893

24 files changed

Lines changed: 3149 additions & 439 deletions

backend/internal/worker/agent/claude.go

Lines changed: 884 additions & 135 deletions
Large diffs are not rendered by default.

backend/internal/worker/agent/claude_livesettings_test.go

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,17 @@ func TestNormalizeClaudeCodeModel(t *testing.T) {
210210
"claude-haiku-4-5-20251001": "haiku",
211211
"claude-haiku-4-5": "haiku",
212212
"claude-sonnet-4-5-20240101": "sonnet",
213+
// Version-first ids (numeric tokens leading the family): the family alias is
214+
// found by skipping the leading version tokens, where a from-position-0 alpha
215+
// scan would return "" and leak the raw id.
216+
"claude-3-5-sonnet": "sonnet",
217+
"claude-3-5-haiku-20241022": "haiku",
218+
"3-7-sonnet": "sonnet",
219+
// Mixed-case CLI values collapse to the lowercase alias space (S3), so a
220+
// running model still matches its own (lowercase) catalog entry.
221+
"OPUS[1M]": "opus[1m]",
222+
"Claude-Sonnet-4-6": "sonnet",
223+
"Opus": "opus",
213224
// Degenerate input.
214225
"": "",
215226
"unknown-thing": "unknown",
@@ -531,6 +542,55 @@ func TestHandlePendingControlResponse_ParsesInitializeFields(t *testing.T) {
531542
assert.NotEmpty(t, result.RawResponse)
532543
}
533544

545+
func TestHandlePendingControlResponse_ParsesModels(t *testing.T) {
546+
a := &ClaudeCodeAgent{
547+
processBase: processBase{agentID: "test"},
548+
pendingControl: make(map[string]chan<- claudeCodeControlResult),
549+
}
550+
551+
ch := make(chan claudeCodeControlResult, 1)
552+
a.registerPendingControl("req-models", ch)
553+
554+
// An initialize response carrying the model catalog: a "default" alias
555+
// sentinel, a disabled entry, the [1m] variants, an effort-bearing Fable, a
556+
// no-effort Haiku, plus a separate unavailable_models list.
557+
raw := []byte(`{"type":"control_response","response":{` +
558+
`"subtype":"success","request_id":"req-models","response":{` +
559+
`"models":[` +
560+
`{"value":"default","displayName":"Default","description":"d"},` +
561+
`{"value":"fable","displayName":"Fable 5","description":"Most powerful for the hardest problems","supportsEffort":true,"supportedEffortLevels":["low","medium","high","xhigh","max"]},` +
562+
`{"value":"fable[1m]","displayName":"Fable 5 (1M context)","description":"Most powerful for the hardest problems","supportsEffort":true,"supportedEffortLevels":["low","medium","high","xhigh","max"]},` +
563+
`{"value":"haiku","displayName":"Haiku","description":"Fastest for quick answers","supportsEffort":false},` +
564+
`{"value":"internal-preview","displayName":"Internal","disabled":true}` +
565+
`],` +
566+
`"unavailable_models":[{"value":"zdr-blocked","displayName":"Blocked"}]` +
567+
`}}}`)
568+
569+
line := &parsedLine{Type: "control_response", Raw: raw}
570+
assert.True(t, a.handlePendingControlResponse(line))
571+
572+
result := <-ch
573+
assert.True(t, result.Success)
574+
require.Len(t, result.Models, 5, "all raw entries decode, including default/disabled")
575+
require.Len(t, result.UnavailableModels, 1)
576+
assert.Equal(t, "fable", result.Models[1].Value)
577+
assert.True(t, result.Models[1].SupportsEffort)
578+
assert.Equal(t, []string{"low", "medium", "high", "xhigh", "max"}, result.Models[1].SupportedEffortLevels)
579+
assert.True(t, result.Models[4].Disabled)
580+
assert.Equal(t, "zdr-blocked", result.UnavailableModels[0].Value)
581+
582+
// Conversion drops the disabled entry, surfaces the "default" sentinel as a
583+
// selectable option, and surfaces Fable with its full effort menu and
584+
// inferred context windows.
585+
converted := claudeModelsByID(convertClaudeModels(result.Models, result.UnavailableModels))
586+
require.Contains(t, converted, "fable")
587+
require.Contains(t, converted, "default")
588+
require.NotContains(t, converted, "internal-preview")
589+
assert.Equal(t, "xhigh", converted["fable"].DefaultEffort)
590+
assert.Equal(t, int64(1_000_000), converted["fable[1m]"].ContextWindow)
591+
assert.Empty(t, converted["haiku"].SupportedEfforts)
592+
}
593+
534594
func TestHandlePendingControlResponse_ParsesGetSettingsResponse(t *testing.T) {
535595
a := &ClaudeCodeAgent{
536596
processBase: processBase{agentID: "test"},

backend/internal/worker/agent/claude_output.go

Lines changed: 140 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,81 @@ type contextUsageSnapshot struct {
5757
CacheCreationInputTokens int64
5858
CacheReadInputTokens int64
5959
ContextWindow int64
60-
LastBroadcast time.Time
60+
// windowModel is the model id ContextWindow was derived for. The snapshot
61+
// outlives a model change (a live model switch, or the account-default sentinel
62+
// resolving to a concrete model after startup), so the window is re-seeded from
63+
// the catalog whenever the current model no longer matches this -- otherwise a
64+
// session that began on the 200K sentinel placeholder (or a smaller-window model)
65+
// would under-report a larger window until a result message happened to refresh it.
66+
windowModel string
67+
LastBroadcast time.Time
68+
}
69+
70+
// reseedWindow updates the snapshot's catalog window estimate when the model it was
71+
// derived for no longer matches the current model: the snapshot outlives a model change
72+
// (a live switch, or the account-default sentinel resolving to a concrete model after
73+
// startup). It runs even when estimate is 0 (an unknown/unresolved model), so switching
74+
// to such a model CLEARS a stale larger window carried over from the previous model --
75+
// reverting to "unknown" rather than over-reporting -- and switching to a known model
76+
// picks up its estimate immediately. A result message's window stays authoritative for
77+
// its model because adoptResultWindow stamps windowModel too, so this estimate doesn't
78+
// clobber it for the same model.
79+
func (s *contextUsageSnapshot) reseedWindow(model string, estimate int64) {
80+
s.mu.Lock()
81+
defer s.mu.Unlock()
82+
if s.windowModel != model {
83+
s.ContextWindow = estimate
84+
s.windowModel = model
85+
}
86+
}
87+
88+
// adoptResultWindow records the authoritative context window a result message reported
89+
// for model, stamping windowModel so the catalog re-seed (reseedWindow) won't overwrite
90+
// it for the same model. A non-positive window is ignored: top-level result messages
91+
// always carry the primary model's window, but a subagent result that slipped past the
92+
// parent_tool_use_id guard would not, and must not clear the real window.
93+
func (s *contextUsageSnapshot) adoptResultWindow(model string, cw int64) {
94+
if cw <= 0 {
95+
return
96+
}
97+
s.mu.Lock()
98+
defer s.mu.Unlock()
99+
s.ContextWindow = cw
100+
s.windowModel = model
101+
}
102+
103+
// buildBroadcast assembles the context_usage broadcast payload from the current
104+
// snapshot and reports whether it should be sent. It returns (nil, false) when no
105+
// token usage has been recorded yet, or when the 10s debounce window has not elapsed
106+
// for a non-result message; a result message always broadcasts. When it decides to
107+
// broadcast it stamps LastBroadcast and includes context_window only when known
108+
// (> 0), matching the "omit when unknown" contract reseedWindow/adoptResultWindow
109+
// maintain. Takes s.mu, so the caller must not already hold it. now is passed in so
110+
// the debounce is testable without a real clock.
111+
func (s *contextUsageSnapshot) buildBroadcast(msgType string, now time.Time) (map[string]interface{}, bool) {
112+
s.mu.Lock()
113+
defer s.mu.Unlock()
114+
hasUsage := s.InputTokens > 0 || s.OutputTokens > 0 ||
115+
s.CacheCreationInputTokens > 0 || s.CacheReadInputTokens > 0
116+
if !hasUsage {
117+
return nil, false
118+
}
119+
shouldBroadcast := msgType == claudeMsgTypeResult ||
120+
now.Sub(s.LastBroadcast) >= 10*time.Second
121+
if !shouldBroadcast {
122+
return nil, false
123+
}
124+
s.LastBroadcast = now
125+
usageMap := map[string]interface{}{
126+
"input_tokens": s.InputTokens,
127+
"output_tokens": s.OutputTokens,
128+
"cache_creation_input_tokens": s.CacheCreationInputTokens,
129+
"cache_read_input_tokens": s.CacheReadInputTokens,
130+
}
131+
if s.ContextWindow > 0 {
132+
usageMap["context_window"] = s.ContextWindow
133+
}
134+
return usageMap, true
61135
}
62136

63137
// HandleOutput processes a single NDJSON line from Claude Code.
@@ -627,7 +701,39 @@ func (a *ClaudeCodeAgent) extractAndBroadcastUsage(env *messageEnvelope, msgType
627701
info["total_cost_usd"] = *env.CostUSD
628702
}
629703

704+
// Snapshot a.model and the effort resolver under a.mu in one acquisition: this
705+
// runs on the readOutputLoop goroutine while refreshSettingsFromAgent may
706+
// concurrently rewrite a.model under the same lock. a.availableModels (which the
707+
// resolver captures) is written once at startup -- before any assistant/result
708+
// output arrives (those need a user turn) -- and never mutated. The startup write
709+
// happens-before this read through the a.mu release/acquire pair: the startup
710+
// goroutine LOCKS a.mu in refreshSettingsFromAgent (which runs after the field
711+
// write) and the Lock below re-acquires it; the field need not be WRITTEN under the
712+
// lock, only released-then-acquired across goroutines. So the resolver needs the
713+
// lock only for the a.model read it pairs with here. The catalog entries are
714+
// immutable shared data, so the window lookup is safe to compute after unlocking.
715+
a.mu.Lock()
716+
model := a.model
717+
resolver := a.effortResolver()
718+
a.mu.Unlock()
719+
// The catalog window is an ESTIMATE inferred from the model id ("[1m]" => 1M, else
720+
// 200K). resolver.contextWindow resolves it over the dynamic catalog with the static
721+
// fallback -- the same dynamic-first-then-fallback the effort lookups use -- so a
722+
// model the live CLI dropped from its list but a resumed session is still running
723+
// keeps its known window instead of going dark. It is 0 only when the model has no
724+
// known window in EITHER catalog: the unresolved account-default sentinel (its entry
725+
// is a placeholder), or a model absent from both lists. We deliberately do NOT
726+
// fabricate a window then -- 0 means "unknown" and the broadcast omits context_window
727+
// below, matching the frontend, which likewise shows no window when it can't resolve
728+
// one. For a concrete model absent from both catalogs, a result message's modelUsage
729+
// supplies the real window once one arrives (findPrimaryContextWindow matches its
730+
// concrete key). The unresolved sentinel ("default") can't: it matches no concrete
731+
// usage key, so it stays unknown until the model resolves off the sentinel (a later
732+
// refreshSettingsFromAgent), whose model change re-seeds the window here.
733+
contextWindow := resolver.contextWindow(model)
734+
630735
snapshot := a.getOrCreateUsageSnapshot()
736+
snapshot.reseedWindow(model, contextWindow)
631737

632738
if msgType == claudeMsgTypeAssistant && env.Message.Usage != nil {
633739
u := env.Message.Usage
@@ -640,104 +746,67 @@ func (a *ClaudeCodeAgent) extractAndBroadcastUsage(env *messageEnvelope, msgType
640746
}
641747

642748
if msgType == claudeMsgTypeResult && env.ModelUsage != nil {
643-
// Find the context window for the primary model in the usage map.
644-
// Top-level result messages include cumulative session-level usage
645-
// that always contains the primary model's entry. Subagent results
646-
// (if they bypass the outer parent_tool_use_id guard) only contain
647-
// the subagent's model and will not match the primary, so we skip
648-
// the update to avoid overwriting with a smaller context window.
649-
if cw := findPrimaryContextWindow(env.ModelUsage, a.model); cw > 0 {
650-
snapshot.mu.Lock()
651-
snapshot.ContextWindow = cw
652-
snapshot.mu.Unlock()
653-
}
749+
// Find the context window for the primary model in the usage map. Top-level
750+
// result messages include cumulative session-level usage that always contains
751+
// the primary model's entry. Subagent results (if they bypass the outer
752+
// parent_tool_use_id guard) only contain the subagent's model and will not
753+
// match the primary; findPrimaryContextWindow returns 0 for that, and
754+
// adoptResultWindow ignores it rather than overwriting with a smaller window.
755+
snapshot.adoptResultWindow(model, findPrimaryContextWindow(env.ModelUsage, model))
654756
}
655757

656-
snapshot.mu.Lock()
657-
hasUsage := snapshot.InputTokens > 0 || snapshot.OutputTokens > 0 ||
658-
snapshot.CacheCreationInputTokens > 0 || snapshot.CacheReadInputTokens > 0
659-
if hasUsage {
660-
now := time.Now()
661-
shouldBroadcast := msgType == claudeMsgTypeResult ||
662-
now.Sub(snapshot.LastBroadcast) >= 10*time.Second
663-
if shouldBroadcast {
664-
snapshot.LastBroadcast = now
665-
usageMap := map[string]interface{}{
666-
"input_tokens": snapshot.InputTokens,
667-
"output_tokens": snapshot.OutputTokens,
668-
"cache_creation_input_tokens": snapshot.CacheCreationInputTokens,
669-
"cache_read_input_tokens": snapshot.CacheReadInputTokens,
670-
}
671-
if snapshot.ContextWindow > 0 {
672-
usageMap["context_window"] = snapshot.ContextWindow
673-
}
674-
info["context_usage"] = usageMap
675-
}
758+
if usageMap, ok := snapshot.buildBroadcast(msgType, time.Now()); ok {
759+
info["context_usage"] = usageMap
676760
}
677-
snapshot.mu.Unlock()
678761

679762
if len(info) > 0 {
680763
a.sink.BroadcastSessionInfo(info)
681764
}
682765
}
683766

767+
// getOrCreateUsageSnapshot returns the usage snapshot, creating an empty one on
768+
// first use. The window is NOT seeded here: every caller calls reseedWindow
769+
// immediately afterward, which is the single source of the estimated window (it
770+
// also stamps windowModel, which a constructor seed cannot). a.contextUsage is only
771+
// ever touched from the readOutputLoop goroutine, so it needs no lock of its own;
772+
// the snapshot's own fields are guarded by snapshot.mu.
684773
func (a *ClaudeCodeAgent) getOrCreateUsageSnapshot() *contextUsageSnapshot {
685774
if a.contextUsage == nil {
686-
a.contextUsage = &contextUsageSnapshot{
687-
ContextWindow: modelContextWindow(claudeCodeAvailableModels, a.model),
688-
}
775+
a.contextUsage = &contextUsageSnapshot{}
689776
}
690777
return a.contextUsage
691778
}
692779

693780
// modelContextWindow looks up the context window for a model ID from a list
694-
// of available models. Returns 0 if the model is not found.
781+
// of available models. Returns 0 if the model is not found. Delegates to
782+
// FindAvailableModel so the nil-entry guard and id match live in one place
783+
// rather than a fourth hand-copied catalog walk.
695784
func modelContextWindow(models []*leapmuxv1.AvailableModel, modelID string) int64 {
696-
for _, m := range models {
697-
if m.Id == modelID {
698-
return m.ContextWindow
699-
}
785+
if m := FindAvailableModel(models, modelID); m != nil {
786+
return m.ContextWindow
700787
}
701788
return 0
702789
}
703790

704-
// findPrimaryContextWindow extracts the context window for the primary model
705-
// from a modelUsage map. The modelUsage keys are full API model IDs (e.g.
706-
// "claude-opus-4-6[1m]") while shortModelID uses the short form (e.g.
707-
// "opus[1m]"). Returns 0 if the primary model is not found.
791+
// findPrimaryContextWindow extracts the context window for the primary model from a
792+
// modelUsage map. The modelUsage keys are full API model IDs (e.g.
793+
// "claude-opus-4-6[1m]") while shortModelID is the short alias (e.g. "opus[1m]").
794+
// Each key is collapsed into the alias space with normalizeClaudeCodeModel -- the
795+
// same normalization a.model and the catalog ids use -- and compared for EQUALITY,
796+
// so the match is exact rather than a substring scan: "opus" no longer matches an
797+
// unrelated "claude-opusplus-1" key, the "[1m]" variant is disambiguated by the
798+
// normalized suffix, and a "[1M]" spelling is handled (normalize lowercases). Returns
799+
// 0 if the primary model is not found.
708800
func findPrimaryContextWindow(modelUsage map[string]json.RawMessage, shortModelID string) int64 {
709801
if shortModelID == "" {
710-
// No primary model configured fall back to max across all models.
802+
// No primary model configured -- fall back to max across all models.
711803
return maxContextWindow(modelUsage)
712804
}
713-
714-
// Extract the family prefix and optional variant suffix from the short
715-
// model ID (e.g. "opus[1m]" → family "opus", suffix "[1m]").
716-
family := shortModelID
717-
suffix := ""
718-
if idx := strings.Index(shortModelID, "["); idx >= 0 {
719-
family = shortModelID[:idx]
720-
suffix = shortModelID[idx:]
721-
}
722-
805+
want := normalizeClaudeCodeModel(shortModelID)
723806
for key, raw := range modelUsage {
724-
if !strings.Contains(key, family) {
807+
if normalizeClaudeCodeModel(key) != want {
725808
continue
726809
}
727-
// When the short ID has a variant suffix (e.g. "[1m]"), the full
728-
// API ID must also contain it. When there is no suffix, reject
729-
// keys that contain a bracket-variant so "opus" does not match
730-
// "claude-opus-4-6[1m]".
731-
if suffix != "" {
732-
if !strings.Contains(key, suffix) {
733-
continue
734-
}
735-
} else {
736-
if strings.Contains(key, "[") {
737-
continue
738-
}
739-
}
740-
741810
if cw := contextWindowOf(raw); cw > 0 {
742811
return cw
743812
}

0 commit comments

Comments
 (0)