-
Notifications
You must be signed in to change notification settings - Fork 4
Budget and Cost Management
Monitor, control, and optimize the cost of every Claude agent query with per-request budgets, token tracking, cache metrics, and model selection strategies.
Every query to Claude has a cost determined by several factors:
| Factor | Impact |
|---|---|
| Model | Larger models (Opus) cost more per token than smaller models (Haiku) |
| Input tokens | The prompt, system prompt, and context sent to Claude |
| Output tokens | Claude's response text, tool calls, and reasoning |
| Thinking tokens | Extended thinking uses additional output tokens |
| Cache tokens | Cached input tokens are cheaper than fresh input tokens |
| Turns | More turns mean more API calls, each with its own token cost |
The SDK gives you tools to set hard limits, read detailed cost breakdowns, and apply strategies that reduce spend without sacrificing quality.
Use maxBudgetUsd() to set a hard spending cap on a single query. The Claude CLI will stop processing if the budget is exceeded, returning whatever results have been gathered so far.
use ClaudeAgentSDK\Facades\ClaudeAgent;
use ClaudeAgentSDK\Options\ClaudeAgentOptions;
$options = ClaudeAgentOptions::make()
->maxBudgetUsd(2.00)
->tools(['Read', 'Grep', 'Glob']);
$result = ClaudeAgent::query('Analyze the entire codebase for security issues', $options);
if ($result->isSuccess()) {
echo $result->text();
}
echo "Cost: \${$result->costUsd()}";Warning: When the budget is exceeded, Claude stops mid-task. The result may be incomplete. Always check
isSuccess()and handle partial results gracefully.
Set a global budget cap in your configuration file so every query has a safety net, even when individual queries do not specify one:
// config/claude-agent.php
'max_budget_usd' => env('CLAUDE_AGENT_MAX_BUDGET_USD', null),# .env
CLAUDE_AGENT_MAX_BUDGET_USD=5.00A per-query maxBudgetUsd() call overrides the config default for that specific query.
Each turn is a round-trip between Claude and the CLI (prompt, response, tool use, tool result). Limiting turns caps how many iterations Claude can perform, which indirectly limits cost:
$options = ClaudeAgentOptions::make()
->maxTurns(5)
->tools(['Read', 'Edit', 'Bash']);
$result = ClaudeAgent::query('Fix the failing test in UserTest.php', $options);
echo "Turns used: {$result->turns()}";Tip: For simple read-only queries,
maxTurns(3)is usually sufficient. For complex multi-file edits, you may need 10-20 turns.
Every QueryResult includes cost and usage information.
$result = ClaudeAgent::query('Explain the User model', $options);
echo "Cost: \${$result->costUsd()}"; // e.g. 0.0234
echo "Turns: {$result->turns()}"; // e.g. 3When Claude uses multiple models (e.g., a primary model plus a subagent model), modelUsage() returns a breakdown keyed by model identifier:
use ClaudeAgentSDK\Data\ModelUsage;
$result = ClaudeAgent::query('Review and test the auth module', $options);
foreach ($result->modelUsage() as $modelId => $usage) {
/** @var ModelUsage $usage */
echo "{$modelId}:\n";
echo " Input tokens: {$usage->inputTokens}\n";
echo " Output tokens: {$usage->outputTokens}\n";
echo " Cache read tokens: {$usage->cacheReadInputTokens}\n";
echo " Cache creation tokens: {$usage->cacheCreationInputTokens}\n";
echo " Web search requests: {$usage->webSearchRequests}\n";
echo " Cost: \${$usage->costUsd}\n";
echo " Context window: {$usage->contextWindow}\n";
echo " Total input tokens: {$usage->totalInputTokens()}\n";
echo " Cache hit rate: " . round($usage->cacheHitRate() * 100, 1) . "%\n";
}Convenience methods sum cache tokens across all models:
echo "Total cache read tokens: {$result->cacheReadTokens()}";
echo "Total cache creation tokens: {$result->cacheCreationTokens()}";The ModelUsage class breaks down token consumption into five categories:
| Property | Description |
|---|---|
$inputTokens |
Fresh (non-cached) input tokens sent to the model |
$outputTokens |
Tokens generated by the model (response text + tool calls) |
$cacheReadInputTokens |
Input tokens served from cache (cheaper than fresh) |
$cacheCreationInputTokens |
Input tokens written to cache for future reuse |
$webSearchRequests |
Number of web searches performed |
Two computed methods provide higher-level insights:
| Method | Returns | Description |
|---|---|---|
totalInputTokens() |
int |
Sum of inputTokens + cacheReadInputTokens + cacheCreationInputTokens
|
cacheHitRate() |
float |
Ratio of cache-read tokens to total input tokens (0.0 to 1.0) |
Cache tokens represent input tokens that the API has seen recently and can serve at a reduced cost. A high cache hit rate means you are paying less per input token.
foreach ($result->modelUsage() as $modelId => $usage) {
$rate = $usage->cacheHitRate();
if ($rate > 0.5) {
echo "{$modelId}: Good cache hit rate (" . round($rate * 100) . "%)\n";
} else {
echo "{$modelId}: Low cache hit rate (" . round($rate * 100) . "%) -- consider optimizing\n";
}
}Strategies to improve cache hit rates:
- Consistent system prompts -- Use the same system prompt across queries so the prompt prefix stays cached.
-
Session resumption -- Resume existing sessions with
resume($sessionId)instead of starting fresh. See Session Management. -
Stable tool configurations -- Keep your
tools()andmcpServer()setup consistent across related queries.
Choose the right model for the task. Smaller models cost significantly less per token:
// Simple classification or extraction -- use a smaller model
$options = ClaudeAgentOptions::make()
->model('haiku')
->maxTurns(1);
$result = ClaudeAgent::query('Classify this support ticket: ' . $ticket->body, $options);Use fallbackModel() to automatically downgrade when the primary model is unavailable or rate-limited:
$options = ClaudeAgentOptions::make()
->model('sonnet')
->fallbackModel('haiku');Every tool call triggers an additional turn, and Claude must reason about which tools to use. Fewer available tools means fewer turns and less reasoning overhead:
// Read-only analysis -- don't give write tools
$options = ClaudeAgentOptions::make()
->tools(['Read', 'Grep', 'Glob']);
// Full edit workflow -- allow writes but cap turns
$options = ClaudeAgentOptions::make()
->tools(['Read', 'Edit', 'Bash'])
->maxTurns(10);Set appropriate turn limits based on your use case:
| Use case | Suggested maxTurns
|
|---|---|
| Single question, no tools | 1 |
| Read-only code analysis | 3-5 |
| Single file edit | 5-10 |
| Multi-file refactor | 10-20 |
| Complex multi-step task | 20-30 |
Extended thinking improves quality for complex reasoning but consumes additional output tokens. Cap thinking tokens when the task is straightforward:
$options = ClaudeAgentOptions::make()
->maxThinkingTokens(5000); // Limit thinking overheadNote: Setting thinking tokens too low on complex tasks may reduce answer quality. Balance cost savings against the quality requirements of each use case.
When using Subagents, assign cheaper models to delegated tasks while keeping the primary agent on a more capable model:
use ClaudeAgentSDK\Agents\AgentDefinition;
use ClaudeAgentSDK\Options\ClaudeAgentOptions;
$options = ClaudeAgentOptions::make()
->model('sonnet')
->tools(['Read', 'Grep', 'Task'])
->agent('summarizer', new AgentDefinition(
description: 'Summarizes code files',
prompt: 'Summarize the given code file in 2-3 sentences.',
tools: ['Read'],
model: 'haiku', // Cheaper model for simple summarization
));Track costs in your application logs for visibility and auditing:
$result = ClaudeAgent::query($prompt, $options);
logger()->info('Claude agent query completed', [
'cost_usd' => $result->costUsd(),
'turns' => $result->turns(),
'duration_ms' => $result->durationMs(),
'success' => $result->isSuccess(),
'session_id' => $result->sessionId,
'cache_read' => $result->cacheReadTokens(),
'cache_creation' => $result->cacheCreationTokens(),
]);Store query costs in your database for aggregation and reporting:
use App\Models\AgentQueryLog;
$result = ClaudeAgent::query($prompt, $options);
AgentQueryLog::create([
'user_id' => auth()->id(),
'prompt' => $prompt,
'cost_usd' => $result->costUsd(),
'turns' => $result->turns(),
'duration_ms' => $result->durationMs(),
'model_usage' => collect($result->modelUsage())->map(fn ($u) => [
'input_tokens' => $u->inputTokens,
'output_tokens' => $u->outputTokens,
'cache_hit_rate' => $u->cacheHitRate(),
'cost_usd' => $u->costUsd,
])->toArray(),
'success' => $result->isSuccess(),
]);Use a Laravel event listener to alert when individual queries exceed a threshold:
// app/Listeners/MonitorAgentCost.php
namespace App\Listeners;
use App\Events\AgentQueryCompleted;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Notification;
use App\Notifications\HighCostAgentAlert;
class MonitorAgentCost
{
public function handle(AgentQueryCompleted $event): void
{
$cost = $event->result->costUsd();
if ($cost > 5.00) {
Log::warning('High-cost agent query detected', [
'cost_usd' => $cost,
'user_id' => $event->userId,
'turns' => $event->result->turns(),
]);
Notification::route('slack', config('services.slack.alerts_webhook'))
->notify(new HighCostAgentAlert($cost, $event->userId));
}
}
}Tip: Combine budget limits (
maxBudgetUsd) as a hard cap with monitoring and alerting as a soft warning system. The budget prevents runaway costs, while alerts help you tune your configurations over time.
- Configuration -- Set global defaults for budgets, models, and timeouts
- Options Reference -- Full list of all fluent options including cost-related methods
- Streaming -- Monitor costs in real time during streaming queries
- Subagents -- Assign cost-effective models to delegated tasks
- Session Management -- Resume sessions to benefit from token caching