Skip to content

Budget and Cost Management

github-actions[bot] edited this page Feb 26, 2026 · 1 revision

Budget and Cost Management

Monitor, control, and optimize the cost of every Claude agent query with per-request budgets, token tracking, cache metrics, and model selection strategies.

Overview

Every query to Claude has a cost determined by several factors:

Factor Impact
Model Larger models (Opus) cost more per token than smaller models (Haiku)
Input tokens The prompt, system prompt, and context sent to Claude
Output tokens Claude's response text, tool calls, and reasoning
Thinking tokens Extended thinking uses additional output tokens
Cache tokens Cached input tokens are cheaper than fresh input tokens
Turns More turns mean more API calls, each with its own token cost

The SDK gives you tools to set hard limits, read detailed cost breakdowns, and apply strategies that reduce spend without sacrificing quality.

Setting Budget Limits

Per-Query Budget

Use maxBudgetUsd() to set a hard spending cap on a single query. The Claude CLI will stop processing if the budget is exceeded, returning whatever results have been gathered so far.

use ClaudeAgentSDK\Facades\ClaudeAgent;
use ClaudeAgentSDK\Options\ClaudeAgentOptions;

$options = ClaudeAgentOptions::make()
    ->maxBudgetUsd(2.00)
    ->tools(['Read', 'Grep', 'Glob']);

$result = ClaudeAgent::query('Analyze the entire codebase for security issues', $options);

if ($result->isSuccess()) {
    echo $result->text();
}

echo "Cost: \${$result->costUsd()}";

Warning: When the budget is exceeded, Claude stops mid-task. The result may be incomplete. Always check isSuccess() and handle partial results gracefully.

Config-Level Default

Set a global budget cap in your configuration file so every query has a safety net, even when individual queries do not specify one:

// config/claude-agent.php
'max_budget_usd' => env('CLAUDE_AGENT_MAX_BUDGET_USD', null),
# .env
CLAUDE_AGENT_MAX_BUDGET_USD=5.00

A per-query maxBudgetUsd() call overrides the config default for that specific query.

Turn Limits as Indirect Cost Control

Each turn is a round-trip between Claude and the CLI (prompt, response, tool use, tool result). Limiting turns caps how many iterations Claude can perform, which indirectly limits cost:

$options = ClaudeAgentOptions::make()
    ->maxTurns(5)
    ->tools(['Read', 'Edit', 'Bash']);

$result = ClaudeAgent::query('Fix the failing test in UserTest.php', $options);
echo "Turns used: {$result->turns()}";

Tip: For simple read-only queries, maxTurns(3) is usually sufficient. For complex multi-file edits, you may need 10-20 turns.

Reading Cost Data

Every QueryResult includes cost and usage information.

Total Cost

$result = ClaudeAgent::query('Explain the User model', $options);

echo "Cost: \${$result->costUsd()}";   // e.g. 0.0234
echo "Turns: {$result->turns()}";       // e.g. 3

Per-Model Breakdown

When Claude uses multiple models (e.g., a primary model plus a subagent model), modelUsage() returns a breakdown keyed by model identifier:

use ClaudeAgentSDK\Data\ModelUsage;

$result = ClaudeAgent::query('Review and test the auth module', $options);

foreach ($result->modelUsage() as $modelId => $usage) {
    /** @var ModelUsage $usage */
    echo "{$modelId}:\n";
    echo "  Input tokens:          {$usage->inputTokens}\n";
    echo "  Output tokens:         {$usage->outputTokens}\n";
    echo "  Cache read tokens:     {$usage->cacheReadInputTokens}\n";
    echo "  Cache creation tokens: {$usage->cacheCreationInputTokens}\n";
    echo "  Web search requests:   {$usage->webSearchRequests}\n";
    echo "  Cost:                  \${$usage->costUsd}\n";
    echo "  Context window:        {$usage->contextWindow}\n";
    echo "  Total input tokens:    {$usage->totalInputTokens()}\n";
    echo "  Cache hit rate:        " . round($usage->cacheHitRate() * 100, 1) . "%\n";
}

Aggregate Cache Metrics

Convenience methods sum cache tokens across all models:

echo "Total cache read tokens:     {$result->cacheReadTokens()}";
echo "Total cache creation tokens: {$result->cacheCreationTokens()}";

Understanding Token Usage

The ModelUsage class breaks down token consumption into five categories:

Property Description
$inputTokens Fresh (non-cached) input tokens sent to the model
$outputTokens Tokens generated by the model (response text + tool calls)
$cacheReadInputTokens Input tokens served from cache (cheaper than fresh)
$cacheCreationInputTokens Input tokens written to cache for future reuse
$webSearchRequests Number of web searches performed

Two computed methods provide higher-level insights:

Method Returns Description
totalInputTokens() int Sum of inputTokens + cacheReadInputTokens + cacheCreationInputTokens
cacheHitRate() float Ratio of cache-read tokens to total input tokens (0.0 to 1.0)

Cache Optimization

Cache tokens represent input tokens that the API has seen recently and can serve at a reduced cost. A high cache hit rate means you are paying less per input token.

foreach ($result->modelUsage() as $modelId => $usage) {
    $rate = $usage->cacheHitRate();

    if ($rate > 0.5) {
        echo "{$modelId}: Good cache hit rate (" . round($rate * 100) . "%)\n";
    } else {
        echo "{$modelId}: Low cache hit rate (" . round($rate * 100) . "%) -- consider optimizing\n";
    }
}

Strategies to improve cache hit rates:

  • Consistent system prompts -- Use the same system prompt across queries so the prompt prefix stays cached.
  • Session resumption -- Resume existing sessions with resume($sessionId) instead of starting fresh. See Session Management.
  • Stable tool configurations -- Keep your tools() and mcpServer() setup consistent across related queries.

Cost Optimization Strategies

Model Selection

Choose the right model for the task. Smaller models cost significantly less per token:

// Simple classification or extraction -- use a smaller model
$options = ClaudeAgentOptions::make()
    ->model('haiku')
    ->maxTurns(1);

$result = ClaudeAgent::query('Classify this support ticket: ' . $ticket->body, $options);

Use fallbackModel() to automatically downgrade when the primary model is unavailable or rate-limited:

$options = ClaudeAgentOptions::make()
    ->model('sonnet')
    ->fallbackModel('haiku');

Tool Restrictions

Every tool call triggers an additional turn, and Claude must reason about which tools to use. Fewer available tools means fewer turns and less reasoning overhead:

// Read-only analysis -- don't give write tools
$options = ClaudeAgentOptions::make()
    ->tools(['Read', 'Grep', 'Glob']);

// Full edit workflow -- allow writes but cap turns
$options = ClaudeAgentOptions::make()
    ->tools(['Read', 'Edit', 'Bash'])
    ->maxTurns(10);

Turn Limits

Set appropriate turn limits based on your use case:

Use case Suggested maxTurns
Single question, no tools 1
Read-only code analysis 3-5
Single file edit 5-10
Multi-file refactor 10-20
Complex multi-step task 20-30

Thinking Token Limits

Extended thinking improves quality for complex reasoning but consumes additional output tokens. Cap thinking tokens when the task is straightforward:

$options = ClaudeAgentOptions::make()
    ->maxThinkingTokens(5000);  // Limit thinking overhead

Note: Setting thinking tokens too low on complex tasks may reduce answer quality. Balance cost savings against the quality requirements of each use case.

Subagent Model Strategy

When using Subagents, assign cheaper models to delegated tasks while keeping the primary agent on a more capable model:

use ClaudeAgentSDK\Agents\AgentDefinition;
use ClaudeAgentSDK\Options\ClaudeAgentOptions;

$options = ClaudeAgentOptions::make()
    ->model('sonnet')
    ->tools(['Read', 'Grep', 'Task'])
    ->agent('summarizer', new AgentDefinition(
        description: 'Summarizes code files',
        prompt: 'Summarize the given code file in 2-3 sentences.',
        tools: ['Read'],
        model: 'haiku',  // Cheaper model for simple summarization
    ));

Logging and Monitoring

Logging Every Query

Track costs in your application logs for visibility and auditing:

$result = ClaudeAgent::query($prompt, $options);

logger()->info('Claude agent query completed', [
    'cost_usd'       => $result->costUsd(),
    'turns'          => $result->turns(),
    'duration_ms'    => $result->durationMs(),
    'success'        => $result->isSuccess(),
    'session_id'     => $result->sessionId,
    'cache_read'     => $result->cacheReadTokens(),
    'cache_creation' => $result->cacheCreationTokens(),
]);

Building a Cost Dashboard

Store query costs in your database for aggregation and reporting:

use App\Models\AgentQueryLog;

$result = ClaudeAgent::query($prompt, $options);

AgentQueryLog::create([
    'user_id'     => auth()->id(),
    'prompt'      => $prompt,
    'cost_usd'    => $result->costUsd(),
    'turns'       => $result->turns(),
    'duration_ms' => $result->durationMs(),
    'model_usage' => collect($result->modelUsage())->map(fn ($u) => [
        'input_tokens'  => $u->inputTokens,
        'output_tokens' => $u->outputTokens,
        'cache_hit_rate' => $u->cacheHitRate(),
        'cost_usd'      => $u->costUsd,
    ])->toArray(),
    'success'     => $result->isSuccess(),
]);

Alerting on High Costs

Use a Laravel event listener to alert when individual queries exceed a threshold:

// app/Listeners/MonitorAgentCost.php
namespace App\Listeners;

use App\Events\AgentQueryCompleted;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Notification;
use App\Notifications\HighCostAgentAlert;

class MonitorAgentCost
{
    public function handle(AgentQueryCompleted $event): void
    {
        $cost = $event->result->costUsd();

        if ($cost > 5.00) {
            Log::warning('High-cost agent query detected', [
                'cost_usd' => $cost,
                'user_id'  => $event->userId,
                'turns'    => $event->result->turns(),
            ]);

            Notification::route('slack', config('services.slack.alerts_webhook'))
                ->notify(new HighCostAgentAlert($cost, $event->userId));
        }
    }
}

Tip: Combine budget limits (maxBudgetUsd) as a hard cap with monitoring and alerting as a soft warning system. The budget prevents runaway costs, while alerts help you tune your configurations over time.

Next Steps

  • Configuration -- Set global defaults for budgets, models, and timeouts
  • Options Reference -- Full list of all fluent options including cost-related methods
  • Streaming -- Monitor costs in real time during streaming queries
  • Subagents -- Assign cost-effective models to delegated tasks
  • Session Management -- Resume sessions to benefit from token caching

Clone this wiki locally