Lesson 7.2: Cost Discipline — Haiku Routing and Daily Caps

An AI integration with no cost controls is a liability waiting to express itself. A batch job that loops over 200 listings, a bug that sends the same request in a loop, a staff member clicking a button repeatedly — any of these can generate an unexpected bill. Cost discipline means the system can’t spend more than you’ve decided it should, and the decisions about what runs on which model are explicit.

The situation

The description generation batch job was the first AI feature that could generate significant cost. 166 listings, each requiring one API call. At Haiku pricing, that batch costs about $0.40. At Sonnet pricing, it costs about $12. Using the wrong model by accident — or having someone add a batch job that calls Sonnet without thinking — was a real risk.

Daily caps eliminate the category of “I didn’t know that was happening” cost overruns. Haiku vs. Sonnet routing ensures the expensive model is reserved for tasks that justify it.

What I did

Daily cap tracking

The cap is checked before every AI call in the central wrapper. The daily spend is tracked in a transient that resets at midnight:

PHP

function [client]_ai_budget_available( string $task ): bool {
    $caps = [
        'description'  => 2.00,  // $2/day max on description generation
        'nl_query'     => 1.00,  // $1/day max on search queries
        'lead_summary' => 0.50,  // $0.50/day max on lead summaries
        'general'      => 1.00,
    ];

    $daily_limit = $caps[ $task ] ?? 0.50;
    $spent_today = [client]_get_daily_spend( $task );

    return $spent_today < $daily_limit;
}

function [client]_get_daily_spend( string $task ): float {
    $cache_key = '[client]_ai_spend_' . $task . '_' . date( 'Y-m-d' );
    $cached    = get_transient( $cache_key );

    if ( $cached !== false ) {
        return (float) $cached;
    }

    // Recompute from log table
    global $wpdb;
    $spend = (float) $wpdb->get_var( $wpdb->prepare(
        "SELECT SUM(cost_usd) FROM [client]_ai_call_log
         WHERE task = %s AND called_at >= CURDATE() AND status = 'ok'",
        $task
    ) );

    set_transient( $cache_key, $spend, 3600 ); // Cache for 1 hour
    return $spend;
}

Graceful degradation on cap hit

When a cap is hit, the system returns a clear signal that the caller can handle without crashing:

PHP

// In the description generation caller:
$result = [client]_ai_request( [
    'task'     => 'description',
    'messages' => [ [ 'role' => 'user', 'content' => $prompt ] ],
] );

if ( ! $result['ok'] && $result['error'] === 'daily_cap' ) {
    // Schedule for tomorrow instead of failing hard
    wp_schedule_single_event(
        strtotime( 'tomorrow midnight' ) + rand( 0, 3600 ),
        '[client]_generate_description',
        [ $unit_id ]
    );
    return; // Don't alert — this is expected behavior
}

if ( ! $result['ok'] ) {
    [client]_alert_ai_failure( $unit_id, $result['error'] );
    return;
}

Cost estimation from token count

The wrapper needs to log cost without making an extra API call. Estimated cost is computed from token count and known pricing:

PHP

function [client]_estimate_cost( string $model, int $tokens ): float {
    // Prices per 1M tokens (input + output blended estimate)
    $rates = [
        'claude-haiku-4-5-20251001' => 0.00080,  // $0.80 / 1M tokens
        'claude-sonnet-4-6'         => 0.01200,  // $12 / 1M tokens
        'claude-opus-4-7'           => 0.07500,  // $75 / 1M tokens
    ];

    $rate = $rates[ $model ] ?? 0.01;
    return round( ( $tokens / 1000000 ) * $rate, 6 );
}

Prompt caching for batch jobs

When running a description batch job, the system prompt is identical for every unit. Anthropic’s prompt caching reduces cost by 90% on cached input tokens:

PHP

function [client]_generate_description_batch( array $unit_ids ): void {
    $system_prompt = [client]_get_description_system_prompt();

    foreach ( $unit_ids as $unit_id ) {
        $specs = [client]_get_unit_specs_for_description( $unit_id );

        $result = [client]_ai_request( [
            'task'         => 'description',
            'system'       => $system_prompt,
            'cache_system' => true,  // Cache the system prompt across batch calls
            'messages'     => [
                [
                    'role'    => 'user',
                    'content' => "Write a listing description for this boat:\n\n" . $specs,
                ],
            ],
        ] );

        if ( $result['ok'] ) {
            update_post_meta( $unit_id, '[client]_ai_description', $result['content'] );
            update_post_meta( $unit_id, '[client]_ai_description_date', current_time( 'mysql' ) );
        }
    }
}

The actual cache-control headers are set inside the wrapper when cache_system is true:

PHP (inside [client]_ai_request)

if ( ! empty( $args['cache_system'] ) && $args['system'] ) {
    $request_body['system'] = [
        [
            'type'          => 'text',
            'text'          => $args['system'],
            'cache_control' => [ 'type' => 'ephemeral' ],
        ]
    ];
}

Why it matters

A daily cap converts an open-ended cost exposure into a predictable one. The system can spend at most $4/day ($2 descriptions + $1 search + $0.50 summaries + buffer). That’s $120/month maximum, regardless of what the batch jobs do or how many leads come in.

Prompt caching on batch jobs is the highest-leverage cost optimization available. A 166-unit description batch without caching costs $0.40. With caching (system prompt cached after the first call), it costs under $0.05. The system prompt is ~800 tokens; caching it at 90% discount on every subsequent call in the batch produces most of the savings.

The Anchor build

The daily cap has been in production since the wrapper shipped. It fired three times: once when a staff member clicked “Generate All Descriptions” twice in the same day (second batch deferred to the next day, correct behavior), once during a code bug that would have sent 400 identical requests (cap stopped it after $2), and once during a period when inbound leads spiked and the lead summary task hit its cap by 3pm.

The log table answered all three incidents: what ran, how much it cost, when the cap hit.

Do this, not that

Set per-task daily caps, not a single global cap. A global cap on a day with a description batch might block the NL search feature that users are actively using. Per-task caps isolate the budget by use case.
Compute estimated cost from token count and a rate table. Don’t make an extra API call to check cost. Log estimated cost at call time; reconcile with the Anthropic dashboard monthly.
Use prompt caching for any batch job with a repeated system prompt. The savings are 80–90% on the system prompt portion. For batch jobs with 50+ calls, this is the highest-leverage optimization available.
Gracefully defer cap-hit requests, don’t fail hard. A batch job that hits the cap should schedule its remaining work for the next day, not throw an error that requires manual restart.
Treat model selection as a policy decision, not a per-call decision. Define the task→model mapping in one place. Callers pass a task name; the wrapper picks the model. This is the only way to maintain cost discipline as the number of AI features grows.

← Previous Lesson 7.1 — Central Wrapper Architecture Next → Lesson 7.3 — AI Catalog: Batch Description Generation

When you’re ready to build

The lessons are yours. When you want it built, we’re here.

Every lesson stays free — no account, no paywall, no email gate, ever. But if you’d rather have this system standing on your business than wire all 48 lessons yourself, leave your email. We’ll send you a direct line to a build — and you’ll be first to hear when we add new tools to the curriculum.

None of this gates a single lesson. The curriculum was free before you got here and it stays that way.

Done learning how it’s built? We’ll build it.

You came here to understand the system, and now you do. If you’d rather have it standing on your business than spend the next three months wiring it yourself, GAP Concierge is the same architecture from these lessons — a white-label AI agent that knows your catalog and captures your leads — set up for you, from $97/mo.

See GAP Concierge →