GC-301c · Module 1

Cost-Quality Tradeoffs & Fallback Chains

3 min read

Cost-quality tradeoffs in model selection follow a predictable curve. Moving from Flash to Pro roughly triples cost per token while improving output quality by 15-30% on complex tasks and 2-5% on simple tasks. The marginal return on complex tasks justifies the cost. The marginal return on simple tasks does not. The optimal strategy is a bimodal distribution: Pro for the 30% of tasks that are genuinely complex, Flash for the remaining 70%. This typically reduces total cost by 40-50% compared to Pro-only usage with negligible quality impact.

Fallback chains add resilience to model selection. A fallback chain specifies a primary model and one or more fallback models that activate when the primary is unavailable (rate limited, service disruption, or context overflow). The pattern is: attempt Pro, fall back to Flash, fall back to a local model or queue the request. In custom commands, you can implement this with shell logic that checks model availability before proceeding. This is especially valuable for CI/CD pipelines and automation where a model being unavailable should not block the entire workflow.

[command]
name = "resilient-review"
description = "Code review with model fallback"
prompt = """
Perform a thorough code review of the changes in this diff:

!{git diff HEAD~1}

Focus on:
1. Correctness — does the logic handle edge cases?
2. Security — any input validation gaps or data exposure risks?
3. Performance — any O(n²) patterns or unnecessary allocations?
4. Maintainability — is the code readable and well-structured?

If you encounter rate limiting or context issues, simplify the review
to correctness and security only.
"""

Do This

  • Track per-task cost to identify where Pro is justified and where Flash suffices
  • Implement fallback chains for automated workflows that must not fail on model availability
  • Default to Flash and upgrade to Pro for specific tasks — the cost savings are significant

Avoid This

  • Optimize purely for cost by using the cheapest model everywhere — rework on complex tasks erases savings
  • Ignore cost entirely because "quality matters most" — the difference on routine tasks is negligible
  • Build automation that depends on a single model with no fallback — service disruptions will block your pipeline