CC-201a · Module 4

Multi-Model Routing

3 min read

Not every token needs to come from the most expensive model. Multi-model routing is the practice of directing different tasks to different AI models based on complexity, cost, and capability. The pattern is straightforward: use a frontier model like Claude Opus 4.6 for planning, architecture decisions, complex debugging, and strategic reasoning — tasks where depth of thought directly impacts outcome quality. Then route execution-heavy work — code generation, repetitive refactoring, front-end implementation, boilerplate creation — to a faster, cheaper model like Gemini 3.1 Pro. The expensive model produces the blueprint. The cheaper model builds from it. You pay premium rates only where premium reasoning matters.

The implementation works through a two-phase workflow. Phase one: engage the planning model. Describe the full system — architecture, component relationships, data flow, edge cases. The planning model produces a detailed implementation specification: folder structure, interface contracts, module responsibilities, and build order. This specification is the handoff artifact. Phase two: switch to the execution model and feed it the specification. The execution model follows the blueprint, generating code module by module. Because it has an explicit, detailed plan to follow, it hallucinates less and stays on track. Tools like Anti-Gravity IDE from Google enable this routing natively, letting you switch between providers within a single workspace. Claude Code sub-agents support this pattern directly — you can assign different models to different sub-agents, routing research to one provider and implementation to another.

Do This

  • Use the frontier model (Opus) for architecture design, system planning, and complex debugging
  • Use cheaper models (Gemini, Haiku) for code generation, UI implementation, and repetitive tasks
  • Create a detailed implementation spec in the planning phase before switching models
  • Assign different models to different Claude Code sub-agents based on task complexity

Avoid This

  • Run every task through the most expensive model — you are paying premium rates for commodity work
  • Route architecture decisions to the cheap model to save money — bad plans cost more to fix than the planning tokens saved
  • Switch models mid-task without a written handoff artifact — the execution model needs explicit specs, not conversation history
  • Assume all models are interchangeable — each has distinct strengths in reasoning depth, UI generation, and code quality