GC-201a · Module 3

Model Selection & Thinking Budget

3 min read

Gemini CLI connects to multiple Gemini models. The default is Gemini 2.5 Pro — Google's most capable coding model with a 1 million token context window. You can switch models via /settings or the --model CLI flag. Model selection is not a set-and-forget decision. Different tasks benefit from different models. Heavy architecture analysis and complex debugging warrant the most capable model. Routine file edits and simple code generation work fine on faster, lighter variants.

The thinking budget controls how much internal reasoning the model performs before responding. Higher budgets produce more thorough analysis — the model considers more alternatives, catches more edge cases, and provides better-reasoned answers. Lower budgets produce faster responses but may miss subtleties. The setting accepts "auto" (model decides), a specific token count, or "off." For most development work, "auto" is correct. Override it when you know the task is simple (reduce budget) or deeply complex (increase budget).

{
  "model": "gemini-2.5-pro",
  "thinkingBudget": "auto"
}

// Override per-session via CLI flags:
// gemini --model gemini-2.5-flash
// (faster model for routine tasks)

Do This

Use "auto" thinking budget for most tasks — the model adapts well
Switch to a faster model for bulk operations like documentation generation
Override thinking budget upward for complex debugging and architecture work

Avoid This

Use the most expensive model for every task regardless of complexity
Set thinking budget to maximum permanently — it wastes tokens on simple tasks
Ignore model selection entirely — default is good but not always optimal