GC-201a · Module 3
Model Selection & Thinking Budget
3 min read
Gemini CLI connects to multiple Gemini models. The default is Gemini 2.5 Pro — Google's most capable coding model with a 1 million token context window. You can switch models via /settings or the --model CLI flag. Model selection is not a set-and-forget decision. Different tasks benefit from different models. Heavy architecture analysis and complex debugging warrant the most capable model. Routine file edits and simple code generation work fine on faster, lighter variants.
The thinking budget controls how much internal reasoning the model performs before responding. Higher budgets produce more thorough analysis — the model considers more alternatives, catches more edge cases, and provides better-reasoned answers. Lower budgets produce faster responses but may miss subtleties. The setting accepts "auto" (model decides), a specific token count, or "off." For most development work, "auto" is correct. Override it when you know the task is simple (reduce budget) or deeply complex (increase budget).
{
"model": "gemini-2.5-pro",
"thinkingBudget": "auto"
}
// Override per-session via CLI flags:
// gemini --model gemini-2.5-flash
// (faster model for routine tasks)
Do This
- Use "auto" thinking budget for most tasks — the model adapts well
- Switch to a faster model for bulk operations like documentation generation
- Override thinking budget upward for complex debugging and architecture work
Avoid This
- Use the most expensive model for every task regardless of complexity
- Set thinking budget to maximum permanently — it wastes tokens on simple tasks
- Ignore model selection entirely — default is good but not always optimal