GC-301c · Module 2
Configuring Thinking Depth
3 min read
The thinking budget controls how many tokens the model spends on internal reasoning before producing a response. "Auto" lets the model decide based on task complexity — simple questions get minimal thinking, complex problems get extended reasoning. Specific token counts (e.g., 1024, 4096, 16384) set hard limits. "Off" disables thinking entirely, producing faster but shallower responses. The thinking budget is the single most impactful runtime parameter for output quality.
Thinking tokens are invisible — they do not appear in the response but they consume your token quota and add latency. A 16384-token thinking budget means the model spends 16K tokens reasoning internally before writing the first visible character. On complex debugging tasks, this produces dramatically better analysis. On simple formatting tasks, it produces identical output with 2x latency and cost. The "auto" setting handles this tradeoff well for 80% of tasks, which is why it is the recommended default.
{
"thinkingBudget": "auto",
"profiles": {
"quick": {
"model": "gemini-2.5-flash",
"thinkingBudget": "off"
},
"standard": {
"model": "gemini-2.5-pro",
"thinkingBudget": "auto"
},
"deep-analysis": {
"model": "gemini-2.5-pro",
"thinkingBudget": 16384
}
}
}
Do This
- Default to "auto" and let the model calibrate thinking depth to task complexity
- Override to high budgets for debugging, architecture, and security analysis
- Override to "off" for bulk operations like documentation and formatting
Avoid This
- Set thinking budget to maximum permanently — you pay for reasoning you do not need
- Disable thinking entirely to save cost — quality on complex tasks degrades severely
- Ignore thinking budget as a tuning parameter — it is the highest-leverage runtime control