GC-301c · Module 2

Configuring Thinking Depth

3 min read

The thinking budget controls how many tokens the model spends on internal reasoning before producing a response. "Auto" lets the model decide based on task complexity — simple questions get minimal thinking, complex problems get extended reasoning. Specific token counts (e.g., 1024, 4096, 16384) set hard limits. "Off" disables thinking entirely, producing faster but shallower responses. The thinking budget is the single most impactful runtime parameter for output quality.

Thinking tokens are invisible — they do not appear in the response but they consume your token quota and add latency. A 16384-token thinking budget means the model spends 16K tokens reasoning internally before writing the first visible character. On complex debugging tasks, this produces dramatically better analysis. On simple formatting tasks, it produces identical output with 2x latency and cost. The "auto" setting handles this tradeoff well for 80% of tasks, which is why it is the recommended default.

{
  "thinkingBudget": "auto",
  "profiles": {
    "quick": {
      "model": "gemini-2.5-flash",
      "thinkingBudget": "off"
    },
    "standard": {
      "model": "gemini-2.5-pro",
      "thinkingBudget": "auto"
    },
    "deep-analysis": {
      "model": "gemini-2.5-pro",
      "thinkingBudget": 16384
    }
  }
}

Do This

Default to "auto" and let the model calibrate thinking depth to task complexity
Override to high budgets for debugging, architecture, and security analysis
Override to "off" for bulk operations like documentation and formatting

Avoid This

Set thinking budget to maximum permanently — you pay for reasoning you do not need
Disable thinking entirely to save cost — quality on complex tasks degrades severely
Ignore thinking budget as a tuning parameter — it is the highest-leverage runtime control