VercelVercel
Menu

Google and Vertex Reasoning

Last updated March 7, 2026

The Gemini 2.5, 3, and 3.1 series models use an internal "thinking process" that improves their reasoning and multi-step planning abilities, making them effective for complex tasks like coding, advanced mathematics, and data analysis.

These models are available through both Google AI and Google Vertex AI providers. The thinking configuration is the same — the only difference is using providerOptions.vertex instead of providerOptions.google. To route through Vertex, configure Vertex AI credentials and set the provider order to prefer vertex.

  • Gemini 3 and 3.1: Use thinkingLevel to control the depth of reasoning
  • Gemini 2.5: Use thinkingBudget to set a token limit for thinking
  • google/gemini-3.1-pro-preview
  • google/gemini-3.1-flash-lite-preview
  • google/gemini-3-flash
  • google/gemini-2.5-pro
  • google/gemini-2.5-flash
  • google/gemini-2.5-flash-lite

The thinkingLevel parameter controls reasoning behavior. Not all levels are available on every model:

Thinking levelGemini 3.1 ProGemini 3.1 Flash-LiteGemini 3 FlashDescription
minimalNot supportedDefaultSupportedMatches "no thinking" for most queries. The model may still think minimally for complex coding tasks. Best for latency-sensitive workloads.
lowSupportedSupportedSupportedMinimizes latency and cost. Best for simple instruction following and chat.
mediumSupportedSupportedSupportedBalanced thinking for most tasks.
highDefaultSupportedDefaultMaximizes reasoning depth. The model may take significantly longer to reach a first output token.

The thinkingBudget parameter sets a specific number of thinking tokens. Set thinkingBudget to 0 to disable thinking, or -1 to enable dynamic thinking (the model adjusts based on request complexity).

Use thinkingLevel with Gemini 3 and 3.1 models. While thinkingBudget is accepted for backwards compatibility, using it with Gemini 3 models may result in unexpected performance.

ModelDefaultRangeDisable thinkingDynamic thinking
Gemini 2.5 ProDynamic128–32,768Not supportedthinkingBudget: -1 (default)
Gemini 2.5 FlashDynamic0–24,576thinkingBudget: 0thinkingBudget: -1 (default)
Gemini 2.5 Flash LiteOff512–24,576thinkingBudget: 0thinkingBudget: -1

Use the thinkingLevel parameter to control the depth of reasoning:

gemini-3-thinking.ts
import { generateText } from 'ai';
 
const result = await generateText({
  model: 'google/gemini-3.1-pro-preview',
  prompt: 'What is the sum of the first 10 prime numbers?',
  providerOptions: {
    vertex: { // use vertex or google
      thinkingConfig: {
        thinkingLevel: 'high',
        includeThoughts: true,
      },
    },
  },
});
 
console.log(result.text);
console.log(result.reasoningText);

Use the thinkingBudget parameter to control the number of thinking tokens:

gemini-25-thinking.ts
import { generateText } from 'ai';
 
const result = await generateText({
  model: 'google/gemini-2.5-flash',
  prompt: 'What is the sum of the first 10 prime numbers?',
  providerOptions: {
    vertex: { // use vertex or google
      thinkingConfig: {
        thinkingBudget: 8192,
        includeThoughts: true,
      },
    },
  },
});
 
console.log(result.text);
console.log(result.reasoningText);

When streaming, thinking tokens are emitted as reasoning-delta stream parts:

gemini-stream-thinking.ts
import { streamText } from 'ai';
 
const result = streamText({
  model: 'google/gemini-2.5-flash',
  prompt: 'Explain quantum computing in simple terms.',
  providerOptions: {
    vertex: { // use vertex or google
      thinkingConfig: {
        thinkingBudget: 2048,
        includeThoughts: true,
      },
    },
  },
});
 
for await (const part of result.fullStream) {
  if (part.type === 'reasoning-delta') {
    process.stdout.write(part.text);
  } else if (part.type === 'text-delta') {
    process.stdout.write(part.text);
  }
}
ParameterTypeDescription
thinkingLevelstringDepth of reasoning: 'minimal', 'low', 'medium', 'high'
includeThoughtsbooleanInclude thinking content in the response
ParameterTypeDescription
thinkingBudgetnumberMaximum number of tokens to allocate for thinking
includeThoughtsbooleanInclude thinking content in the response

For more details, see the Google AI thinking docs and Vertex AI thinking docs.


Was this helpful?

supported.