Advanced Configuration
Configure reasoning behavior for models that support extended thinking or chain-of-thought reasoning. The parameter allows you to control how reasoning tokens are generated and returned.
The object supports the following parameters:
-
(boolean, optional): Enable reasoning output. When , the model will provide its reasoning process.
-
(number, optional): Maximum number of tokens to allocate for reasoning. This helps control costs and response times. Cannot be used with .
-
(string, optional): Control reasoning effort level. Accepts:
- - Disables reasoning
- - ~10% of max_tokens
- - ~20% of max_tokens
- - ~50% of max_tokens
- - ~80% of max_tokens
- - ~95% of max_tokens
Cannot be used with .
-
(boolean, optional): When , excludes reasoning content from the response but still generates it internally. Useful for reducing response payload size.
Mutually exclusive parameters: You cannot specify both and in the same request. Choose one based on your use case.
When reasoning is enabled, the response includes reasoning content:
Reasoning content is streamed incrementally in the field:
The AI Gateway preserves reasoning details from models across interactions, normalizing the different formats used by OpenAI, Anthropic, and other providers into a consistent structure. This allows you to switch between models without rewriting your conversation management logic.
This is particularly useful during tool calling workflows where the model needs to resume its thought process after receiving tool results.
Controlling reasoning details
When is (or when is not set), responses include a array alongside the standard text field. This structured field captures cryptographic signatures, encrypted content, and other verification data that providers include with their reasoning output.
Each detail object contains:
- : one or more of the below, depending on the provider and model
- : Contains the actual reasoning content as plain text in the field. May include a field (Anthropic models) for cryptographic verification.
- : Contains encrypted or redacted reasoning content in the field. Used by OpenAI models when reasoning is protected, or by Anthropic models when thinking is redacted. Preserves the encrypted payload for verification purposes.
- : Contains a condensed version of the reasoning process in the field. Used by OpenAI models to provide a readable summary alongside encrypted reasoning.
- (optional): Unique identifier for the reasoning block, used for tracking and correlation
- : Provider format identifier - , , or
- (optional): Position in the reasoning sequence (for responses with multiple reasoning blocks)
Example response with reasoning details
For Anthropic models:
For OpenAI models (returns both summary and encrypted):
Streaming reasoning details
When streaming, reasoning details are delivered incrementally in :
For Anthropic models:
For OpenAI models (summary chunks during reasoning, then encrypted at end):
The AI Gateway automatically maps reasoning parameters to each provider's native format:
- OpenAI: Maps to and controls summary detail
- Anthropic: Maps to thinking budget tokens
- Google: Maps to with budget and visibility settings
- Groq: Maps to control reasoning format (hidden/parsed)
- xAI: Maps to reasoning effort levels
- Other providers: Generic mapping applied for compatibility
Automatic extraction: For models that don't natively support reasoning output, the gateway automatically extracts reasoning from tags in the response.
The AI Gateway can route your requests across multiple AI providers for better reliability and performance. You can control which providers are used and in what order through the parameter.
Provider routing: In this example, the gateway will first attempt to use Vertex AI to serve the Claude model. If Vertex AI is unavailable or fails, it will fall back to Anthropic. Other providers are still available but will only be used after the specified providers.
You can specify fallback models that will be tried in order if the primary model fails. There are two ways to do this:
Option 1: Direct field
The simplest way is to use the field directly at the top level of your request:
Option 2: Via provider options
Alternatively, you can specify model fallbacks through the field:
Which approach to use: Both methods achieve the same result. Use the direct field (Option 1) for simplicity, or use (Option 2) if you're already using provider options for other configurations.
Both configurations will:
- Try the primary model () first
- If it fails, try
- If that also fails, try
- Return the result from the first model that succeeds
Provider options work with streaming requests as well:
For more details about available providers and advanced provider configuration, see the Provider Options documentation.
You can pass your own provider credentials on a per-request basis using the option in . This allows you to use your existing provider accounts and access private resources without configuring credentials in the gateway settings.
The option is a record where keys are provider slugs and values are arrays of credential objects. Each provider can have multiple credentials that are tried in order.
Credential structure by provider:
- Anthropic:
- OpenAI:
- Google Vertex AI:
- Amazon Bedrock:
For detailed credential parameters for each provider, see the AI SDK providers documentation.
Multiple credentials example:
Credential precedence: When request-scoped BYOK credentials are provided, any cached BYOK credentials configured in the gateway settings are not considered. Requests may still fall back to system credentials if the provided credentials fail. For persistent BYOK configuration, see the BYOK documentation.
Anthropic Claude models support prompt caching, which can significantly reduce costs and latency for repeated prompts. When you mark content with , the model caches that content and reuses it for subsequent requests with the same prefix.
Cache control types: The cache type stores content for the duration of the session. This is useful for large system prompts, documents, or context that you want to reuse across multiple requests. Prompt caching works with Anthropic models across all supported providers (Anthropic, Vertex AI, and Bedrock). For more details, see Anthropic's prompt caching documentation.
Anthropic Claude models support an extended context window of up to 1 million tokens for processing very large documents or conversations. To enable this feature, pass the header with your request.
When to use extended context: The 1M context window is useful when working with very large documents, extensive codebases, or long conversation histories that exceed the standard 200K token limit. Note that longer contexts may increase latency and costs. For more details, see Anthropic's context window documentation.
Was this helpful?