OpenAI-Compatible API

Last updated December 1, 2025

AI Gateway provides OpenAI-compatible API endpoints, letting you use multiple AI providers through a familiar interface. You can use existing OpenAI client libraries, switch to the AI Gateway with a URL change, and keep your current tools and workflows without code rewrites.

The OpenAI-compatible API implements the same specification as the OpenAI API.

Base URL

The OpenAI-compatible API is available at the following base URL:

Authentication

The OpenAI-compatible API supports the same authentication methods as the main AI Gateway:

API key: Use your AI Gateway API key with the header
OIDC token: Use your Vercel OIDC token with the header

You only need to use one of these forms of authentication. If an API key is specified it will take precedence over any OIDC token, even if the API key is invalid.

Supported endpoints

The AI Gateway supports the following OpenAI-compatible endpoints:

- List available models
- Retrieve a specific model
- Create chat completions with support for streaming, attachments, tool calls, and image generation
- Generate vector embeddings

Integration with existing tools

You can use the AI Gateway's OpenAI-compatible API with existing tools and libraries like the OpenAI client libraries and AI SDK 4. Point your existing client to the AI Gateway's base URL and use your AI Gateway API key or OIDC token for authentication.

OpenAI client libraries

AI SDK 4

For compatibility with AI SDK v4 and AI Gateway, install the @ai-sdk/openai-compatible package.

Verify that you are using AI SDK 4 by using the following package versions: version (e.g., ) and version (e.g., ).

List models

Retrieve a list of all available models that can be used with the AI Gateway.

Endpoint

Example request

Response format

The response follows the OpenAI API format:

Retrieve model

Retrieve details about a specific model.

Endpoint

Parameters

(required): The model ID to retrieve (e.g., )

Example request

Response format

Chat completions

Create chat completions using various AI models available through the AI Gateway.

Endpoint

Basic chat completion

Create a non-streaming chat completion.

Example request

Response format

Streaming chat completion

Create a streaming chat completion that streams tokens as they are generated.

Example request

Streaming response format

Streaming responses are sent as Server-Sent Events (SSE), a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.

The response format follows the OpenAI streaming specification:

Key characteristics:

Each line starts with followed by JSON
Content is delivered incrementally in the field
The stream ends with
Empty lines separate events

SSE Parsing Libraries:

If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:

JavaScript/TypeScript: - Robust SSE parsing with support for partial events
Python: - SSE support for HTTPX, or for requests

For more details about the SSE specification, see the W3C specification.

Image attachments

Send images as part of your chat completion request.

Example request

PDF attachments

Send PDF documents as part of your chat completion request.

Example request

Tool calls

The AI Gateway supports OpenAI-compatible function calling, allowing models to call tools and functions. This follows the same specification as the OpenAI Function Calling API.

Basic tool calls

Controlling tool selection: By default, is set to , allowing the model to decide when to use tools. You can also:

Set to to disable tool calls
Force a specific tool with:

Tool call response format

When the model makes tool calls, the response includes tool call information:

Structured outputs

Generate structured JSON responses that conform to a specific schema, ensuring predictable and reliable data formats for your applications.

JSON Schema format

Use the OpenAI standard response format for the most robust structured output experience. This follows the official OpenAI Structured Outputs specification.

Example request

Response format

The response contains structured JSON that conforms to your specified schema:

JSON Schema parameters

: Must be
: Object containing schema definition
- (required): Name of the response schema
- (optional): Human-readable description of the expected output
- (required): Valid JSON Schema object defining the structure

Legacy JSON format (alternative)

Legacy format: The following format is supported for backward compatibility. For new implementations, use the format above.

Streaming with structured outputs

Both and legacy formats work with streaming responses:

Streaming assembly: When using structured outputs with streaming, you'll need to collect all the content chunks and parse the complete JSON response once the stream is finished.

Reasoning configuration

Configure reasoning behavior for models that support extended thinking or chain-of-thought reasoning. The parameter allows you to control how reasoning tokens are generated and returned.

Example request

Reasoning parameters

The object supports the following parameters:

(boolean, optional): Enable reasoning output. When , the model will provide its reasoning process.
(number, optional): Maximum number of tokens to allocate for reasoning. This helps control costs and response times. Cannot be used with .
(string, optional): Control reasoning effort level. Accepts , , or . Cannot be used with .
(boolean, optional): When , excludes reasoning content from the response but still generates it internally. Useful for reducing response payload size.

Mutually exclusive parameters: You cannot specify both and in the same request. Choose one based on your use case.

Response format with reasoning

When reasoning is enabled, the response includes reasoning content:

Streaming with reasoning

Reasoning content is streamed incrementally in the field:

Preserving reasoning details across providers

The AI Gateway preserves reasoning details from models across interactions, normalizing the different formats used by OpenAI, Anthropic, and other providers into a consistent structure. This allows you to switch between models without rewriting your conversation management logic.

This is particularly useful during tool calling workflows where the model needs to resume its thought process after receiving tool results.

Controlling reasoning details

When is (or when is not set), responses include a array alongside the standard text field. This structured field captures cryptographic signatures, encrypted content, and other verification data that providers include with their reasoning output.

Each detail object contains:

: one or more of the below, depending on the provider and model
- : Contains the actual reasoning content as plain text in the field. May include a field (Anthropic models) for cryptographic verification.
- : Contains encrypted or redacted reasoning content in the field. Used by OpenAI models when reasoning is protected, or by Anthropic models when thinking is redacted. Preserves the encrypted payload for verification purposes.
- : Contains a condensed version of the reasoning process in the field. Used by OpenAI models to provide a readable summary alongside encrypted reasoning.
(optional): Unique identifier for the reasoning block, used for tracking and correlation
: Provider format identifier - , , or
(optional): Position in the reasoning sequence (for responses with multiple reasoning blocks)

Example response with reasoning details

For Anthropic models:

For OpenAI models (returns both summary and encrypted):

Streaming reasoning details

When streaming, reasoning details are delivered incrementally in :

For Anthropic models:

For OpenAI models (summary chunks during reasoning, then encrypted at end):

Provider-specific behavior

The AI Gateway automatically maps reasoning parameters to each provider's native format:

OpenAI: Maps to and controls summary detail
Anthropic: Maps to thinking budget tokens
Google: Maps to with budget and visibility settings
Groq: Maps to control reasoning format (hidden/parsed)
xAI: Maps to reasoning effort levels
Other providers: Generic mapping applied for compatibility

Automatic extraction: For models that don't natively support reasoning output, the gateway automatically extracts reasoning from tags in the response.

Provider options

The AI Gateway can route your requests across multiple AI providers for better reliability and performance. You can control which providers are used and in what order through the parameter.

Example request

Provider routing: In this example, the gateway will first attempt to use Vertex AI to serve the Claude model. If Vertex AI is unavailable or fails, it will fall back to Anthropic. Other providers are still available but will only be used after the specified providers.

Model fallbacks

You can specify fallback models that will be tried in order if the primary model fails. There are two ways to do this:

Option 1: Direct field

The simplest way is to use the field directly at the top level of your request:

Option 2: Via provider options

Alternatively, you can specify model fallbacks through the field:

Which approach to use: Both methods achieve the same result. Use the direct field (Option 1) for simplicity, or use (Option 2) if you're already using provider options for other configurations.

Both configurations will:

Try the primary model () first
If it fails, try
If that also fails, try
Return the result from the first model that succeeds

Streaming with provider options

Provider options work with streaming requests as well:

For more details about available providers and advanced provider configuration, see the Provider Options documentation.

Parameters

The chat completions endpoint supports the following parameters:

Required parameters

(string): The model to use for the completion (e.g., )
(array): Array of message objects with and fields

Optional parameters

(boolean): Whether to stream the response. Defaults to
(number): Controls randomness in the output. Range: 0-2
(integer): Maximum number of tokens to generate
(number): Nucleus sampling parameter. Range: 0-1
(number): Penalty for frequent tokens. Range: -2 to 2
(number): Penalty for present tokens. Range: -2 to 2
(string or array): Stop sequences for the generation
(array): Array of tool definitions for function calling
(string or object): Controls which tools are called (, , or specific function)
(object): Provider routing and configuration options
(object): Controls the format of the model's response
- For OpenAI standard format:
- For legacy format:
- For plain text:
- See Structured outputs for detailed examples

Message format

Messages support different content types:

Text messages

Multimodal messages

File messages

Image generation

Generate images using AI models that support multimodal output through the OpenAI-compatible API. This feature allows you to create images alongside text responses using models like Google's Gemini 2.5 Flash Image.

Endpoint

Parameters

To enable image generation, include the parameter in your request:

(array): Array of strings specifying the desired output modalities. Use for both text and image generation, or for image-only generation.

Example requests

Response format

When image generation is enabled, the response separates text content from generated images:

Response structure details

: Contains the text description as a string
: Array of generated images, each with:
- : Always
- : Base64-encoded data URI of the generated image

Streaming responses

For streaming requests, images are delivered in delta chunks:

Handling streaming image responses

When processing streaming responses, check for both text content and images in each delta:

Image generation support: Currently, image generation is supported by Google's Gemini 2.5 Flash Image model. The generated images are returned as base64-encoded data URIs in the response. For more detailed information about image generation capabilities, see the Image Generation documentation.

Embeddings

Generate vector embeddings from input text for semantic search, similarity matching, and retrieval-augmented generation (RAG).

Endpoint

Example request

Response format

Dimensions parameter

You can set the root-level field (from the OpenAI Embeddings API spec) and the gateway will auto-map it to each provider's expected field; still passes through as-is and isn't required for to work.

Error handling

The API returns standard HTTP status codes and error responses:

Common error codes

: Invalid request parameters
: Invalid or missing authentication
: Insufficient permissions
: Model or endpoint not found
: Rate limit exceeded
: Server error

Was this helpful?