OpenAI - Portkey Docs

Portkey provides a robust and secure gateway to integrate OpenAI’s APIs into your applications, including GPT-4o, o1, DALL·E, Whisper, and more. With Portkey, take advantage of features like fast AI gateway access, observability, prompt management, and more, while securely managing API keys through Model Catalog.

All Models

Full support for GPT-4o, o1, GPT-4, GPT-3.5, and all OpenAI models

All Endpoints

Chat, completions, embeddings, audio, images, and more fully supported

Multi-SDK Support

Use with OpenAI SDK, Portkey SDK, or popular frameworks like LangChain

Quick Start

Get OpenAI working in 3 steps:

from portkey_ai import Portkey

# 1. Install: pip install portkey-ai
# 2. Add @openai provider in model catalog
# 3. Use it:

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.chat.completions.create(
    model="@openai/gpt-4o",
    messages=[{"role": "user", "content": "Say this is a test"}]
)

print(response.choices[0].message.content)

Tip: You can also set provider="@openai" in Portkey() and use just model="gpt-4o" in the request.Legacy support: The virtual_key parameter still works for backwards compatibility.

Add Provider in Model Catalog

Go to Model Catalog → Add Provider
Select OpenAI
Choose existing credentials or create new by entering your OpenAI API key
(Optional) Add your OpenAI Organization ID and Project ID for better cost tracking
Name your provider (e.g., openai-prod)

Complete Setup Guide →

See all setup options, code examples, and detailed instructions

Basic Usage

Streaming

Stream responses for real-time output in your applications:

response = portkey.chat.completions.create(
    model="@openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Advanced Features

Responses API

OpenAI’s Responses API combines the best of both Chat Completions and Assistants APIs. Portkey fully supports this API with both the Portkey SDK and OpenAI SDK.

from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.responses.create(
    model="@openai/gpt-4.1",
    input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

The Responses API provides a more flexible foundation for building agentic applications with built-in tools that execute automatically.

Remote MCP support on Responses API

Portkey supports Remote MCP support by OpenAI on its Responses API. Learn More

Streaming with Responses API

response = portkey.responses.create(
    model="@openai/gpt-4.1",
    instructions="You are a helpful assistant.",
    input="Hello!",
    stream=True
)

for event in response:
    print(event)

Realtime API

Portkey supports OpenAI’s Realtime API with a seamless integration. This allows you to use Portkey’s logging, cost tracking, and guardrail features while using the Realtime API.

Realtime API

Using Vision Models

Portkey’s multimodal Gateway fully supports OpenAI vision models as well. See this guide for more info:

Vision

Vision with the Responses API

The Responses API also processes images alongside text:

response = portkey.responses.create(
    model="@openai/gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "What is in this image?" },
                {
                    "type": "input_image",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            ]
        }
    ]
)

print(response)

Function Calling

Function calls within your OpenAI or Portkey SDK operations remain standard. These logs will appear in Portkey, highlighting the utilized functions and their outputs. Additionally, you can define functions within your prompts and invoke the portkey.prompts.completions.create method as above.

Function Calling with the Responses API

The Responses API also supports function calling with the same powerful capabilities:

tools = [
    {
        "type": "function",
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
]

response = portkey.responses.create(
    model="@openai/gpt-4.1",
    tools=tools,
    input="What is the weather like in Boston today?",
    tool_choice="auto"
)

print(response)

Fine-Tuning

Please refer to our fine-tuning guides to take advantage of Portkey’s advanced continuous fine-tuning capabilities.

Image Generation

Portkey supports multiple modalities for OpenAI. Make image generation requests through Portkey’s AI Gateway the same way as making completion calls.

// Define the OpenAI client as shown above

const image = await openai.images.generate({
  model:"dall-e-3",
  prompt:"Lucy in the sky with diamonds",
  size:"1024x1024"
})

Portkey’s fast AI gateway captures the information about the request on your Portkey Dashboard. On your logs screen, you’d be able to see this request with the request and response.

Log view for an image generation request on OpenAI

More information on image generation is available in the API Reference.

Video Generation with Sora

Portkey supports OpenAI’s Sora video generation models through the AI Gateway. Generate videos using the Portkey Python SDK:

from portkey_ai import Portkey

client = Portkey(
    api_key="PORTKEY_API_KEY"
)

video = client.videos.create(
    model="@openai/sora-2",
    prompt="A video of a cool cat on a motorcycle in the night",
)

print("Video generation started:", video)

Pricing for video generation requests will be visible on your Portkey dashboard, allowing you to track costs alongside your other API usage.

Audio - Transcription, Translation, and Text-to-Speech

Portkey’s multimodal Gateway also supports the audio methods on OpenAI API. Check out the below guides for more info: Check out the below guides for more info:

Text-to-Speech

Speech-to-Text

Integrated Tools with Responses API

Web Search Tool

Web search delivers accurate and clearly-cited answers from the web, using the same tool as search in ChatGPT:

response = portkey.responses.create(
    model="@openai/gpt-4.1",
    tools=[{
        "type": "web_search_preview",
        "search_context_size": "medium", # Options: "high", "medium" (default), or "low"
        "user_location": {  # Optional - for localized results
            "type": "approximate",
            "country": "US",
            "city": "San Francisco",
            "region": "California"
        }
    }],
    input="What was a positive news story from today?"
)

print(response)

Options for search_context_size:

high: Most comprehensive context, higher cost, slower response
medium: Balanced context, cost, and latency (default)
low: Minimal context, lowest cost, fastest response

Responses include citations for URLs found in search results, with clickable references.

File Search Tool

File search enables quick retrieval from your knowledge base across multiple file types:

response = portkey.responses.create(
    model="@openai/gpt-4.1",
    tools=[{
        "type": "file_search",
        "vector_store_ids": ["vs_1234567890"],
        "max_num_results": 20,
        "filters": {  # Optional - filter by metadata
            "type": "eq",
            "key": "document_type",
            "value": "report"
        }
    }],
    input="What are the attributes of an ancient brown dragon?"
)

print(response)

This tool requires you to first create a vector store and upload files to it. Supports various file formats including PDFs, DOCXs, TXT, and more. Results include file citations in the response.

Enhanced Reasoning

Control the depth of model reasoning for more comprehensive analysis:

response = portkey.responses.create(
    model="@openai/o3-mini",
    input="How much wood would a woodchuck chuck?",
    reasoning={
        "effort": "high"  # Options: "high", "medium", or "low"
    }
)

print(response)

Computer Use Assistant

Portkey also supports the Computer Use Assistant (CUA) tool, which helps agents control computers or virtual machines through screenshots and actions. This feature is available for select developers as a research preview on premium tiers.

Learn More about Computer use tool here

Managing OpenAI Projects & Organizations in Portkey

When integrating OpenAI with Portkey, specify your OpenAI organization and project IDs along with your API key. This is particularly useful if you belong to multiple organizations or are accessing projects through a legacy user API key. Specifying the organization and project IDs helps you maintain better control over your access rules, usage, and costs. Add your Org & Project details using:

Adding in Model Catalog (Recommended)
Defining a Gateway Config
Passing Details in a Request

Let’s explore each method in more detail.

Using Model Catalog

When adding OpenAI from the Model Catalog, Portkey automatically displays optional fields for the organization ID and project ID alongside the API key field. Get your OpenAI API key from here, then add it to Portkey along with your org/project details.

Model Catalog

Portkey takes budget management a step further than OpenAI. While OpenAI allows setting budget limits per project, Portkey enables you to set budget limits for each provider you create. For more information on budget limits, refer to this documentation:

Budget Limits

Using the Gateway Config

You can also specify the organization and project details in the gateway config, either at the root level or within a specific target.

{
	"provider": "@openai",
	"openai_organization": "org-xxxxxx",
	"openai_project": "proj_xxxxxxxx"
}

While Making a Request

You can also pass your organization and project details directly when making a request using curl, the OpenAI SDK, or the Portkey SDK.

from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL

client = OpenAI(
    api_key="PORTKEY_API_KEY",
    organization="org-xxxxxxxxxx",
    project="proj_xxxxxxxxx",
    base_url=PORTKEY_GATEWAY_URL
)

chat_complete = client.chat.completions.create(
    model="@openai/gpt-4o",
    messages=[{"role": "user", "content": "Say this is a test"}],
)

print(chat_complete.choices[0].message.content)

Limitations

Portkey does not currently support:

Streaming for audio endpoints

Vision Model Limitations

Medical images: Vision models are not suitable for interpreting specialized medical images like CT scans and shouldn’t be used for medical advice.
Non-English: The models may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
Small text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The models may misinterpret rotated / upside-down text or images.
Visual elements: The models may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial reasoning: The models struggle with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The models may generate incorrect descriptions or captions in certain scenarios.
Image shape: The models struggle with panoramic and fisheye images.
Metadata and resizing: The models do not process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
CAPTCHAS: For safety reasons, CAPTCHA submissions are blocked by OpenAI.

Image Generation Limitations

DALL·E 3 Restrictions:
- Only supports image generation (no editing or variations)
- Limited to one image per request
- Fixed size options: 1024x1024, 1024x1792, or 1792x1024 pixels
- Automatic prompt enhancement cannot be disabled
Image Requirements:
- Must be PNG format
- Maximum file size: 4MB
- Must be square dimensions
- For edits/variations: input images must meet same requirements
Content Restrictions:
- All prompts and images are filtered based on OpenAI’s content policy
- Violating content will return an error
- Edited areas must be described in full context, not just the edited portion
Technical Limitations:
- Image URLs expire after 1 hour
- Image editing (inpainting) and variations only available in DALL·E 2
- Response format limited to URL or Base64 data

Speech-to-Text Limitations

File Restrictions:
- Maximum file size: 25 MB
- Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
- No streaming support
Language Limitations:
- Translation output available only in English
- Variable accuracy for non-listed languages
- Limited control over generated audio compared to other language models
Technical Constraints:
- Prompt limited to first 244 tokens
- Restricted processing for longer audio files
- No real-time transcription support

Text-to-Speech Limitations

Voice Restrictions:
- Limited to 6 pre-built voices (alloy, echo, fable, onyx, nova, shimmer)
- Voices optimized primarily for English
- No custom voice creation support
- No direct control over emotional range or tone
Audio Quality Trade-offs:
- tts-1: Lower latency but potentially more static
- tts-1-hd: Higher quality but increased latency
- Quality differences may vary by listening device
Usage Requirements:
- Must disclose AI-generated nature to end users
- Cannot create custom voice clones
- Performance varies for non-English languages

Frequently Asked Questions

General FAQs

How to get the OpenAI API key?

You can sign up to OpenAI here and grab your scoped API key here.

Is it free to use the OpenAI API?

The OpenAI API can be used by signing up to the OpenAI platform. You can find the pricing info here

I am getting rate limited on OpenAI API

You can find your current rate limits imposed by OpenAI here. For more tips, check out this guide.

Vision FAQs

Can I fine-tune OpenAI models on vision requests?

Vision fine-tuning is available for some OpenAI models.

Can I use gpt-4o or other chat models to generate images?

No, you can use dall-e-3 to generate images and gpt-4o and other chat models to understand images.

What type of files can I upload for vision requests?

OpenAI currently supports PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif).

For vision requests, is there a limit to the size of the image I can upload?

OpenAI currently restricts image uploads to 20MB per image.

How do rate limits work for vision requests?

OpenAI processes images at the token level, so each image that’s processed counts towards your tokens per minute (TPM) limit. See how OpenAI calculates costs here for details on the formula used to determine token count per image.

Can models understand image metadata?

No, the models do not receive image metadata.

Embedding FAQs

How can I tell how many tokens a string has before I embed it?

This cookbook by OpenAI illustrates how to leverage their Tiktoken library to count tokens for various embedding requests.

How can I retrieve K nearest embedding vectors quickly?

Using a specialized vector database helps here. Check out this cookbook by OpenAI for a deep dive.

Do V3 embedding models know about recent events?

The cutoff date for V3 embedding models (text-embedding-3-large & text-embedding-3-small) is September 2021 - so they do not know about the most recent events.

Prompt Caching FAQs

How is data privacy maintained for caches?

OpenAI Prompt caches are not shared between organizations. Only members of the same organization can access caches of identical prompts.

Does Prompt Caching affect output token generation or the final response of the API?

Prompt Caching does not influence the generation of output tokens or the final response provided by the API. Regardless of whether caching is used, the output generated will be identical. This is because only the prompt itself is cached, while the actual response is computed anew each time based on the cached prompt.

Is there a way to manually clear the cache?

Manual cache clearing is not currently available. Prompts that have not been encountered recently are automatically cleared from the cache. Typical cache evictions occur after 5-10 minutes of inactivity, though sometimes lasting up to a maximum of one hour during off-peak periods.

Will I be expected to pay extra for writing to Prompt Caching?

No. Caching happens automatically, with no explicit action needed or extra cost paid to use the caching feature.

Do cached prompts contribute to TPM rate limits?

Yes, as caching does not affect rate limits.

Is discounting for Prompt Caching available on Scale Tier and the Batch API?

Discounting for Prompt Caching is not available on the Batch API but is available on Scale Tier. With Scale Tier, any tokens that are spilled over to the shared API will also be eligible for caching.

Does Prompt Caching work on Zero Data Retention requests?

Yes, Prompt Caching is compliant with existing Zero Data Retention policies.

Image Generation FAQs

What's the difference between DALL·E 2 and DALL·E 3?

DALL·E 3 offers higher quality images and enhanced capabilities, but only supports image generation. DALL·E 2 supports all three capabilities: generation, editing, and variations.

How long do the generated image URLs last?

Generated image URLs expire after one hour. Download or process the images before expiration.

What are the size requirements for uploading images?

Images must be square PNG files under 4MB. For editing features, both the image and mask must have identical dimensions.

Can I disable DALL·E 3's automatic prompt enhancement?

While you can’t completely disable it, you can add “I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:” to your prompt.

How many images can I generate per request?

DALL·E 3 supports 1 image per request (use parallel requests for more), while DALL·E 2 supports up to 10 images per request.

What image formats are supported?

The API requires PNG format for all image uploads and manipulations. Generated images can be returned as either a URL or Base64 data.

How does image editing (inpainting) work?

Available only in DALL·E 2, inpainting requires both an original image and a mask. The transparent areas of the mask indicate where the image should be edited, and your prompt should describe the complete new image, not just the edited area.

Speech-to-Text FAQs

What audio file formats are supported?

The API supports mp3, mp4, mpeg, mpga, m4a, wav, and webm formats, with a maximum file size of 25 MB.

Can I translate audio to languages other than English?

No, currently the translation API only supports output in English, regardless of the input language.

How do I handle audio files longer than 25 MB?

You’ll need to either compress the audio file or split it into smaller chunks. Tools like PyDub can help split audio files while avoiding mid-sentence breaks.

Does the API support all languages equally well?

While the model was trained on 98 languages, only languages with less than 50% word error rate are officially supported. Other languages may work but with lower accuracy.

Can I get timestamps in the transcription?

Yes, using the timestamp_granularities parameter, you can get timestamps at the segment level, word level, or both.

How can I improve transcription accuracy for specific terms?

You can use the prompt parameter to provide context or correct spellings of specific terms, or use post-processing with GPT-4 for more extensive corrections.

What's the difference between transcription and translation?

Transcription provides output in the original language, while translation always converts the audio to English text.

Text-to-Speech FAQs

What are the differences between TTS-1 and TTS-1-HD models?

TTS-1 offers lower latency for real-time applications but may include more static. TTS-1-HD provides higher quality audio but with increased generation time.

Which audio formats are supported?

The API supports multiple formats: MP3 (default), Opus (for streaming), AAC (for mobile), FLAC (lossless), WAV (uncompressed), and PCM (raw 24kHz samples).

Can I create or clone custom voices?

No, the API only supports the six built-in voices (alloy, echo, fable, onyx, nova, and shimmer). Custom voice creation is not available.

How well does it support non-English languages?

While the voices are optimized for English, the API supports multiple languages with varying effectiveness. Performance quality may vary by language.

Can I control the emotional tone or style of the speech?

There’s no direct mechanism to control emotional output. While capitalization and grammar might influence the output, results are inconsistent.

Is real-time streaming supported?

Yes, the API supports real-time audio streaming using chunk transfer encoding, allowing audio playback before complete file generation.

Do I need to disclose that the audio is AI-generated?

Yes, OpenAI’s usage policies require clear disclosure to end users that they are hearing AI-generated voices, not human ones.

Next Steps

Add Metadata

Add metadata to your OpenAI requests

Gateway Configs

Add gateway configs to your OpenAI requests

Tracing

Trace your OpenAI requests

Fallbacks

Setup fallback from OpenAI to other providers

For complete SDK documentation:

SDK Reference

Complete Portkey SDK documentation

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

All Models

All Endpoints

Multi-SDK Support

​Quick Start

​Add Provider in Model Catalog

Complete Setup Guide →

​Basic Usage

​Streaming

​Advanced Features

​Responses API

Remote MCP support on Responses API

​Streaming with Responses API

​Realtime API

Realtime API

​Using Vision Models

​Vision with the Responses API

​Function Calling

​Function Calling with the Responses API

​Fine-Tuning

​Image Generation

​Video Generation with Sora

​Audio - Transcription, Translation, and Text-to-Speech

​Integrated Tools with Responses API

​Web Search Tool

​File Search Tool

​Enhanced Reasoning

​Computer Use Assistant

​Managing OpenAI Projects & Organizations in Portkey

​Using Model Catalog

​Using the Gateway Config

​While Making a Request

​Limitations

​Vision Model Limitations

​Image Generation Limitations

​Speech-to-Text Limitations

​Text-to-Speech Limitations

​Frequently Asked Questions

​General FAQs

​Vision FAQs

​Embedding FAQs

​Prompt Caching FAQs

​Image Generation FAQs

​Speech-to-Text FAQs

​Text-to-Speech FAQs

​Next Steps

Add Metadata

Gateway Configs

Tracing

Fallbacks

SDK Reference

Quick Start

Add Provider in Model Catalog

Basic Usage

Streaming

Advanced Features

Responses API

Streaming with Responses API

Realtime API

Using Vision Models

Vision with the Responses API

Function Calling

Function Calling with the Responses API

Fine-Tuning

Image Generation

Video Generation with Sora

Audio - Transcription, Translation, and Text-to-Speech

Integrated Tools with Responses API

Web Search Tool

File Search Tool

Enhanced Reasoning

Computer Use Assistant

Managing OpenAI Projects & Organizations in Portkey

Using Model Catalog

Using the Gateway Config

While Making a Request

Limitations

Vision Model Limitations

Image Generation Limitations

Speech-to-Text Limitations

Text-to-Speech Limitations

Frequently Asked Questions

General FAQs

Vision FAQs

Embedding FAQs

Prompt Caching FAQs

Image Generation FAQs

Speech-to-Text FAQs

Text-to-Speech FAQs

Next Steps