Load Balancing

Examples
How It Works
Considerations

Available on all Portkey plans.

Distribute traffic across multiple LLMs to prevent any single provider from becoming a bottleneck.

Examples

{
  "strategy": { "mode": "loadbalance" },
  "targets": [
    { "provider": "@openai-prod", "weight": 0.7 },
    { "provider": "@azure-prod", "weight": 0.3 }
  ]
}

Pattern	Use Case
Between Providers	Route to different providers; model comes from request
Multiple API Keys	Distribute load across rate limits from different accounts
Cost Optimization	Send most traffic to cheaper models, reserve premium for a portion
Gradual Migration	Test new models with small percentage before full rollout

The @provider-slug/model-name format automatically routes to the correct provider. Set up providers in Model Catalog.

Create and use configs in your requests.

How It Works

Define targets & weights — Assign a weight to each target. Weights represent relative share of traffic.
Weight normalization — Portkey normalizes weights to sum to 100%. Example: weights 5, 3, 1 become 55%, 33%, 11%.
Request distribution — Each request routes to a target based on normalized probabilities.

Default weight: 1
Minimum weight: 0 (stops traffic without removing from config)
Unset weights default to 1

Considerations

Ensure LLMs in your list are compatible with your use case
Monitor usage per LLM—weight distribution affects spend
Each LLM has different latency and pricing

Realtime API Canary Testing

⌘I

Introduction

Product

Self-Hosting

Support

Examples

How It Works

Considerations

Introduction

Product

Self-Hosting

Support

​Examples

​How It Works

​Considerations

Examples

How It Works

Considerations