Skip to main content
Portkey provides a robust platform to observe, govern, and manage your locally or privately hosted custom models using vLLM.
Here’s a list of all model architectures supported on vLLM.

Integration Steps

1

Expose your vLLM Server

Expose your vLLM server using a tunneling service like ngrok or make it publicly accessible. Skip this if you’re self-hosting the Gateway.
ngrok http 8000 --host-header="localhost:8080"
2

Add to Model Catalog

  1. Go to Model Catalog → Add Provider
  2. Enable “Local/Privately hosted provider” toggle
  3. Select OpenAI as the provider type (vLLM follows OpenAI API schema)
  4. Enter your vLLM server URL in Custom Host: https://your-vllm-server.ngrok-free.app
  5. Add authentication headers if needed
  6. Name your provider (e.g., my-vllm)

Complete Setup Guide

See all setup options
3

Use in Your Application

from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="@my-vllm"
)

response = portkey.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
Or use custom host directly:
from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="openai",
    custom_host="https://your-vllm-server.ngrok-free.app",
    Authorization="AUTH_KEY"  # If needed
)
Important: vLLM follows the OpenAI API specification, so set the provider as openai when using custom host directly. By default, vLLM runs on http://localhost:8000/v1.

Next Steps

For complete SDK documentation:

SDK Reference

Complete Portkey SDK documentation