Here’s a list of all model architectures supported on vLLM.
Integration Steps
1
Expose your vLLM Server
Expose your vLLM server using a tunneling service like ngrok or make it publicly accessible. Skip this if you’re self-hosting the Gateway.
2
Add to Model Catalog
- Go to Model Catalog → Add Provider
- Enable “Local/Privately hosted provider” toggle
- Select OpenAI as the provider type (vLLM follows OpenAI API schema)
- Enter your vLLM server URL in Custom Host:
https://your-vllm-server.ngrok-free.app - Add authentication headers if needed
- Name your provider (e.g.,
my-vllm)
Complete Setup Guide
See all setup options
3
Use in Your Application
Important: vLLM follows the OpenAI API specification, so set the provider as
openai when using custom host directly. By default, vLLM runs on http://localhost:8000/v1.Next Steps
Gateway Configs
Add retries, timeouts, and fallbacks
Observability
Monitor your vLLM deployments
Custom Host Guide
Learn more about custom host setup
BYOLLM Guide
Complete guide for private LLMs
SDK Reference
Complete Portkey SDK documentation

