FutureAGI

FutureAGI provides automated evaluation, tracing, and quality assessment for LLM applications. Combined with Portkey, get comprehensive observability covering both operational performance and response quality.

Portkey handles “what happened, how fast, and how much?” while FutureAGI answers “how good was the response?”

Quick Start

pip install portkey-ai fi-instrumentation traceai-portkey

import asyncio
from portkey_ai import Portkey
from traceai_portkey import PortkeyInstrumentor
from fi_instrumentation import register
from fi_instrumentation.fi_types import (
    ProjectType, EvalTag, EvalTagType,
    EvalSpanKind, EvalName, ModelChoices
)

# Setup FutureAGI tracing
tracer_provider = register(
    project_name="Model-Benchmarking",
    project_type=ProjectType.EXPERIMENT,
    project_version_name="gpt-4.1-test",
    eval_tags=[
        EvalTag(
            type=EvalTagType.OBSERVATION_SPAN,
            value=EvalSpanKind.LLM,
            eval_name=EvalName.IS_CONCISE,
            custom_eval_name="Is_Concise",
            mapping={"input": "llm.output_messages.0.message.content"},
            model=ModelChoices.TURING_LARGE
        ),
    ]
)
PortkeyInstrumentor().instrument(tracer_provider=tracer_provider)

# Use Portkey gateway with provider slug
client = Portkey(api_key="YOUR_PORTKEY_API_KEY")

response = await client.chat.completions.create(
    model="@openai-prod/gpt-4.1",  # Provider slug from Model Catalog
    messages=[{"role": "user", "content": "Explain quantum computing in 3 sentences."}],
    max_tokens=1024
)

print(response.choices[0].message.content)

Setup

Add providers in Model Catalog
Get Portkey API key
Get FutureAGI API key
Use model="@provider-slug/model-name" in requests

Multi-Model Benchmarking

Compare models across providers:

models = [
    {"name": "GPT-4.1", "model": "@openai-prod/gpt-4.1"},
    {"name": "Claude Sonnet", "model": "@anthropic-prod/claude-sonnet-4"},
    {"name": "Llama-3-70b", "model": "@groq-prod/llama3-70b-8192"},
]

scenarios = {
    "reasoning": "A farmer has 17 sheep. All but 9 die. How many are left?",
    "creative": "Write a 6-word story about a robot who discovers music.",
    "code": "Write a Python function to find the nth Fibonacci number.",
}

client = Portkey(api_key="YOUR_PORTKEY_API_KEY")

for test_name, prompt in scenarios.items():
    for model in models:
        tracer_provider = register(
            project_name="Model-Benchmarking",
            project_type=ProjectType.EXPERIMENT,
            project_version_name=model["name"]
        )
        PortkeyInstrumentor().instrument(tracer_provider=tracer_provider)
        
        response = await client.chat.completions.create(
            model=model["model"],
            messages=[{"role": "user", "content": prompt}],
            max_tokens=1024
        )
        print(f"{model['name']}: {response.choices[0].message.content[:100]}...")
        
        PortkeyInstrumentor().uninstrument()

Evaluation Tags

Configure automatic quality assessment:

eval_tags=[
    # Response conciseness
    EvalTag(
        type=EvalTagType.OBSERVATION_SPAN,
        value=EvalSpanKind.LLM,
        eval_name=EvalName.IS_CONCISE,
        mapping={"input": "llm.output_messages.0.message.content"},
        model=ModelChoices.TURING_LARGE
    ),
    # Context adherence
    EvalTag(
        type=EvalTagType.OBSERVATION_SPAN,
        value=EvalSpanKind.LLM,
        eval_name=EvalName.CONTEXT_ADHERENCE,
        mapping={
            "context": "llm.input_messages.0.message.content",
            "output": "llm.output_messages.0.message.content",
        },
        model=ModelChoices.TURING_LARGE
    ),
    # Task completion
    EvalTag(
        type=EvalTagType.OBSERVATION_SPAN,
        value=EvalSpanKind.LLM,
        eval_name=EvalName.TASK_COMPLETION,
        mapping={
            "input": "llm.input_messages.0.message.content",
            "output": "llm.output_messages.0.message.content",
        },
        model=ModelChoices.TURING_LARGE
    ),
]

Advanced Use Cases

Complex Agentic Workflows

The integration supports tracing complex workflows with multiple LLM calls:

async def ecommerce_assistant_workflow(user_query):
    intent = await classify_intent(user_query)
    products = await search_products(intent)
    response = await generate_response(products, user_query)
    # All steps are automatically traced and evaluated
    return response

CI/CD Integration

Use this integration in your CI/CD pipelines for:

Automated Model Testing: Run evaluation suites on new model versions
Quality Gates: Set thresholds for evaluation scores before deployment
Performance Monitoring: Track degradation in model quality over time
Cost Optimization: Monitor and alert on cost spikes

View Results

FutureAGI Dashboard

Navigate to Prototype Tab → “Model-Benchmarking” project:

Automated evaluation scores
Quality metrics per response
Model comparison views

Portkey Dashboard

Unified logs across providers
Cost tracking per request
Latency comparisons
Token usage analytics

Next Steps

Gateway Configs

Fallbacks, caching, and load balancing

Model Catalog

Manage providers and credentials

FutureAGI Docs

Evaluation and quality assessment

Discord

Community support

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

Quick Start

Setup

Multi-Model Benchmarking

Evaluation Tags

Advanced Use Cases

Complex Agentic Workflows

CI/CD Integration

View Results

FutureAGI Dashboard

Portkey Dashboard

Next Steps

Gateway Configs

Model Catalog

FutureAGI Docs

Discord

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

​Quick Start

​Setup

​Multi-Model Benchmarking

​Evaluation Tags

​Advanced Use Cases

​Complex Agentic Workflows

​CI/CD Integration

​View Results

​FutureAGI Dashboard

​Portkey Dashboard

​Next Steps

Gateway Configs

Model Catalog

FutureAGI Docs

Discord

Quick Start

Setup

Multi-Model Benchmarking

Evaluation Tags

Advanced Use Cases

Complex Agentic Workflows

CI/CD Integration

View Results

FutureAGI Dashboard

Portkey Dashboard

Next Steps