LiteLLM

provider: litellm routes inference through any OpenAI-compatible HTTP endpoint using the Strands SDK LiteLLMModel. This covers self-hosted LiteLLM proxy servers, vLLM, Ollama (with the OpenAI adapter), OpenRouter, Cloudflare AI Gateway, and any other OpenAI-compatible API.

This is the provider to use when you want a single routing layer that can transparently switch between multiple upstream models, apply rate limiting, load-balance across instances, enforce budgets, or add custom audit logging — all without touching the agent blueprint.

Installation

The litellm extra is required:

pip install "agent-core[litellm]"

This installs litellm>=1.83.0,<2. See the CVE note below for why the lower bound matters.

Blueprint configuration

model:
  provider: litellm
  model_id: claude-sonnet-4-6              # Model name as the proxy expects it
  temperature: 0.3
  max_tokens: 4096
  base_url: https://your-litellm-proxy.example.com
  api_key_env: LITELLM_API_KEY             # Env var holding the API key

Fields

Field	Required	Notes
`model_id`	Yes	Model string passed through unchanged to the proxy. The proxy maps this to the upstream provider.
`temperature`	Yes	Forwarded to LiteLLM via the `params` dict (`params={"temperature": ..., "max_tokens": ...}`).
`max_tokens`	Yes	Forwarded to LiteLLM via the `params` dict.
`base_url`	Recommended	Base URL of the proxy endpoint. Required when routing through a custom proxy.
`api_key_env`	Recommended	Env var name holding the API key. Never put the key itself in the blueprint.
`extra_headers_env`	Optional	Map of HTTP header name → env var name. Resolved at load time. Useful for auth headers such as Cloudflare Access tokens.

`extra_headers_env` — injecting custom headers

model:
  provider: litellm
  model_id: claude-sonnet-4-6
  temperature: 0.3
  max_tokens: 4096
  base_url: https://your-gateway.example.com
  api_key_env: GATEWAY_API_KEY
  extra_headers_env:
    CF-Access-Client-Id: CF_ACCESS_CLIENT_ID
    CF-Access-Client-Secret: CF_ACCESS_CLIENT_SECRET

Each entry maps an HTTP header name (sent with every request) to the name of an environment variable whose value is resolved at load time. Headers whose env var is absent or empty are omitted silently.

The `custom_llm_provider` mechanism

When base_url is set, the loader automatically sets custom_llm_provider="openai" inside client_args. This is a critical detail:

Without this flag, the litellm library inspects model_id and routes based on the name prefix — claude-* goes to Anthropic’s API directly, gemini-* goes to Google, deepseek-* goes to DeepSeek, and so on — completely ignoring base_url. Your carefully configured proxy is bypassed.

With this flag, litellm treats the endpoint as OpenAI-compatible and sends all requests to base_url, regardless of what the model name looks like. This is the correct behavior when your proxy handles all routing internally.

The model_id string is passed through unchanged — no prefix is added. Supply whatever model identifier your proxy expects:

# If your proxy expects the model as "claude-sonnet-4-6"
model_id: claude-sonnet-4-6

# If your proxy expects a fully-qualified Anthropic ID
model_id: anthropic/claude-sonnet-4-5-20250908

# If your proxy uses internal aliases
model_id: my-org/fast-claude

This mechanism applies only when base_url is set. If base_url is absent, custom_llm_provider is not injected and litellm resolves the provider from the model name as usual.

Environment variables

Variable	Purpose
Named by `api_key_env`	API key sent to the proxy.
Named by each entry in `extra_headers_env`	Custom HTTP header values resolved at load time.

No region variable is needed — the endpoint is fully specified by base_url.

Runtime-switchable model IDs

The model_id field supports ${VAR:-default} expansion:

model:
  provider: litellm
  model_id: ${ACTIVE_MODEL:-claude-sonnet-4-6}
  temperature: 0.3
  max_tokens: 8192
  base_url: https://your-litellm-proxy.example.com
  api_key_env: LITELLM_API_KEY

Set ACTIVE_MODEL in the Runtime environment to switch models across deployments without changing the blueprint.

Example: agent behind a LiteLLM proxy

id: analysis-agent
name: Analysis Agent
version: "1.0.0"
prompt_ref: analysis-agent-system-v1

model:
  provider: litellm
  model_id: ${ACTIVE_MODEL:-claude-sonnet-4-6}
  temperature: 0.2
  max_tokens: 8192
  base_url: https://llm.your-company.example.com
  api_key_env: LITELLM_API_KEY

gateway:
  auth_type: aws_iam

runtime:
  type: agentcore
  max_iterations: 10
  idle_timeout_minutes: 20
  network_mode: PRIVATE
  protocol: HTTP

execution_modes:
  simulation: true
  staging: true
  production: true

Example: Cloudflare AI Gateway with Access tokens

model:
  provider: litellm
  model_id: claude-sonnet-4-6
  temperature: 0.3
  max_tokens: 4096
  base_url: https://gateway.ai.cloudflare.com/v1/YOUR_ACCOUNT_ID/YOUR_GATEWAY/openai
  api_key_env: CF_GATEWAY_API_KEY
  extra_headers_env:
    CF-Access-Client-Id: CF_ACCESS_CLIENT_ID
    CF-Access-Client-Secret: CF_ACCESS_CLIENT_SECRET

CVE-2026-33634 safety pin

The litellm dependency is pinned to >=1.83.0,<2. Versions 1.82.7 and 1.82.8 were compromised in a supply chain attack (CVE-2026-33634) that injected malicious code into the published PyPI packages.

If you pin or override litellm in your own pyproject.toml or requirements.txt, ensure the version is outside the 1.82.7–1.82.8 range. The agent-core[litellm] extra enforces >=1.83.0 automatically.

Structured output

output_schema works with the LiteLLM provider via the Strands native forced-tool path, as it does with all four providers. See Structured Output for details.

Data protection

Bedrock Guardrails are not available for the LiteLLM provider. Use data_protection.provider: presidio for provider-agnostic in-process PII filtering:

observability:
  data_protection:
    provider: presidio
    presidio_entities:
      - EMAIL_ADDRESS
      - PHONE_NUMBER
    presidio_language: en

Related: Overview — provider selection guide. Structured Output — how output_schema works across all providers.