Skip to main content
By the end of this guide, you will have a production-scaled AI Identity Gateway deployment with horizontally scaled Auth Provider and Gateway Orchestrators, load-balanced MCP traffic, optimized token exchange, and observability tuned for AI agent workloads. The AI Identity Gateway has a two-deployment architecture: an Auth Provider Orchestrator (OIDC Provider) that issues and exchanges tokens, and an AI Identity Gateway Orchestrator (MCP Provider) that handles agent connections and tool routing. Scaling for production means scaling both deployments independently, because they have different load profiles and different infrastructure requirements.
Both the Auth Provider and AI Identity Gateway are the same Orchestrator software with different configurations. You do not need to install or manage separate products — just deploy the same Orchestrator image with the appropriate configuration for each tier.
AI agent traffic patterns differ from human user traffic. Agents make rapid, bursty tool invocations — a single agent session can trigger dozens of token exchanges in seconds. This guide addresses the scaling challenges specific to these patterns.

Prerequisites

  • A working AI Identity Gateway deployment — You should have the Auth Provider Orchestrator and AI Identity Gateway Orchestrator configured and working in development. See the AI Identity overview for setup.
  • Familiarity with production deployment basics — This guide builds on the Deploy to Production and Scale for Production guides. Read those first if you have not deployed the Orchestrator to production before.
  • A load balancer with cookie and header-based session affinity — The Auth Provider requires cookie-based affinity (on the maverics_session cookie). The AI Identity Gateway requires header-based affinity (on the Mcp-Session-Id header). See Configure load balancing for MCP traffic for examples.
  • A Redis instance — Required for the Auth Provider Orchestrator (OIDC Provider) in multi-node deployments.
  • An observability stack — Prometheus, Grafana, or equivalent for metrics and dashboards. See the Monitor guide for setup.

Scale the AI Identity Gateway

1

Scale the Auth Provider Orchestrator

The Auth Provider Orchestrator is a standard Maverics Orchestrator configured with an oidcProvider block. It runs as an OIDC Provider and handles agent authentication, token issuance, and per-tool token exchange. In an AI Identity Gateway deployment, every tool invocation triggers a token exchange request to this service.Scale the Auth Provider horizontally with Redis caching and sticky sessions, following the same pattern as any OIDC Provider scaling:
  • Redis cache is required for multi-node OIDC Provider deployments. Authorization codes, token state, and provider data must be accessible from any instance.
  • Sticky sessions ensure each agent’s OAuth flow completes on the same instance.
1

Open your Auth Provider deployment

Go to Deployments in the sidebar and select the deployment that runs your Auth Provider Orchestrator.
2

Add a Redis cache

Under Orchestrator Settings, scroll to Redis Caches and click Add. Fill in the Redis connection, TLS, and encryption settings, then click Add.
3

Assign the cache to the OIDC Provider

Under OIDC Provider, click Edit OIDC Provider. Select the Redis cache you created from the Redis Cache dropdown and click Save.
4

Configure session settings

Under Session and Cookie, click Edit Session and Cookie. Adjust the cache size, lifetime, and idle timeout for your expected agent volume, then click Save.
The Auth Provider is the token-minting bottleneck. If agents experience slow tool invocations, check Auth Provider latency first — every tool call requires a round-trip token exchange to this service.
2

Scale the AI Identity Gateway Orchestrator

The AI Identity Gateway Orchestrator is the same Maverics Orchestrator software, configured with an mcpProvider block instead of an oidcProvider block. It runs the MCP Provider with your MCP Bridge and MCP Proxy apps, handling inbound agent connections, tool discovery, OPA policy evaluation, and upstream forwarding. Unlike the Auth Provider, the Gateway Orchestrator does not use an external state store.
1

Open your Gateway deployment

Go to Deployments in the sidebar and select the deployment that runs your AI Identity Gateway Orchestrator.
2

Configure the MCP Provider

Under Orchestrator Settings, scroll to MCP Provider and click Edit MCP Provider. Enable the provider, configure the Stream Endpoint, Session settings (header name, timeout), and OAuth 2.0 Authorization Settings, then click Save.
3

Configure load balancing for MCP traffic

MCP traffic uses Streamable HTTP, which is HTTP-based but has specific load balancer requirements. You need separate load balancers (or separate listener rules) for the Auth Provider and the Gateway, since they serve different roles and may scale independently.Auth Provider load balancer:
  • Cookie-based sticky sessions on the Orchestrator’s session cookie which has a default name of maverics_session (same as any OIDC Provider — see Scale for Production)
  • Health check on /status
AI Identity Gateway load balancer:
  • Session affinity on the Mcp-Session-Id header for Streamable HTTP connections
  • Extended timeouts for long-lived MCP sessions (at least 1 hour to match the default session.timeout)
  • Health check on /status
# Auth Provider upstream
upstream auth_provider {
    server 10.0.1.10:9443;
    server 10.0.1.11:9443;
    sticky cookie maverics_session;
}

# AI Identity Gateway upstream
upstream ai_gateway {
    server 10.0.2.10:9443;
    server 10.0.2.11:9443;
    server 10.0.2.12:9443;
    # Route based on the Mcp-Session-Id header for session affinity
    sticky route $http_mcp_session_id;
}

server {
    listen 443 ssl;
    server_name auth.example.com;

    location / {
        proxy_pass https://auth_provider;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

server {
    listen 443 ssl;
    server_name gateway.example.com;

    # Extended timeouts for long-lived MCP sessions
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;

    location / {
        proxy_pass https://ai_gateway;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /status {
        proxy_pass https://ai_gateway/status;
    }
}
4

Tune token exchange performance

Per-tool token exchange is one of the core security mechanism of the AI Identity Gateway — every tool invocation triggers a token exchange round-trip to the Auth Provider. At scale, this becomes the primary latency and throughput bottleneck.Optimize token exchange performance:
  1. Minimize network latency between Gateway and Auth Provider — Deploy them in the same region and availability zones. Each tool invocation adds one round-trip to the Auth Provider, so every millisecond of network latency is multiplied by the number of tool calls.
  2. Scale the Auth Provider ahead of demand — The Auth Provider handles cryptographic operations (JWT signing) for every token exchange. Monitor CPU utilization on Auth Provider instances and scale horizontally before saturation. When deploying on platforms like Kubernetes, autoscaling rules can be configured based on the CPU utilization (see the Orchestrator Helm Chart for an example).
  3. Set appropriate token TTLs — Short TTLs (e.g., 5s) are the default and provide the strongest security. If your tool invocations involve slow upstream APIs, increase the TTL to avoid tokens expiring mid-request. Balance security requirements against your upstream response times.
1

Open your MCP app

Go to Applications in the sidebar and select your MCP Bridge or MCP Proxy application.
2

Configure outbound token exchange

Under Application Identity, click Edit. In the Outbound Request Authorization section, set Authorization Type to Token Exchange, choose the Exchange Type and OIDC Identity Provider, and set the Audience.
3

Add per-tool configurations

Under Tool Configurations, click Add Tool Configuration. Enter the Tool Name (exact match or regex pattern), set a Token Lifetime (TTL), and add any required OAuth Scopes. Repeat for each tool or tool pattern.
5

Configure observability for AI agent traffic

AI agent traffic generates different load patterns than human user traffic. Agents make rapid, bursty requests — a single agent session can trigger dozens of tool invocations in seconds. Monitor both the Auth Provider and Gateway deployments to detect bottlenecks early.Key host-level metrics to monitor on both deployments:
MetricWhy It Matters
CPU utilizationToken exchange (Auth Provider) and policy evaluation (Gateway) are CPU-intensive. Scale horizontally when utilization exceeds 70%.
Memory usageHigh memory on the Auth Provider may indicate Redis connection pooling issues. High memory on the Gateway may indicate excessive concurrent MCP sessions.
Network throughputEvery tool invocation generates a round-trip between Gateway and Auth Provider. Monitor for saturation, especially if they are in different availability zones.
Request rate and error rateUse your load balancer’s metrics to track inbound request volume and 5xx error rates for each deployment independently.
Load balancer health check failuresIndicates an Orchestrator instance is unhealthy or unreachable. Configure alerts so failed instances are investigated promptly.
Open file descriptorsThe Gateway maintains concurrent connections: inbound from agents, outbound to the Auth Provider for token exchange, and outbound to upstream APIs. While connections are reused across requests, each concurrent agent session consumes file descriptors across all three. Ensure the OS ulimit -n is set high enough and monitor usage to avoid connection failures.
See the Monitor guide for setting up OpenTelemetry metrics and traces, and Telemetry Reference for all configuration options.
AI Identity Gateway-specific metrics — including token exchange latency, MCP session counts, and tool invocation rates — are planned for a future Orchestrator release. Today, use host-level and load balancer metrics to monitor your deployment.
6

Verify the scaled deployment

With both deployments scaled, load balancers configured, and observability in place, verify that the entire system works end-to-end under load.
# Verify Auth Provider instances are healthy
curl -s https://auth.example.com/status | jq .

# Verify AI Identity Gateway instances are healthy
curl -s https://gateway.example.com/status | jq .

# Verify OAuth discovery works through the load balancer
curl -s https://gateway.example.com/.well-known/oauth-protected-resource | jq .
Test the full agent flow:
  1. Connect an agent to the Gateway through the load balancer — verify OAuth discovery returns the Auth Provider’s metadata
  2. Authenticate the agent against the Auth Provider — verify token issuance succeeds
  3. Discover tools — verify the agent receives the full tool catalog
  4. Invoke a tool — verify the tool call succeeds end-to-end (agent → Gateway → token exchange → Auth Provider → upstream API)
  5. Check audit logs — verify the tool invocation is logged with both agent and user identity
  6. Simulate a Gateway instance failure — stop one Gateway instance and verify the agent’s next request is routed to a healthy instance
  7. Simulate an Auth Provider instance failure — stop one Auth Provider instance and verify token exchange still succeeds via remaining instances
Success! Your AI Identity Gateway is scaled for production. The Auth Provider handles token exchange with Redis-backed state, the Gateway handles MCP traffic seamlessly, load balancers route traffic with appropriate session affinity, and observability captures AI-specific metrics for monitoring and alerting. Remember, both tiers are running the same Orchestrator software — upgrades and patches apply uniformly to both deployments.

Architecture Overview

The following diagram shows the scaled two-deployment architecture with load balancers, multiple instances, and the flow of MCP traffic:

Troubleshooting

High token exchange latency usually means the Auth Provider is the bottleneck:
  • Check Auth Provider CPU — Token exchange involves JWT signing, which is CPU-intensive. If CPU utilization is above 70%, add more Auth Provider instances.
  • Check Redis latency — The Auth Provider reads/writes token state in Redis. High Redis latency propagates to every token exchange. Monitor Redis response times and consider upgrading to a larger Redis instance or a Redis cluster.
  • Check network latency — If the Gateway and Auth Provider are in different availability zones or regions, every tool invocation pays the cross-zone round-trip cost. Co-locate them in the same zone.
Premature session disconnections are usually caused by load balancer timeouts:
  • Check load balancer idle timeout — MCP sessions can be idle between tool invocations. If the load balancer’s idle timeout is shorter than the MCP session timeout, the connection drops. Set the load balancer idle timeout to at least match mcpProvider.transports.stream.session.timeout (default 1 hour).
  • Check proxy timeouts — NGINX proxy_read_timeout and proxy_send_timeout must accommodate long-lived MCP sessions. Set both to at least 3600 seconds.
If agents lose their tool catalog after a Gateway scaling event (new instances added or removed):
  • Check session affinity — If the load balancer is not maintaining MCP session affinity, agents may be routed to a new Gateway instance that does not have their session context. Verify session affinity is configured on the Mcp-Session-Id header or on a load balancer-managed cookie.
  • Check connection draining — When a Gateway instance is removed, in-progress MCP sessions should drain gracefully. Configure connection draining (deregistration delay) on your load balancer to allow active sessions to complete.
  • Agent reconnection — Well-behaved MCP agents should reconnect and re-discover tools when a session is lost. If your agents do not handle reconnection, consider increasing the connection draining timeout.
Slow OPA policy evaluation adds latency to every tool invocation:
  • Review policy complexity — Complex Rego policies with external data lookups or deep object traversals take longer to evaluate. Simplify policies where possible.
  • Check policy data size — Large policy data sets (loaded via data imports) increase evaluation time. Keep policy data minimal and load only what each policy needs.
  • Monitor Gateway CPU — OPA evaluation is CPU-bound. If Gateway CPU is high and most of the time is in policy evaluation, scale the Gateway horizontally.

AI Identity Overview

Set up the AI Identity Gateway from scratch with Auth Provider and Gateway configuration

AI Identity Gateway Reference

Full configuration reference for MCP Provider settings, transports, and OAuth authorization

Scale for Production

General Orchestrator scaling guide with load balancer examples for all modes

Monitor Your Deployment

Set up OpenTelemetry observability with metrics, traces, and structured logging