AI agent traffic patterns differ from human user traffic. Agents make rapid, bursty tool invocations — a single agent session can trigger dozens of token exchanges in seconds. This guide addresses the scaling challenges specific to these patterns.
Prerequisites
- A working AI Identity Gateway deployment — You should have the Auth Provider Orchestrator and AI Identity Gateway Orchestrator configured and working in development. See the AI Identity overview for setup.
- Familiarity with production deployment basics — This guide builds on the Deploy to Production and Scale for Production guides. Read those first if you have not deployed the Orchestrator to production before.
- A load balancer with cookie and header-based session affinity — The Auth Provider requires cookie-based affinity (on the
maverics_sessioncookie). The AI Identity Gateway requires header-based affinity (on theMcp-Session-Idheader). See Configure load balancing for MCP traffic for examples. - A Redis instance — Required for the Auth Provider Orchestrator (OIDC Provider) in multi-node deployments.
- An observability stack — Prometheus, Grafana, or equivalent for metrics and dashboards. See the Monitor guide for setup.
Scale the AI Identity Gateway
Scale the Auth Provider Orchestrator
The Auth Provider Orchestrator is a standard Maverics Orchestrator configured with an
oidcProvider block. It runs as an OIDC Provider and handles agent authentication, token issuance, and per-tool token exchange. In an AI Identity Gateway deployment, every tool invocation triggers a token exchange request to this service.Scale the Auth Provider horizontally with Redis caching and sticky sessions, following the same pattern as any OIDC Provider scaling:- Redis cache is required for multi-node OIDC Provider deployments. Authorization codes, token state, and provider data must be accessible from any instance.
- Sticky sessions ensure each agent’s OAuth flow completes on the same instance.
- Console UI
- Configuration
Open your Auth Provider deployment
Go to Deployments in the sidebar and select the deployment that runs your Auth Provider Orchestrator.
Add a Redis cache
Under Orchestrator Settings, scroll to Redis Caches and click Add. Fill in the Redis connection, TLS, and encryption settings, then click Add.
Assign the cache to the OIDC Provider
Under OIDC Provider, click Edit OIDC Provider. Select the Redis cache you created from the Redis Cache dropdown and click Save.
Scale the AI Identity Gateway Orchestrator
The AI Identity Gateway Orchestrator is the same Maverics Orchestrator software, configured with an
mcpProvider block instead of an oidcProvider block. It runs the MCP Provider with your MCP Bridge and MCP Proxy apps, handling inbound agent connections, tool discovery, OPA policy evaluation, and upstream forwarding. Unlike the Auth Provider, the Gateway Orchestrator does not use an external state store.- Console UI
- Configuration
Open your Gateway deployment
Go to Deployments in the sidebar and select the deployment that runs your AI Identity Gateway Orchestrator.
Configure load balancing for MCP traffic
MCP traffic uses Streamable HTTP, which is HTTP-based but has specific load balancer requirements. You need separate load balancers (or separate listener rules) for the Auth Provider and the Gateway, since they serve different roles and may scale independently.Auth Provider load balancer:
- Cookie-based sticky sessions on the Orchestrator’s session cookie which has a default name of
maverics_session(same as any OIDC Provider — see Scale for Production) - Health check on
/status
- Session affinity on the
Mcp-Session-Idheader for Streamable HTTP connections - Extended timeouts for long-lived MCP sessions (at least 1 hour to match the default
session.timeout) - Health check on
/status
- NGINX Plus
- Kubernetes (Envoy Gateway)
Tune token exchange performance
Per-tool token exchange is one of the core security mechanism of the AI Identity Gateway — every tool invocation triggers a token exchange round-trip to the Auth Provider. At scale, this becomes the primary latency and throughput bottleneck.Optimize token exchange performance:
- Minimize network latency between Gateway and Auth Provider — Deploy them in the same region and availability zones. Each tool invocation adds one round-trip to the Auth Provider, so every millisecond of network latency is multiplied by the number of tool calls.
- Scale the Auth Provider ahead of demand — The Auth Provider handles cryptographic operations (JWT signing) for every token exchange. Monitor CPU utilization on Auth Provider instances and scale horizontally before saturation. When deploying on platforms like Kubernetes, autoscaling rules can be configured based on the CPU utilization (see the Orchestrator Helm Chart for an example).
-
Set appropriate token TTLs — Short TTLs (e.g.,
5s) are the default and provide the strongest security. If your tool invocations involve slow upstream APIs, increase the TTL to avoid tokens expiring mid-request. Balance security requirements against your upstream response times.
- Console UI
- Configuration
Open your MCP app
Go to Applications in the sidebar and select your MCP Bridge or MCP Proxy application.
Configure outbound token exchange
Under Application Identity, click Edit. In the Outbound Request Authorization section, set Authorization Type to Token Exchange, choose the Exchange Type and OIDC Identity Provider, and set the Audience.
Configure observability for AI agent traffic
AI agent traffic generates different load patterns than human user traffic. Agents make rapid, bursty requests — a single agent session can trigger dozens of tool invocations in seconds. Monitor both the Auth Provider and Gateway deployments to detect bottlenecks early.Key host-level metrics to monitor on both deployments:
See the Monitor guide for setting up OpenTelemetry metrics and traces, and Telemetry Reference for all configuration options.
| Metric | Why It Matters |
|---|---|
| CPU utilization | Token exchange (Auth Provider) and policy evaluation (Gateway) are CPU-intensive. Scale horizontally when utilization exceeds 70%. |
| Memory usage | High memory on the Auth Provider may indicate Redis connection pooling issues. High memory on the Gateway may indicate excessive concurrent MCP sessions. |
| Network throughput | Every tool invocation generates a round-trip between Gateway and Auth Provider. Monitor for saturation, especially if they are in different availability zones. |
| Request rate and error rate | Use your load balancer’s metrics to track inbound request volume and 5xx error rates for each deployment independently. |
| Load balancer health check failures | Indicates an Orchestrator instance is unhealthy or unreachable. Configure alerts so failed instances are investigated promptly. |
| Open file descriptors | The Gateway maintains concurrent connections: inbound from agents, outbound to the Auth Provider for token exchange, and outbound to upstream APIs. While connections are reused across requests, each concurrent agent session consumes file descriptors across all three. Ensure the OS ulimit -n is set high enough and monitor usage to avoid connection failures. |
AI Identity Gateway-specific metrics — including token exchange latency, MCP session counts, and tool invocation rates — are planned for a future Orchestrator release. Today, use host-level and load balancer metrics to monitor your deployment.
Verify the scaled deployment
With both deployments scaled, load balancers configured, and observability in place, verify that the entire system works end-to-end under load.Test the full agent flow:
- Connect an agent to the Gateway through the load balancer — verify OAuth discovery returns the Auth Provider’s metadata
- Authenticate the agent against the Auth Provider — verify token issuance succeeds
- Discover tools — verify the agent receives the full tool catalog
- Invoke a tool — verify the tool call succeeds end-to-end (agent → Gateway → token exchange → Auth Provider → upstream API)
- Check audit logs — verify the tool invocation is logged with both agent and user identity
- Simulate a Gateway instance failure — stop one Gateway instance and verify the agent’s next request is routed to a healthy instance
- Simulate an Auth Provider instance failure — stop one Auth Provider instance and verify token exchange still succeeds via remaining instances
Architecture Overview
The following diagram shows the scaled two-deployment architecture with load balancers, multiple instances, and the flow of MCP traffic:Troubleshooting
Token exchange latency is high under load
Token exchange latency is high under load
High token exchange latency usually means the Auth Provider is the
bottleneck:
- Check Auth Provider CPU — Token exchange involves JWT signing, which is CPU-intensive. If CPU utilization is above 70%, add more Auth Provider instances.
- Check Redis latency — The Auth Provider reads/writes token state in Redis. High Redis latency propagates to every token exchange. Monitor Redis response times and consider upgrading to a larger Redis instance or a Redis cluster.
- Check network latency — If the Gateway and Auth Provider are in different availability zones or regions, every tool invocation pays the cross-zone round-trip cost. Co-locate them in the same zone.
MCP sessions disconnect unexpectedly
MCP sessions disconnect unexpectedly
Premature session disconnections are usually caused by load balancer
timeouts:
- Check load balancer idle timeout — MCP sessions can be idle between tool invocations. If the load balancer’s idle timeout is shorter than the MCP session timeout, the connection drops. Set the load balancer idle timeout to at least match
mcpProvider.transports.stream.session.timeout(default 1 hour). - Check proxy timeouts — NGINX
proxy_read_timeoutandproxy_send_timeoutmust accommodate long-lived MCP sessions. Set both to at least 3600 seconds.
Agents fail to discover tools after Gateway scaling event
Agents fail to discover tools after Gateway scaling event
If agents lose their tool catalog after a Gateway scaling event (new
instances added or removed):
- Check session affinity — If the load balancer is not maintaining MCP session affinity, agents may be routed to a new Gateway instance that does not have their session context. Verify session affinity is configured on the
Mcp-Session-Idheader or on a load balancer-managed cookie. - Check connection draining — When a Gateway instance is removed, in-progress MCP sessions should drain gracefully. Configure connection draining (deregistration delay) on your load balancer to allow active sessions to complete.
- Agent reconnection — Well-behaved MCP agents should reconnect and re-discover tools when a session is lost. If your agents do not handle reconnection, consider increasing the connection draining timeout.
OPA policy evaluation is slow
OPA policy evaluation is slow
Slow OPA policy evaluation adds latency to every tool invocation:
- Review policy complexity — Complex Rego policies with external data lookups or deep object traversals take longer to evaluate. Simplify policies where possible.
- Check policy data size — Large policy data sets (loaded via
dataimports) increase evaluation time. Keep policy data minimal and load only what each policy needs. - Monitor Gateway CPU — OPA evaluation is CPU-bound. If Gateway CPU is high and most of the time is in policy evaluation, scale the Gateway horizontally.
Related Pages
AI Identity Overview
Set up the AI Identity Gateway from scratch with Auth Provider and Gateway configuration
AI Identity Gateway Reference
Full configuration reference for MCP Provider settings, transports, and OAuth authorization
Scale for Production
General Orchestrator scaling guide with load balancer examples for all modes
Monitor Your Deployment
Set up OpenTelemetry observability with metrics, traces, and structured logging