Scale the AI Identity Gateway for Production

By the end of this guide, you will have a production-scaled AI Identity Gateway deployment with horizontally scaled Auth Provider and Gateway Orchestrators, load-balanced MCP traffic, optimized token exchange, and observability tuned for AI agent workloads. The AI Identity Gateway has a two-deployment architecture: an Auth Provider Orchestrator (OIDC Provider) that issues and exchanges tokens, and an AI Identity Gateway Orchestrator (MCP Provider) that handles agent connections and tool routing. Scaling for production means scaling both deployments independently, because they have different load profiles and different infrastructure requirements.

Both the Auth Provider and AI Identity Gateway are the same Orchestrator software with different configurations. You do not need to install or manage separate products — just deploy the same Orchestrator image with the appropriate configuration for each tier.

AI agent traffic patterns differ from human user traffic. Agents make rapid, bursty tool invocations — a single agent session can trigger dozens of token exchanges in seconds. This guide addresses the scaling challenges specific to these patterns.

Prerequisites

A working AI Identity Gateway deployment — You should have the Auth Provider Orchestrator and AI Identity Gateway Orchestrator configured and working in development. See the AI Identity overview for setup.
Familiarity with production deployment basics — This guide builds on the Deploy to Production and Scale for Production guides. Read those first if you have not deployed the Orchestrator to production before.
A load balancer with cookie and header-based session affinity — The Auth Provider requires cookie-based affinity (on the maverics_session cookie). The AI Identity Gateway requires header-based affinity (on the Mcp-Session-Id header). See Configure load balancing for MCP traffic for examples.
A Redis instance — Required for the Auth Provider Orchestrator (OIDC Provider) in multi-node deployments.
An observability stack — Prometheus, Grafana, or equivalent for metrics and dashboards. See the Monitor guide for setup.

Scale the AI Identity Gateway

Scale the Auth Provider Orchestrator

The Auth Provider Orchestrator is a standard Maverics Orchestrator configured with an oidcProvider block. It runs as an OIDC Provider and handles agent authentication, token issuance, and per-tool token exchange. In an AI Identity Gateway deployment, every tool invocation triggers a token exchange request to this service.Scale the Auth Provider horizontally with Redis caching and sticky sessions, following the same pattern as any OIDC Provider scaling:

Redis cache is required for multi-node OIDC Provider deployments. Authorization codes, token state, and provider data must be accessible from any instance.
Sticky sessions ensure each agent’s OAuth flow completes on the same instance.

Console UI
Configuration

Open your Auth Provider deployment

Go to Deployments in the sidebar and select the deployment that runs your Auth Provider Orchestrator.

Add a Redis cache

Under Orchestrator Settings, scroll to Redis Caches and click Add. Fill in the Redis connection, TLS, and encryption settings, then click Add.

Assign the cache to the OIDC Provider

Under OIDC Provider, click Edit OIDC Provider. Select the Redis cache you created from the Redis Cache dropdown and click Save.

Configure session settings

Under Session and Cookie, click Edit Session and Cookie. Adjust the cache size, lifetime, and idle timeout for your expected agent volume, then click Save.

Configure the Auth Provider Orchestrator with Redis caching and production session settings:

maverics.yaml

# Auth Provider Orchestrator configuration
http:
  address: "{{ env.MAVERICS_HTTP_ADDRESS }}"
  tls: default

tls:
  default:
    certFile: "{{ env.TLS_CERT_PATH }}"
    keyFile: "{{ env.TLS_KEY_PATH }}"
    minVersion: 1.2

caches:
  - name: auth-redis
    type: redis
    redis:
      addresses:
        - "{{ env.REDIS_ADDRESS }}"
      password: <vault.redis_password>
      tls: default
    encryption:
      keys:
        current: <vault.redis_encryption_key>
    hashing:
      keys:
        disabled: false

session:
  cookie:
    name: maverics_session
    domain: .example.com
  lifetime:
    maxTimeout: 12h
    idleTimeout: 30m
  store:
    type: local
    local:
      capacity: 100000

oidcProvider:
  cache: auth-redis
  correlateSession: true
  discovery:
    issuer: https://auth.example.com
    endpoints:
      wellKnown: https://auth.example.com/.well-known/oauth-authorization-server
      jwks: https://auth.example.com/oauth2/jwks
      auth: https://auth.example.com/oauth2/auth
      token: https://auth.example.com/oauth2/token
      userinfo: https://auth.example.com/oauth2/userinfo
      introspect: https://auth.example.com/oauth2/introspect
  jwks:
    - algorithm: RSA256
      publicKey: <vault.oidc-signing-public-key>
      privateKey: <vault.oidc-signing-private-key>

The capacity: 100000 on the local session store accommodates the higher session volume from AI agents. Adjust this based on your expected concurrent user count.See Redis Cache Reference for Redis clustering and encryption options.

The Auth Provider is the token-minting bottleneck. If agents experience slow tool invocations, check Auth Provider latency first — every tool call requires a round-trip token exchange to this service.

Scale the AI Identity Gateway Orchestrator

The AI Identity Gateway Orchestrator is the same Maverics Orchestrator software, configured with an mcpProvider block instead of an oidcProvider block. It runs the MCP Provider with your MCP Bridge and MCP Proxy apps, handling inbound agent connections, tool discovery, OPA policy evaluation, and upstream forwarding. Unlike the Auth Provider, the Gateway Orchestrator does not use an external state store.

Console UI
Configuration

Open your Gateway deployment

Go to Deployments in the sidebar and select the deployment that runs your AI Identity Gateway Orchestrator.

Configure the MCP Provider

Under Orchestrator Settings, scroll to MCP Provider and click Edit MCP Provider. Enable the provider, configure the Stream Endpoint, Session settings (header name, timeout), and OAuth 2.0 Authorization Settings, then click Save.

Configure the AI Identity Gateway Orchestrator for production:

maverics.yaml

# AI Identity Gateway Orchestrator configuration
http:
  address: "{{ env.MAVERICS_HTTP_ADDRESS }}"
  tls: default

tls:
  default:
    certFile: "{{ env.TLS_CERT_PATH }}"
    keyFile: "{{ env.TLS_KEY_PATH }}"
    minVersion: 1.2

mcpProvider:
  enabled: true
  transports:
    stream:
      enabled: true
      path: /mcp
      session:
        enabled: true
        headerName: Mcp-Session-Id
        timeout: 1h
  authorization:
    oauth:
      enabled: true
      metadataPath: /.well-known/oauth-protected-resource
      servers:
        - wellKnownEndpoint: https://auth.example.com/.well-known/oauth-authorization-server
          refreshInterval: 5m
          tokenValidation:
            expectedAudiences:
              - https://gateway.example.com/
            method: jwt

Key scaling considerations for the Gateway Orchestrator:

No Redis needed — The AI Identity Gateway uses token-based state tied to the upstream OAuth authorization server. Each request is independently validated.
MCP session affinity — While the Gateway is stateless for authorization, MCP sessions (tracked via the Mcp-Session-Id header) maintain tool discovery state in the process memory of the instance that minted them. Configure your load balancer to pin each session to the instance that minted it; otherwise follow-up requests may land on an instance that does not recognize the session ID. See Configure load balancing for MCP traffic for tested LB configurations.
OAuth metadata refresh — The refreshInterval: "5m" ensures the Gateway periodically refreshes the Auth Provider’s JWKS and metadata. In high-availability deployments, this prevents stale signing key issues during key rotation.
Scaling ratio — The Auth Provider and Gateway have different load profiles. The Auth Provider handles CPU-intensive cryptographic operations (JWT signing) per request, while the Gateway handles all inbound MCP traffic (tool discovery, OPA evaluation, upstream forwarding). Monitor both and scale each independently based on observed load.

Configure load balancing for MCP traffic

MCP traffic uses Streamable HTTP, which is HTTP-based but has specific load balancer requirements. You need separate load balancers (or separate listener rules) for the Auth Provider and the Gateway, since they serve different roles and may scale independently.Auth Provider load balancer:

Cookie-based sticky sessions on the Orchestrator’s session cookie which has a default name of maverics_session (same as any OIDC Provider — see Scale for Production)
Health check on /status

AI Identity Gateway load balancer:

Session affinity on the Mcp-Session-Id header for Streamable HTTP connections
Extended timeouts for long-lived MCP sessions (at least 1 hour to match the default session.timeout)
Health check on /status

NGINX Plus
Kubernetes (Envoy Gateway)
Kubernetes (HAProxy Ingress)

# Auth Provider upstream
upstream auth_provider {
    server 10.0.1.10:9443;
    server 10.0.1.11:9443;
    sticky cookie maverics_session;
}

# AI Identity Gateway upstream
upstream ai_gateway {
    server 10.0.2.10:9443;
    server 10.0.2.11:9443;
    server 10.0.2.12:9443;
    # Route based on the Mcp-Session-Id header for session affinity
    sticky route $http_mcp_session_id;
}

server {
    listen 443 ssl;
    server_name auth.example.com;

    location / {
        proxy_pass https://auth_provider;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

server {
    listen 443 ssl;
    server_name gateway.example.com;

    # Extended timeouts for long-lived MCP sessions
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;

    location / {
        proxy_pass https://ai_gateway;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /status {
        proxy_pass https://ai_gateway/status;
    }
}

This example uses Envoy Gateway, an open source Kubernetes Gateway API implementation.The Auth Provider uses cookie-based session affinity (standard BackendTrafficPolicy). The AI Identity Gateway needs Envoy’s envelope-mode stateful session filter, which Envoy Gateway does not yet expose through BackendTrafficPolicy.SessionPersistence — it must be enabled via EnvoyPatchPolicy.

The envelope stateful session extension was added in Envoy v1.35.0 (envoyproxy/envoy#39004) and shipped with the default Envoy data-plane image in Envoy Gateway v1.5.0+. Earlier versions silently reject the patch. See the Envoy Gateway compatibility matrix.EnvoyPatchPolicy is also disabled by default and must be turned on in the EnvoyGateway config. With the official Helm chart set config.envoyGateway.extensionApis.enableEnvoyPatchPolicy=true at install or upgrade time.

# Auth Provider -- cookie-based affinity on maverics_session
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: auth-provider
  namespace: maverics
spec:
  parentRefs:
    - name: maverics
  hostnames:
    - auth.example.com
  rules:
    - backendRefs:
        - name: auth-provider
          port: 9443
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: auth-provider-affinity
  namespace: maverics
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: auth-provider
  loadBalancer:
    type: ConsistentHash
    consistentHash:
      type: Cookie
      cookie:
        name: maverics_session
---
# AI Identity Gateway -- HTTPRoute (LB affinity is set up via EnvoyPatchPolicy below)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ai-gateway
  namespace: maverics
spec:
  parentRefs:
    - name: maverics
  hostnames:
    - gateway.example.com
  rules:
    - backendRefs:
        - name: ai-gateway
          port: 9443
---
# AI Identity Gateway -- session affinity via envelope-mode stateful_session.
#
# Envoy rewrites the Mcp-Session-Id response header to embed the chosen
# upstream instance's address alongside the original session ID. On subsequent
# requests, Envoy decodes the address back out and routes to that instance,
# restoring the original session ID before forwarding upstream. Transparent
# to MCP clients (the session ID is treated as opaque per the MCP spec)
# and to the AI Identity Gateway (it never sees the encoded form).
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyPatchPolicy
metadata:
  name: ai-gateway-envelope-session
  namespace: maverics
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: maverics
  type: JSONPatch
  jsonPatches:
    - type: type.googleapis.com/envoy.config.listener.v3.Listener
      # Envoy Gateway names listeners <namespace>/<gateway-name>/<listener-name>.
      # Update this to match your Gateway. The listener name is also visible
      # in the data-plane Envoy's /listeners admin endpoint.
      name: maverics/maverics/https
      operation:
        op: add
        path: /default_filter_chain/filters/0/typed_config/http_filters/0
        value:
          name: envoy.filters.http.stateful_session
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.stateful_session.v3.StatefulSession
            session_state:
              name: envoy.http.stateful_session.envelope
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.http.stateful_session.envelope.v3.EnvelopeSessionState
                header:
                  name: Mcp-Session-Id

Why not just BackendTrafficPolicy.SessionPersistence? Envoy Gateway exposes Cookie and Header modes there, but neither fits the MCP protocol:

Cookie mode writes a Set-Cookie that the LB reads on subsequent requests. MCP clients typically do not use cookies, so the cookie is dropped and affinity is lost.
Header mode writes the chosen upstream’s address into a configured response header. If you point it at Mcp-Session-Id, the value collides with the session ID the AI Identity Gateway already wrote; if you point it at a sibling header, MCP clients do not know to roundtrip it.

The envelope mode rewrites the existing Mcp-Session-Id value to embed both the upstream address and the original session ID, which is the only approach that is transparent to MCP clients while preserving the protocol semantics.

This example uses the HAProxy Kubernetes Ingress Controller, the official open source Ingress Controller from HAProxy Technologies.The Auth Provider uses cookie-based session affinity (the standard haproxy.org/cookie-persistence annotation). The AI Identity Gateway uses HAProxy’s stick-table mechanism, which learns the <session-id> → <instance> mapping from the upstream’s Mcp-Session-Id response header and routes subsequent requests with the same header back to the recorded instance. The Ingress Controller does not expose stick-table-on-header as a first-class annotation, so we inject the directives via the controller’s escape hatch: haproxy.org/backend-config-snippet.

Install the controller via the official Helm chart:

helm repo add haproxytech https://haproxytech.github.io/helm-charts
helm install haproxy-kic haproxytech/kubernetes-ingress \
  -n haproxy-controller --create-namespace

The backend-config-snippet annotation passes raw HAProxy directives directly into the rendered backend section, so syntax is validated by HAProxy at reload time — not by the controller. Malformed snippets cause HAProxy to fail to reload; check the controller pod logs after applying.

# Auth Provider -- cookie-based affinity on maverics_session
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: auth-provider
  namespace: maverics
  annotations:
    haproxy.org/cookie-persistence: maverics_session
spec:
  ingressClassName: haproxy
  rules:
    - host: auth.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: auth-provider
                port:
                  number: 9443
---
# AI Identity Gateway -- session affinity via stick-table on Mcp-Session-Id.
#
# HAProxy maintains an in-memory stick-table keyed by the Mcp-Session-Id
# value. The `stick store-response` directive records <session-id> -> <instance>
# when an upstream responds with the header (i.e. on `initialize`).
# The `stick on` directive looks up that mapping on subsequent requests
# and routes to the recorded instance -- transparent to MCP clients (they
# roundtrip the protocol-mandated Mcp-Session-Id header by spec).
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ai-gateway
  namespace: maverics
  annotations:
    haproxy.org/load-balance: roundrobin
    haproxy.org/backend-config-snippet: |
      stick-table type string len 128 size 100k expire 1h
      stick on req.hdr(Mcp-Session-Id)
      stick store-response res.hdr(Mcp-Session-Id)
    # Extended timeouts for long-lived MCP sessions
    haproxy.org/timeout-client: 1h
    haproxy.org/timeout-server: 1h
spec:
  ingressClassName: haproxy
  rules:
    - host: gateway.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ai-gateway
                port:
                  number: 9443

Why not haproxy.org/cookie-persistence for the AI Identity Gateway too? The annotation works perfectly for the Auth Provider’s browser-based OAuth flow, but cookie-based affinity does not work for typical MCP clients. MCP clients only roundtrip the protocol-mandated Mcp-Session-Id header, so the LB has to key affinity off that header rather than off a cookie it sets itself.The stick-table state lives in HAProxy’s process memory. For multi-replica HAProxy deployments, configure HAProxy peers to replicate stick-tables across replicas; otherwise each replica builds its own table independently and traffic must remain pinned to a specific HAProxy replica (typically via L4 source-IP affinity at the upstream LB).

Tune token exchange performance

Per-tool token exchange is one of the core security mechanism of the AI Identity Gateway — every tool invocation triggers a token exchange round-trip to the Auth Provider. At scale, this becomes the primary latency and throughput bottleneck.Optimize token exchange performance:

Minimize network latency between Gateway and Auth Provider — Deploy them in the same region and availability zones. Each tool invocation adds one round-trip to the Auth Provider, so every millisecond of network latency is multiplied by the number of tool calls.
Scale the Auth Provider ahead of demand — The Auth Provider handles cryptographic operations (JWT signing) for every token exchange. Monitor CPU utilization on Auth Provider instances and scale horizontally before saturation. When deploying on platforms like Kubernetes, autoscaling rules can be configured based on the CPU utilization (see the Orchestrator Helm Chart for an example).
Set appropriate token TTLs — Short TTLs (e.g., 5s) are the default and provide the strongest security. If your tool invocations involve slow upstream APIs, increase the TTL to avoid tokens expiring mid-request. Balance security requirements against your upstream response times.

Console UI
Configuration

Open your MCP app

Go to Applications in the sidebar and select your MCP Bridge or MCP Proxy application.

Configure outbound token exchange

Under Application Identity, click Edit. In the Outbound Request Authorization section, set Authorization Type to Token Exchange, choose the Exchange Type and OIDC Identity Provider, and set the Audience.

Add per-tool configurations

Under Tool Configurations, click Add Tool Configuration. Enter the Tool Name (exact match or regex pattern), set a Token Lifetime (TTL), and add any required OAuth Scopes. Repeat for each tool or tool pattern.

Configure per-tool token exchange with appropriate TTLs in your MCP Bridge or MCP Proxy app:

maverics.yaml

# On the AI Identity Gateway Orchestrator
apps:
  - name: employee-api
    type: mcpBridge
    mode: openapi
    openapi:
      spec:
        uri: https://api.example.com/openapi.json
      baseURL: https://api.example.com
    authorization:
      outbound:
        type: tokenExchange
        tokenExchange:
          type: delegation
          # Connector pointing to the Auth Provider's OIDC well-known
          idp: auth-provider
          audience: https://api.example.com/
          tools:
            - name: listEmployees
              ttl: 10s
              scopes:
                - name: employee:List
            - name: getEmployee
              ttl: 10s
              scopes:
                - name: employee:Get
            - name: createEmployee
              ttl: 30s
              scopes:
                - name: employee:Create

Write operations (createEmployee) use a longer TTL because the upstream API may take longer to process mutations. Read operations use shorter TTLs because they complete quickly and benefit from tighter token lifetimes.

Configure observability for AI agent traffic

AI agent traffic generates different load patterns than human user traffic. Agents make rapid, bursty requests — a single agent session can trigger dozens of tool invocations in seconds. Monitor both the Auth Provider and Gateway deployments to detect bottlenecks early.Key host-level metrics to monitor on both deployments:

Metric	Why It Matters
CPU utilization	Token exchange (Auth Provider) and policy evaluation (Gateway) are CPU-intensive. Scale horizontally when utilization exceeds 70%.
Memory usage	High memory on the Auth Provider may indicate Redis connection pooling issues. High memory on the Gateway may indicate excessive concurrent MCP sessions.
Network throughput	Every tool invocation generates a round-trip between Gateway and Auth Provider. Monitor for saturation, especially if they are in different availability zones.
Request rate and error rate	Use your load balancer’s metrics to track inbound request volume and 5xx error rates for each deployment independently.
Load balancer health check failures	Indicates an Orchestrator instance is unhealthy or unreachable. Configure alerts so failed instances are investigated promptly.
Open file descriptors	The Gateway maintains concurrent connections: inbound from agents, outbound to the Auth Provider for token exchange, and outbound to upstream APIs. While connections are reused across requests, each concurrent agent session consumes file descriptors across all three. Ensure the OS `ulimit -n` is set high enough and monitor usage to avoid connection failures.

See the Monitor guide for setting up OpenTelemetry metrics and traces, and Telemetry Reference for all configuration options.

AI Identity Gateway-specific metrics — including token exchange latency, MCP session counts, and tool invocation rates — are planned for a future Orchestrator release. Today, use host-level and load balancer metrics to monitor your deployment.

Verify the scaled deployment

With both deployments scaled, load balancers configured, and observability in place, verify that the entire system works end-to-end under load.

# Verify Auth Provider instances are healthy
curl -s https://auth.example.com/status | jq .

# Verify AI Identity Gateway instances are healthy
curl -s https://gateway.example.com/status | jq .

# Verify OAuth discovery works through the load balancer
curl -s https://gateway.example.com/.well-known/oauth-protected-resource | jq .

Test the full agent flow:

Connect an agent to the Gateway through the load balancer — verify OAuth discovery returns the Auth Provider’s metadata
Authenticate the agent against the Auth Provider — verify token issuance succeeds
Discover tools — verify the agent receives the full tool catalog
Invoke a tool — verify the tool call succeeds end-to-end (agent → Gateway → token exchange → Auth Provider → upstream API)
Check audit logs — verify the tool invocation is logged with both agent and user identity
Simulate a Gateway instance failure — stop one Gateway instance and verify the agent’s next request is routed to a healthy instance
Simulate an Auth Provider instance failure — stop one Auth Provider instance and verify token exchange still succeeds via remaining instances

Success! Your AI Identity Gateway is scaled for production. The Auth Provider handles token exchange with Redis-backed state, the Gateway handles MCP traffic seamlessly, load balancers route traffic with appropriate session affinity, and observability captures AI-specific metrics for monitoring and alerting. Remember, both tiers are running the same Orchestrator software — upgrades and patches apply uniformly to both deployments.

Architecture Overview

The following diagram shows the scaled two-deployment architecture with load balancers, multiple instances, and the flow of MCP traffic:

Troubleshooting

Token exchange latency is high under load

High token exchange latency usually means the Auth Provider is the bottleneck:

Check Auth Provider CPU — Token exchange involves JWT signing, which is CPU-intensive. If CPU utilization is above 70%, add more Auth Provider instances.
Check Redis latency — The Auth Provider reads/writes token state in Redis. High Redis latency propagates to every token exchange. Monitor Redis response times and consider upgrading to a larger Redis instance or a Redis cluster.
Check network latency — If the Gateway and Auth Provider are in different availability zones or regions, every tool invocation pays the cross-zone round-trip cost. Co-locate them in the same zone.

MCP sessions disconnect unexpectedly

Premature session disconnections are usually caused by load balancer timeouts:

Check load balancer idle timeout — MCP sessions can be idle between tool invocations. If the load balancer’s idle timeout is shorter than the MCP session timeout, the connection drops. Set the load balancer idle timeout to at least match mcpProvider.transports.stream.session.timeout (default 1 hour).
Check proxy timeouts — NGINX proxy_read_timeout and proxy_send_timeout must accommodate long-lived MCP sessions. Set both to at least 3600 seconds.

Agents fail to discover tools after Gateway scaling event

If agents lose their tool catalog after a Gateway scaling event (new instances added or removed):

Check session affinity — If the load balancer is not pinning each MCP session to the instance that minted it, agents may be routed to a Gateway instance that does not have their session context. See Configure load balancing for MCP traffic for tested LB configurations; in particular, generic consistent-hashing on the Mcp-Session-Id header does not deliver affinity (the instance that minted the session and the instance the hash selects are uncorrelated), and cookie-based affinity does not work for MCP clients that do not use cookies.
Check connection draining — When a Gateway instance is removed, in-progress MCP sessions should drain gracefully. Configure connection draining (deregistration delay) on your load balancer to allow active sessions to complete.
Agent reconnection — Well-behaved MCP agents should reconnect and re-discover tools when a session is lost. If your agents do not handle reconnection, consider increasing the connection draining timeout.

OPA policy evaluation is slow

Slow OPA policy evaluation adds latency to every tool invocation:

Review policy complexity — Complex Rego policies with external data lookups or deep object traversals take longer to evaluate. Simplify policies where possible.
Check policy data size — Large policy data sets (loaded via data imports) increase evaluation time. Keep policy data minimal and load only what each policy needs.
Monitor Gateway CPU — OPA evaluation is CPU-bound. If Gateway CPU is high and most of the time is in policy evaluation, scale the Gateway horizontally.

AI Identity Overview

Set up the AI Identity Gateway from scratch with Auth Provider and Gateway configuration

AI Identity Gateway Reference

Full configuration reference for MCP Provider settings, transports, and OAuth authorization

Scale for Production

General Orchestrator scaling guide with load balancer examples for all modes

Monitor Your Deployment

Set up OpenTelemetry observability with metrics, traces, and structured logging

​Prerequisites

​Scale the AI Identity Gateway

​Architecture Overview

​Troubleshooting

​Related Pages

AI Identity Overview

AI Identity Gateway Reference

Scale for Production

Monitor Your Deployment

Prerequisites

Scale the AI Identity Gateway

Architecture Overview

Troubleshooting

Related Pages