Reducing Single Points of Failure

The most common question reliability architects ask about the Orchestrator is: “Isn’t it a single point of failure?” The short answer is no — the Orchestrator scales horizontally behind a load balancer (see the Scale for Production guide). If one instance goes down, the others continue serving traffic and users are transparently re-authenticated. But scaling the Orchestrator itself only solves half the problem. Your identity providers can still be single points of failure. If your sole IdP goes down, it does not matter how many Orchestrator instances you have — nobody can authenticate. Identity Continuity is how you address that: the Orchestrator monitors IdP health and automatically fails over to a backup when an outage is detected. This guide presents a tiered approach to Identity Continuity. Each tier adds resilience against progressively more severe failure scenarios. Choose the tier that matches your risk tolerance.

Understanding Failure Modes

Before choosing a tier, understand what can go wrong:

Failure mode	Example	Impact
Single IdP outage	Unplanned incident, scheduled maintenance window	Users who authenticate through that IdP cannot log in
Vendor-wide outage	An entire IdP vendor’s service is unavailable	Every application relying on that vendor is affected
Regional network partition	Orchestrator in US-East cannot reach IdPs in EU-West	Users at affected sites cannot authenticate, even if the IdPs are healthy
Multi-provider outage	Two or more IdP vendors are unavailable simultaneously	Rare but catastrophic — all failover targets are unavailable

Tier 1: Basic Continuity

Protects against: Single IdP outage The simplest configuration: one Continuity connector with a primary IdP and one backup. If the primary goes down, the Orchestrator automatically routes authentication to the backup.

maverics.yaml

connectors:
  - name: primary-idp
    type: oidc
    oidcWellKnownURL: https://primary.example.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.primary-secret>
    oauthRedirectURL: https://orch.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: backup-idp
    type: oidc
    oidcWellKnownURL: https://backup.example.com/.well-known/openid-configuration
    oauthClientID: orchestrator-backup-client
    oauthClientSecret: <vault.backup-secret>
    oauthRedirectURL: https://orch.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: ha-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - primary-idp
        - backup-idp

Limitation: If both IdPs are from the same vendor or hosted in the same region, a vendor-wide or regional outage takes both down simultaneously. Tier 2 addresses this. For a full walkthrough of setting up Continuity with health monitoring and the Schema Abstraction Layer, see the Identity Continuity setup guide.

Tier 2: Diversified Continuity

Protects against: Vendor-specific outage, regional outage Tier 2 applies a single principle: diversity. Add three or more IdPs from different vendors, deployed in different regions, using different protocols. A vendor-wide outage or regional network partition only takes out one link in the chain.

maverics.yaml

connectors:
  # Cloud IdP - Vendor A (US region)
  - name: okta-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.okta-secret>
    oauthRedirectURL: https://orch.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  # Cloud IdP - Vendor B (EU region)
  - name: entra-idp
    type: saml
    samlMetadataURL: https://login.microsoftonline.com/TENANT/federationmetadata/2007-06/federationmetadata.xml
    samlConsumerServiceURL: https://orch.example.com/saml/callback
    samlEntityID: https://orch.example.com
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  # On-premises IdP - different network path entirely
  - name: onprem-ldap
    type: ldap
    url: ldaps://dc.corp.example.com:636
    baseDN: dc=corp,dc=example,dc=com
    bindDN: cn=svc-maverics,ou=service-accounts,dc=corp,dc=example,dc=com
    bindPassword: <vault.ldap-password>
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: diversified-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - okta-idp
        - entra-idp
        - onprem-ldap
    # The Schema Abstraction Layer normalizes attributes across
    # providers so downstream apps see consistent claim names
    # regardless of which IdP authenticated the user.
    attributes:
      - name: email
        mapping:
          okta-idp: email
          entra-idp: preferred_username
          onprem-ldap: mail
      - name: displayName
        mapping:
          okta-idp: name
          entra-idp: displayName
          onprem-ldap: cn
      - name: role
        mapping:
          okta-idp: role
          entra-idp: group
        default: "user"

The key decisions in this tier:

Different vendors — Okta and Entra ID are independent services. A vendor-specific incident only affects one.
Different regions — Cloud IdPs in US and EU, plus an on-premises LDAP server. A regional network partition cannot take all three offline.
Different protocols — OIDC, SAML, and LDAP. A protocol-specific bug or misconfiguration is contained to one connector.
Schema Abstraction Layer — The attributes block normalizes claim names so your applications see email, displayName, and role regardless of which IdP responded. Without this, failover would break applications that depend on specific claim names.

Health monitoring is configured on each individual IdP connector, not on the Continuity connector. The Continuity connector reads health status from its member connectors to make routing decisions. See the Identity Continuity reference for the full health check configuration.

Tier 3: Cascading Continuity

Protects against: Regional isolation, near-total IdP infrastructure failure Tier 3 introduces Orchestrator-to-Orchestrator federation. Instead of every Orchestrator instance connecting directly to every IdP, you create a two-layer architecture:

Root Orchestrators in primary data centers connect directly to the actual IdPs via Continuity. They run as OIDC Providers or SAML Providers, acting as a stable identity endpoint.
Edge Orchestrators in secondary regions or branch offices use Generic OIDC or Generic SAML connectors pointed at the root Orchestrators. They use Continuity to fail over between multiple root Orchestrators.

Only root Orchestrators need direct IdP credentials and configurations. Edge sites treat the root Orchestrators as their identity providers. If an edge site loses connectivity to one root, it fails over to another. If all roots are unreachable (full regional isolation), only that edge site is affected — other edge sites continue operating through whichever root they can reach.

Root Orchestrator Configuration

Root Orchestrators connect directly to your IdPs via Continuity and expose an OIDC Provider endpoint that edge Orchestrators authenticate against.

root-orch.yaml

connectors:
  - name: okta-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.okta-secret>
    oauthRedirectURL: https://root-dc1.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: entra-idp
    type: saml
    samlMetadataURL: https://login.microsoftonline.com/TENANT/federationmetadata/2007-06/federationmetadata.xml
    samlConsumerServiceURL: https://root-dc1.example.com/saml/callback
    samlEntityID: https://root-dc1.example.com
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: root-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - okta-idp
        - entra-idp
    attributes:
      - name: email
        mapping:
          okta-idp: email
          entra-idp: preferred_username
      - name: displayName
        mapping:
          okta-idp: name
          entra-idp: displayName

# Expose this Orchestrator as an OIDC Provider so edge
# Orchestrators can authenticate against it.
oidcProvider:
  discovery:
    issuer: https://root-dc1.example.com
  jwks:
    - name: signing-key
      type: rsa
      file: /etc/maverics/keys/signing.key

apps:
  - name: edge-app
    type: oidc
    clientID: edge-orchestrator
    clientSecret: <vault.edge-client-secret>
    authentication:
      idps:
        - root-failover
    claimsMapping:
      email: root-failover.email
      name: root-failover.displayName

Edge Orchestrator Configuration

Edge Orchestrators treat root Orchestrators as IdPs using Generic OIDC connectors, with Continuity providing failover between roots.

edge-orch.yaml

connectors:
  # Root Orchestrator in DC1, treated as an OIDC IdP
  - name: root-dc1
    type: oidc
    oidcWellKnownURL: https://root-dc1.example.com/.well-known/openid-configuration
    oauthClientID: edge-orchestrator
    oauthClientSecret: <vault.edge-dc1-secret>
    oauthRedirectURL: https://edge-branch-a.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  # Root Orchestrator in DC2, treated as an OIDC IdP
  - name: root-dc2
    type: oidc
    oidcWellKnownURL: https://root-dc2.example.com/.well-known/openid-configuration
    oauthClientID: edge-orchestrator
    oauthClientSecret: <vault.edge-dc2-secret>
    oauthRedirectURL: https://edge-branch-a.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: edge-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - root-dc1
        - root-dc2

# Edge Orchestrator serves local applications.
# Configure apps here using the edge-failover connector.
apps:
  - name: branch-app
    type: oidc
    clientID: branch-app
    authentication:
      idps:
        - edge-failover
    claimsMapping:
      email: edge-failover.email
      name: edge-failover.name

Since root Orchestrators already normalize attributes via their Schema Abstraction Layer, edge Orchestrators receive consistent claims regardless of which root (or underlying IdP) handled authentication. You only need Schema Abstraction Layer mappings at the edge if the two root Orchestrators emit different claim names.

Tier 4: Tactical Edge Continuity

Protects against: Total connectivity loss (DDIL — Denied, Degraded, Intermittent, or Limited) Tiers 1-3 assume that at least one upstream IdP is reachable at all times. Tactical and field deployments break that assumption. In DDIL environments — forward-deployed operations, remote industrial sites, shipboard networks, or disaster-response staging areas — connectivity to cloud identity providers can be intermittent or severed entirely. Tier 4 addresses this with a co-located on-premises IdP (such as Keycloak) deployed alongside the tactical Orchestrator on the same local network segment. When cloud connectivity is available, the Orchestrator routes authentication to the cloud IdP normally. When connectivity is lost, the Orchestrator detects the failure and falls back to the local IdP autonomously — no operator intervention required.

tactical-orch.yaml

connectors:
  # Cloud IdP -- preferred when connectivity is available
  - name: cloud-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: tactical-client
    oauthClientSecret: <vault.cloud-secret>
    oauthRedirectURL: https://tactical-orch.local/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 10s
      timeout: 5s
      unhealthyThreshold: 2
      healthyThreshold: 3

  # Co-located Keycloak -- always reachable on the local network
  - name: local-keycloak
    type: oidc
    oidcWellKnownURL: https://keycloak.local/realms/tactical/.well-known/openid-configuration
    oauthClientID: tactical-client
    oauthClientSecret: <vault.keycloak-secret>
    oauthRedirectURL: https://tactical-orch.local/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 10s
      timeout: 5s
      unhealthyThreshold: 2
      healthyThreshold: 2

  - name: tactical-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - cloud-idp
        - local-keycloak
    # Schema Abstraction Layer ensures apps see consistent attributes whether the user
    # authenticated against the cloud IdP or local Keycloak.
    attributes:
      - name: email
        mapping:
          cloud-idp: email
          local-keycloak: email
      - name: displayName
        mapping:
          cloud-idp: name
          local-keycloak: preferred_username
      - name: role
        mapping:
          cloud-idp: role
          local-keycloak: realm_access.roles
        default: "user"

Key design decisions for tactical deployments:

Aggressive health check intervals — The interval: 10s and unhealthyThreshold: 2 configuration detects connectivity loss in as little as 20 seconds. Standard deployments use 30-second intervals and a threshold of 3 (90 seconds to detect). In DDIL environments, fast detection is critical because users cannot wait minutes for the system to recognize that the cloud is unreachable.
Co-located on the same network segment — The local IdP or directory must be reachable without traversing any WAN link. If the local IdP depends on the same network path as the cloud, it will fail at the same time.
Schema Abstraction Layer bridges the gap — Cloud IdPs and on-premises IdPs rarely use the same attribute names. The Schema Abstraction Layer ensures applications see email, displayName, and role regardless of whether Okta or local Keycloak responded.
Automatic recovery — When cloud connectivity restores, the Orchestrator’s health checks detect the cloud IdP as healthy again and automatically route traffic back. No manual switchover is required.

Choosing the Right Tier

	Tier 1: Basic	Tier 2: Diversified	Tier 3: Cascading	Tier 4: Tactical Edge
Protects against	Single IdP outage	Vendor or regional outage	Regional isolation, near-total failure	Total connectivity loss (DDIL)
Number of IdPs	2	3+	3+ (at root), 2+ roots (at edge)	1+ cloud, 1+ local
Vendor diversity	Optional	Required	Required at root layer	Required (cloud + on-prem)
IdP credentials needed at	Every Orchestrator	Every Orchestrator	Root Orchestrators only	Tactical Orchestrator (both cloud and local)
Configuration complexity	Low	Medium	High	Medium-High
Best for	Most organizations, internal apps	Regulated workloads, SLA-bound services	Global deployments, zero-tolerance environments	Tactical/field deployments, DDIL environments

Start simple. Most organizations begin at Tier 1 — a primary IdP with a single backup handles the vast majority of outage scenarios. Move to Tier 2 when you need protection against vendor-wide incidents or have compliance requirements for vendor diversity. Tier 3 is for global deployments where regional isolation is a realistic threat and downtime tolerance is near zero. Tier 4 is purpose-built for tactical and field deployments where cloud connectivity cannot be guaranteed — DDIL environments, disconnected operations, and forward-deployed sites.

Custom Health Checks and Manual Failover

The default health check polls each IdP automatically, but you are not limited to automatic detection. Custom health check endpoints let you point the Orchestrator at any URL and define what a healthy response looks like — specific status codes, expected response body values, or custom headers. This opens up scenarios beyond automatic IdP monitoring:

Manual failover — Stand up a simple internal endpoint (e.g., /idp-status) that your operations team controls. Flipping that endpoint to return a non-healthy status forces the Orchestrator to fail over immediately, without waiting for the IdP to actually go down. Useful for planned maintenance windows or preemptive action.
External monitoring signals — Wire the custom endpoint to your existing monitoring infrastructure (SIEM, network monitoring, threat detection). If your SOC detects that an IdP has been compromised or a network path is degraded, the monitoring system can flip the health endpoint and trigger failover before users are impacted.
Composite health — Build a health endpoint that aggregates multiple signals — network latency, certificate expiry, IdP vendor status pages — into a single healthy/unhealthy decision. The Orchestrator doesn’t need to understand each signal; it just needs a status code.

maverics.yaml

connectors:
  - name: primary-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.okta-secret>
    healthCheck:
      enabled: true
      interval: 10s
      timeout: 5s
      unhealthyThreshold: 2
      healthyThreshold: 3
      customEndpoint:
        type: http
        endpoint: https://internal-monitoring.example.com/idp-status/okta
        responseMatcher:
          expectedStatuses: [200]
          body:
            contains: "healthy"

With this configuration, the Orchestrator checks https://internal-monitoring.example.com/idp-status/okta instead of the IdP’s well-known endpoint. If that endpoint returns anything other than a 200 with "healthy" in the body, failover begins.

Supporting Infrastructure

Identity Continuity handles the authentication layer, but a production HA deployment also depends on infrastructure outside the Orchestrator:

Global load balancers or GeoDNS — Route users to the nearest healthy Orchestrator cluster. Required for Tier 3 to direct edge traffic to the closest root.
Health check integration — Your load balancer should poll each Orchestrator’s /status endpoint and remove unhealthy instances from the pool. See the Scale for Production guide for load balancer configuration examples.
Network path diversity — If all your IdP connections traverse the same network link, a single link failure defeats the purpose of multiple IdPs. Ensure root Orchestrators have diverse network paths to their upstream IdPs.
Monitoring and alerting — Configure alerts on Continuity failover events so your team knows when a backup IdP is actively handling traffic. See the Monitor and Observe guide for telemetry configuration.

Identity Continuity Setup

Step-by-step guide to configuring your first Continuity connector with health monitoring

Identity Continuity Reference

Full configuration reference for the Continuity connector, health checks, and Schema Abstraction Layer

Scale for Production

Scale horizontally with sticky sessions, local session storage, and load balancer configuration

Deploy to Production

Deploy with CLI flags, Docker or systemd, secret provider configuration, and health probes

Getting Started

Authentication

AI Identity

Identity Continuity

Operations

Security

Understanding Failure Modes

Tier 1: Basic Continuity

Tier 2: Diversified Continuity

Tier 3: Cascading Continuity

Root Orchestrator Configuration

Edge Orchestrator Configuration

Tier 4: Tactical Edge Continuity

Choosing the Right Tier

Custom Health Checks and Manual Failover

Supporting Infrastructure

Identity Continuity Setup

Identity Continuity Reference

Scale for Production

Deploy to Production

Getting Started

Authentication

AI Identity

Identity Continuity

Operations

Security

​Understanding Failure Modes

​Tier 1: Basic Continuity

​Tier 2: Diversified Continuity

​Tier 3: Cascading Continuity

​Root Orchestrator Configuration

​Edge Orchestrator Configuration

​Tier 4: Tactical Edge Continuity

​Choosing the Right Tier

​Custom Health Checks and Manual Failover

​Supporting Infrastructure

​Related Pages

Identity Continuity Setup

Identity Continuity Reference

Scale for Production

Deploy to Production

Understanding Failure Modes

Tier 1: Basic Continuity

Tier 2: Diversified Continuity

Tier 3: Cascading Continuity

Root Orchestrator Configuration

Edge Orchestrator Configuration

Tier 4: Tactical Edge Continuity

Choosing the Right Tier

Custom Health Checks and Manual Failover

Supporting Infrastructure

Related Pages