> ## Documentation Index
> Fetch the complete documentation index at: https://docs.strata.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Reducing Single Points of Failure

The most common question reliability architects ask about the Orchestrator is: **"Isn't it a single point of failure?"** The short answer is no -- the Orchestrator scales horizontally behind a load balancer (see the [Scale for Production guide](/guides/operations/scale)). If one instance goes down, the others continue serving traffic and users are transparently re-authenticated.

But scaling the Orchestrator itself only solves half the problem. Your **identity providers** can still be single points of failure. If your sole IdP goes down, it does not matter how many Orchestrator instances you have -- nobody can authenticate. [Identity Continuity](/reference/orchestrator/identity-fabric/continuity) is how you address that: the Orchestrator monitors IdP health and automatically fails over to a backup when an outage is detected.

This guide presents a tiered approach to Identity Continuity. Each tier adds resilience against progressively more severe failure scenarios. Choose the tier that matches your risk tolerance.

## Understanding Failure Modes

Before choosing a tier, understand what can go wrong:

| Failure mode                   | Example                                                | Impact                                                                    |
| ------------------------------ | ------------------------------------------------------ | ------------------------------------------------------------------------- |
| **Single IdP outage**          | Unplanned incident, scheduled maintenance window       | Users who authenticate through that IdP cannot log in                     |
| **Vendor-wide outage**         | An entire IdP vendor's service is unavailable          | Every application relying on that vendor is affected                      |
| **Regional network partition** | Orchestrator in US-East cannot reach IdPs in EU-West   | Users at affected sites cannot authenticate, even if the IdPs are healthy |
| **Multi-provider outage**      | Two or more IdP vendors are unavailable simultaneously | Rare but catastrophic -- all failover targets are unavailable             |

## Tier 1: Basic Continuity

**Protects against:** Single IdP outage

The simplest configuration: one [Continuity connector](/reference/orchestrator/identity-fabric/continuity) with a primary IdP and one backup. If the primary goes down, the Orchestrator automatically routes authentication to the backup.

```yaml maverics.yaml theme={null}
connectors:
  - name: primary-idp
    type: oidc
    oidcWellKnownURL: https://primary.example.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.primary-secret>
    oauthRedirectURL: https://orch.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: backup-idp
    type: oidc
    oidcWellKnownURL: https://backup.example.com/.well-known/openid-configuration
    oauthClientID: orchestrator-backup-client
    oauthClientSecret: <vault.backup-secret>
    oauthRedirectURL: https://orch.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: ha-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - primary-idp
        - backup-idp
```

**Limitation:** If both IdPs are from the same vendor or hosted in the same region, a vendor-wide or regional outage takes both down simultaneously. Tier 2 addresses this.

For a full walkthrough of setting up Continuity with health monitoring and the Schema Abstraction Layer, see the [Identity Continuity setup guide](/guides/identity-continuity/overview).

## Tier 2: Diversified Continuity

**Protects against:** Vendor-specific outage, regional outage

Tier 2 applies a single principle: **diversity**. Add three or more IdPs from different vendors, deployed in different regions, using different protocols. A vendor-wide outage or regional network partition only takes out one link in the chain.

```yaml maverics.yaml theme={null}
connectors:
  # Cloud IdP - Vendor A (US region)
  - name: okta-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.okta-secret>
    oauthRedirectURL: https://orch.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  # Cloud IdP - Vendor B (EU region)
  - name: entra-idp
    type: saml
    samlMetadataURL: https://login.microsoftonline.com/TENANT/federationmetadata/2007-06/federationmetadata.xml
    samlConsumerServiceURL: https://orch.example.com/saml/callback
    samlEntityID: https://orch.example.com
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  # On-premises IdP - different network path entirely
  - name: onprem-ldap
    type: ldap
    url: ldaps://dc.corp.example.com:636
    baseDN: dc=corp,dc=example,dc=com
    bindDN: cn=svc-maverics,ou=service-accounts,dc=corp,dc=example,dc=com
    bindPassword: <vault.ldap-password>
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: diversified-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - okta-idp
        - entra-idp
        - onprem-ldap
    # The Schema Abstraction Layer normalizes attributes across
    # providers so downstream apps see consistent claim names
    # regardless of which IdP authenticated the user.
    attributes:
      - name: email
        mapping:
          okta-idp: email
          entra-idp: preferred_username
          onprem-ldap: mail
      - name: displayName
        mapping:
          okta-idp: name
          entra-idp: displayName
          onprem-ldap: cn
      - name: role
        mapping:
          okta-idp: role
          entra-idp: group
        default: "user"
```

The key decisions in this tier:

* **Different vendors** -- Okta and Entra ID are independent services. A vendor-specific incident only affects one.
* **Different regions** -- Cloud IdPs in US and EU, plus an on-premises LDAP server. A regional network partition cannot take all three offline.
* **Different protocols** -- OIDC, SAML, and LDAP. A protocol-specific bug or misconfiguration is contained to one connector.
* **Schema Abstraction Layer** -- The `attributes` block normalizes claim names so your applications see `email`, `displayName`, and `role` regardless of which IdP responded. Without this, failover would break applications that depend on specific claim names.

<Note>
  Health monitoring is configured on each **individual IdP connector**, not on the Continuity connector. The Continuity connector reads health status from its member connectors to make routing decisions. See the [Identity Continuity reference](/reference/orchestrator/identity-fabric/continuity) for the full health check configuration.
</Note>

## Tier 3: Cascading Continuity

**Protects against:** Regional isolation, near-total IdP infrastructure failure

Tier 3 introduces **Orchestrator-to-Orchestrator federation**. Instead of every Orchestrator instance connecting directly to every IdP, you create a two-layer architecture:

* **Root Orchestrators** in primary data centers connect directly to the actual IdPs via Continuity. They run as [OIDC Providers](/reference/modes/oidc-provider) or [SAML Providers](/reference/modes/saml-provider), acting as a stable identity endpoint.
* **Edge Orchestrators** in secondary regions or branch offices use [Generic OIDC](/reference/orchestrator/identity-fabric/custom-oidc) or [Generic SAML](/reference/orchestrator/identity-fabric/custom-saml) connectors pointed at the root Orchestrators. They use Continuity to fail over between multiple root Orchestrators.

```mermaid theme={null}
%%{init: {'theme': 'neutral', 'themeVariables': {'edgeWidth': 4}}}%%
flowchart TB
    subgraph IdPs["Identity Providers"]
        Okta["Okta (Vendor A)"]
        Entra["Entra ID (Vendor B)"]
    end

    subgraph Roots["Root Orchestrators — Continuity failover across IdPs"]
        RootDC1["Root Orch (DC1)<br/>OIDC Provider"]
        RootDC2["Root Orch (DC2)<br/>OIDC Provider"]
        RootN["Root Orch (N...)<br/>OIDC Provider"]
    end

    subgraph Edge["Edge Orchestrators — Continuity failover across roots"]
        EdgeA["Edge Orch (Branch A)"]
        EdgeB["Edge Orch (Branch B)"]
        EdgeC["Edge Orch (Branch C)"]
    end

    Okta --> RootDC1
    Okta --> RootDC2
    Okta --> RootN
    Entra --> RootDC1
    Entra --> RootDC2
    Entra --> RootN
    RootDC1 --> EdgeA
    RootDC1 --> EdgeB
    RootDC1 --> EdgeC
    RootDC2 --> EdgeA
    RootDC2 --> EdgeB
    RootDC2 --> EdgeC
    RootN --> EdgeA
    RootN --> EdgeB
    RootN --> EdgeC

    classDef orch stroke:#6A3EC8,stroke-width:6px
    class RootDC1,RootDC2,RootN,EdgeA,EdgeB,EdgeC orch
```

Only root Orchestrators need direct IdP credentials and configurations. Edge sites treat the root Orchestrators as their identity providers. If an edge site loses connectivity to one root, it fails over to another. If all roots are unreachable (full regional isolation), only that edge site is affected -- other edge sites continue operating through whichever root they can reach.

### Root Orchestrator Configuration

Root Orchestrators connect directly to your IdPs via Continuity and expose an OIDC Provider endpoint that edge Orchestrators authenticate against.

```yaml root-orch.yaml theme={null}
connectors:
  - name: okta-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.okta-secret>
    oauthRedirectURL: https://root-dc1.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: entra-idp
    type: saml
    samlMetadataURL: https://login.microsoftonline.com/TENANT/federationmetadata/2007-06/federationmetadata.xml
    samlConsumerServiceURL: https://root-dc1.example.com/saml/callback
    samlEntityID: https://root-dc1.example.com
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: root-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - okta-idp
        - entra-idp
    attributes:
      - name: email
        mapping:
          okta-idp: email
          entra-idp: preferred_username
      - name: displayName
        mapping:
          okta-idp: name
          entra-idp: displayName

# Expose this Orchestrator as an OIDC Provider so edge
# Orchestrators can authenticate against it.
oidcProvider:
  discovery:
    issuer: https://root-dc1.example.com
  jwks:
    - name: signing-key
      type: rsa
      file: /etc/maverics/keys/signing.key

apps:
  - name: edge-app
    type: oidc
    clientID: edge-orchestrator
    clientSecret: <vault.edge-client-secret>
    authentication:
      idps:
        - root-failover
    claimsMapping:
      email: root-failover.email
      name: root-failover.displayName
```

### Edge Orchestrator Configuration

Edge Orchestrators treat root Orchestrators as IdPs using Generic OIDC connectors, with Continuity providing failover between roots.

```yaml edge-orch.yaml theme={null}
connectors:
  # Root Orchestrator in DC1, treated as an OIDC IdP
  - name: root-dc1
    type: oidc
    oidcWellKnownURL: https://root-dc1.example.com/.well-known/openid-configuration
    oauthClientID: edge-orchestrator
    oauthClientSecret: <vault.edge-dc1-secret>
    oauthRedirectURL: https://edge-branch-a.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  # Root Orchestrator in DC2, treated as an OIDC IdP
  - name: root-dc2
    type: oidc
    oidcWellKnownURL: https://root-dc2.example.com/.well-known/openid-configuration
    oauthClientID: edge-orchestrator
    oauthClientSecret: <vault.edge-dc2-secret>
    oauthRedirectURL: https://edge-branch-a.example.com/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 30s
      timeout: 10s
      unhealthyThreshold: 3
      healthyThreshold: 2

  - name: edge-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - root-dc1
        - root-dc2

# Edge Orchestrator serves local applications.
# Configure apps here using the edge-failover connector.
apps:
  - name: branch-app
    type: oidc
    clientID: branch-app
    authentication:
      idps:
        - edge-failover
    claimsMapping:
      email: edge-failover.email
      name: edge-failover.name
```

<Note>
  Since root Orchestrators already normalize attributes via their Schema Abstraction Layer, edge Orchestrators receive consistent claims regardless of which root (or underlying IdP) handled authentication. You only need Schema Abstraction Layer mappings at the edge if the two root Orchestrators emit different claim names.
</Note>

## Tier 4: Tactical Edge Continuity

**Protects against:** Total connectivity loss (DDIL -- Denied, Degraded, Intermittent, or Limited)

Tiers 1-3 assume that at least one upstream IdP is reachable at all times. Tactical and field deployments break that assumption. In DDIL environments -- forward-deployed operations, remote industrial sites, shipboard networks, or disaster-response staging areas -- connectivity to cloud identity providers can be intermittent or severed entirely.

Tier 4 addresses this with a **co-located on-premises IdP** (such as Keycloak) deployed alongside the tactical Orchestrator on the same local network segment. When cloud connectivity is available, the Orchestrator routes authentication to the cloud IdP normally. When connectivity is lost, the Orchestrator detects the failure and falls back to the local IdP autonomously -- no operator intervention required.

```mermaid theme={null}
%%{init: {'theme': 'neutral', 'themeVariables': {'edgeWidth': 4}}}%%
flowchart TB
    subgraph Cloud["Cloud Identity Providers"]
        Okta["Okta"]
        Entra["Entra ID"]
    end

    subgraph Tactical["Tactical Deployment (Local Network)"]
        TactOrch["Tactical Orchestrator"]
        LocalIdP["Keycloak"]
    end

    Okta -. "intermittent connectivity" .-> TactOrch
    Entra -. "intermittent connectivity" .-> TactOrch
    LocalIdP -- "always available" --> TactOrch

    classDef orch stroke:#6A3EC8,stroke-width:6px
    classDef local stroke:#2E7D32,stroke-width:4px
    class TactOrch orch
    class LocalIdP local
```

```yaml tactical-orch.yaml theme={null}
connectors:
  # Cloud IdP -- preferred when connectivity is available
  - name: cloud-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: tactical-client
    oauthClientSecret: <vault.cloud-secret>
    oauthRedirectURL: https://tactical-orch.local/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 10s
      timeout: 5s
      unhealthyThreshold: 2
      healthyThreshold: 3

  # Co-located Keycloak -- always reachable on the local network
  - name: local-keycloak
    type: oidc
    oidcWellKnownURL: https://keycloak.local/realms/tactical/.well-known/openid-configuration
    oauthClientID: tactical-client
    oauthClientSecret: <vault.keycloak-secret>
    oauthRedirectURL: https://tactical-orch.local/oidc/callback
    scopes: openid profile email
    healthCheck:
      enabled: true
      interval: 10s
      timeout: 5s
      unhealthyThreshold: 2
      healthyThreshold: 2

  - name: tactical-failover
    type: continuity
    strategy: failover
    failover:
      idps:
        - cloud-idp
        - local-keycloak
    # Schema Abstraction Layer ensures apps see consistent attributes whether the user
    # authenticated against the cloud IdP or local Keycloak.
    attributes:
      - name: email
        mapping:
          cloud-idp: email
          local-keycloak: email
      - name: displayName
        mapping:
          cloud-idp: name
          local-keycloak: preferred_username
      - name: role
        mapping:
          cloud-idp: role
          local-keycloak: realm_access.roles
        default: "user"
```

Key design decisions for tactical deployments:

* **Aggressive health check intervals** -- The `interval: 10s` and `unhealthyThreshold: 2` configuration detects connectivity loss in as little as 20 seconds. Standard deployments use 30-second intervals and a threshold of 3 (90 seconds to detect). In DDIL environments, fast detection is critical because users cannot wait minutes for the system to recognize that the cloud is unreachable.
* **Co-located on the same network segment** -- The local IdP or directory must be reachable without traversing any WAN link. If the local IdP depends on the same network path as the cloud, it will fail at the same time.
* **Schema Abstraction Layer bridges the gap** -- Cloud IdPs and on-premises IdPs rarely use the same attribute names. The Schema Abstraction Layer ensures applications see `email`, `displayName`, and `role` regardless of whether Okta or local Keycloak responded.
* **Automatic recovery** -- When cloud connectivity restores, the Orchestrator's health checks detect the cloud IdP as healthy again and automatically route traffic back. No manual switchover is required.

## Choosing the Right Tier

|                               | Tier 1: Basic                     | Tier 2: Diversified                     | Tier 3: Cascading                               | Tier 4: Tactical Edge                         |
| ----------------------------- | --------------------------------- | --------------------------------------- | ----------------------------------------------- | --------------------------------------------- |
| **Protects against**          | Single IdP outage                 | Vendor or regional outage               | Regional isolation, near-total failure          | Total connectivity loss (DDIL)                |
| **Number of IdPs**            | 2                                 | 3+                                      | 3+ (at root), 2+ roots (at edge)                | 1+ cloud, 1+ local                            |
| **Vendor diversity**          | Optional                          | Required                                | Required at root layer                          | Required (cloud + on-prem)                    |
| **IdP credentials needed at** | Every Orchestrator                | Every Orchestrator                      | Root Orchestrators only                         | Tactical Orchestrator (both cloud and local)  |
| **Configuration complexity**  | Low                               | Medium                                  | High                                            | Medium-High                                   |
| **Best for**                  | Most organizations, internal apps | Regulated workloads, SLA-bound services | Global deployments, zero-tolerance environments | Tactical/field deployments, DDIL environments |

**Start simple.** Most organizations begin at Tier 1 -- a primary IdP with a single backup handles the vast majority of outage scenarios. Move to Tier 2 when you need protection against vendor-wide incidents or have compliance requirements for vendor diversity. Tier 3 is for global deployments where regional isolation is a realistic threat and downtime tolerance is near zero. Tier 4 is purpose-built for tactical and field deployments where cloud connectivity cannot be guaranteed -- DDIL environments, disconnected operations, and forward-deployed sites.

## Custom Health Checks and Manual Failover

The default health check polls each IdP automatically, but you are not limited to automatic detection. [Custom health check endpoints](/reference/orchestrator/identity-fabric/continuity#custom-health-check-fields) let you point the Orchestrator at any URL and define what a healthy response looks like -- specific status codes, expected response body values, or custom headers.

This opens up scenarios beyond automatic IdP monitoring:

* **Manual failover** -- Stand up a simple internal endpoint (e.g., `/idp-status`) that your operations team controls. Flipping that endpoint to return a non-healthy status forces the Orchestrator to fail over immediately, without waiting for the IdP to actually go down. Useful for planned maintenance windows or preemptive action.
* **External monitoring signals** -- Wire the custom endpoint to your existing monitoring infrastructure (SIEM, network monitoring, threat detection). If your SOC detects that an IdP has been compromised or a network path is degraded, the monitoring system can flip the health endpoint and trigger failover before users are impacted.
* **Composite health** -- Build a health endpoint that aggregates multiple signals -- network latency, certificate expiry, IdP vendor status pages -- into a single healthy/unhealthy decision. The Orchestrator doesn't need to understand each signal; it just needs a status code.

```yaml maverics.yaml theme={null}
connectors:
  - name: primary-idp
    type: oidc
    oidcWellKnownURL: https://corp.okta.com/.well-known/openid-configuration
    oauthClientID: orchestrator-client
    oauthClientSecret: <vault.okta-secret>
    healthCheck:
      enabled: true
      interval: 10s
      timeout: 5s
      unhealthyThreshold: 2
      healthyThreshold: 3
      customEndpoint:
        type: http
        endpoint: https://internal-monitoring.example.com/idp-status/okta
        responseMatcher:
          expectedStatuses: [200]
          body:
            contains: "healthy"
```

With this configuration, the Orchestrator checks `https://internal-monitoring.example.com/idp-status/okta` instead of the IdP's well-known endpoint. If that endpoint returns anything other than a `200` with `"healthy"` in the body, failover begins.

## Supporting Infrastructure

Identity Continuity handles the authentication layer, but a production HA deployment also depends on infrastructure outside the Orchestrator:

* **Global load balancers or GeoDNS** -- Route users to the nearest healthy Orchestrator cluster. Required for Tier 3 to direct edge traffic to the closest root.
* **Health check integration** -- Your load balancer should poll each Orchestrator's `/status` endpoint and remove unhealthy instances from the pool. See the [Scale for Production guide](/guides/operations/scale) for load balancer configuration examples.
* **Network path diversity** -- If all your IdP connections traverse the same network link, a single link failure defeats the purpose of multiple IdPs. Ensure root Orchestrators have diverse network paths to their upstream IdPs.
* **Monitoring and alerting** -- Configure alerts on Continuity failover events so your team knows when a backup IdP is actively handling traffic. See the [Monitor and Observe guide](/guides/operations/monitor) for telemetry configuration.

## Related Pages

<CardGroup cols={2}>
  <Card title="Identity Continuity Setup" icon="arrows-rotate" href="/guides/identity-continuity/overview">
    Step-by-step guide to configuring your first Continuity connector with health monitoring
  </Card>

  <Card title="Identity Continuity Reference" icon="file-code" href="/reference/orchestrator/identity-fabric/continuity">
    Full configuration reference for the Continuity connector, health checks, and Schema Abstraction Layer
  </Card>

  <Card title="Scale for Production" icon="arrows-up-down" href="/guides/operations/scale">
    Scale horizontally with sticky sessions, local session storage, and load balancer configuration
  </Card>

  <Card title="Deploy to Production" icon="rocket" href="/guides/operations/deploy">
    Deploy with CLI flags, Docker or systemd, secret provider configuration, and health probes
  </Card>
</CardGroup>
