Understanding Failure Modes
Before choosing a tier, understand what can go wrong:| Failure mode | Example | Impact |
|---|---|---|
| Single IdP outage | Unplanned incident, scheduled maintenance window | Users who authenticate through that IdP cannot log in |
| Vendor-wide outage | An entire IdP vendor’s service is unavailable | Every application relying on that vendor is affected |
| Regional network partition | Orchestrator in US-East cannot reach IdPs in EU-West | Users at affected sites cannot authenticate, even if the IdPs are healthy |
| Multi-provider outage | Two or more IdP vendors are unavailable simultaneously | Rare but catastrophic — all failover targets are unavailable |
Tier 1: Basic Continuity
Protects against: Single IdP outage The simplest configuration: one Continuity connector with a primary IdP and one backup. If the primary goes down, the Orchestrator automatically routes authentication to the backup.maverics.yaml
Tier 2: Diversified Continuity
Protects against: Vendor-specific outage, regional outage Tier 2 applies a single principle: diversity. Add three or more IdPs from different vendors, deployed in different regions, using different protocols. A vendor-wide outage or regional network partition only takes out one link in the chain.maverics.yaml
- Different vendors — Okta and Entra ID are independent services. A vendor-specific incident only affects one.
- Different regions — Cloud IdPs in US and EU, plus an on-premises LDAP server. A regional network partition cannot take all three offline.
- Different protocols — OIDC, SAML, and LDAP. A protocol-specific bug or misconfiguration is contained to one connector.
- Schema Abstraction Layer — The
attributesblock normalizes claim names so your applications seeemail,displayName, androleregardless of which IdP responded. Without this, failover would break applications that depend on specific claim names.
Health monitoring is configured on each individual IdP connector, not on the Continuity connector. The Continuity connector reads health status from its member connectors to make routing decisions. See the Identity Continuity reference for the full health check configuration.
Tier 3: Cascading Continuity
Protects against: Regional isolation, near-total IdP infrastructure failure Tier 3 introduces Orchestrator-to-Orchestrator federation. Instead of every Orchestrator instance connecting directly to every IdP, you create a two-layer architecture:- Root Orchestrators in primary data centers connect directly to the actual IdPs via Continuity. They run as OIDC Providers or SAML Providers, acting as a stable identity endpoint.
- Edge Orchestrators in secondary regions or branch offices use Generic OIDC or Generic SAML connectors pointed at the root Orchestrators. They use Continuity to fail over between multiple root Orchestrators.
Root Orchestrator Configuration
Root Orchestrators connect directly to your IdPs via Continuity and expose an OIDC Provider endpoint that edge Orchestrators authenticate against.root-orch.yaml
Edge Orchestrator Configuration
Edge Orchestrators treat root Orchestrators as IdPs using Generic OIDC connectors, with Continuity providing failover between roots.edge-orch.yaml
Since root Orchestrators already normalize attributes via their Schema Abstraction Layer, edge Orchestrators receive consistent claims regardless of which root (or underlying IdP) handled authentication. You only need Schema Abstraction Layer mappings at the edge if the two root Orchestrators emit different claim names.
Tier 4: Tactical Edge Continuity
Protects against: Total connectivity loss (DDIL — Denied, Degraded, Intermittent, or Limited) Tiers 1-3 assume that at least one upstream IdP is reachable at all times. Tactical and field deployments break that assumption. In DDIL environments — forward-deployed operations, remote industrial sites, shipboard networks, or disaster-response staging areas — connectivity to cloud identity providers can be intermittent or severed entirely. Tier 4 addresses this with a co-located on-premises IdP (such as Keycloak) deployed alongside the tactical Orchestrator on the same local network segment. When cloud connectivity is available, the Orchestrator routes authentication to the cloud IdP normally. When connectivity is lost, the Orchestrator detects the failure and falls back to the local IdP autonomously — no operator intervention required.tactical-orch.yaml
- Aggressive health check intervals — The
interval: 10sandunhealthyThreshold: 2configuration detects connectivity loss in as little as 20 seconds. Standard deployments use 30-second intervals and a threshold of 3 (90 seconds to detect). In DDIL environments, fast detection is critical because users cannot wait minutes for the system to recognize that the cloud is unreachable. - Co-located on the same network segment — The local IdP or directory must be reachable without traversing any WAN link. If the local IdP depends on the same network path as the cloud, it will fail at the same time.
- Schema Abstraction Layer bridges the gap — Cloud IdPs and on-premises IdPs rarely use the same attribute names. The Schema Abstraction Layer ensures applications see
email,displayName, androleregardless of whether Okta or local Keycloak responded. - Automatic recovery — When cloud connectivity restores, the Orchestrator’s health checks detect the cloud IdP as healthy again and automatically route traffic back. No manual switchover is required.
Choosing the Right Tier
| Tier 1: Basic | Tier 2: Diversified | Tier 3: Cascading | Tier 4: Tactical Edge | |
|---|---|---|---|---|
| Protects against | Single IdP outage | Vendor or regional outage | Regional isolation, near-total failure | Total connectivity loss (DDIL) |
| Number of IdPs | 2 | 3+ | 3+ (at root), 2+ roots (at edge) | 1+ cloud, 1+ local |
| Vendor diversity | Optional | Required | Required at root layer | Required (cloud + on-prem) |
| IdP credentials needed at | Every Orchestrator | Every Orchestrator | Root Orchestrators only | Tactical Orchestrator (both cloud and local) |
| Configuration complexity | Low | Medium | High | Medium-High |
| Best for | Most organizations, internal apps | Regulated workloads, SLA-bound services | Global deployments, zero-tolerance environments | Tactical/field deployments, DDIL environments |
Custom Health Checks and Manual Failover
The default health check polls each IdP automatically, but you are not limited to automatic detection. Custom health check endpoints let you point the Orchestrator at any URL and define what a healthy response looks like — specific status codes, expected response body values, or custom headers. This opens up scenarios beyond automatic IdP monitoring:- Manual failover — Stand up a simple internal endpoint (e.g.,
/idp-status) that your operations team controls. Flipping that endpoint to return a non-healthy status forces the Orchestrator to fail over immediately, without waiting for the IdP to actually go down. Useful for planned maintenance windows or preemptive action. - External monitoring signals — Wire the custom endpoint to your existing monitoring infrastructure (SIEM, network monitoring, threat detection). If your SOC detects that an IdP has been compromised or a network path is degraded, the monitoring system can flip the health endpoint and trigger failover before users are impacted.
- Composite health — Build a health endpoint that aggregates multiple signals — network latency, certificate expiry, IdP vendor status pages — into a single healthy/unhealthy decision. The Orchestrator doesn’t need to understand each signal; it just needs a status code.
maverics.yaml
https://internal-monitoring.example.com/idp-status/okta instead of the IdP’s well-known endpoint. If that endpoint returns anything other than a 200 with "healthy" in the body, failover begins.
Supporting Infrastructure
Identity Continuity handles the authentication layer, but a production HA deployment also depends on infrastructure outside the Orchestrator:- Global load balancers or GeoDNS — Route users to the nearest healthy Orchestrator cluster. Required for Tier 3 to direct edge traffic to the closest root.
- Health check integration — Your load balancer should poll each Orchestrator’s
/statusendpoint and remove unhealthy instances from the pool. See the Scale for Production guide for load balancer configuration examples. - Network path diversity — If all your IdP connections traverse the same network link, a single link failure defeats the purpose of multiple IdPs. Ensure root Orchestrators have diverse network paths to their upstream IdPs.
- Monitoring and alerting — Configure alerts on Continuity failover events so your team knows when a backup IdP is actively handling traffic. See the Monitor and Observe guide for telemetry configuration.
Related Pages
Identity Continuity Setup
Step-by-step guide to configuring your first Continuity connector with health monitoring
Identity Continuity Reference
Full configuration reference for the Continuity connector, health checks, and Schema Abstraction Layer
Scale for Production
Scale horizontally with sticky sessions, local session storage, and load balancer configuration
Deploy to Production
Deploy with CLI flags, Docker or systemd, secret provider configuration, and health probes