> ## Documentation Index > Fetch the complete documentation index at: https://docs.strata.io/llms.txt > Use this file to discover all available pages before exploring further. # Monitor and Observe By the end of this guide, you will have a fully instrumented Maverics Orchestrator with OpenTelemetry-based metrics and traces, structured log output, and health check monitoring for key operational events. Good observability means you know what your Orchestrator is doing before users tell you something is wrong. The Orchestrator exports metrics and traces via OpenTelemetry (OTLP), emits structured logs, and provides health endpoints -- giving you the raw data you need to build dashboards, set up alerts, and debug issues when they arise. **Console terminology:** In the Maverics Console, Orchestrator instances and configuration delivery are managed through **Deployments**. When working directly with YAML, configuration is managed as files delivered via the `-config` flag or `MAVERICS_CONFIG` environment variable. ## Prerequisites * **A running Maverics Orchestrator** -- If you have not deployed yet, follow the [Deploy to Production guide](/guides/operations/deploy) first. * **An OpenTelemetry collector** -- The Orchestrator exports telemetry via OTLP. You need an OpenTelemetry Collector (or compatible endpoint like Grafana Alloy, Datadog Agent, or New Relic) to receive metrics and traces. * **A log aggregation system** (recommended) -- Elasticsearch, Loki, Splunk, or any system that can ingest structured JSON logs. ## Set Up Observability The Orchestrator uses OpenTelemetry to export metrics and traces via the OTLP protocol. Metrics are collected through periodic readers that push to your OTLP endpoint at a configured interval. Traces are exported through simple processors. The Orchestrator can export telemetry data including: * **Request metrics** -- Total requests, response status codes, and request duration histograms * **Authentication metrics** -- Authentication event data (availability varies by Orchestrator version) * **Runtime metrics** -- Process-level metrics such as memory and concurrency data * **Distributed traces** -- End-to-end request tracing through authentication and authorization flows The specific metrics and trace data available depend on your Orchestrator version and configuration. Consult your Orchestrator's actual OTLP output to confirm which metrics are exported in your deployment. **Console UI documentation is coming soon.** This section will walk you through configuring this component using the Maverics Console's visual interface, including step-by-step screenshots and field descriptions.

Configure OTLP exporters for both metrics and traces: ```yaml maverics.yaml theme={null} telemetry: metrics: readers: - periodic: exporter: otlp: protocol: "http/protobuf" endpoint: "http://otelcol.example.com:4318/v1/metrics" insecure: true timeout: 5000 interval: 5000 traces: processors: - simple: exporter: otlp: protocol: "http/protobuf" endpoint: "http://otelcol.example.com:4318/v1/traces" ``` | Field | Description | | --------------------------------------------------- | -------------------------------------------------- | | `metrics.readers[].periodic.exporter.otlp.protocol` | OTLP transport -- use `"http/protobuf"` | | `metrics.readers[].periodic.exporter.otlp.endpoint` | Your OTLP collector's metrics endpoint | | `metrics.readers[].periodic.exporter.otlp.insecure` | Skip TLS verification for the collector connection | | `metrics.readers[].periodic.exporter.otlp.timeout` | Export timeout in milliseconds | | `metrics.readers[].periodic.interval` | Collection interval in milliseconds | | `traces.processors[].simple.exporter.otlp.protocol` | OTLP transport for traces | | `traces.processors[].simple.exporter.otlp.endpoint` | Your OTLP collector's traces endpoint | The OTLP exporter supports additional production options including TLS certificates, gzip compression, custom headers, and aggregation temporality preferences. For traces, a batch processor is available for high-volume environments. See [Telemetry Reference](/reference/orchestrator/telemetry) for the complete field reference. See [Telemetry Reference](/reference/orchestrator/telemetry) for all telemetry fields. If you manage Orchestrator configuration through the Maverics Console, advanced telemetry settings like batch processing, TLS, compression, and custom headers require the [**config override**](/reference/console/config-publishing#override-config) feature. Config override requires enablement for your organization -- contact your Strata account team or [Strata support](https://strataidentity.my.site.com/support/s/) to enable it. The Orchestrator exports all telemetry via OTLP. If you use Prometheus, configure your OpenTelemetry Collector to receive OTLP and export to Prometheus using the `prometheusremotewrite` exporter. The Orchestrator emits structured logs in JSON format -- making them easy to parse, search, and aggregate in any log management system. Structured logs include consistent fields like timestamp, log level, request ID, and component name, so you can filter and correlate events across your deployment. Production logging best practices: * **Log level** -- Use `info` for normal production operation. Switch to `debug` only when actively troubleshooting -- debug logging is verbose and can impact performance. * **Output format** -- JSON format (`jsonOutput: true`) is recommended for production. It integrates cleanly with log aggregation systems like Elasticsearch, Loki, and Splunk. * **Output destination** -- Stdout is the standard approach for containerized deployments (Docker and Kubernetes capture stdout automatically). For bare-metal deployments, you can configure file-based output with rotation. * **Request ID correlation** -- Each request gets a unique ID that appears in every log entry for that request. Use this to trace a single user's authentication flow across log entries. **Console UI documentation is coming soon.** This section will walk you through configuring this component using the Maverics Console's visual interface, including step-by-step screenshots and field descriptions.

Configure the logger for production: ```yaml maverics.yaml theme={null} logger: level: "info" jsonOutput: true timeFormat: "RFC3339Nano" logSessionID: false fieldOrdering: enabled: true ``` | Field | Default | Description | | ------------------------------ | --------------- | ------------------------------------------------------- | | `logger.level` | `"info"` | Log verbosity: `"debug"`, `"info"`, `"warn"`, `"error"` | | `logger.jsonOutput` | `false` | Output logs in JSON format for structured logging | | `logger.timeFormat` | `"RFC3339Nano"` | Time format string for log timestamps | | `logger.logSessionID` | `false` | Include the session ID in log entries for correlation | | `logger.fieldOrdering.enabled` | `false` | Order log fields consistently across entries | The `-verbose` CLI flag or `MAVERICS_DEBUG_MODE=true` environment variable overrides `logger.level` to `"debug"` at startup. You can also configure HTTP access logging separately: ```yaml maverics.yaml theme={null} http: accessLog: disabled: false level: "info" ``` The Orchestrator exposes a configurable health endpoint that load balancers and orchestration platforms use to verify operational status. The health endpoint returns a JSON response, and a periodic heartbeat logs system metrics. **Console UI documentation is coming soon.** This section will walk you through configuring this component using the Maverics Console's visual interface, including step-by-step screenshots and field descriptions.

Configure the health endpoint and heartbeat: ```yaml maverics.yaml theme={null} health: location: "/status" heartbeat: disabled: false logLevel: "info" interval: "60s" ``` | Field | Default | Description | | --------------------------- | ----------- | --------------------------------------- | | `health.location` | `"/status"` | HTTP path for the health check endpoint | | `health.heartbeat.disabled` | `false` | Disable periodic heartbeat logging | | `health.heartbeat.logLevel` | `"info"` | Heartbeat log level | | `health.heartbeat.interval` | `"60s"` | Heartbeat interval (duration string) | Verify the health endpoint: ```bash theme={null} curl -s https://localhost:9443/status | jq . ``` ```json theme={null} { "status": "up" } ``` The periodic heartbeat log entry includes: `orchestrator_version`, `config_version`, `cpu_count`, `cpu_usage`, `total_memory`, `memory_usage`, and `active_goroutines`. Every log entry (including heartbeat entries) includes a [`soid`](/introduction/glossary#soid-secure-orchestrator-id) field for identifying and correlating logs by Orchestrator instance. See [Logging — Deployment Correlation](/reference/orchestrator/telemetry/logging#deployment-correlation) for details. Metrics and logs are useful for investigation, but alerts are what tell you something needs attention right now. Configure alerts for the conditions that indicate real problems -- not just noise. Recommended alerts for a production Orchestrator deployment: * **High error rate** -- Alert when the 5xx error rate exceeds a threshold (for example, more than 1% of requests returning 500-series errors over a 5-minute window). This catches upstream failures, misconfigurations, and application errors. * **Authentication failure spike** -- Alert when authentication failures increase significantly above the baseline. A sudden spike could indicate an IdP outage, expired credentials, or a misconfigured connector. * **Health check failure** -- Alert when the health endpoint reports an unhealthy status for more than 2 consecutive checks. This catches connector failures and startup issues. * **High latency** -- Alert when the p95 request latency exceeds your SLA threshold. High latency often indicates network issues, slow upstream services, or resource contention. **Console UI documentation is coming soon.** This section will walk you through configuring this component using the Maverics Console's visual interface, including step-by-step screenshots and field descriptions.

Alerting is configured in your monitoring platform (Prometheus, Grafana, Datadog, etc.), not in the Orchestrator YAML. The Orchestrator exports the metrics that your alerting rules evaluate. **Example Prometheus alert rules** for an Orchestrator deployment: ```yaml theme={null} # prometheus-alerts.yaml groups: - name: maverics rules: - alert: MavericsHealthCheckFailed expr: up{job="maverics"} == 0 for: 2m labels: severity: critical annotations: summary: "Orchestrator instance is down" - alert: MavericsHighErrorRate expr: rate(http_server_requests_total{status=~"5.."}[5m]) > 0.01 for: 5m labels: severity: warning annotations: summary: "Orchestrator error rate above 1%" ``` **Prometheus scrape configuration** for the Orchestrator's OTLP metrics (after your OpenTelemetry Collector translates to Prometheus format): ```yaml theme={null} # prometheus.yaml scrape config scrape_configs: - job_name: "maverics" scrape_interval: 15s static_configs: - targets: ["otelcol.example.com:8889"] ``` Start with a small number of high-signal alerts and add more as you learn your deployment's baseline behavior. Too many alerts leads to alert fatigue -- where important signals get lost in the noise. With telemetry, logging, and alerting configured, verify that data is flowing correctly through your entire observability pipeline. ```bash theme={null} # Verify status endpoint curl -s https://your-orchestrator-host:9443/status | jq . ``` Walk through a complete verification: 1. **Check your OTLP collector** -- Confirm the collector is receiving metrics and traces from the Orchestrator 2. **Check dashboards** -- If you have Grafana dashboards, verify that charts are rendering with real data 3. **Check logs** -- Trigger a few requests and confirm the structured log entries appear in your log aggregation system with the correct JSON format and fields 4. **Test an alert** -- If possible, trigger one of your alert conditions (for example, temporarily set a very low threshold) and confirm the alert fires and notifications are delivered **Success!** Your Orchestrator deployment is fully instrumented. Metrics and traces are being exported via OTLP, structured logs are flowing to your aggregation system, and alerts are configured for key operational events. ## Troubleshooting Verify that the OTLP endpoint URL in your `telemetry` configuration is correct and reachable from the Orchestrator. Check that the collector is running and listening on the expected port. If the collector uses TLS, set `insecure: false` and ensure the Orchestrator can verify the collector's certificate. Check firewall rules and network policies between the Orchestrator and collector. If your log aggregation system is not parsing the Orchestrator's logs correctly, check that you are using a JSON parser (not a regex-based parser for plain text logs). The Orchestrator outputs JSON-formatted logs when `jsonOutput: true` is set. If you see raw JSON strings instead of parsed fields, your log shipper (Filebeat, Fluentd, Vector) may need a JSON parsing filter configured. Check the log shipper's documentation for JSON input configuration. If alerts are not firing when expected, check these common causes: * **Threshold too high** -- Your alert threshold may be higher than the actual metric values. Check the raw metric values in your monitoring platform to set realistic thresholds. * **Wrong metric name** -- Verify the exact metric name matches what your alert rule references. Metric names are case-sensitive. * **Evaluation interval** -- Alert rules only evaluate at configured intervals. If your evaluation interval is 5 minutes, the alert will not fire until the next evaluation window after the condition is met. * **Notification channel** -- The alert rule may be firing but the notification is not reaching you. Check your alerting tool's notification configuration (email, Slack, PagerDuty). ## Related Pages Back to the Operations guides hub Complete configuration reference for logger, telemetry, access logs, and health check settings Set up your production deployment before configuring monitoring