Monitor and Observe

By the end of this guide, you will have a fully instrumented Maverics Orchestrator with OpenTelemetry-based metrics and traces, structured log output, and health check monitoring for key operational events. Good observability means you know what your Orchestrator is doing before users tell you something is wrong. The Orchestrator exports metrics and traces via OpenTelemetry (OTLP), emits structured logs, and provides health endpoints — giving you the raw data you need to build dashboards, set up alerts, and debug issues when they arise.

Console terminology: In the Maverics Console, Orchestrator instances and configuration delivery are managed through Deployments. When working directly with YAML, configuration is managed as files delivered via the -config flag or MAVERICS_CONFIG environment variable.

Prerequisites

A running Maverics Orchestrator — If you have not deployed yet, follow the Deploy to Production guide first.
An OpenTelemetry collector — The Orchestrator exports telemetry via OTLP. You need an OpenTelemetry Collector (or compatible endpoint like Grafana Alloy, Datadog Agent, or New Relic) to receive metrics and traces.
A log aggregation system (recommended) — Elasticsearch, Loki, Splunk, or any system that can ingest structured JSON logs.

Set Up Observability

Configure telemetry

The Orchestrator uses OpenTelemetry to export metrics and traces via the OTLP protocol. Metrics are collected through periodic readers that push to your OTLP endpoint at a configured interval. Traces are exported through simple processors.The Orchestrator can export telemetry data including:

Request metrics — Total requests, response status codes, and request duration histograms
Authentication metrics — Authentication event data (availability varies by Orchestrator version)
Runtime metrics — Process-level metrics such as memory and concurrency data
Distributed traces — End-to-end request tracing through authentication and authorization flows

The specific metrics and trace data available depend on your Orchestrator version and configuration. Consult your Orchestrator’s actual OTLP output to confirm which metrics are exported in your deployment.

Console UI
Configuration

Console UI documentation is coming soon. This section will walk you through configuring this component using the Maverics Console’s visual interface, including step-by-step screenshots and field descriptions.

Telemetry endpoint configuration in Maverics Console showing OTLP endpoint and interval settings

Configure OTLP exporters for both metrics and traces:

maverics.yaml

telemetry:
  metrics:
    readers:
      - periodic:
          exporter:
            otlp:
              protocol: "http/protobuf"
              endpoint: "http://otelcol.example.com:4318/v1/metrics"
              insecure: true
              timeout: 5000
          interval: 5000
  traces:
    processors:
      - simple:
          exporter:
            otlp:
              protocol: "http/protobuf"
              endpoint: "http://otelcol.example.com:4318/v1/traces"

Field	Description
`metrics.readers[].periodic.exporter.otlp.protocol`	OTLP transport — use `"http/protobuf"`
`metrics.readers[].periodic.exporter.otlp.endpoint`	Your OTLP collector’s metrics endpoint
`metrics.readers[].periodic.exporter.otlp.insecure`	Skip TLS verification for the collector connection
`metrics.readers[].periodic.exporter.otlp.timeout`	Export timeout in milliseconds
`metrics.readers[].periodic.interval`	Collection interval in milliseconds
`traces.processors[].simple.exporter.otlp.protocol`	OTLP transport for traces
`traces.processors[].simple.exporter.otlp.endpoint`	Your OTLP collector’s traces endpoint

The OTLP exporter supports additional production options including TLS certificates, gzip compression, custom headers, and aggregation temporality preferences. For traces, a batch processor is available for high-volume environments. See Telemetry Reference for the complete field reference.

See Telemetry Reference for all telemetry fields.

If you manage Orchestrator configuration through the Maverics Console, advanced telemetry settings like batch processing, TLS, compression, and custom headers require the config override feature. Config override requires enablement for your organization — contact your Strata account team or Strata support to enable it.

The Orchestrator exports all telemetry via OTLP. If you use Prometheus, configure your OpenTelemetry Collector to receive OTLP and export to Prometheus using the prometheusremotewrite exporter.

Configure structured logging

The Orchestrator emits structured logs in JSON format — making them easy to parse, search, and aggregate in any log management system. Structured logs include consistent fields like timestamp, log level, request ID, and component name, so you can filter and correlate events across your deployment.Production logging best practices:

Log level — Use info for normal production operation. Switch to debug only when actively troubleshooting — debug logging is verbose and can impact performance.
Output format — JSON format (jsonOutput: true) is recommended for production. It integrates cleanly with log aggregation systems like Elasticsearch, Loki, and Splunk.
Output destination — Stdout is the standard approach for containerized deployments (Docker and Kubernetes capture stdout automatically). For bare-metal deployments, you can configure file-based output with rotation.
Request ID correlation — Each request gets a unique ID that appears in every log entry for that request. Use this to trace a single user’s authentication flow across log entries.

Console UI
Configuration

Structured logging settings in Maverics Console showing log level, format, and output options

Configure the logger for production:

maverics.yaml

logger:
  level: "info"
  jsonOutput: true
  timeFormat: "RFC3339Nano"
  logSessionID: false
  fieldOrdering:
    enabled: true

Field	Default	Description
`logger.level`	`"info"`	Log verbosity: `"debug"`, `"info"`, `"warn"`, `"error"`
`logger.jsonOutput`	`false`	Output logs in JSON format for structured logging
`logger.timeFormat`	`"RFC3339Nano"`	Time format string for log timestamps
`logger.logSessionID`	`false`	Include the session ID in log entries for correlation
`logger.fieldOrdering.enabled`	`false`	Order log fields consistently across entries

The -verbose CLI flag or MAVERICS_DEBUG_MODE=true environment variable overrides logger.level to "debug" at startup.

You can also configure HTTP access logging separately:

maverics.yaml

http:
  accessLog:
    disabled: false
    level: "info"

Set up health checks

The Orchestrator exposes a configurable health endpoint that load balancers and orchestration platforms use to verify operational status. The health endpoint returns a JSON response, and a periodic heartbeat logs system metrics.

Console UI
Configuration

Health check settings in Maverics Console showing endpoint path and heartbeat interval

Configure the health endpoint and heartbeat:

maverics.yaml

health:
  location: "/status"
  heartbeat:
    disabled: false
    logLevel: "info"
    interval: "60s"

Field	Default	Description
`health.location`	`"/status"`	HTTP path for the health check endpoint
`health.heartbeat.disabled`	`false`	Disable periodic heartbeat logging
`health.heartbeat.logLevel`	`"info"`	Heartbeat log level
`health.heartbeat.interval`	`"60s"`	Heartbeat interval (duration string)

Verify the health endpoint:

curl -s https://localhost:9443/status | jq .

{
  "status": "up"
}

The periodic heartbeat log entry includes: orchestrator_id, orchestrator_version, config_version, cpu_count, cpu_usage, total_memory, memory_usage, and active_goroutines.

Set up alerting

Metrics and logs are useful for investigation, but alerts are what tell you something needs attention right now. Configure alerts for the conditions that indicate real problems — not just noise.Recommended alerts for a production Orchestrator deployment:

High error rate — Alert when the 5xx error rate exceeds a threshold (for example, more than 1% of requests returning 500-series errors over a 5-minute window). This catches upstream failures, misconfigurations, and application errors.
Authentication failure spike — Alert when authentication failures increase significantly above the baseline. A sudden spike could indicate an IdP outage, expired credentials, or a misconfigured connector.
Health check failure — Alert when the health endpoint reports an unhealthy status for more than 2 consecutive checks. This catches connector failures and startup issues.
High latency — Alert when the p95 request latency exceeds your SLA threshold. High latency often indicates network issues, slow upstream services, or resource contention.

Console UI
Configuration

Alerting rules configuration in Maverics Console showing threshold settings and notification channels

Alerting is configured in your monitoring platform (Prometheus, Grafana, Datadog, etc.), not in the Orchestrator YAML. The Orchestrator exports the metrics that your alerting rules evaluate.Example Prometheus alert rules for an Orchestrator deployment:

# prometheus-alerts.yaml
groups:
  - name: maverics
    rules:
      - alert: MavericsHealthCheckFailed
        expr: up{job="maverics"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Orchestrator instance is down"

      - alert: MavericsHighErrorRate
        expr: rate(http_server_requests_total{status=~"5.."}[5m]) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Orchestrator error rate above 1%"

Prometheus scrape configuration for the Orchestrator’s OTLP metrics (after your OpenTelemetry Collector translates to Prometheus format):

# prometheus.yaml scrape config
scrape_configs:
  - job_name: "maverics"
    scrape_interval: 15s
    static_configs:
      - targets: ["otelcol.example.com:8889"]

Start with a small number of high-signal alerts and add more as you learn your deployment’s baseline behavior. Too many alerts leads to alert fatigue — where important signals get lost in the noise.

Verify observability

With telemetry, logging, and alerting configured, verify that data is flowing correctly through your entire observability pipeline.

# Verify status endpoint
curl -s https://your-orchestrator-host:9443/status | jq .

Walk through a complete verification:

Check your OTLP collector — Confirm the collector is receiving metrics and traces from the Orchestrator
Check dashboards — If you have Grafana dashboards, verify that charts are rendering with real data
Check logs — Trigger a few requests and confirm the structured log entries appear in your log aggregation system with the correct JSON format and fields
Test an alert — If possible, trigger one of your alert conditions (for example, temporarily set a very low threshold) and confirm the alert fires and notifications are delivered

Success! Your Orchestrator deployment is fully instrumented. Metrics and traces are being exported via OTLP, structured logs are flowing to your aggregation system, and alerts are configured for key operational events.

Troubleshooting

Telemetry not reaching collector

Verify that the OTLP endpoint URL in your telemetry configuration is correct and reachable from the Orchestrator. Check that the collector is running and listening on the expected port. If the collector uses TLS, set insecure: false and ensure the Orchestrator can verify the collector’s certificate. Check firewall rules and network policies between the Orchestrator and collector.

Log format not parsing correctly

If your log aggregation system is not parsing the Orchestrator’s logs correctly, check that you are using a JSON parser (not a regex-based parser for plain text logs). The Orchestrator outputs JSON-formatted logs when jsonOutput: true is set.If you see raw JSON strings instead of parsed fields, your log shipper (Filebeat, Fluentd, Vector) may need a JSON parsing filter configured. Check the log shipper’s documentation for JSON input configuration.

Alerts not triggering

If alerts are not firing when expected, check these common causes:

Threshold too high — Your alert threshold may be higher than the actual metric values. Check the raw metric values in your monitoring platform to set realistic thresholds.
Wrong metric name — Verify the exact metric name matches what your alert rule references. Metric names are case-sensitive.
Evaluation interval — Alert rules only evaluate at configured intervals. If your evaluation interval is 5 minutes, the alert will not fire until the next evaluation window after the condition is met.
Notification channel — The alert rule may be firing but the notification is not reaching you. Check your alerting tool’s notification configuration (email, Slack, PagerDuty).

Operations Overview

Back to the Operations guides hub

Telemetry

Complete configuration reference for logger, telemetry, access logs, and health check settings

Deploy to Production

Set up your production deployment before configuring monitoring

Getting Started

Authentication

AI Identity

Identity Continuity

Operations

Security

Prerequisites

Set Up Observability

Troubleshooting

Operations Overview

Telemetry

Deploy to Production

Getting Started

Authentication

AI Identity

Identity Continuity

Operations

Security

​Prerequisites

​Set Up Observability

​Troubleshooting

​Related Pages

Operations Overview

Telemetry

Deploy to Production

Prerequisites

Set Up Observability

Troubleshooting

Related Pages