Skip to main content
By the end of this guide, you will have a fully instrumented Maverics Orchestrator with OpenTelemetry-based metrics and traces, structured log output, and health check monitoring for key operational events. Good observability means you know what your Orchestrator is doing before users tell you something is wrong. The Orchestrator exports metrics and traces via OpenTelemetry (OTLP), emits structured logs, and provides health endpoints — giving you the raw data you need to build dashboards, set up alerts, and debug issues when they arise.
Console terminology: In the Maverics Console, Orchestrator instances and configuration delivery are managed through Deployments. When working directly with YAML, configuration is managed as files delivered via the -config flag or MAVERICS_CONFIG environment variable.

Prerequisites

  • A running Maverics Orchestrator — If you have not deployed yet, follow the Deploy to Production guide first.
  • An OpenTelemetry collector — The Orchestrator exports telemetry via OTLP. You need an OpenTelemetry Collector (or compatible endpoint like Grafana Alloy, Datadog Agent, or New Relic) to receive metrics and traces.
  • A log aggregation system (recommended) — Elasticsearch, Loki, Splunk, or any system that can ingest structured JSON logs.

Set Up Observability

1

Configure telemetry

The Orchestrator uses OpenTelemetry to export metrics and traces via the OTLP protocol. Metrics are collected through periodic readers that push to your OTLP endpoint at a configured interval. Traces are exported through simple processors.The Orchestrator can export telemetry data including:
  • Request metrics — Total requests, response status codes, and request duration histograms
  • Authentication metrics — Authentication event data (availability varies by Orchestrator version)
  • Runtime metrics — Process-level metrics such as memory and concurrency data
  • Distributed traces — End-to-end request tracing through authentication and authorization flows
The specific metrics and trace data available depend on your Orchestrator version and configuration. Consult your Orchestrator’s actual OTLP output to confirm which metrics are exported in your deployment.
Console UI documentation is coming soon. This section will walk you through configuring this component using the Maverics Console’s visual interface, including step-by-step screenshots and field descriptions.
Telemetry endpoint configuration in Maverics Console showing OTLP endpoint and interval settings
The Orchestrator exports all telemetry via OTLP. If you use Prometheus, configure your OpenTelemetry Collector to receive OTLP and export to Prometheus using the prometheusremotewrite exporter.
2

Configure structured logging

The Orchestrator emits structured logs in JSON format — making them easy to parse, search, and aggregate in any log management system. Structured logs include consistent fields like timestamp, log level, request ID, and component name, so you can filter and correlate events across your deployment.Production logging best practices:
  • Log level — Use info for normal production operation. Switch to debug only when actively troubleshooting — debug logging is verbose and can impact performance.
  • Output format — JSON format (jsonOutput: true) is recommended for production. It integrates cleanly with log aggregation systems like Elasticsearch, Loki, and Splunk.
  • Output destination — Stdout is the standard approach for containerized deployments (Docker and Kubernetes capture stdout automatically). For bare-metal deployments, you can configure file-based output with rotation.
  • Request ID correlation — Each request gets a unique ID that appears in every log entry for that request. Use this to trace a single user’s authentication flow across log entries.
Console UI documentation is coming soon. This section will walk you through configuring this component using the Maverics Console’s visual interface, including step-by-step screenshots and field descriptions.
Structured logging settings in Maverics Console showing log level, format, and output options
3

Set up health checks

The Orchestrator exposes a configurable health endpoint that load balancers and orchestration platforms use to verify operational status. The health endpoint returns a JSON response, and a periodic heartbeat logs system metrics.
Console UI documentation is coming soon. This section will walk you through configuring this component using the Maverics Console’s visual interface, including step-by-step screenshots and field descriptions.
Health check settings in Maverics Console showing endpoint path and heartbeat interval
4

Set up alerting

Metrics and logs are useful for investigation, but alerts are what tell you something needs attention right now. Configure alerts for the conditions that indicate real problems — not just noise.Recommended alerts for a production Orchestrator deployment:
  • High error rate — Alert when the 5xx error rate exceeds a threshold (for example, more than 1% of requests returning 500-series errors over a 5-minute window). This catches upstream failures, misconfigurations, and application errors.
  • Authentication failure spike — Alert when authentication failures increase significantly above the baseline. A sudden spike could indicate an IdP outage, expired credentials, or a misconfigured connector.
  • Health check failure — Alert when the health endpoint reports an unhealthy status for more than 2 consecutive checks. This catches connector failures and startup issues.
  • High latency — Alert when the p95 request latency exceeds your SLA threshold. High latency often indicates network issues, slow upstream services, or resource contention.
Console UI documentation is coming soon. This section will walk you through configuring this component using the Maverics Console’s visual interface, including step-by-step screenshots and field descriptions.
Alerting rules configuration in Maverics Console showing threshold settings and notification channels
Start with a small number of high-signal alerts and add more as you learn your deployment’s baseline behavior. Too many alerts leads to alert fatigue — where important signals get lost in the noise.
5

Verify observability

With telemetry, logging, and alerting configured, verify that data is flowing correctly through your entire observability pipeline.
# Verify status endpoint
curl -s https://your-orchestrator-host:9443/status | jq .
Walk through a complete verification:
  1. Check your OTLP collector — Confirm the collector is receiving metrics and traces from the Orchestrator
  2. Check dashboards — If you have Grafana dashboards, verify that charts are rendering with real data
  3. Check logs — Trigger a few requests and confirm the structured log entries appear in your log aggregation system with the correct JSON format and fields
  4. Test an alert — If possible, trigger one of your alert conditions (for example, temporarily set a very low threshold) and confirm the alert fires and notifications are delivered
Success! Your Orchestrator deployment is fully instrumented. Metrics and traces are being exported via OTLP, structured logs are flowing to your aggregation system, and alerts are configured for key operational events.

Troubleshooting

Verify that the OTLP endpoint URL in your telemetry configuration is correct and reachable from the Orchestrator. Check that the collector is running and listening on the expected port. If the collector uses TLS, set insecure: false and ensure the Orchestrator can verify the collector’s certificate. Check firewall rules and network policies between the Orchestrator and collector.
If your log aggregation system is not parsing the Orchestrator’s logs correctly, check that you are using a JSON parser (not a regex-based parser for plain text logs). The Orchestrator outputs JSON-formatted logs when jsonOutput: true is set.If you see raw JSON strings instead of parsed fields, your log shipper (Filebeat, Fluentd, Vector) may need a JSON parsing filter configured. Check the log shipper’s documentation for JSON input configuration.
If alerts are not firing when expected, check these common causes:
  • Threshold too high — Your alert threshold may be higher than the actual metric values. Check the raw metric values in your monitoring platform to set realistic thresholds.
  • Wrong metric name — Verify the exact metric name matches what your alert rule references. Metric names are case-sensitive.
  • Evaluation interval — Alert rules only evaluate at configured intervals. If your evaluation interval is 5 minutes, the alert will not fire until the next evaluation window after the condition is met.
  • Notification channel — The alert rule may be firing but the notification is not reaching you. Check your alerting tool’s notification configuration (email, Slack, PagerDuty).