Skip to main content
Parix
Monitoring

Metrics

Current metrics page behavior, supported providers, time windows, and machine-scoped views.

The metrics page is the dashboard surface for both provider-collected host metrics and TigerBeetle-native replica metrics.

Provider support

Metrics are currently available only when:

  • the database profile exists
  • the profile provider is AWS or GCP

AWS replica runtime telemetry depends on CloudWatch Agent being present on the replica host. Older AWS clusters can still show partial metrics until their next reprovision or upgrade.

TigerBeetle-native metrics are currently supported only for aws and gcp:

  • aws: TigerBeetle emits StatsD locally and CloudWatch Agent ingests it on 127.0.0.1:8125.
  • gcp: TigerBeetle emits DogStatsD locally and Telegraf bridges it into Cloud Monitoring custom metrics.

Other providers are treated as unsupported by the current route logic.

Time windows

The metrics UI currently exposes these windows:

  • 15m
  • 1h
  • 12h
  • 24h
  • 7d

When metrics are available, the page can auto-refresh every 15 seconds in live mode.

Machine scopes

If a profile resolves to more than one machine scope, the UI lets operators switch between:

  • aggregate or default metrics
  • machine-specific or replica-specific scopes derived from provider metadata

This is how the current app presents per-machine utilization instead of flattening every replica into a single chart only.

What the page shows

The metrics route is designed around:

  • utilization series such as CPU and memory
  • TigerBeetle replica health, sync, cache, and request timing series
  • provider notices when metrics are partial or unavailable
  • machine state context derived from provider metadata

The page now keeps the existing system charts and adds a TigerBeetle section with:

  • cluster-level summary cards for replica health, state sync, cache hit rate, and replica request p95
  • replica-scoped charts for status, sync stage, cache hits vs misses, request p95 by operation, and request count by operation

The same replica selector drives both the system charts and the TigerBeetle charts.

AWS troubleshooting

If an AWS metrics page loads but runtime charts stay empty, check the regional private gateway before assuming the collector code is missing support.

In particular:

  • Unsupported AWS control-plane route: POST /cloudwatch/metric-data can mean the regional AWS gateway is stale or failed during bootstrap, not just that the Worker code is missing the route.
  • If the AWS gateway never finished bootstrapping, the ALB target usually fails health checks and the gateway instance may not have a tb-gateway.service unit yet.
  • If the replica already has CloudWatch Agent metrics but the page still shows partial or failed collection, compare replica health with gateway health first.

Recommended operator checks:

  1. Confirm the replica is publishing AWS/EC2 and CWAgent metrics for the expected InstanceId.
  2. Confirm CloudWatch Agent is ingesting TigerBeetle StatsD metrics on 127.0.0.1:8125.
  3. Confirm the collector is writing into the TB_METRICS_DATASET Analytics Engine dataset.
  4. Open the metrics page and confirm the TigerBeetle cards populate within one collector interval.
  5. Confirm the regional AWS gateway target is healthy and serving /health.
  6. Inspect gateway bootstrap logs with SSM if tb-gateway.service is missing or the target stays unhealthy.

Dataset versioning

TigerBeetle-native metrics require the v4 Analytics Engine schema because samples now include one optional normalized dimension pair.

  • Dataset naming is parix-metrics-v4-*.
  • TB_METRICS_DATASET must point at the v4 dataset before the page can query TigerBeetle request-by-operation metrics.

For raw operational logs, continue with Logs.