Skip to main content

Documentation Index

Fetch the complete documentation index at: https://cantonfoundation-issue-365-details-history.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This page covers the metrics that matter most for operational health of your validator or SV node. For the complete metrics catalog, see the Metrics Reference.

Validator Health Checks

This section was adapted from existing reviewed documentation. Source: deployment/observability/validator_health.rst Reviewers: Skip this section. Remove markers after final approval.

Readiness and Liveness Probes

All Canton Network applications provide /readyz and /livez endpoints for readiness and liveness probes. Kubernetes — Probes are pre-configured. You can also manually check readiness:
kubectl exec <pod-name> -n <namespace> -- curl -v https://localhost:5003/api/validator/readyz
Docker — Check liveness inside the container:
docker exec <container-name> -- curl -v https://localhost:5003/api/validator/livez
HTTP status 200 indicates the validator is ready and live.

Last Ingested Record Time

The splice_store_last_ingested_record_time_ms metric tracks the last ingested record time in each validator store. Use it to verify your node is keeping up with the network:
  • Value increasing over time — Your node is active and in sync. For validators collecting liveness rewards, this advances every round, so lag should never exceed 20 minutes.
  • Value static — Your node may be stalled. Investigate further.
The dedicated Grafana dashboard Splice Store Last Ingested Record Time visualizes this metric.

Critical Metrics by Category

Synchronizer Connectivity

MetricWhat It Tells You
daml.sequencer-client.sequencer-connection-pool.validated-connectionsNumber of active, validated sequencer connections
daml.sequencer-client.sequencer-connection-pool.active-subscriptionsNumber of active event subscriptions in the connection pool
daml.sequencer-client.handler.delayProcessing delay between sequencing time and local clock (milliseconds)
A sustained drop in validated-connections to zero means your node can’t submit or confirm transactions.

Transaction Processing

MetricWhat It Tells You
daml.participant.api.commands.submissionsEnd-to-end time to process a Daml command (validation, interpretation, synchronization)
daml.participant.api.commands.submissions_runningNumber of commands currently being handled
daml.participant.api.commands.failed_command_interpretationsCommands that failed during interpretation
A rising failed_command_interpretations rate may indicate package vetting issues, incorrect party hosting, or application bugs. Monitor submissions latency to detect synchronizer congestion or validator overload.

Database Health

MetricWhat It Tells You
daml.db-storage.general.executor.loadRatio of running queries to available connections
daml.db-storage.general.executor.exectimeDatabase task execution time
daml.db-storage.general.executor.queuedNumber of database tasks waiting in queue
Database performance directly affects transaction processing speed. If executor.load is consistently above 0.8, increase the connection pool size or scale your database.

Traffic and Canton Coin

MetricWhat It Tells You
daml.sequencer-client.traffic-control.submitted-event-costCost of events submitted from this node
daml.sequencer-client.traffic-control.event-delivered-costCost of successfully delivered events
daml.sequencer-client.traffic-control.event-rejectedNumber of events rejected (often due to insufficient traffic balance)
If event-rejected is increasing, check your traffic balance. A zero balance prevents your node from submitting transactions.

Alerting Recommendations

Set up alerts for these conditions:
  • Node not ready/readyz returns non-200 for more than 5 minutes
  • Ingestion stalledsplice_store_last_ingested_record_time_ms hasn’t advanced in 30 minutes
  • Sequencer disconnecteddaml.sequencer-client.sequencer-connection-pool.validated-connections at 0 for more than 2 minutes
  • Database connection pool exhausteddaml.db-storage.general.executor.load above 0.95
  • Traffic balance low — Below the amount needed for 24 hours of normal operation
  • Disk usage high — Database storage above 80% capacity

Next Steps