Documentation Index
Fetch the complete documentation index at: https://cantonfoundation-issue-365-details-history.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
This page covers the metrics that matter most for operational health of your validator or SV node. For the complete metrics catalog, see the Metrics Reference.
Validator Health Checks
This section was adapted from existing reviewed documentation.
Source: deployment/observability/validator_health.rst
Reviewers: Skip this section. Remove markers after final approval.
Readiness and Liveness Probes
All Canton Network applications provide /readyz and /livez endpoints for readiness and liveness probes.
Kubernetes — Probes are pre-configured. You can also manually check readiness:
kubectl exec <pod-name> -n <namespace> -- curl -v https://localhost:5003/api/validator/readyz
Docker — Check liveness inside the container:
docker exec <container-name> -- curl -v https://localhost:5003/api/validator/livez
HTTP status 200 indicates the validator is ready and live.
Last Ingested Record Time
The splice_store_last_ingested_record_time_ms metric tracks the last ingested record time in each validator store. Use it to verify your node is keeping up with the network:
- Value increasing over time — Your node is active and in sync. For validators collecting liveness rewards, this advances every round, so lag should never exceed 20 minutes.
- Value static — Your node may be stalled. Investigate further.
The dedicated Grafana dashboard Splice Store Last Ingested Record Time visualizes this metric.
Critical Metrics by Category
Synchronizer Connectivity
| Metric | What It Tells You |
|---|
daml.sequencer-client.sequencer-connection-pool.validated-connections | Number of active, validated sequencer connections |
daml.sequencer-client.sequencer-connection-pool.active-subscriptions | Number of active event subscriptions in the connection pool |
daml.sequencer-client.handler.delay | Processing delay between sequencing time and local clock (milliseconds) |
A sustained drop in validated-connections to zero means your node can’t submit or confirm transactions.
Transaction Processing
| Metric | What It Tells You |
|---|
daml.participant.api.commands.submissions | End-to-end time to process a Daml command (validation, interpretation, synchronization) |
daml.participant.api.commands.submissions_running | Number of commands currently being handled |
daml.participant.api.commands.failed_command_interpretations | Commands that failed during interpretation |
A rising failed_command_interpretations rate may indicate package vetting issues, incorrect party hosting, or application bugs. Monitor submissions latency to detect synchronizer congestion or validator overload.
Database Health
| Metric | What It Tells You |
|---|
daml.db-storage.general.executor.load | Ratio of running queries to available connections |
daml.db-storage.general.executor.exectime | Database task execution time |
daml.db-storage.general.executor.queued | Number of database tasks waiting in queue |
Database performance directly affects transaction processing speed. If executor.load is consistently above 0.8, increase the connection pool size or scale your database.
Traffic and Canton Coin
| Metric | What It Tells You |
|---|
daml.sequencer-client.traffic-control.submitted-event-cost | Cost of events submitted from this node |
daml.sequencer-client.traffic-control.event-delivered-cost | Cost of successfully delivered events |
daml.sequencer-client.traffic-control.event-rejected | Number of events rejected (often due to insufficient traffic balance) |
If event-rejected is increasing, check your traffic balance. A zero balance prevents your node from submitting transactions.
Alerting Recommendations
Set up alerts for these conditions:
- Node not ready —
/readyz returns non-200 for more than 5 minutes
- Ingestion stalled —
splice_store_last_ingested_record_time_ms hasn’t advanced in 30 minutes
- Sequencer disconnected —
daml.sequencer-client.sequencer-connection-pool.validated-connections at 0 for more than 2 minutes
- Database connection pool exhausted —
daml.db-storage.general.executor.load above 0.95
- Traffic balance low — Below the amount needed for 24 hours of normal operation
- Disk usage high — Database storage above 80% capacity
Next Steps