Every Canton Network node exposes metrics on port 10013 at the /metrics path in Prometheus format. These metrics are built with OpenTelemetry and cover the participant, validator app, and (for super validators) the SV and scan apps.

Enabling metrics

Kubernetes (Helm)

Set metrics.enable to true in your Helm values. This creates a ServiceMonitor custom resource, which requires the Prometheus Operator to be installed in your cluster. Alternatively, add Prometheus scrape annotations targeting port 10013.

Docker Compose

Histogram format

To revert to regular histograms on a specific node, set the environment variable:

ADDITIONAL_CONFIG_DISABLE_NATIVE_HISTOGRAMS="canton.monitoring.metrics.histograms=[]"

Health metrics

This section was copied from existing reviewed documentation. Source: docs/src/deployment/observability/validator_health.rst Reviewers: Skip this section. Remove markers after final approval.

Validator Health

You can check your validator’s health using the readiness endpoints. All CN applications provide the /readyz and /livez endpoints, which are used for readiness and liveness probes.

Checking readiness
- In Kubernetes: readiness and liveness probes are already configured. You can also manually check validator readiness with the following command:
  kubectl exec <pod-name> -n <namespace> -- curl -v https://localhost:5003/api/validator/readyz
- In Docker: run for example this command to check validator liveness inside a container:
  docker exec <container-name> -- curl -v https://localhost:5003/api/validator/livez
You should expect in both case HTTP status code 200 if the validator is ready and live.
Using metrics The splice_store_last_ingested_record_time_ms metric represents the last ingested record time in each validator store. It can be used to track general activity of the node:
- If this value continue to increase over time, your node is active and stays in sync with the network. Note that it only advances if your node actually ingests new transactions. For a validator collecting validator liveness rewards this happens every round so you should expect your lag to never go above 20min.
- If it remains static, further investigation may be required.
For more details and to visualize this metric on its dedicated dashboard Splice Store Last Ingested Record Time, refer to the documentation about Metrics <metrics>.

You can also check health through the readiness and liveness HTTP endpoints (/readyz and /livez) on port 5003. In Kubernetes, these probes are preconfigured. For Docker, query them with:

docker exec <container-name> curl -v https://localhost:5003/api/validator/readyz

An HTTP 200 response means the node is ready.

Splice app metrics

Topology metrics (optional)

This section will be expanded in a future update. Topology metrics track synchronizer membership changes and party-to-participant mappings. For monitoring guidance, see Performance Optimization.

Key participant metrics

These are the most operationally significant metrics from the participant node. For the full catalog of several hundred metrics, see the Canton 3.x metrics reference.

Sequencer client

The sequencer client metrics tell you whether your node is keeping up with the synchronizer’s message stream.

Metric	Type	What to watch for
`daml.sequencer-client.handler.delay`	gauge	Event processing delay in milliseconds. A large, growing value means the node is falling behind. Cross-reference with clock skew before assuming a processing bottleneck.
`daml.sequencer-client.handler.sequencer-events`	counter	Number of events received from the sequencer. Tracks overall event throughput.
`daml.sequencer-client.handler.actual-in-flight-event-batches`	counter	How many event batches are being processed in parallel. If this constantly sits near `max-in-flight-event-batches`, the node’s resources may be under-utilized (raise the limit). If you see OOM errors, lower it.
`daml.sequencer-client.submissions.dropped`	counter	Send requests that were not sequenced within the max-sequencing-time. An increasing count points to sequencer capacity issues or network problems.
`daml.sequencer-client.submissions.overloaded`	counter	Requests that received an overloaded response from the sequencer.
`daml.sequencer-client.submissions.sequencing`	timer	End-to-end time from submission to sequencing confirmation.
`daml.sequencer-client.submissions.in-flight`	counter	Submissions waiting for an outcome. High values indicate backpressure.

Connection pool

These metrics track the health of connections between your node and the synchronizer’s sequencers.

Metric	Type	What to watch for
`daml.sequencer-client.sequencer-connection-pool.active-subscriptions`	gauge	Number of active sequencer subscriptions. Should stay at or above the subscription threshold.
`daml.sequencer-client.sequencer-connection-pool.validated-connections`	gauge	Connections that are up and validated. A drop signals connectivity problems.
`daml.sequencer-client.sequencer-connection-pool.trust-threshold`	gauge	The minimum number of consistent sequencer connections needed before the pool will initialize.

ACS commitments

Commitment metrics reveal whether your participant is staying in sync with counter-participants during reconciliation.

Metric	Type	What to watch for
`daml.participant.sync.commitments.compute`	timer	Time spent computing bilateral commitments. If this approaches or exceeds the reconciliation interval, the participant will perpetually lag behind.
`daml.participant.sync.commitments.sequencing-time`	gauge	Time between the end of a commitment period and when the sequencer observes the commitment. An unexplained increase may indicate the participant is falling behind.
`daml.participant.sync.commitments.catch-up-mode-triggered`	meter	How often catch-up mode has been activated. A healthy value is 0. An increasing count signals intermittent performance degradation.

Ledger API command processing

Metric	Type	What to watch for
`daml.participant.api.commands.submissions`	timer	Time to validate and interpret a command before it is sent for finalization.
`daml.participant.api.commands.submissions_running`	counter	Commands currently being processed. Indicates load on the Ledger API server.
`daml.participant.api.commands.failed_command_interpretations`	meter	Commands rejected by the Daml interpreter (for example, unauthorized actions).
`daml.grpc.server.requests.rejections`	counter	Requests rejected due to active request limits. Sustained rejections mean you need to raise limits or reduce load.

Database

Database metrics share a common pattern across all node types. The general pool handles reads; the write pool handles writes.

Metric	Type	What to watch for
`daml.db-storage.{general,write}.executor.load`	gauge	Current queries running divided by available connections. A sustained value near 1.0 means the pool is saturated.
`daml.db-storage.{general,write}.executor.queued`	counter	Tasks waiting for a database connection. A growing queue indicates the database cannot keep up.
`daml.db-storage.{general,write}.executor.waittime`	timer	Time tasks spend waiting in the queue before execution.
`daml.db-storage.{general,write}.executor.exectime`	timer	Time tasks spend executing on the database.

Pruning

Metric	Type	What to watch for
`daml.pruning.max-event-age`	gauge	Age of the oldest unpruned event in hours. A large or growing value means pruning is falling behind, which increases storage consumption.
`daml_services_pruning_prune_started_total`	counter	Number of pruning processes started.
`daml_services_pruning_prune_completed_total`	counter	Number of pruning processes completed. Compare with started count to detect stalled pruning.

Traffic control

These metrics are relevant if your node participates in traffic-based rate limiting.

Metric	Type	What to watch for
`daml.sequencer-client.traffic-control.event-delivered`	counter	Events that were sequenced and delivered.
`daml.sequencer-client.traffic-control.event-rejected`	counter	Events sequenced but not delivered (insufficient traffic credits).
`daml.sequencer-client.traffic-control.submitted-event-cost`	meter	Cost of events submitted. May not exactly match actual consumed traffic since some events may not be sequenced.

JVM metrics

Grafana dashboards

The dashboards use queries specific to Prometheus native histograms, so make sure native histogram support is enabled in your Prometheus instance.

Alerting recommendations

The following thresholds are starting points. Adjust them based on your environment and workload patterns.

Node health: Alert when daml_health_status equals 0 for any component for more than 2 minutes.
Sequencer delay: Alert when daml.sequencer-client.handler.delay exceeds 30 seconds and is increasing.
Dropped submissions: Alert on any sustained increase in daml.sequencer-client.submissions.dropped.
Overloaded responses: Alert on any increase in daml.sequencer-client.submissions.overloaded.
Database pool saturation: Alert when daml.db-storage.*.executor.load exceeds 0.85 for more than 5 minutes.
Pruning backlog: Alert when daml.pruning.max-event-age exceeds your retention policy threshold.
Store ingestion lag: Alert when splice_store_last_ingested_record_time_ms stops advancing for more than 20 minutes.
JVM memory: Alert when runtime_jvm_memory_area with area=heap and type=used exceeds 85% of type=max over a sustained period.
Commitment computation: Alert when daml.participant.sync.commitments.catch-up-mode-triggered is non-zero.

Overview

Validator Deployment

Super Validator Deployment

Splice Fundamentals

Canton Console

Production Operations

Extension Synchronizers

Troubleshooting

Release Notes

Reference

Metrics Reference

Enabling metrics

Kubernetes (Helm)

Docker Compose

Histogram format

Health metrics

Validator Health

Splice app metrics

Topology metrics (optional)

Key participant metrics

Sequencer client

Connection pool

ACS commitments

Ledger API command processing

Database

Pruning

Traffic control

JVM metrics

Grafana dashboards

Alerting recommendations

Overview

Validator Deployment

Super Validator Deployment

Splice Fundamentals

Canton Console

Production Operations

Extension Synchronizers

Troubleshooting

Release Notes

Reference

Documentation Index

​Enabling metrics

​Kubernetes (Helm)

​Docker Compose

​Histogram format

​Health metrics

​Validator Health

​Splice app metrics

​Topology metrics (optional)

​Key participant metrics

​Sequencer client

​Connection pool

​ACS commitments

​Ledger API command processing

​Database

​Pruning

​Traffic control

​JVM metrics

​Grafana dashboards

​Alerting recommendations

Enabling metrics

Kubernetes (Helm)

Docker Compose

Histogram format

Health metrics

Validator Health

Splice app metrics

Topology metrics (optional)

Key participant metrics

Sequencer client

Connection pool

ACS commitments

Ledger API command processing

Database

Pruning

Traffic control

JVM metrics

Grafana dashboards

Alerting recommendations