Metrics
metrics-crate counters, histograms, and gauges for your workflows.
Behind the metrics feature gate (features = ["metrics"]).
This page covers Cano's built-in metrics instrumentation and the MetricsObserver.
For the synchronous callback API (WorkflowObserver) that MetricsObserver builds on,
see Observers. For tracing-crate span instrumentation, the
sibling observability feature, see Tracing.
Cano provides optional metrics instrumentation through the metrics feature using the
metrics facade crate.
The metrics crate is recorder-agnostic: you install any compatible exporter
(Prometheus, StatsD, a debugging snapshotter, …) and Cano emits to it. All instrumentation is
behind conditional compilation — zero overhead when the feature is disabled.
For a callback-style API — get notified on workflow lifecycle and failure events without depending
on the metrics ecosystem — see Observers. The
metrics feature ships a ready-made MetricsObserver that bridges the two:
attach it with .with_observer(Arc::new(MetricsObserver::new())) to re-emit those
observer hooks as metrics-crate counters.
Setup
Enable the metrics feature flag in your Cargo.toml. You can also use
features = ["all"] to enable everything (scheduler + tracing +
recovery + metrics) at once.
[dependencies]
cano = { version = "0.13", features = ["metrics"] }
# The metrics crate is a facade — you also need a recorder/exporter:
metrics-exporter-prometheus = "0.16" # for production
# or: metrics-util = "0.18" # for testing / debugging snapshots
# Or enable everything (scheduler + tracing + recovery + metrics):
# cano = { version = "0.13", features = ["all"] }
Because metrics is a facade, Cano only depends on the shared interface. Your application
picks the concrete recorder (e.g. metrics_exporter_prometheus::PrometheusBuilder::new().install_recorder()
for a Prometheus scrape endpoint, or metrics_util::debugging::DebuggingRecorder in tests).
Call cano::metrics::describe() once, after installing your recorder, so exporters receive
help text and units.
Two Surfaces
The metrics feature exposes instrumentation through two complementary surfaces, mirroring how
Tracing pairs engine spans with the TracingObserver bridge.
MetricsObserver — opt-in lifecycle counters
MetricsObserver is a WorkflowObserver that re-emits the observer hooks as
metrics-crate counters. Wire it up in one line:
use cano::prelude::*;
use std::sync::Arc;
Workflow::bare()
.register(/* ... */)
.add_exit_state(/* ... */)
.with_observer(Arc::new(MetricsObserver::new()))
It emits these counters (each incremented on the corresponding observer hook):
cano_state_enters_total{state}— onon_state_entercano_observed_task_runs_total{task, outcome}— onon_task_success/on_task_failure(outcome∈completed|failed)cano_task_retries_total{task}— onon_retrycano_circuit_open_events_total{task}— onon_circuit_opencano_checkpoints_observed_total— onon_checkpointcano_resumes_total— onon_resume
on_task_start is intentionally not counted — every dispatch already shows up in
cano_observed_task_runs_total{outcome}, so a separate "start" counter would just be the
sum of the completed and failed rows.
MetricsObserver is in the prelude behind the metrics feature — no extra import needed
when you use use cano::prelude::*.
Always-on direct instrumentation — engine internals
Compiled in whenever the metrics feature is on, regardless of whether a
MetricsObserver is attached. Covers engine internals the observer hooks do not reach:
workflow run duration, circuit-breaker state transitions, per-attempt retry-loop outcomes,
poll/batch/step iteration counts, scheduler flow telemetry, and checkpoint store operations.
See What Gets Measured for the full list.
What Gets Measured
Histograms record raw f64 seconds samples — bucketing and quantile computation are the
exporter's responsibility. Metric names follow metrics-crate underscore conventions.
Workflow
cano_workflow_runs_total{outcome}— counter;outcome∈completed|failed|timeoutcano_workflow_duration_seconds{outcome}— histogram (seconds)cano_workflow_active— gauge; workflows currently executing
Task Dispatch
cano_task_duration_seconds{state, kind}— histogram (seconds);kind∈single|router|split|compensatable|steppedcano_task_attempts_total{outcome}— counter; per-attempt inside the retry loop;outcome∈completed|failedcano_circuit_rejections_total— counter; attempts short-circuited by an open breaker
Split / Join
cano_split_branch_results_total{result}— counter;result∈success|failure|cancelled
Circuit Breaker
cano_circuit_transitions_total{transition}— counter;transition∈closed_to_open|open_to_halfopen|halfopen_to_closed|halfopen_to_opencano_circuit_acquires_total{result}— counter;result∈acquired|rejectedcano_circuit_outcomes_total{outcome}— counter;outcome∈success|failure
Processing Loops
cano_poll_iterations_total{outcome}— counter;outcome∈ready|pendingcano_batch_runs_total{outcome}— counter;outcome∈completed|failedcano_batch_items_total{result}— counter;result∈ok|errcano_step_iterations_total{outcome}— counter;outcome∈more|done
Recovery & Saga
cano_checkpoint_appends_total{result}— counter;result∈ok|errcano_checkpoint_clears_total{result}— counter;result∈ok|errcano_compensations_run_total{result}— counter;result∈ok|errcano_compensation_drains_total{outcome}— counter;outcome∈clean|partial
Scheduler
Also requires the scheduler feature.
cano_scheduler_flow_runs_total{flow, outcome}— counter;outcome∈completed|failedcano_scheduler_flow_duration_seconds{flow}— histogram (seconds)cano_scheduler_flow_backoff_total{flow}— countercano_scheduler_flow_tripped_total{flow}— countercano_scheduler_active_flows— gauge; flows currently executing
Registering Descriptions
cano::metrics::describe() registers a human-readable description and unit for every metric
Cano emits. Call it once, after installing your recorder, so exporters receive help text in their
output (e.g. Prometheus # HELP / # TYPE lines).
// Install your recorder first, then describe:
metrics_exporter_prometheus::PrometheusBuilder::new()
.install()
.expect("install prometheus exporter");
cano::metrics::describe();
If you skip describe(), metrics still flow — only the help text and units are missing
from the exporter output.
Cardinality
Labels are deliberately minimal to keep cardinality bounded:
state—format!("{:?}")of your FSM state enum; bounded by registered states.task—Task::name(), which defaults tostd::any::type_name; bounded by registered task types.flow— the scheduler flow id string; bounded by registered flows.- All other label values (
outcome,kind,result,transition) are fixed, bounded enum labels.
The deepest hot-path metrics — per-attempt retry-loop counters (cano_task_attempts_total),
circuit-breaker internals, and poll/batch/step iteration counters — carry no per-state label,
keeping their cardinality constant regardless of how many states your workflow defines.
Cost
Compiling the metrics feature in adds a small, bounded per-state-transition cost (formatting
the state label as a string for the state label) even when no recorder is installed. If
you are building a latency-critical service that does not collect metrics, leave the feature off.
Otherwise, when a recorder is installed, the overhead is the same as any other metrics-crate
emission — a hash-map lookup plus atomic increment, comparable to a log line.
Correlating with Traces
When you also enable the tracing feature, metrics and
tracing interoperate through tracing spans: the
metrics-tracing-context
crate makes any metric emitted inside a span inherit that span's fields as labels. This is
wiring you do in your application — Cano never depends on metrics-tracing-context itself
(the same posture it takes toward tracing-subscriber).
[dependencies]
cano = { version = "0.13", features = ["metrics", "tracing"] }
metrics-tracing-context = "0.18"
tracing-subscriber = "0.3"
use metrics_tracing_context::{MetricsLayer, TracingContextLayer};
use metrics_util::layers::Layer;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
// 1. Wrap your metrics recorder so it reads the current span's fields.
let recorder = TracingContextLayer::all().layer(your_recorder);
metrics::set_global_recorder(recorder).expect("install metrics recorder");
cano::metrics::describe();
// 2. Add MetricsLayer to your tracing subscriber so spans expose their fields.
tracing_subscriber::registry()
.with(MetricsLayer::new())
.with(tracing_subscriber::fmt::layer())
.init();
With both layers installed, a span you open around Workflow::orchestrate — e.g.
info_span!("api_request", request_id = …) — tags every cano_* metric recorded
during that run with request_id. Cano's own default workflow_orchestrate and
workflow_resume spans carry a workflow_id field whenever one is set via
with_workflow_id, so that becomes a metric label too. Span fields are merged with
the explicit labels Cano already attaches (state, task, flow,
outcome, …) — Cano deliberately does not name any span field after an existing metric
label, so there is no merge ambiguity.
TracingContextLayer::all() promotes every field of every entered span as a label —
including Cano-internal span fields such as max_attempts from the retry-loop spans. For
production, prefer TracingContextLayer::new(filter) with a LabelFilter that
allow-lists just the fields you author, to keep metric cardinality bounded.
Runnable example: cargo run --example metrics_tracing_context --features "metrics tracing"
— wires the two layers, runs a workflow under a workflow_id and another inside a user
api_request span, and prints the captured metrics so you can see the workflow_id
and request_id labels propagated purely from span context.
Known Limitation
The #[task::poll] and #[task::stepped] macros have two usage forms.
The trait-impl form (impl PollTask<S> for T /
impl SteppedTask<S> for T) inlines the loop body into the synthesised
Task::run, so cano_poll_iterations_total and
cano_step_iterations_total are not emitted for that form.
The inherent-impl form (#[task::poll(state = S)] impl T { async fn poll ... }
/ #[task::stepped(state = S)] impl T { async fn step ... }, the recommended form) and
Workflow::register_stepped (engine-owned loop) both emit the iteration counters as
expected.
Full Example
Install a DebuggingRecorder (useful for tests and self-contained demos), call
cano::metrics::describe(), attach a MetricsObserver, run a workflow directly
and then under the scheduler, then dump the captured snapshot. This mirrors the
metrics_demo example shipped with the crate.
use cano::prelude::*;
use metrics_util::debugging::{DebugValue, DebuggingRecorder};
use std::sync::Arc;
use std::time::Duration;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum Step { Fetch, Process, Done }
struct FetchTask;
#[task]
impl Task<Step> for FetchTask {
async fn run_bare(&self) -> Result<TaskResult<Step>, CanoError> {
tokio::time::sleep(Duration::from_millis(5)).await;
Ok(TaskResult::Single(Step::Process))
}
}
struct ProcessTask;
#[task]
impl Task<Step> for ProcessTask {
async fn run_bare(&self) -> Result<TaskResult<Step>, CanoError> {
tokio::time::sleep(Duration::from_millis(3)).await;
Ok(TaskResult::Single(Step::Done))
}
}
fn workflow() -> Workflow<Step> {
Workflow::bare()
.with_observer(Arc::new(MetricsObserver::new()))
.register(Step::Fetch, FetchTask)
.register(Step::Process, ProcessTask)
.add_exit_state(Step::Done)
}
#[tokio::main]
async fn main() {
// 1. Install recorder and register descriptions.
let recorder = DebuggingRecorder::new();
let snapshotter = recorder.snapshotter();
metrics::set_global_recorder(recorder).expect("install metrics recorder");
cano::metrics::describe();
// In production, use a real exporter instead:
// metrics_exporter_prometheus::PrometheusBuilder::new()
// .install().expect("install prometheus exporter");
// 2. Run the workflow a few times directly.
for _ in 0..3 {
workflow()
.orchestrate(Step::Fetch)
.await
.expect("workflow run");
}
// 3. Run the same workflow under the scheduler for ~1.2s, firing every 1s.
let mut scheduler = Scheduler::new();
scheduler
.every_seconds("demo_flow", workflow(), Step::Fetch, 1)
.expect("register flow");
let running = scheduler.start().await.expect("start scheduler");
tokio::time::sleep(Duration::from_millis(1200)).await;
running.stop().await.expect("stop scheduler");
// 4. Dump every captured metric (sorted alphabetically).
println!("\n=== Cano metrics ===");
let mut rows = snapshotter.snapshot().into_vec();
rows.sort_by(|a, b| a.0.key().name().cmp(b.0.key().name()));
for (ck, _unit, _desc, value) in rows {
let key = ck.key();
let labels: Vec<String> = key
.labels()
.map(|l| format!("{}={}", l.key(), l.value()))
.collect();
let label_str = if labels.is_empty() {
String::new()
} else {
format!("{{{}}}", labels.join(","))
};
match value {
DebugValue::Counter(v) => println!(" {}{label_str} = {v}", key.name()),
DebugValue::Gauge(v) => println!(" {}{label_str} = {}", key.name(), v.into_inner()),
DebugValue::Histogram(s) => {
let n = s.len();
let sum: f64 = s.iter().map(|x| x.into_inner()).sum();
println!(" {}{label_str} = {{count={n}, sum={sum:.6}s}}", key.name());
}
}
}
}
Runnable example: cargo run --example metrics_demo --features "metrics scheduler" — installs a
DebuggingRecorder, attaches a MetricsObserver, runs a workflow and a scheduled
flow, and prints every emitted metric. For a real Prometheus exporter, swap in
metrics_exporter_prometheus::PrometheusBuilder::new().install() before
cano::metrics::describe().