Metrics

metrics-crate counters, histograms, and gauges for your workflows.

Behind the metrics feature gate (features = ["metrics"]).

See also

This page covers Cano's built-in metrics instrumentation and the MetricsObserver. For the synchronous callback API (WorkflowObserver) that MetricsObserver builds on, see Observers. For tracing-crate span instrumentation, the sibling observability feature, see Tracing.

Cano provides optional metrics instrumentation through the metrics feature using the metrics facade crate. The metrics crate is recorder-agnostic: you install any compatible exporter (Prometheus, StatsD, a debugging snapshotter, …) and Cano emits to it. All instrumentation is behind conditional compilation — zero overhead when the feature is disabled.

For a callback-style API — get notified on workflow lifecycle and failure events without depending on the metrics ecosystem — see Observers. The metrics feature ships a ready-made MetricsObserver that bridges the two: attach it with .with_observer(Arc::new(MetricsObserver::new())) to re-emit those observer hooks as metrics-crate counters.


Setup

Enable the metrics feature flag in your Cargo.toml. You can also use features = ["all"] to enable everything (scheduler + tracing + recovery + metrics) at once.

[dependencies]
cano = { version = "0.13", features = ["metrics"] }


# The metrics crate is a facade — you also need a recorder/exporter:
metrics-exporter-prometheus = "0.16"  # for production
# or: metrics-util = "0.18"           # for testing / debugging snapshots

# Or enable everything (scheduler + tracing + recovery + metrics):
# cano = { version = "0.13", features = ["all"] }


Because metrics is a facade, Cano only depends on the shared interface. Your application picks the concrete recorder (e.g. metrics_exporter_prometheus::PrometheusBuilder::new().install_recorder() for a Prometheus scrape endpoint, or metrics_util::debugging::DebuggingRecorder in tests). Call cano::metrics::describe() once, after installing your recorder, so exporters receive help text and units.


Two Surfaces

The metrics feature exposes instrumentation through two complementary surfaces, mirroring how Tracing pairs engine spans with the TracingObserver bridge.

MetricsObserver — opt-in lifecycle counters

MetricsObserver is a WorkflowObserver that re-emits the observer hooks as metrics-crate counters. Wire it up in one line:

use cano::prelude::*;
use std::sync::Arc;

Workflow::bare()
    .register(/* ... */)
    .add_exit_state(/* ... */)
    .with_observer(Arc::new(MetricsObserver::new()))

It emits these counters (each incremented on the corresponding observer hook):

  • cano_state_enters_total{state} — on on_state_enter
  • cano_observed_task_runs_total{task, outcome} — on on_task_success / on_task_failure (outcomecompleted|failed)
  • cano_task_retries_total{task} — on on_retry
  • cano_circuit_open_events_total{task} — on on_circuit_open
  • cano_checkpoints_observed_total — on on_checkpoint
  • cano_resumes_total — on on_resume

on_task_start is intentionally not counted — every dispatch already shows up in cano_observed_task_runs_total{outcome}, so a separate "start" counter would just be the sum of the completed and failed rows.

MetricsObserver is in the prelude behind the metrics feature — no extra import needed when you use use cano::prelude::*.

Always-on direct instrumentation — engine internals

Compiled in whenever the metrics feature is on, regardless of whether a MetricsObserver is attached. Covers engine internals the observer hooks do not reach: workflow run duration, circuit-breaker state transitions, per-attempt retry-loop outcomes, poll/batch/step iteration counts, scheduler flow telemetry, and checkpoint store operations. See What Gets Measured for the full list.


What Gets Measured

Histograms record raw f64 seconds samples — bucketing and quantile computation are the exporter's responsibility. Metric names follow metrics-crate underscore conventions.

Workflow

  • cano_workflow_runs_total{outcome} — counter; outcomecompleted|failed|timeout
  • cano_workflow_duration_seconds{outcome} — histogram (seconds)
  • cano_workflow_active — gauge; workflows currently executing

Task Dispatch

  • cano_task_duration_seconds{state, kind} — histogram (seconds); kindsingle|router|split|compensatable|stepped
  • cano_task_attempts_total{outcome} — counter; per-attempt inside the retry loop; outcomecompleted|failed
  • cano_circuit_rejections_total — counter; attempts short-circuited by an open breaker

Split / Join

  • cano_split_branch_results_total{result} — counter; resultsuccess|failure|cancelled

Circuit Breaker

  • cano_circuit_transitions_total{transition} — counter; transitionclosed_to_open|open_to_halfopen|halfopen_to_closed|halfopen_to_open
  • cano_circuit_acquires_total{result} — counter; resultacquired|rejected
  • cano_circuit_outcomes_total{outcome} — counter; outcomesuccess|failure

Processing Loops

  • cano_poll_iterations_total{outcome} — counter; outcomeready|pending
  • cano_batch_runs_total{outcome} — counter; outcomecompleted|failed
  • cano_batch_items_total{result} — counter; resultok|err
  • cano_step_iterations_total{outcome} — counter; outcomemore|done

Recovery & Saga

  • cano_checkpoint_appends_total{result} — counter; resultok|err
  • cano_checkpoint_clears_total{result} — counter; resultok|err
  • cano_compensations_run_total{result} — counter; resultok|err
  • cano_compensation_drains_total{outcome} — counter; outcomeclean|partial

Scheduler

Also requires the scheduler feature.

  • cano_scheduler_flow_runs_total{flow, outcome} — counter; outcomecompleted|failed
  • cano_scheduler_flow_duration_seconds{flow} — histogram (seconds)
  • cano_scheduler_flow_backoff_total{flow} — counter
  • cano_scheduler_flow_tripped_total{flow} — counter
  • cano_scheduler_active_flows — gauge; flows currently executing

Registering Descriptions

cano::metrics::describe() registers a human-readable description and unit for every metric Cano emits. Call it once, after installing your recorder, so exporters receive help text in their output (e.g. Prometheus # HELP / # TYPE lines).

// Install your recorder first, then describe:
metrics_exporter_prometheus::PrometheusBuilder::new()
    .install()
    .expect("install prometheus exporter");

cano::metrics::describe();

If you skip describe(), metrics still flow — only the help text and units are missing from the exporter output.


Cardinality

Labels are deliberately minimal to keep cardinality bounded:

The deepest hot-path metrics — per-attempt retry-loop counters (cano_task_attempts_total), circuit-breaker internals, and poll/batch/step iteration counters — carry no per-state label, keeping their cardinality constant regardless of how many states your workflow defines.


Cost

Compiling the metrics feature in adds a small, bounded per-state-transition cost (formatting the state label as a string for the state label) even when no recorder is installed. If you are building a latency-critical service that does not collect metrics, leave the feature off. Otherwise, when a recorder is installed, the overhead is the same as any other metrics-crate emission — a hash-map lookup plus atomic increment, comparable to a log line.


Correlating with Traces

When you also enable the tracing feature, metrics and tracing interoperate through tracing spans: the metrics-tracing-context crate makes any metric emitted inside a span inherit that span's fields as labels. This is wiring you do in your application — Cano never depends on metrics-tracing-context itself (the same posture it takes toward tracing-subscriber).

[dependencies]
cano = { version = "0.13", features = ["metrics", "tracing"] }

metrics-tracing-context = "0.18"
tracing-subscriber = "0.3"

use metrics_tracing_context::{MetricsLayer, TracingContextLayer};
use metrics_util::layers::Layer;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

// 1. Wrap your metrics recorder so it reads the current span's fields.
let recorder = TracingContextLayer::all().layer(your_recorder);
metrics::set_global_recorder(recorder).expect("install metrics recorder");
cano::metrics::describe();

// 2. Add MetricsLayer to your tracing subscriber so spans expose their fields.
tracing_subscriber::registry()
    .with(MetricsLayer::new())
    .with(tracing_subscriber::fmt::layer())
    .init();

With both layers installed, a span you open around Workflow::orchestrate — e.g. info_span!("api_request", request_id = …) — tags every cano_* metric recorded during that run with request_id. Cano's own default workflow_orchestrate and workflow_resume spans carry a workflow_id field whenever one is set via with_workflow_id, so that becomes a metric label too. Span fields are merged with the explicit labels Cano already attaches (state, task, flow, outcome, …) — Cano deliberately does not name any span field after an existing metric label, so there is no merge ambiguity.

Cardinality

TracingContextLayer::all() promotes every field of every entered span as a label — including Cano-internal span fields such as max_attempts from the retry-loop spans. For production, prefer TracingContextLayer::new(filter) with a LabelFilter that allow-lists just the fields you author, to keep metric cardinality bounded.

Runnable example: cargo run --example metrics_tracing_context --features "metrics tracing" — wires the two layers, runs a workflow under a workflow_id and another inside a user api_request span, and prints the captured metrics so you can see the workflow_id and request_id labels propagated purely from span context.


Known Limitation

Note

The #[task::poll] and #[task::stepped] macros have two usage forms. The trait-impl form (impl PollTask<S> for T / impl SteppedTask<S> for T) inlines the loop body into the synthesised Task::run, so cano_poll_iterations_total and cano_step_iterations_total are not emitted for that form.

The inherent-impl form (#[task::poll(state = S)] impl T { async fn poll ... } / #[task::stepped(state = S)] impl T { async fn step ... }, the recommended form) and Workflow::register_stepped (engine-owned loop) both emit the iteration counters as expected.


Full Example

Install a DebuggingRecorder (useful for tests and self-contained demos), call cano::metrics::describe(), attach a MetricsObserver, run a workflow directly and then under the scheduler, then dump the captured snapshot. This mirrors the metrics_demo example shipped with the crate.

use cano::prelude::*;
use metrics_util::debugging::{DebugValue, DebuggingRecorder};
use std::sync::Arc;
use std::time::Duration;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum Step { Fetch, Process, Done }

struct FetchTask;

#[task]
impl Task<Step> for FetchTask {
    async fn run_bare(&self) -> Result<TaskResult<Step>, CanoError> {
        tokio::time::sleep(Duration::from_millis(5)).await;
        Ok(TaskResult::Single(Step::Process))
    }
}

struct ProcessTask;

#[task]
impl Task<Step> for ProcessTask {
    async fn run_bare(&self) -> Result<TaskResult<Step>, CanoError> {
        tokio::time::sleep(Duration::from_millis(3)).await;
        Ok(TaskResult::Single(Step::Done))
    }
}

fn workflow() -> Workflow<Step> {
    Workflow::bare()
        .with_observer(Arc::new(MetricsObserver::new()))
        .register(Step::Fetch, FetchTask)
        .register(Step::Process, ProcessTask)
        .add_exit_state(Step::Done)
}

#[tokio::main]
async fn main() {
    // 1. Install recorder and register descriptions.
    let recorder = DebuggingRecorder::new();
    let snapshotter = recorder.snapshotter();
    metrics::set_global_recorder(recorder).expect("install metrics recorder");
    cano::metrics::describe();

    // In production, use a real exporter instead:
    // metrics_exporter_prometheus::PrometheusBuilder::new()
    //     .install().expect("install prometheus exporter");

    // 2. Run the workflow a few times directly.
    for _ in 0..3 {
        workflow()
            .orchestrate(Step::Fetch)
            .await
            .expect("workflow run");
    }

    // 3. Run the same workflow under the scheduler for ~1.2s, firing every 1s.
    let mut scheduler = Scheduler::new();
    scheduler
        .every_seconds("demo_flow", workflow(), Step::Fetch, 1)
        .expect("register flow");
    let running = scheduler.start().await.expect("start scheduler");
    tokio::time::sleep(Duration::from_millis(1200)).await;
    running.stop().await.expect("stop scheduler");

    // 4. Dump every captured metric (sorted alphabetically).
    println!("\n=== Cano metrics ===");
    let mut rows = snapshotter.snapshot().into_vec();
    rows.sort_by(|a, b| a.0.key().name().cmp(b.0.key().name()));
    for (ck, _unit, _desc, value) in rows {
        let key = ck.key();
        let labels: Vec<String> = key
            .labels()
            .map(|l| format!("{}={}", l.key(), l.value()))
            .collect();
        let label_str = if labels.is_empty() {
            String::new()
        } else {
            format!("{{{}}}", labels.join(","))
        };
        match value {
            DebugValue::Counter(v) => println!("  {}{label_str} = {v}", key.name()),
            DebugValue::Gauge(v)   => println!("  {}{label_str} = {}", key.name(), v.into_inner()),
            DebugValue::Histogram(s) => {
                let n = s.len();
                let sum: f64 = s.iter().map(|x| x.into_inner()).sum();
                println!("  {}{label_str} = {{count={n}, sum={sum:.6}s}}", key.name());
            }
        }
    }
}

Runnable example: cargo run --example metrics_demo --features "metrics scheduler" — installs a DebuggingRecorder, attaches a MetricsObserver, runs a workflow and a scheduled flow, and prints every emitted metric. For a real Prometheus exporter, swap in metrics_exporter_prometheus::PrometheusBuilder::new().install() before cano::metrics::describe().