Skip to main content
z4j vs. Grafana + Prometheus

The DIY metrics approach. Great for SRE, wrong tool for task ops.

Scope
Any engine, via custom exporters
Positioning
Metrics and alerting, not task-level control
License
AGPL-3 (Grafana) + Apache-2 (Prometheus)
Grafana + Prometheus upstream project
what Grafana + Prometheus does well

Credit where it's due

Already in every production stack - no new tool to sell to your platform team
Excellent for aggregate trends, SLO dashboards, and alerting
Integrates with existing on-call / PagerDuty routing
Arbitrary dashboards for whatever you can export as a metric
capability matrix

z4j vs. Grafana + Prometheus, feature by feature

Capability Grafana + Prometheus z4j
Aggregate throughput / latency charts Excellent Built-in, no exporter needed
Per-task drill-down (args, kwargs, stack) Not possible - metrics strip that Every task, every attempt, with redaction
Retry / cancel / bulk actions None - read-only dashboards Universal across every engine
Schedule CRUD None Create / edit / delete / trigger-now
Audit log None HMAC-chained, tamper-evident
Setup cost Hours to days - exporters, dashboards, alert rules One docker compose up
Engine-specific signals Whatever you export yourself Built-in adapters for 6 engines
Multi-tenant project scoping Grafana Orgs (heavy) Per-project RBAC out of the box
honest trade-offs

Where we don't win

If your platform team already runs Grafana, you already have aggregate metrics. z4j is a complement, not a replacement - it owns the task-level layer.

For site-wide SLO dashboards and alerting, Grafana is the right tool. z4j focuses on task operators: the person who has to figure out why a specific job is stuck.

We recommend running both. z4j exposes Prometheus metrics at /metrics so your Grafana dashboards can pull from it.

other comparisons

Compared to other dashboards

Try z4j alongside Grafana + Prometheus, no migration required

Run both for a week. Compare. Decide.