z4j vs. Grafana + Prometheus
The DIY metrics approach. Great for SRE, wrong tool for task ops.
- Scope
- Any engine, via custom exporters
- Positioning
- Metrics and alerting, not task-level control
- License
- AGPL-3 (Grafana) + Apache-2 (Prometheus)
what Grafana + Prometheus does well
Credit where it's due
Already in every production stack - no new tool to sell to your platform team
Excellent for aggregate trends, SLO dashboards, and alerting
Integrates with existing on-call / PagerDuty routing
Arbitrary dashboards for whatever you can export as a metric
capability matrix
z4j vs. Grafana + Prometheus, feature by feature
| Capability | Grafana + Prometheus | z4j |
|---|---|---|
| Aggregate throughput / latency charts | Excellent | Built-in, no exporter needed |
| Per-task drill-down (args, kwargs, stack) | Not possible - metrics strip that | Every task, every attempt, with redaction |
| Retry / cancel / bulk actions | None - read-only dashboards | Universal across every engine |
| Schedule CRUD | None | Create / edit / delete / trigger-now |
| Audit log | None | HMAC-chained, tamper-evident |
| Setup cost | Hours to days - exporters, dashboards, alert rules | One docker compose up |
| Engine-specific signals | Whatever you export yourself | Built-in adapters for 6 engines |
| Multi-tenant project scoping | Grafana Orgs (heavy) | Per-project RBAC out of the box |
honest trade-offs
Where we don't win
If your platform team already runs Grafana, you already have aggregate metrics. z4j is a complement, not a replacement - it owns the task-level layer.
For site-wide SLO dashboards and alerting, Grafana is the right tool. z4j focuses on task operators: the person who has to figure out why a specific job is stuck.
We recommend running both. z4j exposes Prometheus metrics at /metrics so your Grafana dashboards can pull from it.
other comparisons
Compared to other dashboards
Try z4j alongside Grafana + Prometheus, no migration required
Run both for a week. Compare. Decide.