Skip to main content

Stuck-task reconciliation

Tasks started forever. Fixed, automatically.

A background worker sweeps tasks stuck in started/pending past a configurable threshold, asks the agent to probe the engine's result backend, and applies the authoritative state back with a task.reconciled audit row.

Preview
z4j dashboard — Stuck-task reconciliation
Reconciliation worker Sweeping stuck tasks every 60s LAST SWEEP 247 tasks probed 12 state corrected 3 marked lost Next sweep in 00:47 PER-ENGINE PROBE celery 162 rq 44 dramatiq 21 huey 20 Recent reconciliations task_id=01H6M...042 · reports.generate_weekly corrected: pending → succeeded 1m ago task_id=01H6M...142 · reports.generate_weekly corrected: pending → succeeded 4m ago task_id=01H6M...242 · reports.generate_weekly lost (engine unknown) 7m ago task_id=01H6M...342 · reports.generate_weekly corrected: pending → succeeded 10m ago

Mockup of the dashboard view for this feature. Live-reloading and themed to match your dashboard.

Ships with

  • Covers every engine (Celery AsyncResult, RQ Job, arq Job.status, Huey result store, taskiq result_backend, Dramatiq Results middleware)
  • Engine-matched routing: reconcile calls go to an agent that actually runs that engine
  • Source-anchored state updates (agent cannot spoof engine / task_id)
  • Rate-limited to protect the broker and audit log
Related

More capabilities