Skip to content

Retry and Failure

Default Retry Policy

ParameterDefault
attempts5
backoff.typeexponential
backoff.delay30 000 ms (30 s)
removeOnComplete{ age: 604800, count: 1000 } — 7 days or 1 000 jobs
removeOnFail{ age: 2592000 } — 30 days

Exponential backoff schedule from the first retry: 30 s → 1 min → 2 min → 4 min → 8 min.

Per-Job Override

Pass retry in the schedule spec to override for a specific job:

ts
await scheduleService.scheduleAt({
  topic: 'payments.charge-reminder',
  runAt: futureDate,
  timezone: 'UTC',
  payload: { invoiceId: '...' },
  retry: {
    attempts: 3,
    backoff: { type: 'fixed', delay: 10_000 },
  },
});

Allowed bounds:

ParameterMinMax
attempts120
backoff.delay1 000 ms3 600 000 ms (1 h)

Values outside these bounds are rejected at creation time with SCHEDULE_RETRY_POLICY_INVALID.

What Counts as a Failure

A job fails an attempt when the BullMQ worker's process callback throws or rejects. This happens when emitAsync throws — which means at least one @OnEvent listener threw or rejected.

"No listener registered" is NOT a failure. If schedule.arrived fires and no listener is attached for the topic, the job transitions to completed. A WARN log is emitted and the app_schedule_job_no_listener_total metric is incremented.

Retry Flow

  1. process callback throws → BullMQ moves job to its failed state (temporary).
  2. Audit row attempts incremented; last_error updated.
  3. After the backoff delay, BullMQ re-queues the job. Row status remains active.
  4. attempt in the IScheduleArrivedEvent payload increments on each retry — listeners can use this to apply different logic on repeated attempts.
  5. If attempt < maxAttempts and the listener succeeds → job transitions to completed.

Terminal Failure

After maxAttempts are exhausted:

  • BullMQ moves the job to the permanent failed list (kept for removeOnFail.age days).
  • Audit row: status = 'failed', last_error = serialized error, completed_at = null.
  • schedule.failed event is emitted (fire-and-forget) with payload { scheduledJobId, topic, lastError, attempts, metadata }.

There is no automatic DLQ queue in v1. Subscribe to schedule.failed for alerting or dead-letter workflows.

Cancellation During Retry

Cancelling a job that is in retry backoff (status = 'active') prevents future retries. The job row ends as cancelled. Any in-flight retry that resolves after cancellation transitions to completed (last-write wins — cancelled takes priority in a race). Already-delivered events are not rolled back.