Appearance
Schedule Runbooks
Verify Dual-Write Consistency
Check that every pending or active row in Postgres has a corresponding BullMQ entry in Redis.
Postgres side — list jobs that should be active:
sql
SELECT id, topic, kind, status, run_at, cron_pattern, interval_ms, attempts, max_attempts
FROM schedule.scheduled_jobs
WHERE status IN ('pending', 'active')
ORDER BY created_at DESC
LIMIT 50;Redis side — inspect delayed (one-shot) and repeatable jobs:
bash
# List all delayed BullMQ jobs for the schedule queue
redis-cli ZRANGE bull:schedule:delayed 0 -1 WITHSCORES
# List all repeatable job keys
redis-cli KEYS "bull:schedule:repeat:*"
# Inspect a specific job by id (replace <job-id>)
redis-cli HGETALL "bull:schedule:<job-id>"If a row exists in Postgres but has no Redis entry, the reconciler will fix it on the next restart (see below). You can also trigger recovery manually by restarting the API process.
Inspect Redis Keys
bash
# All schedule queue keys
redis-cli KEYS "bull:schedule:*"
# Jobs waiting in the delayed set (sorted by score = fire timestamp ms)
redis-cli ZRANGE bull:schedule:delayed 0 -1 WITHSCORES
# Active (currently processing) jobs
redis-cli LRANGE bull:schedule:active 0 -1
# Failed jobs
redis-cli LRANGE bull:schedule:failed 0 -1
# Count jobs by state
redis-cli LLEN bull:schedule:wait
redis-cli ZCARD bull:schedule:delayed
redis-cli LLEN bull:schedule:active
redis-cli LLEN bull:schedule:failedRun the Migration
Do not run the migration command yourself — provide it to the developer or DBA who runs migrations for this project.
The migration file is located at:
apps/api/src/modules/schedule/infrastructure/migrations/<timestamp>-add_scheduled_jobs.tsThe command to generate and run migrations follows the project convention:
bash
pnpm --filter api migration:schedule:runThe migration creates the schedule Postgres schema and the schedule.scheduled_jobs table with all constraints and indexes.
Recover from Redis Flush
If Redis is flushed or restarted with data loss, the boot reconciler automatically recovers all pending and active jobs from Postgres when the API process restarts.
Steps:
- Confirm the API is stopped or restarting.
- Start (or restart) the API process.
- On
OnApplicationBootstrap,ScheduleReconcilerServicewill re-enqueue everypending/activerow. Jobs withrunAtin the past fire immediately (delay = 0). - Check the logs for
schedule.reconciler.summaryto confirmhealedcount.
Note: Jobs with status = 'active' at the time of the flush were in-flight. They will be re-enqueued and may fire again — listeners must handle this duplicate delivery idempotently.
Manually Cancel a Stuck Job
If a job is stuck in active status (e.g., worker crashed after DB update but before BullMQ ack), update the row directly and optionally clean Redis:
sql
-- Mark the job cancelled in Postgres
UPDATE schedule.scheduled_jobs
SET status = 'cancelled', cancelled_at = now()
WHERE id = '<job-id>';bash
# Remove from BullMQ if still present
redis-cli HDEL "bull:schedule:<job-id>"After the next restart, the reconciler will not re-enqueue rows with status = 'cancelled'.
Check No-Listener Jobs
Jobs that fire but have no registered @OnEvent listener are logged as WARN and tracked by the app_schedule_job_no_listener_total metric. To find them in logs:
grep "schedule.no_listener" <log-file>Or query in Grafana Loki:
{app="api"} |= "schedule.no_listener"This is not an error, but indicates a topic with no subscriber — likely a misconfigured listener or a topic mismatch.