Heartbeat Monitoring: How to Know When Your Cron Jobs and Background Workers Die

Most uptime monitoring works by poking your service and waiting for a response. You set up a monitor, it hits your URL every minute, and if it doesn't get back a 200 OK, it alerts you.

That works great for web servers. It's completely blind to everything else.

Cron jobs don't have a URL to ping. Background workers don't expose an endpoint. A data import script that runs every night at 3 AM isn't serving HTTP traffic. When these processes fail, they don't return an error code you can monitor -- they just go silent. And silence is impossible to detect with a traditional HTTP monitor.

This is exactly what heartbeat monitoring solves.

What Is Heartbeat Monitoring?

Heartbeat monitoring flips the usual model. Instead of your monitoring tool calling your service, your service calls your monitoring tool.

You create a heartbeat monitor and receive a unique URL -- something like https://statusdude.com/heartbeat/abc123xyz. Your job or worker is configured to hit that URL every time it completes a run. As long as the pings keep coming in on schedule, everything is fine. If the pings stop, the heartbeat monitor triggers an alert.

It's the software equivalent of a "dead man's switch" -- the system stays quiet as long as you're alive and checking in. Stop checking in, and it raises the alarm.

What Should You Monitor With Heartbeats?

Almost any scheduled or background process is a candidate:

Cron jobs — The most obvious use case. Database backups, cleanup scripts, report generation, email digests. These run on a schedule and fail silently all the time. A missed backup isn't discovered until you need to restore something.

Queue workers — Your ARQ, Celery, or Sidekiq workers process jobs in the background. If a worker crashes or gets stuck, the queue backs up. By the time a user complains that their export is "processing" for an hour, the damage is done.

Data sync pipelines — If your product pulls data from external APIs and the sync stops working, users see stale data without any obvious error. A heartbeat that fires after each successful sync catches this immediately.

Health check scripts — Anything that validates your system state and should run on a regular cadence. Database integrity checks, disk space monitors, certificate renewal scripts.

Scheduled reports — If a report isn't sent, nobody on the receiving end will tell you. The heartbeat will.

How Grace Periods Work

The most important configuration for a heartbeat monitor is the grace period -- how long after the expected check-in time to wait before alerting.

If your cron job runs every hour and the monitoring tool alerts the second a heartbeat is 1 second late, you'll get flooded with false positives. Cron jobs start slightly late. Workers take a moment to finish a run. Network calls add latency.

A reasonable grace period for an hourly job is 5 to 10 minutes. For a daily job, 30 to 60 minutes. The rule of thumb: set the grace period to roughly 10-15% of the check-in interval, or whatever covers the worst-case legitimate delay for that specific job.

What you should not do is set the grace period to zero. And you definitely shouldn't set it so long that a dead process runs for hours before anyone is paged. The grace period is a trade-off between false positives and detection speed -- tune it per monitor based on how critical the job is.

Setting Up a Heartbeat in Practice

In StatusDude, you create a heartbeat monitor the same way as any other monitor. Select the HEARTBEAT type, set your expected check-in interval, configure the grace period, and you'll get a unique ping URL.

Then you wire up your process to call that URL. Here's how that looks in practice:

Cron job (bash):

# /etc/cron.d/db-backup
0 3 * * * root /usr/local/bin/backup.sh && curl -fsS https://statusdude.com/heartbeat/YOUR_TOKEN > /dev/null

The && is critical here -- the heartbeat URL only gets called if the backup script exits successfully. If the script fails, no heartbeat is sent, and you'll get alerted.

Python:

import httpx
import sys

def run_daily_sync():
    # ... your sync logic ...
    pass

if __name__ == "__main__":
    try:
        run_daily_sync()
        httpx.get("https://statusdude.com/heartbeat/YOUR_TOKEN", timeout=5)
    except Exception as e:
        print(f"Sync failed: {e}", file=sys.stderr)
        sys.exit(1)

Node.js:

import fetch from "node-fetch";

async function runExport() {
  // ... your export logic ...
}

runExport()
  .then(() => fetch("https://statusdude.com/heartbeat/YOUR_TOKEN"))
  .catch((err) => {
    console.error("Export failed:", err);
    process.exit(1);
  });

The pattern is always the same: call the heartbeat URL only on successful completion. On failure, let it time out naturally. The monitoring tool notices the missed check-in and alerts you.

Heartbeats vs HTTP Monitors — When to Use Which

These two monitor types are complementary, not competing.

Use an HTTP monitor when:

You have a public endpoint that should always be reachable
You want to verify response codes, response times, or SSL certificate validity
You need multi-region verification (is it down for everyone, or just one location?)

Use a heartbeat monitor when:

The process doesn't expose a URL
The process runs on a schedule and you need to verify it's completing successfully
You want to detect that something stopped running, not that something is returning errors

For your background workers, you might want both. An HTTP monitor on the worker's internal health endpoint tells you the process is up. A heartbeat from the worker after each successful batch tells you the worker is actually processing jobs, not just sitting idle.

The Monitoring Gap Nobody Talks About

Here's the uncomfortable truth about most monitoring setups: they check that services are up, not that they're doing useful work.

Your API can be returning 200 OK while a background worker silently stopped processing emails two hours ago. Your dashboard loads fine while the cron job that refreshes user statistics hasn't run in a week. Your web server is healthy while the nightly backup has been failing for three days.

HTTP monitors tell you your front door is open. Heartbeat monitors tell you your house is actually occupied.

If you have any scheduled or background processes in your stack -- and every production SaaS does -- set up heartbeat monitors for them. It takes five minutes per job and it closes a whole class of silent failure that traditional monitoring completely misses.

Start with the most critical ones: database backups, payment reconciliation jobs, any process that users will complain about if it silently stops working. Then work your way down the list.

The jobs you're most likely to forget are the ones running quietly at 3 AM. Those are exactly the ones that need heartbeat monitors.