How to Monitor Cron Jobs and Scheduled Tasks

The silent failure problem

Cron jobs are the duct tape of software. They handle backups, send emails, clean up old data, sync APIs, generate reports. They run in the background and nobody thinks about them.

Until they stop running.

The problem with cron jobs is that they fail silently. A web server that crashes returns a 500 error. A cron job that stops running returns nothing. No error, no alert, no signal. It just... doesn't happen.

Your daily backup hasn't run in two weeks? Your email queue has been stuck since Tuesday? Your data sync quietly stopped processing? You won't know until someone notices the symptoms.

Why traditional monitoring doesn't work

Normal uptime monitoring pings a URL from the outside. Your monitoring service sends a request, your server responds, everyone's happy.

But cron jobs don't have URLs. You can't ping a process that runs for 30 seconds every hour and then goes away. There's nothing to ping between runs.

Some people try workarounds like monitoring the server the cron runs on. But the server can be perfectly healthy while the cron process is failing - wrong permissions, full disk, expired API token, broken dependency. The server is "up" but the job isn't running.

How heartbeat monitoring works

Heartbeat monitoring (sometimes called dead man's switch or cron monitoring) flips the model:

Traditional monitoring: the monitor pings your service. Heartbeat monitoring: your service pings the monitor.

Here's the flow:

You set up a heartbeat monitor with an expected interval (e.g., every hour)
You add one line to the end of your cron job that pings the monitor's URL
If the monitor doesn't receive a ping within the expected window, it alerts you

It's a dead man's switch. As long as your cron is running successfully, it pings the monitor and everything's fine. The moment it stops pinging - for any reason - you get alerted.

Setting it up with Chirp

Step 1: Create a heartbeat monitor

In your Chirp dashboard, create a new monitor and select "Heartbeat" as the type. Set two values:

Expected interval - how often your cron runs (every 5 minutes, hourly, daily)
Grace period - buffer time for network delays or slow execution (default: 60 seconds)

Step 2: Add the ping to your cron job

Chirp gives you a unique URL and token. Add a curl to the end of your script:

#!/bin/bash
# Your cron job logic here
python process_queue.py

# Ping Chirp on success
curl -sf -X POST "https://getchirp.dev/api/monitors/YOUR_ID/heartbeat?token=YOUR_TOKEN"

The -sf flags make curl silent and fail quietly so it doesn't affect your cron's output.

Step 3: That's it

Chirp will now expect a ping at regular intervals. If it doesn't receive one within your interval plus grace period, it:

Marks the monitor as down
Creates an incident on your status page (if configured)
Updates any linked component status
Sends you an email alert

When your cron recovers and sends the next ping, Chirp automatically resolves the incident and generates an AI summary of what happened.

What to monitor

Here are the cron jobs you should monitor first - the ones where silent failure causes the most damage:

Start with database backups. If your backup cron fails and you need to restore, you'll find out the hard way that your last good backup is from three weeks ago. This is the one that keeps people up at night.

Email queues are next. A stuck queue means your users aren't getting password resets, notifications, or receipts. They'll blame your product, not the cron job.

After that: data syncs between services (CRM, analytics, billing) where a stuck sync creates drift that compounds daily. Billing processes like subscription renewals and payment retries where silent failures directly cost you money. And cleanup jobs (temp files, log rotation, session pruning) that seem low-priority until your disk fills up and everything falls over.

Tips for reliable cron monitoring

Always ping at the end of your script, not the beginning. If you ping at the start, a crash halfway through still looks like a successful run. Even better, use && to chain the ping so it only fires on success:

python process_data.py && curl -sf -X POST "https://getchirp.dev/api/monitors/YOUR_ID/heartbeat?token=YOUR_TOKEN"

If process_data.py fails (non-zero exit), the curl never runs, and the missed heartbeat triggers an alert.

Set your grace period wisely. If your job usually takes 5 minutes but occasionally takes 15, set the grace period high enough to avoid false alerts. And you don't need to monitor every cron job on day one - start with the ones where failure has real consequences (backups, billing, email) and expand from there.

Stop ignoring your cron jobs

Cron jobs fail silently and that makes them dangerous. The fix takes a few minutes: create a heartbeat monitor and add a curl to your script. Most monitoring tools support this, including Chirp's free tier.