Get notified when a periodic task doesn't run

  • To those complaining about the price - the level of technicality, like for MAILTO=user@domain.com in a crontab may not explain the price alone. Price != costs.

    The right price is the price your target consumers are willing to pay. And if like another commenter you are doing to build it yourself instead of forking $19, you obviously are not the consumer. (and neither am I - I'll stick to cron :-) !)

  • I really shouldn't have to create an account just to figure out what your pricing looks like.

  • Monit does this pretty easily http://mmonit.com/monit/

    You can monitor web services, processes, file modified dates, directories, loads of stuff all with email alerts and a web-front end too.

  • $19/month for more than one task? That's fine for companies, but I'm not going to pay that much as an individual, who needs 3, maybe 5 tasks to track.

  • Here's a shot at giving constructive feedback rather than bitching about the cost which most people here are doing. Point taken, there's the other stuff I can think of:

    1) Disclose cost earlier. Telling people to sign up for free and then asking them to pay you is not cool. It starts off the relationship on a bad note and will prevent people from signing up.

    2) You have a great idea here. Startups need this. We do not have time to set up and configure nagios, or some other warning mechanism. You make a good start by attacking the problem no one else actually wants to take care of, and making it really easy to do so.

    3) Either forget the hobbyist or make another (less expensive) level of service for him. Do not forget there is very little money to be made on hobbyists unless it is very easy for you to take them on. You do not want to take on these people asking for the same service for $1 unless you can make it much cheaper to provide service to them.

    4) As some have noted: Put in a SLA, and tell us why your service will not fail. Otherwise if we are not warned we will not know if it's because your service is down.

    5) Put up a tutorial or more info on what you offer. Can we have planned maintenances (ie, do not warn me about this for 4 hours)? Can we hit the snooze button? What are the methods you use for notifications? How many people/emails/phones will be notified per failure? The unanswered questions go on and on.

    6) Do more stuff. What about other type of monitoring? Can we group servers into groups so we do not have to set up each individually? Can we monitor stuff that IS running but I send you a value periodically through the HTTP hit? I want to graph stuff, like HTTP hits per hour, or HTTPD errors per minute. I want a warning to be sent to me when I get more than X HTTPD errors per minute, for example.

    EDIT: 7) Add a trial period, this just makes sense.

    So for work I run Nagios. I would love to have a Nagios set up for my side projects because Nagios provides a world of benefits but if I invested my time in setting up Nagios then I would not have time to do actual development.

    You are onto something good, you just need to shape it a bit.

  • The problem is that this doesn't address what I have found to be the most dangerous problem with periodic tasks.

    That is when your cron job runs but there is some error in part of the script (for example maybe it writes/reads a file in a folder but the permissions on that folder were changed since the script was written). This causes an error which might cause a cascade of errors meaning that some other parts of your job either fail to run or run incorrectly.

    Now what happens here, do you get notified of the error or does it just get silently eaten? It's also very possible that your system will eat the error and then proceed to the next step (calling this API) and everything will appear fine.

    One thing I figure out what the expected output from the job should be, I then pipe the output from the cronjob into a file. I have a second cronjob that checks the contents of this file periodically and generates an alert if it does not match what is expected.

    You should also try and find some way to test any generated data. For example if you are doing a DB backup, add another table with a field that contains data that is in some way based on the date. You can then have a task which will try and restore old backups into another DB, it can then check this field against the expected value for the date of the backup.

    Of course none of these techniques are silver bullets and there are plenty of things that can go wrong, it is certainly prudent to check things manually every once in a while.

    Perhaps this API could be modified to take as an input the output from scheduled tasks and check them?

  • I run this as a combination of monit timestamp checking and stdout to a file. Is there anything this does that I cannot do this without monit?

    e.g.

    MAILTO=email@ops.com

    1 23 * * * /home/cron/backup.sh 1> /backups/backup

    and monit doing a

    check file backup with path /backups/backup

    if timestamp > 24 hours then alert

    This way I am emailed when there is an error with the backup script, otherwise things continue as it is.

  • This will be AWESOME for monitoring daily backups on the large number of sites/databases that should be getting backed up automatically every night. They work reliably for a while and then I forget about checking them. Would be nice to know when one of those starts to choke.

  • Those using cron hacks to check on previous cron jobs might want to take a look at Matt Dillon's cron (as opposed to Paul Vixie's cron), which is the crond that Slackware uses.

    It has named cron jobs, @noauto and AFTER keywords, which let you run jobs depending on whether previous jobs (identified by name) have run successfully.

    That, combined with using things like && and || and mailx, give you quite powerful ways of checking on previous cron jobs.

  • Ouch. I've been dabbling at a project that does the exact same thing and your implementation is so perfect that I may give up and sign up :-)

  • I'm sure this will make tons of money and will be amazingly useful as a hip tool for those developers we all know. But honestly the rest of us build this kind of stuff into our company dashboards without thinking twice. Where it belongs, in the hands of an internal team who can share the monitoring of more than $20 worth of "snitches".

  • I definitely know quite a few small projects that this could be extremely useful for. However, I think some people are right that Munin works better at tracking these things long term.

  • And what service do I use to find out when this service doesn't run?