Ever had a process die and not know it until trying to use it? Last year dovecot was dying and running “service dovecot status” shows that - this script was born to address this for many processes. Its a work in progress, and please share any ways to make it better. Cron this to run every 15 minutes, and minor adjustments will be needed for other systems, daemons.
Note: this was written for RHEL systems - make minor adjustments for other systems/daemons
The textfile is located here and the config file looks like:
#
email: user@example.com
pager: 4155551212@messaging.sprintpcs.com
# service: commented_out
service: httpd
service: dovecot
service: mysqld
service: postfix
service: sshd
service: MailScanner
service: proftpd
service: syslog






2 responses so far ↓
1 gregg // May 7, 2008 at 7:06 pm
The Script itself
#!/bin/ksh
#
# Servercheck - thats all it does -
# Script to check the server health for various processes
#
# Written from scratch by Gregg Lain 10/13/2007 gregg@mochabomb.com
#
###########################################################################################################
#
# Variables
#
email=`grep email /usr/local/ervercheck.conf | sed ’s/email: //’`
pager=`grep pager /usr/local/servercheck.conf | sed ’s/pager: //’`
timestamp=`date ‘+%m-%d-%y-%H:%M:%S’`
logdir=”/var/log”
tmpfile=”$logdir/$timestamp”
touch $tmpfile
Server=`hostname`
#
##########################################################################################################
#
# Initialize the incident flag - since we run with a cron, this is not in a loop…
incidentflag=0
#
##########################################################################################################
#
# 1. Check out a process function
#
for procstatus in `grep service /usr/local/servercheck.conf | egrep -v ‘^#’ | sed ’s/service: //’`; do
status=`/sbin/service $procstatus status`
echo $status | egrep ‘running|OK’ 1> /dev/null
if [ $? -ne 0 ]; then # something is not running
# incidentflag=$(($incidentflag+1))
incidentflag=1
echo “(servercheck) $Server: $procstatus not running: $timestamp ” >> $tmpfile
mail $email -s “$Server: $procstatus not running” < $tmpfile
/sbin/service $procstatus restart
restart=$?
servicePID=`ps -ef | grep $procstatus | egrep -v grep | head -1 | awk {'print $2'}`
if [ $restart -ne 0 ]; then # something cannot be started
echo "Alert! $Server $procstatus cannot be started" > $tmpfile
echo “Restart $procstatus via shell or webmin” >> $tmpfile
mail $email -s “** Alert ** $Server: $procstatus cannot be started” < $tmpfile
fi
if [ $restart -eq 0 ]; then # something was restarted
echo "$Server $procstatus re-started" > $tmpfile
echo “$procstatus running with PID of $servicePID” >> $tmpfile
mail $email -s “$Server $procstatus restarted successfully” < $tmpfile
fi
fi
done
#
#
########################################################################################################
#
# 3. Garbage collection
#
cat $tmpfile >> /var/log/servercheck
rm $tmpfile
2 gregg // May 7, 2008 at 7:08 pm
Leave a Comment