Script to monitor Linux server processes

May 6, 2008 by gregg Leave a reply »

Ever had a process die and not know it until trying to use it? Last year dovecot was dying and running “service dovecot status” shows that – this script was born to address this for many processes. Its a work in progress, and please share any ways to make it better. Cron this to run every 15 minutes, and minor adjustments will be needed for other systems, daemons.

Note: this was written for RHEL systems – make minor adjustments for other systems/daemons

The textfile is located here and the config file looks like:

#
email: user@example.com
pager: 4155551212@messaging.sprintpcs.com
# service: commented_out
service: httpd
service: dovecot
service: mysqld
service: postfix
service: sshd
service: MailScanner
service: proftpd
service: syslog

  • del.icio.us
  • Digg
  • Slashdot
  • Technorati
  • MisterWong
  • Reddit
Advertisement:

2 Responses

  1. gregg says:

    The Script itself

    #!/bin/ksh
    #
    # Servercheck – thats all it does -
    # Script to check the server health for various processes
    #
    # Written from scratch by Gregg Lain 10/13/2007 gregg@mochabomb.com
    #
    ###########################################################################################################
    #
    # Variables
    #
    email=`grep email /usr/local/ervercheck.conf | sed ‘s/email: //’`
    pager=`grep pager /usr/local/servercheck.conf | sed ‘s/pager: //’`
    timestamp=`date ‘+%m-%d-%y-%H:%M:%S’`
    logdir=”/var/log”
    tmpfile=”$logdir/$timestamp”
    touch $tmpfile
    Server=`hostname`
    #
    ##########################################################################################################
    #
    # Initialize the incident flag – since we run with a cron, this is not in a loop…
    incidentflag=0
    #

    ##########################################################################################################
    #
    # 1. Check out a process function
    #
    for procstatus in `grep service /usr/local/servercheck.conf | egrep -v ‘^#’ | sed ‘s/service: //’`; do
    status=`/sbin/service $procstatus status`
    echo $status | egrep ‘running|OK’ 1> /dev/null
    if [ $? -ne 0 ]; then # something is not running
    # incidentflag=$(($incidentflag+1))
    incidentflag=1
    echo “(servercheck) $Server: $procstatus not running: $timestamp ” >> $tmpfile
    mail $email -s “$Server: $procstatus not running” < $tmpfile
    /sbin/service $procstatus restart
    restart=$?
    servicePID=`ps -ef | grep $procstatus | egrep -v grep | head -1 | awk {'print $2'}`
    if [ $restart -ne 0 ]; then # something cannot be started
    echo "Alert! $Server $procstatus cannot be started" > $tmpfile
    echo “Restart $procstatus via shell or webmin” >> $tmpfile
    mail $email -s “** Alert ** $Server: $procstatus cannot be started” < $tmpfile
    fi
    if [ $restart -eq 0 ]; then # something was restarted
    echo "$Server $procstatus re-started" > $tmpfile
    echo “$procstatus running with PID of $servicePID” >> $tmpfile
    mail $email -s “$Server $procstatus restarted successfully” < $tmpfile
    fi
    fi
    done
    #
    #
    ########################################################################################################
    #
    # 3. Garbage collection
    #
    cat $tmpfile >> /var/log/servercheck
    rm $tmpfile

Leave a Reply


Mochabomb is Digg proof thanks to caching by WP Super Cache