Table of Contents
Creating a watchdog in Linux
Intro
I've created a simple process that monitors if certain processes are being executed, and, if they don't, restart them again.
My first purpose for this is for monitor two minecraft server I am running in my server; so there are specific glitches to work only for this case. However, there are easily changed to fit other purposes.
The code
#!/bin/bash # # watchdog - monitors a process # # # pidfileList[0]="/home/minecraft/pidfile" pidfileList[1]="/home/minecraft2/pidfile" startcmd[0]="/etc/init.d/minecraft start" startcmd[1]="/etc/init.d/minecraft2 start" logfile=/var/log/watchdog.log tries=0 umask 022 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin # first we delete the log file rm "$logfile" # to write a message to the log function log() { now=$(date +"%Y-%m-%d %H:%M:%S") echo "$now $1" >> $logfile } # log # do a lazy start: the first time it will wait # 20 minutes to let the sistem to stabilize log "Waiting 20 minutes to let the sistem stabilize..." sleep 20m log "Recovering..." while [ true ] ; do for(( i = 0; i < 50; i++ )) ; do its_ok_to_launch=0 if [ -n "${pidfileList[i]}" ] ; then # get the pidfile value pidfile="${pidfileList[i]}" log "Checking pidfile $pidfile..." # check the existence of this pidfile if [ -e "$pidfile" ] ; then # get the pid value pidvalue=$(cat $pidfile) log "The file exists and contains the value $pidvalue" # check existence of this pidvalue line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep) if [ -z "$line" ] ; then # the process doesn't exist # or the pid number doesn't correspond # to a minecraft server log "There is no process with id: $pidvalue" its_ok_to_launch=1 fi # -z "$line" else # if the pidfile doesn't exist, # it is correct to launch it log "The file doesn't exist" its_ok_to_launch=1 fi # -e $pidfile fi # -n pidfileList[i] if [ $its_ok_to_launch -eq 1 ] ; then tries=$((tries+1)) if [ $tries -le 6 ]; then # attempt to start the process # if the maximum reach attempts # haven't been reached log "Attempting to start command ${startcmd[i]} (this is time $tries out of 6)" ${startcmd[i]} else if [ $tries -eq 6 ] ; then log "This is time number $tries, giving up" fi # tries -eq 6 fi # tries -le 6 fi # its_ok_to_launc -eq 1 done # for log "Sleeping for 10 minutes...." sleep 10m done # true
This is what the program does
First, it waits 20 minutes, to save the case this program is configured to be run in the booting of the server and the monitored processes aren't being started:
# do a lazy start: the first time it will wait # 20 minutes to let the sistem to stabilize log "Waiting 20 minutes to let the sistem stabilize..." sleep 20m log "Recovering..."
Next, it will run forever, awakening for every ten minutes:
while [ true ] ; do .... log "Sleeping for 10 minutes...." sleep 10m done # true
Next, a for loop is run to traverse the arrah pidfileList:
for(( i = 0; i < 50; i++ )) ; do .... done # for
For every element in the array that is not empty….
its_ok_to_launch=0 if [ -n "${pidfileList[i]}" ] ; then .... fi # its_ok_to_launc -eq 1
Comes the real part. Get the content of the pidfile and put into a pidvalue variable:
# get the pidfile value pidfile="${pidfileList[i]}" log "Checking pidfile $pidfile..." # check the existence of this pidfile if [ -e "$pidfile" ] ; then # get the pid value pidvalue=$(cat $pidfile) log "The file exists and contains the value $pidvalue" ....
Verify that this pidvalue correspond to a real, existing process:
# check existence of this pidvalue line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep) if [ -z "$line" ] ; then # the process doesn't exist # or the pid number doesn't correspond # to a minecraft server log "There is no process with id: $pidvalue" its_ok_to_launch=1 fi # -z "$line" else # if the pidfile doesn't exist, # it is correct to launch it log "The file doesn't exist" its_ok_to_launch=1 fi # -e $pidfile fi # -n pidfileList[i]
And if the process doesn't exist for whatever reason (the file doesn't exist, or the pid number doesn't correspond to a real process), try to restart the file up to a limit of six times:
if [ $its_ok_to_launch -eq 1 ] ; then tries=$((tries+1)) if [ $tries -le 6 ]; then # attempt to start the process # if the maximum reach attempts # haven't been reached log "Attempting to start command ${startcmd[i]} (this is time $tries out of 6)" ${startcmd[i]} else if [ $tries -eq 6 ] ; then log "This is time number $tries, giving up" fi # tries -eq 6 fi # tries -le 6
Confiuration
You have to configure the pid files to be monitorized (here is my example with the minecraft server):
pidfileList[0]="/home/minecraft/pidfile" pidfileList[1]="/home/minecraft2/pidfile"
How this command are run in the event of a failure:
startcmd[0]="/etc/init.d/minecraft start" startcmd[1]="/etc/init.d/minecraft2 start"
The location of the logfile:
logfile=/var/log/watchdog.log
And, in the case you have to use it for other purposes, how it's identified each process:
line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
I've need to add this grep minecraft
to avoid errors: in some ocasion, the process failed and another program start to occupy this process number.
Installation
I've used the file /etc/rc.local
to start this file and I've put this under /usr/local/sbin
, but you can pick whatever directory best suits you.