I've created a simple process that monitors if certain processes are being executed, and, if they don't, restart them again.
My first purpose for this is for monitor two minecraft server I am running in my server; so there are specific glitches to work only for this case. However, there are easily changed to fit other purposes.
#!/bin/bash # # watchdog - monitors a process # # # pidfileList[0]="/home/minecraft/pidfile" pidfileList[1]="/home/minecraft2/pidfile" startcmd[0]="/etc/init.d/minecraft start" startcmd[1]="/etc/init.d/minecraft2 start" logfile=/var/log/watchdog.log tries=0 umask 022 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin # first we delete the log file rm "$logfile" # to write a message to the log function log() { now=$(date +"%Y-%m-%d %H:%M:%S") echo "$now $1" >> $logfile } # log # do a lazy start: the first time it will wait # 20 minutes to let the sistem to stabilize log "Waiting 20 minutes to let the sistem stabilize..." sleep 20m log "Recovering..." while [ true ] ; do for(( i = 0; i < 50; i++ )) ; do its_ok_to_launch=0 if [ -n "${pidfileList[i]}" ] ; then # get the pidfile value pidfile="${pidfileList[i]}" log "Checking pidfile $pidfile..." # check the existence of this pidfile if [ -e "$pidfile" ] ; then # get the pid value pidvalue=$(cat $pidfile) log "The file exists and contains the value $pidvalue" # check existence of this pidvalue line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep) if [ -z "$line" ] ; then # the process doesn't exist # or the pid number doesn't correspond # to a minecraft server log "There is no process with id: $pidvalue" its_ok_to_launch=1 fi # -z "$line" else # if the pidfile doesn't exist, # it is correct to launch it log "The file doesn't exist" its_ok_to_launch=1 fi # -e $pidfile fi # -n pidfileList[i] if [ $its_ok_to_launch -eq 1 ] ; then tries=$((tries+1)) if [ $tries -le 6 ]; then # attempt to start the process # if the maximum reach attempts # haven't been reached log "Attempting to start command ${startcmd[i]} (this is time $tries out of 6)" ${startcmd[i]} else if [ $tries -eq 6 ] ; then log "This is time number $tries, giving up" fi # tries -eq 6 fi # tries -le 6 fi # its_ok_to_launc -eq 1 done # for log "Sleeping for 10 minutes...." sleep 10m done # true
First, it waits 20 minutes, to save the case this program is configured to be run in the booting of the server and the monitored processes aren't being started:
# do a lazy start: the first time it will wait # 20 minutes to let the sistem to stabilize log "Waiting 20 minutes to let the sistem stabilize..." sleep 20m log "Recovering..."
Next, it will run forever, awakening for every ten minutes:
while [ true ] ; do .... log "Sleeping for 10 minutes...." sleep 10m done # true
Next, a for loop is run to traverse the arrah pidfileList:
for(( i = 0; i < 50; i++ )) ; do .... done # for
For every element in the array that is not empty….
its_ok_to_launch=0 if [ -n "${pidfileList[i]}" ] ; then .... fi # its_ok_to_launc -eq 1
Comes the real part. Get the content of the pidfile and put into a pidvalue variable:
# get the pidfile value pidfile="${pidfileList[i]}" log "Checking pidfile $pidfile..." # check the existence of this pidfile if [ -e "$pidfile" ] ; then # get the pid value pidvalue=$(cat $pidfile) log "The file exists and contains the value $pidvalue" ....
Verify that this pidvalue correspond to a real, existing process:
# check existence of this pidvalue line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep) if [ -z "$line" ] ; then # the process doesn't exist # or the pid number doesn't correspond # to a minecraft server log "There is no process with id: $pidvalue" its_ok_to_launch=1 fi # -z "$line" else # if the pidfile doesn't exist, # it is correct to launch it log "The file doesn't exist" its_ok_to_launch=1 fi # -e $pidfile fi # -n pidfileList[i]
And if the process doesn't exist for whatever reason (the file doesn't exist, or the pid number doesn't correspond to a real process), try to restart the file up to a limit of six times:
if [ $its_ok_to_launch -eq 1 ] ; then tries=$((tries+1)) if [ $tries -le 6 ]; then # attempt to start the process # if the maximum reach attempts # haven't been reached log "Attempting to start command ${startcmd[i]} (this is time $tries out of 6)" ${startcmd[i]} else if [ $tries -eq 6 ] ; then log "This is time number $tries, giving up" fi # tries -eq 6 fi # tries -le 6
You have to configure the pid files to be monitorized (here is my example with the minecraft server):
pidfileList[0]="/home/minecraft/pidfile" pidfileList[1]="/home/minecraft2/pidfile"
How this command are run in the event of a failure:
startcmd[0]="/etc/init.d/minecraft start" startcmd[1]="/etc/init.d/minecraft2 start"
The location of the logfile:
logfile=/var/log/watchdog.log
And, in the case you have to use it for other purposes, how it's identified each process:
line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
I've need to add this grep minecraft
to avoid errors: in some ocasion, the process failed and another program start to occupy this process number.
I've used the file /etc/rc.local
to start this file and I've put this under /usr/local/sbin
, but you can pick whatever directory best suits you.