User Tools

Site Tools


linux:watchdoginlinux
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


Last revision
linux:watchdoginlinux [2016/09/17 17:00] – created rlunaro
Line 1: Line 1:
 +====== Creating a watchdog in Linux ======
 +
 +===== Intro =====
 +
 +I've created a simple process that monitors if certain processes are being executed, and, if they don't, restart them again. 
 +
 +My first purpose for this is for monitor two minecraft server I am running in my server; so there are specific glitches to work only for this case. However, there are easily changed to fit other purposes. 
 +
 +===== The code =====
 +
 +
 +<code>
 +#!/bin/bash
 +#
 +# watchdog - monitors a process
 +#
 +#
 +#
 +
 +pidfileList[0]="/home/minecraft/pidfile"
 +pidfileList[1]="/home/minecraft2/pidfile"
 +
 +startcmd[0]="/etc/init.d/minecraft start"
 +startcmd[1]="/etc/init.d/minecraft2 start"
 +
 +logfile=/var/log/watchdog.log
 +
 +tries=0
 +
 +umask 022
 +
 +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 +
 +
 +# first we delete the log file
 +rm "$logfile"
 +
 +# to write a message to the log
 +function log()
 +{
 +    now=$(date +"%Y-%m-%d %H:%M:%S")
 +    echo "$now $1" >> $logfile
 +} # log
 +
 +
 +# do a lazy start: the first time it will wait
 +# 20 minutes to let the sistem to stabilize
 +log "Waiting 20 minutes to let the sistem stabilize..."
 +sleep 20m
 +log "Recovering..."
 +
 +while [ true ] ; do
 +
 +
 +  for(( i = 0; i < 50; i++ )) ; do
 +
 +    its_ok_to_launch=0
 +    if [ -n "${pidfileList[i]}" ] ; then
 +
 +      # get the pidfile value
 +      pidfile="${pidfileList[i]}"
 +      log "Checking pidfile $pidfile..."
 +      # check the existence of this pidfile
 +      if [ -e "$pidfile" ] ; then
 +        # get the pid value
 +        pidvalue=$(cat $pidfile)
 +        log "The file exists and contains the value $pidvalue"
 +        # check existence of this pidvalue
 +        line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
 +        if [ -z "$line" ] ; then
 +          # the process doesn't exist
 +          # or the pid number doesn't correspond
 +          # to a minecraft server
 +          log "There is no process with id: $pidvalue"
 +          its_ok_to_launch=1
 +        fi # -z "$line"
 +      else
 +        # if the pidfile doesn't exist,
 +        # it is correct to launch it
 +        log "The file doesn't exist"
 +        its_ok_to_launch=1
 +      fi # -e $pidfile
 +
 +    fi # -n pidfileList[i]
 +
 +    if [ $its_ok_to_launch -eq 1 ] ; then
 +
 +      tries=$((tries+1))
 +      if [ $tries -le 6 ]; then
 +        # attempt to start the process
 +        # if the maximum reach attempts
 +        # haven't been reached
 +        log "Attempting to start command ${startcmd[i]} (this is time $tries out
 + of 6)"
 +        ${startcmd[i]}
 +      else
 +       if [ $tries -eq 6 ] ; then
 +         log "This is time number $tries, giving up"
 +       fi # tries -eq 6
 +      fi # tries -le 6
 +
 +    fi # its_ok_to_launc -eq 1
 +
 +  done # for
 +
 +  log "Sleeping for 10 minutes...."
 +  sleep 10m
 +done # true
 +
 +</code>
 +
 +
 +==== This is what the program does ====
 +
 +First, it waits 20 minutes, to save the case this program is configured to be run in the booting of the server and the monitored processes aren't being started: 
 +
 +<code>
 +# do a lazy start: the first time it will wait
 +# 20 minutes to let the sistem to stabilize
 +log "Waiting 20 minutes to let the sistem stabilize..."
 +sleep 20m
 +log "Recovering..."
 +</code>
 +
 +Next, it will run forever, awakening for every ten minutes:
 +
 +<code>
 +
 +while [ true ] ; do
 +
 +  ....
 +
 +  log "Sleeping for 10 minutes...."
 +  sleep 10m
 +done # true
 +</code>
 +
 +
 +Next, a for loop is run to traverse the arrah pidfileList: 
 +
 +<code>
 +
 +  for(( i = 0; i < 50; i++ )) ; do
 +
 +  ....
 +
 +  done # for
 +
 +</code>
 +
 +For every element in the array that is not empty....
 +
 +<code>
 +
 +    its_ok_to_launch=0
 +    if [ -n "${pidfileList[i]}" ] ; then
 +
 +    ....
 +
 +    fi # its_ok_to_launc -eq 1
 +
 +</code>
 +
 +
 +Comes the real part. Get the content of the pidfile and put into a pidvalue variable:
 +
 +<code>
 +
 +      # get the pidfile value
 +      pidfile="${pidfileList[i]}"
 +      log "Checking pidfile $pidfile..."
 +      # check the existence of this pidfile
 +      if [ -e "$pidfile" ] ; then
 +        # get the pid value
 +        pidvalue=$(cat $pidfile)
 +        log "The file exists and contains the value $pidvalue"
 +        ....
 +
 +</code>
 +
 +
 +Verify that this pidvalue correspond to a real, existing process: 
 +
 +<code>
 +        # check existence of this pidvalue
 +        line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
 +        if [ -z "$line" ] ; then
 +          # the process doesn't exist
 +          # or the pid number doesn't correspond
 +          # to a minecraft server
 +          log "There is no process with id: $pidvalue"
 +          its_ok_to_launch=1
 +        fi # -z "$line"
 +      else
 +        # if the pidfile doesn't exist,
 +        # it is correct to launch it
 +        log "The file doesn't exist"
 +        its_ok_to_launch=1
 +      fi # -e $pidfile
 +
 +    fi # -n pidfileList[i]
 +    
 +</code>
 +
 +And if the process doesn't exist for whatever reason (the file doesn't exist, or the pid number doesn't correspond to a real process), try to restart the file up to a limit of six times: 
 +
 +<code>
 +
 +    if [ $its_ok_to_launch -eq 1 ] ; then
 +
 +      tries=$((tries+1))
 +      if [ $tries -le 6 ]; then
 +        # attempt to start the process
 +        # if the maximum reach attempts
 +        # haven't been reached
 +        log "Attempting to start command ${startcmd[i]} (this is time $tries out
 + of 6)"
 +        ${startcmd[i]}
 +      else
 +       if [ $tries -eq 6 ] ; then
 +         log "This is time number $tries, giving up"
 +       fi # tries -eq 6
 +      fi # tries -le 6
 +      
 +</code>
 +
 +
 +===== Confiuration =====
 +
 +You have to configure the pid files to be monitorized (here is my example with the minecraft server): 
 +
 +<code>
 +
 +pidfileList[0]="/home/minecraft/pidfile"
 +pidfileList[1]="/home/minecraft2/pidfile"
 +
 +</code>
 +
 +How this command are run in the event of a failure:
 +
 +<code>
 +startcmd[0]="/etc/init.d/minecraft start"
 +startcmd[1]="/etc/init.d/minecraft2 start"
 +</code>
 +
 +The location of the logfile: 
 +
 +<code>
 +logfile=/var/log/watchdog.log
 +</code>
 +
 +And, in the case you have to use it for other purposes, how it's identified each process: 
 +
 +<code>
 +line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
 +</code>
 +
 +I've need to add this ''grep minecraft'' to avoid errors: in some ocasion, the process failed and another program start to occupy this process number. 
 +
 +
 +
 +
 +===== Installation =====
 +
 +I've used the file ''/etc/rc.local'' to start this file and I've put this under ''/usr/local/sbin'', but you can pick whatever directory best suits you. 
 +
 +
  
linux/watchdoginlinux.txt · Last modified: 2022/12/02 22:02 by 127.0.0.1