User Tools

Site Tools


linux:watchdoginlinux

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

linux:watchdoginlinux [2016/09/17 17:00] (current)
rlunaro created
Line 1: Line 1:
 +====== Creating a watchdog in Linux ======
 +
 +===== Intro =====
 +
 +I've created a simple process that monitors if certain processes are being executed, and, if they don't, restart them again. ​
 +
 +My first purpose for this is for monitor two minecraft server I am running in my server; so there are specific glitches to work only for this case. However, there are easily changed to fit other purposes. ​
 +
 +===== The code =====
 +
 +
 +<​code>​
 +#!/bin/bash
 +#
 +# watchdog - monitors a process
 +#
 +#
 +#
 +
 +pidfileList[0]="/​home/​minecraft/​pidfile"​
 +pidfileList[1]="/​home/​minecraft2/​pidfile"​
 +
 +startcmd[0]="/​etc/​init.d/​minecraft start"
 +startcmd[1]="/​etc/​init.d/​minecraft2 start"
 +
 +logfile=/​var/​log/​watchdog.log
 +
 +tries=0
 +
 +umask 022
 +
 +PATH=/​usr/​local/​sbin:/​usr/​local/​bin:/​usr/​sbin:/​usr/​bin:/​sbin:/​bin
 +
 +
 +# first we delete the log file
 +rm "​$logfile"​
 +
 +# to write a message to the log
 +function log()
 +{
 +    now=$(date +"​%Y-%m-%d %H:​%M:​%S"​)
 +    echo "$now $1" >> $logfile
 +} # log
 +
 +
 +# do a lazy start: the first time it will wait
 +# 20 minutes to let the sistem to stabilize
 +log "​Waiting 20 minutes to let the sistem stabilize..."​
 +sleep 20m
 +log "​Recovering..."​
 +
 +while [ true ] ; do
 +
 +
 +  for(( i = 0; i < 50; i++ )) ; do
 +
 +    its_ok_to_launch=0
 +    if [ -n "​${pidfileList[i]}"​ ] ; then
 +
 +      # get the pidfile value
 +      pidfile="​${pidfileList[i]}"​
 +      log "​Checking pidfile $pidfile..."​
 +      # check the existence of this pidfile
 +      if [ -e "​$pidfile"​ ] ; then
 +        # get the pid value
 +        pidvalue=$(cat $pidfile)
 +        log "The file exists and contains the value $pidvalue"​
 +        # check existence of this pidvalue
 +        line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
 +        if [ -z "​$line"​ ] ; then
 +          # the process doesn'​t exist
 +          # or the pid number doesn'​t correspond
 +          # to a minecraft server
 +          log "There is no process with id: $pidvalue"​
 +          its_ok_to_launch=1
 +        fi # -z "​$line"​
 +      else
 +        # if the pidfile doesn'​t exist,
 +        # it is correct to launch it
 +        log "The file doesn'​t exist"
 +        its_ok_to_launch=1
 +      fi # -e $pidfile
 +
 +    fi # -n pidfileList[i]
 +
 +    if [ $its_ok_to_launch -eq 1 ] ; then
 +
 +      tries=$((tries+1))
 +      if [ $tries -le 6 ]; then
 +        # attempt to start the process
 +        # if the maximum reach attempts
 +        # haven'​t been reached
 +        log "​Attempting to start command ${startcmd[i]} (this is time $tries out
 + of 6)"
 +        ${startcmd[i]}
 +      else
 +       if [ $tries -eq 6 ] ; then
 +         log "This is time number $tries, giving up"
 +       fi # tries -eq 6
 +      fi # tries -le 6
 +
 +    fi # its_ok_to_launc -eq 1
 +
 +  done # for
 +
 +  log "​Sleeping for 10 minutes...."​
 +  sleep 10m
 +done # true
 +
 +</​code>​
 +
 +
 +==== This is what the program does ====
 +
 +First, it waits 20 minutes, to save the case this program is configured to be run in the booting of the server and the monitored processes aren't being started: ​
 +
 +<​code>​
 +# do a lazy start: the first time it will wait
 +# 20 minutes to let the sistem to stabilize
 +log "​Waiting 20 minutes to let the sistem stabilize..."​
 +sleep 20m
 +log "​Recovering..."​
 +</​code>​
 +
 +Next, it will run forever, awakening for every ten minutes:
 +
 +<​code>​
 +
 +while [ true ] ; do
 +
 +  ....
 +
 +  log "​Sleeping for 10 minutes...."​
 +  sleep 10m
 +done # true
 +</​code>​
 +
 +
 +Next, a for loop is run to traverse the arrah pidfileList: ​
 +
 +<​code>​
 +
 +  for(( i = 0; i < 50; i++ )) ; do
 +
 +  ....
 +
 +  done # for
 +
 +</​code>​
 +
 +For every element in the array that is not empty....
 +
 +<​code>​
 +
 +    its_ok_to_launch=0
 +    if [ -n "​${pidfileList[i]}"​ ] ; then
 +
 +    ....
 +
 +    fi # its_ok_to_launc -eq 1
 +
 +</​code>​
 +
 +
 +Comes the real part. Get the content of the pidfile and put into a pidvalue variable:
 +
 +<​code>​
 +
 +      # get the pidfile value
 +      pidfile="​${pidfileList[i]}"​
 +      log "​Checking pidfile $pidfile..."​
 +      # check the existence of this pidfile
 +      if [ -e "​$pidfile"​ ] ; then
 +        # get the pid value
 +        pidvalue=$(cat $pidfile)
 +        log "The file exists and contains the value $pidvalue"​
 +        ....
 +
 +</​code>​
 +
 +
 +Verify that this pidvalue correspond to a real, existing process: ​
 +
 +<​code>​
 +        # check existence of this pidvalue
 +        line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
 +        if [ -z "​$line"​ ] ; then
 +          # the process doesn'​t exist
 +          # or the pid number doesn'​t correspond
 +          # to a minecraft server
 +          log "There is no process with id: $pidvalue"​
 +          its_ok_to_launch=1
 +        fi # -z "​$line"​
 +      else
 +        # if the pidfile doesn'​t exist,
 +        # it is correct to launch it
 +        log "The file doesn'​t exist"
 +        its_ok_to_launch=1
 +      fi # -e $pidfile
 +
 +    fi # -n pidfileList[i]
 +    ​
 +</​code>​
 +
 +And if the process doesn'​t exist for whatever reason (the file doesn'​t exist, or the pid number doesn'​t correspond to a real process), try to restart the file up to a limit of six times: ​
 +
 +<​code>​
 +
 +    if [ $its_ok_to_launch -eq 1 ] ; then
 +
 +      tries=$((tries+1))
 +      if [ $tries -le 6 ]; then
 +        # attempt to start the process
 +        # if the maximum reach attempts
 +        # haven'​t been reached
 +        log "​Attempting to start command ${startcmd[i]} (this is time $tries out
 + of 6)"
 +        ${startcmd[i]}
 +      else
 +       if [ $tries -eq 6 ] ; then
 +         log "This is time number $tries, giving up"
 +       fi # tries -eq 6
 +      fi # tries -le 6
 +      ​
 +</​code>​
 +
 +
 +===== Confiuration =====
 +
 +You have to configure the pid files to be monitorized (here is my example with the minecraft server): ​
 +
 +<​code>​
 +
 +pidfileList[0]="/​home/​minecraft/​pidfile"​
 +pidfileList[1]="/​home/​minecraft2/​pidfile"​
 +
 +</​code>​
 +
 +How this command are run in the event of a failure:
 +
 +<​code>​
 +startcmd[0]="/​etc/​init.d/​minecraft start"
 +startcmd[1]="/​etc/​init.d/​minecraft2 start"
 +</​code>​
 +
 +The location of the logfile: ​
 +
 +<​code>​
 +logfile=/​var/​log/​watchdog.log
 +</​code>​
 +
 +And, in the case you have to use it for other purposes, how it's identified each process: ​
 +
 +<​code>​
 +line=$(ps aux | grep $pidvalue | grep minecraft | grep -v grep)
 +</​code>​
 +
 +I've need to add this ''​grep minecraft''​ to avoid errors: in some ocasion, the process failed and another program start to occupy this process number. ​
 +
 +
 +
 +
 +===== Installation =====
 +
 +I've used the file ''/​etc/​rc.local''​ to start this file and I've put this under ''/​usr/​local/​sbin'',​ but you can pick whatever directory best suits you. 
 +
 +
  
linux/watchdoginlinux.txt ยท Last modified: 2016/09/17 17:00 by rlunaro