Wednesday, January 26, 2011

collectd-mod-exec Part 4

Part 1
Part 2
Part 3

There is only one thing left to explain - how to get notified if some values go out of scope. As an example I'm going to send notification emails if CPU temperature reaches 57 degree Celsius. This could be implemented with only one additional if statement and this would work perfectly if I only had one script that sends notifications. It would become cumbersome if I needed to edit multiple scripts to change say an email address where my notifications are sent. In other words the collecting and notification logics shall be separated. To  do this one can use 'PUTNOTIF' statement that shall be sent to the script STDOUT - exactly the same way like 'PUTVAL' is sent. Documentation of the 'PUTNOTIF' arguments can be found on the collect-exec module documentation page. Additionally there shall be some logic added to send notification after some time out period otherwise emails would be sent after each $COLLECTD_INTERVAL which is by default 30 seconds. Updated tmpcollect.sh script is shown below:
root@Alix:~# cat /mnt/sd/bin/tmpcollect.sh
#!/bin/sh

logger "*** tmpcollect script is started ***"

HOST=$COLLECTD_HOSTNAME
INTERVAL=$COLLECTD_INTERVAL
CRITEMP=57

send_notification() {
  local NOTIFTOUT=300 # 300 sec
  local timenow=$(date +%s)
  [ -z "$lastnotif" ] && lastnotif=0
  if [ $(( $timenow - $lastnotif  )) -gt $NOTIFTOUT ]; then
    echo "PUTNOTIF type=temperature host=$HOST plugin=tmpcollect severity=warning time=$timenow message=$1"
    lastnotif=$timenow
  fi
}

[ -z "$INTERVAL" ] && INTERVAL=5
INTERVAL=$(awk -v i=$INTERVAL 'BEGIN{print int(i)}')
while sleep $INTERVAL; do
  local values=$(echo `sensors | sed -r -n 's/temp[0-9][^0-9]*([^ ]*).*/\1/p'`)
  local cputemp=${values#*\ }
  echo "PUTVAL \"$HOST/exec-temperature_sensors/sensors\" interval=$INTERVAL N:${cputemp}:${values%\ *}"
  [ ${cputemp%.*} -gt $CRITEMP ] && {
    send_notification "CPU temperature exceeds critical value and is currently: $cputemp"
  }
done
 
send_notification function is a generic one and can be stored separately to be sourced by multiple scripts.

Notifications collected by the collectd daemon are redirected to STDIN of all scripts or applications registered to receive notifications. Such scripts are to be registered on the 'Statistics->Collectd->System Plugins->Exec' page:

[Image]

The script I'm registering is very simple - it reads from its STDIN and sends a notification to a specified email address:
root@Alix:~# cat /mnt/sd/bin/notifyemail.sh
#!/bin/sh

sendto="some.address@anydomain.com"
header="Subject: Collectd notification\nFrom:Alix\nTo:${sendto}\n\n"
echo -e "${header}`cat -`" | ssmtp "${sendto}" 2>>/tmp/ssmtp.error


Received email content looks like this:
Severity: WARNING
Time: 1295477902
Host: Alix
Plugin: tmpcollect
Type: temperature

CPU temperature exceeds critical value and is currently: 58.2

The 'Time' is shown in seconds because collectd-exec does not perform any 'PUTNOTIF' parameters conversions internally. To convert Time field into more readable form the last line of the notifyemail.sh script could be changed to this:
eval echo -e "\"${header}$(cat - | sed -r 's/Time: ([0-9]+)/Time:\ `date -d \1 -D %s \"+%Y:%m:%d %T\"`/')\"" \
  | ssmtp "${sendto}" 2>>/tmp/ssmtp.error

Note: collectd-exec module will not start user defined scripts with the root credentials so you have to make sure that resources you are requesting in your scripts are accessible to the user (or group) that is used by the collectd to start the script (default is nobody)


In the next part I'll post a real world example of the collect-exec usage - a weather info collecting script.

No comments: