Activities

October 2012
M T W T F S S
« Sep   Nov »
1234567
891011121314
15161718192021
22232425262728
293031  

Identify the process activities while server load is high

Recently I had an issue with one of my Nagios host showing huge load value ie 167(unbelievable) in 3 minutes of our early morning IST time and went offline. But server will be back in place automatically when load decreases in an hour.

My investigation shows that system time was automatically had updated to past two hours which causes all the cronsjobs start to run together and causing huge server load and platform downtime. Unfortunately there is not 100% surety on my assumption and I wish to record the process activity when load is getting high. So that we can investigate the root cause with more specific values.

Google lead to to find a script from here

http://jackal777.wordpress.com/2012/08/08/linux-bash-script-to-record-all-process-during-high-load/

This script will log the process activities when system load is getting high and create a dated log on server. This will help to identify the issue for the later analysis. I modified this script to send process details are email to the concerned persons.

if [ ! -d /tmp/pinfo ];then
   mkdir /tmp/pinfo
fi

if [[ `awk '{print int($3)}' /proc/loadavg` -gt `grep -c processor /proc/cpuinfo` && ! -f /tmp/lock ]];then
   touch /tmp/pinfo/.lock
   #Record all process
   filename="/tmp/pinfo/`date +%F-%H-%M-%S`"
   echo -e "############## Process List ############\n\n" >> ${filename}
   top -b -n 1 > ${filename}
   echo -e "\n" >> ${filename}
   echo -e "####### Memory Usage  #########" >> ${filename}
   free -m  >> ${filename}
   echo -e "\n" >> ${filename}
   echo -e "####### Disk Usage  #########" >> ${filename}
   df -h  >> ${filename}
   rm -f /tmp/pinfo/.lock
   #Send alert email
   sendEmail -f admins@mydomain.net  -t admins@mydomain.com, greg@mydomain.com -cc 24x7@mydomain.cpm  -u "Report :  MyServer: process list - CPU Load is high" $(date +%d-%m-%Y)  -l /var/log/sendEmail  -o message-content-type=auto  message-file=${filename}   -s smtp.mailserver.com:25 -xu user1  -xp d3WtBhY7Y3z

fi

This script can be downloadable from here

Pls note I uses “sendEmail” which is a third party program to send email using smtp account. So that we can ensure the delivery of the such critical emails. You may download it from http://caspian.dotconf.net/menu/Software/SendEmail/. Just download it and copy it to “/usr/bin” folder.

Cron settings

 */5 * * * * /bin/sh /home/installation/scripts/check_load_process.sh >  /dev/null 2>&1

Great !! All done .. Now you can say something specifically when server gets busy .. 🙂 all the time..

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>