Disk Error Alerts

From Amahi Wiki
Revision as of 21:06, 23 March 2011 by Bigfoot65 (talk | contribs)
Jump to: navigation, search

Finding out a disk is bad after it crashes can often prove disastrous. To be warned of a failing disk is something we all would appreciate. This can be done via smartmontools], a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. Obviously, nothing replaces regular backups, but it's absolutely better than knowing nothing!

Footnote: Special thanks to NeverSimple for documenting this process.

First, Sendmail is off by default with Amahi installed. You will need to enable it, so do the following as root user:

bash code
​service sendmail start chkconfig sendmail on​

If you prefer to have alerts sent to an email address outside your HDA, try one of the following tutorials:


This may already be installed in your system. If not, as root user do:

bash code
​yum -y install smartmontools​


smartmontools comes with two programs; smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T.

You can do a quick test to see if it recognizes your drives (replace /dev/sda by the drive(s) present on your system):

bash code
​smartctl -i /dev/sda​


To setup smartd to monitor your system automatically, edit the file /etc/smartd.conf and check for a line that begins with DEVICESCAN. Comment it out by adding a ‘#’ to the beginning of the line something like this:

Text
​#DEVICESCAN -H -m root -n standby,10,q​


Add the following line to /etc/smartd.conf:

Text
​/dev/sda -n standby -a -I 194 -W 6,45,55 -R 5 -M daily -M test -m root​

This an example from the config file:

'/dev/sda' is the drive you want to monitor
'-n standby' will not wake up the drive if it is 'sleeping' or in 'standby' to poll it for status
'-a' contains the most common options. you probably want this
'-I 194' don't monitor normalized temperature changes, but...
'-W 6,45,5' track temperature changes >= 6 Celsius, report temperatures >= 45 Celsius; send mail when temperature >= 55 celcius
'-R 5' changes in Raw value of Reallocated Sector Count.
'-M daily' send reports daily. (The default is to send only one warning email for each type of disk problem)
'-M test' send a single test email immediately upon smartd startup. This allows one to verify that email is delivered correctly. 
'-m root' Send a warning email to the email address root (you can replace that with any email address provide you can send mail with your HDA)

You'll need a line like that for every drive in the server you want to monitor. Recommend to check the man page for smartd to see all the available options. There are a lot of them.

Start the daemon with:

bash code
​service smartd start​


To restart after a reboot:

bash code
​chkconfig smartd on​


You can read local mail sent to root using Webmin.

NOTE: You will receive a test email each day or so, one for each drive you identify to be monitored.