Disk Error Alerts
Finding out a disk is bad after it crashes can often prove disastrous. To be warned of a failing disk is something we all would appreciate. This can be done via smartmontools], a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. Obviously, nothing replaces regular backups, but it's absolutely better than knowing nothing!
Footnote: Special thanks to NeverSimple for documenting this process.
First, Sendmail is off by default with Amahi installed. You will need to enable it, so do the following as root user:
bash code |
---|
service sendmail start chkconfig sendmail on
|
If you prefer to have alerts sent to an email address outside your HDA, try one of the following tutorials:
This may already be installed in your system. If not, as root user do:
bash code |
---|
yum -y install smartmontools
|
smartmontools comes with two programs; smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T.
You can do a quick test to see if it recognizes your drives (replace /dev/sda by the drive(s) present on your system):
bash code |
---|
smartctl -i /dev/sda
|
To setup smartd to monitor your system automatically, edit the file /etc/smartd.conf and check for a line that begins with DEVICESCAN. Comment it out by adding a ‘#’ to the beginning of the line something like this:
Text |
---|
#DEVICESCAN -H -m root -n standby,10,q
|
Add the following line to /etc/smartd.conf:
Text |
---|
/dev/sda -n standby -a -I 194 -W 6,45,55 -R 5 -M daily -M test -m root
|
This an example from the config file:
'/dev/sda' is the drive you want to monitor '-n standby' will not wake up the drive if it is 'sleeping' or in 'standby' to poll it for status '-a' contains the most common options. you probably want this '-I 194' don't monitor normalized temperature changes, but... '-W 6,45,5' track temperature changes >= 6 Celsius, report temperatures >= 45 Celsius; send mail when temperature >= 55 celcius '-R 5' changes in Raw value of Reallocated Sector Count. '-M daily' send reports daily. (The default is to send only one warning email for each type of disk problem) '-M test' send a single test email immediately upon smartd startup. This allows one to verify that email is delivered correctly. '-m root' Send a warning email to the email address root (you can replace that with any email address provide you can send mail with your HDA)
You'll need a line like that for every drive in the server you want to monitor. Recommend to check the man page for smartd to see all the available options. There are a lot of them.
Start the daemon with:
bash code |
---|
service smartd start
|
To restart after a reboot:
bash code |
---|
chkconfig smartd on
|
You can read local mail sent to root using Webmin.
NOTE: You will receive a test email each day or so, one for each drive you identify to be monitored.