Disk Error Alerts
Someone in the forums posted this. I thought it was appropriate to include here for others. The package 'smartmontools' (http://smartmontools.sourceforge.net/) is already present and setup on my system but wasn't running. smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests. Basically, S.M.A.R.T. may give you enough of a warning that you can safely backup all your data before your hard drive dies. Obviously, nothing replaces regular backups, but it's absolutely better than knowing nothing!
If it isn't installed on your system you can do that by using a terminal as root:
bash code |
---|
yum -y install smartmontools
|
smartmontools comes with two programs: smartctl which is meant for interactive use and smartd which continuously monitors S.M.A.R.T.
You can do a quick test to see if it recognizes your drives (replace /dev/sda by the drive(s) present on your system):
bash code |
---|
smartctl -i /dev/sda
|
To setup smartd to monitor your system automatically, edit the file /etc/smartd.conf and check for a line that begins with DEVICESCAN. Comment it out by adding a ‘#’ to the beginning of the line something like this:
Text |
---|
#DEVICESCAN -H -m root -n standby,10,q
|
Add the following line to /etc/smartd.conf:
Text |
---|
/dev/sda -n standby -a -I 194 -W 6,45,55 -R 5 -M daily -M test -m root
|
This an example from the config file:
'/dev/sda' is the drive you want to monitor '-n standby' will not wake up the drive if it is 'sleeping' or in 'standby' to poll it for status '-a' contains the most common options. you probably want this '-I 194' don't monitor normalized temperature changes, but... '-W 6,45,5' track temperature changes >= 6 Celsius, report temperatures >= 45 Celsius; send mail when temperature >= 55 celcius '-R 5' changes in Raw value of Reallocated Sector Count. '-M daily' send reports daily. (The default is to send only one warning email for each type of disk problem) '-M test' send a single test email immediately upon smartd startup. This allows one to verify that email is delivered correctly. '-m root' Send a warning email to the email address root (you can replace that with any email address provide you can send mail with your HDA)
You'll need a line like that for every drive in the server you want to monitor. Recommend to check the man page for smartd to see all the available options. There are a lot of them....
Start the daemon with:
bash code |
---|
service smartd start
|
To restart after a reboot:
bash code |
---|
chkconfig smartd on
|