Suse raid monitor

4/15/2023

Get number of logs: server:~# MegaCli64 -AdpEventLog -GetEventlogInfo -aALL Adapter #0 Newest Seqnum: 0x0000749e Oldest Seqnum: 0x00000001 Clear Seqnum: 0x000003b4 Shutdown Seqnum: 0x00006cfa Reboot Seqnum: 0x00006dc1 Success in AdpEventLog Exit Code: 0x00 (I've tested this with a h700 and h810, older cards may not work, I only discovered the history recently) Manually, this is: The perc cards have a nvram log onboard, accessible with the megacli command. What you essentially want is an email saying: This disk is dying, and it needs replacing, or rebuild has finished, or anything other than "everything is fine." You can also go so far as to have it setup automatic ticket creation with Dell Tech Support but we've been unable to get that feature working yet. Is it required to manage your environment? No but it sure does make your life much easier and gives you central firmware version reports. So even if we didn't have Dell OME installed we would have a good idea if something was about to go bad assuming our firmware levels were reasonably up to date.ĭell OME now becomes your central hardware monitoring server and OMSA is just the client side tools.

We also have central syslog monitoring setup and when a hardware issue is detected it logs the error to /var/log/messages. omreport chassis or omreport storage pdisk controller=0 to see the hardware status of the drives). Note in my environment we're mostly RHEL that I manage so also having the OMSA software opened the door to command line monitoring (i.e. We could have saved ourself a lot of headaches just opening the firewall ports to the OMSA software with the 11th gen servers, but the 12th gen servers really didn't need any bios updates or OMSA software to properly report errors. The key her for all the upgrades had to do with the Lifecycle controller which is much more robust with the 12th generation of servers (R620's, R720's, etc). Not an easy task with a large environment but eventually it resolved our issue. So when we deployed the environment we opted to just upgrade all patches that were detected by OME to get a clean report. That Perc update resolved a lot of issues where we had drives not properly being detected or being marked bad when they weren't for us. Later we found both the network firmware and Perc firmware was out of date and didn't correctly identify bad drives until it was too late (usually during scheduled reboot). This involved updating the bios first to 6.3.0, the lifecycle controller and then the iDRAC to 1.95+ (1.96 being the latest) in that order. iDRAC 1.95 and above was required for all our R710's for OME to work properly for us (11th generation servers). Note we ran into a recent issue you should be aware of. In our DMZ's security won't let us monitor the OMSA software directly so we use the iDRAC interface with SNMP monitoring back to the Dell OME server. Install the appropriate OMSA software client on all machines Setup a Dell OME server which runs on Windows Large Dell environment, iDRAC's for RCS connections and lots of DMZ's. We run into this quite a bit in a Fortune 50 client.

0 Comments

Suse raid monitor

Leave a Reply.

Author

Archives

Categories