Thursday, October 31, 2019

Equallogic dirty cache

I used to manage a large number of Equallogic 6510.  They have bought them for production used because of insufficient budget for enterprise storage.  It is just an entry level array with very limited redundancy.  Controllers are active / standby and it takes 21 s to failover when it is completely idle.  So, you can imagine how long it will take if there is heavy IOPs.

It is a pain to update firmware since it takes some time for controller to failover.  Linux and Unix will not like that.  If you used it for production, it will be really hard to get downtime.

One time, there was a bug and both controller panic.  After it starts back up, the management interface is not reachable.  Check serial connection and see the following.  Even though the controllers panic, it still displayed a msg "This is a POWER FAILURE RECOVERY".  Also, it showed RAID LUN not recoverable.  Keep looking down, sounds like we actually have a dirty cache stuck in the memory.  This normally happens with power failure.  


When I try to login from serial port, it shows the array is not even configured.  Obviously answer No when asked to config the array.  


Contact support from that point and it is indeed a dirty cache problem.  Talk to support and confirm if the dirty cache stuck, user will lose management interface access.  Also, only later model of EQL support port failover to standby controller.  If both interfaces from the active controllers die, the interface will not failover to standby controller interface.  You will need to do a manual failover.  That's why I don't suggest them for production. 

Clear the cache and reboot the array.  Everything is normal from that point.  Only thing you lost is the data in the stuck cache.  Luckily, there is no database running in those arrays.  The uncorrectable sectors are empty space. 

Earlier version of firmware especially version 5 and 6 are problematic.  Lots of problem.  After the latest patch of version 7 installed, we see stability from that point.  However, new enterprise arrays were installed, and these units were used for backup / archiving.  Now, they were all retired.