Thursday, July 25, 2024

Isilon upgrade from 9.1.0.x to 9.4.0.14 to fix NFSv4 lock issue

We discover the NFS lock issue in OneFS 8.x with MQ.  We have MQ with message queue stores in the Isilon NFS share.  When Isilon node reboots due to maintenance, the NFS lock does not fail over to the remaining Isilon node correctly.  That causes MQ hung.  Sometimes, server requires reboot to fix the issue.  There are multiple Isilon fixes to it.  Eventually, this is fixed in 9.4.0.3.  We update to 9.4.0.14 and confirm issue is fixed.  If MQ is running in Solaris OS, pls confirm Solaris is running at least 11.4 SRU69. 

After upgrade to 9.4.0.14, some settings are reset.

1) Spillover pool reset (set to Anywhere)

2) SNMP node limit reset (we only allow SNMP alerts sent from NDMP nodes.  That is A200 in our environment).  I need to reconfigure it.  

New problem comes up after upgrade.  NDMP random backup fails with memory leak

New feature data inline introduced in 9.3 and enabled by default.  After a month of upgrade to 9.4, there is random NDMP backup failure due to memory leak.  Eventually, turn off data inline with support assistance.  Then reboot all the nodes running NDMP backup one by one to workaround the issue to clear the dedupe cache.  Permanent fix will be available in 9.7.1.x.