Saturday, April 19, 2025

DCNM login error "RequestSendFailed: EJBCLIENT000409" for the SAN Client

We have plan to build a new NDFC to manage our MDS switches.  It is more complex and will take some time to deploy.  Few challenges are resource requirement, license transfer and other new requirement.  In the meantime, we continue to use existing DCNM 11.5(4).  

Last week, suddenly, there is a login error with the java SAN client "RequestSendFailed: EJBCLIENT000409".  We have restarted the services and reboot the DCNM server.  However, it does not help.  





Do some research and most likely it is related to certificate.  We don't have our own certificate.  So, most likely, the default expired.  It is confirmed by checking the certificate expiration date from the web browser.  So, open a ticket with support and ask them to renew the certificate for 2 more years.  Then, I can login without issue.  

Thursday, April 17, 2025

Migration from VMAX 40k to PMAX 8000 and 8500

Just completed the storage migration from VMAX to PMAX 8000 and 8500 before Christmas.  NDM is not an option because NDM no longer supports Solaris.  Besides, there are a number of busy databases running in both test and production.  Before cutover to PMAX occurs, the SRDF directors between VMAX and PMAX will be the bottleneck.  Also, we do not have any extra director left to config for SRDF traffic.  

Our plan is to use Storage vMotion for all VMDK.  The RDMs are for SQL DB running in ESX.  For the RDM, we use SRDF/S to start replication to PMAX in the background.  Because there is no SRDF supported from VMAX to PMAX 8500, all luns migration using SRDF will be replicated to PMAX 8000.  Then, app owner will pick a downtime to shutdown the app and the servers.  We will stop the sync after confirming no outstanding tracks.  Remove the VMAX luns from the initiator group then add the PMAX luns.    

For Solaris, if it is Oracle DB, same size or larger luns will be added to Oracle.  Then DBA will complete the balancing and drop the old VMAX luns.  For the boot luns and luns from other app, some of our Unix admin will use SRDF/S to migrate them to PMAX 8000.  Some decide to do host base migration.  For host base migration, we just provide a lun of same size or larger to the Unix admin from PMAX 8500.  

Below summarizes the general steps for the storage migration from VMAX to PMAX 8000 using SRDF/S.  Pls check your environment and test to see if additional steps are required.  If Volume Manager is used, migration should be complete with Volume Manager rather than SRDF.  

1) Change the source luns attribute in VMAX to dyn_rdf.  
2) setup SRDF/S pair from VMAX to PowerMax 8000 (put the target lun in temp target_SG)
3) during downtime, shutdown the apps and servers  
4) Confirm no outstanding tracks.  
5)     Perform SRDF split.  
6) Remove source lun from VMAX storage group   
7) Add the target luns to SG in PowerMax and remove from temp target_SG.
8) Host team completes lun mapping 
9) Delete SRDF pair with force option 
10)  Unset GCM bit if required  (symdev -sid xxx -devs xxx unset -gcm)
11)  Host team can perform rescan if setp 10 is required.  (They should see about 1MB more space for luns in step 10)  
12)  Power up the servers to validate

That way, they can always go back to the VMAX luns if backout is required.  

Note: in the past, if the source / target luns are not mapped to FE ports, sometimes will see some strange results on some of the SRDF operation.  So, I create a temp target_SG with no HBA in the IG for the target_SG's masking view.  



Isilon NDMP backup failed after NetWorker upgraded to 19.11

We need to update firmware of existing Isilon to 9.7.1.x to add new Isilon nodes as recommended by support.  Before we do that, we need to confirm about NDMP backup with Isilon.  Because we have NDMP backup of Isilon by NetWorker, NetWorker upgrade to 19.11 from 19.10.0.4 is required.  After NetWorker upgrade in Feb, some of the NDMP backups failed with error message "Hostname resolution failed ".   Multiple retries will work.  However, it is very annoying.  

After working with support, there is new change to NetWorker in 19.11.  See kb NetWorker: server upgraded to 19.11, backup fails reporting "Hostname resolution failed" | Dell US

None of the workaround in the kb above will fix the issue on Isilon.  The forward DNS lookup for Isilon are actually forwarded to Smartconnect SIP from DNS server.  Only option is to add a reverse entries to the DNS server for all the Isilon nodes that handle NDMP backup.  Refer to link SmartConnect and Reverse DNS | Dell PowerScale: Network Design Considerations | Dell Technologies Info Hub.  

So, pointer records for all Isilon nodes handling NDMP backup will be created on the DNS server for the NDMP zone name.  Once that is done, all backup is fine.  Keep in mind there is no change on the forward lookup which is still handled by Smartconnect SIP.  Not sure if there is a fix now.  

Tuesday, April 15, 2025

Update Isilon to 9.7.1.4 from 9.4.0.14 to add new A300, H700 and F710 nodes

Main reason for the upgrade in Jan is to replace existing A200, H500 and F800 with new Isilon nodes A300, H700 and F710 nodes.  Support recommends to update firmware of existing nodes to 9.7.1.4 before adding new Isilon nodes to the pool.  

Things went smoothly during upgrade.  So far no issue.  Right after upgrade, I need to reconfigure the following settings again.  

1) SNMP node restriction got reset.  Manually select node 1-4 again (In our environment, we only allow SNMP trap from A200 nodes.  The A200 nodes handle only NDMP backup; the only VLAN for those A200 nodes)






2) SNMP v3 is new feature and selected auto in alert channel for SNMP.  Manually select SNMP v2 for our environment













3) SMTP setting is reset back to the manual settings (nothing required since it is populated with the same SMTP info)

New features

1) IPv6 feature is new and enabled by default

2) New feature for Transfer Limit at 90% for spillover pool.  I guess it will not fill it up if pool is 90% full and there is other pool with space less than 90%.

3) Support Assist is required for SCG in future OneFS.  

4)     Firewall 



Wednesday, December 25, 2024

Create partition with Linux

Sometimes, for some odd reason, there is issue deleting and reforming partition on USB disk.  I have to use Linux to complete the task.  Again, this will erase everything on the USB disk.  Assume the device handle for the USB jump drive is /dev/sdb.  

1) run "sudo fdisk /dev/sdb"
2) type "o" to create new dos partition table then hit enter
3) type "n" to create a new partition then hit enter
4) hit enter if you are ok with default settings
5) type "t" then "7" to create a exFAT partition
6) type "w" to save the change then "q" to exit.  
7) run "sudo mkfs.exfat -n "label" /dev/sdb1




Thursday, July 25, 2024

Isilon upgrade from 9.1.0.x to 9.4.0.14 to fix NFSv4 lock issue

We discover the NFS lock issue in OneFS 8.x with MQ.  We have MQ with message queue stores in the Isilon NFS share.  When Isilon node reboots due to maintenance, the NFS lock does not fail over to the remaining Isilon node correctly.  That causes MQ hung.  Sometimes, server requires reboot to fix the issue.  There are multiple Isilon fixes to it.  Eventually, this is fixed in 9.4.0.3.  We update to 9.4.0.14 and confirm issue is fixed.  If MQ is running in Solaris OS, pls confirm Solaris is running at least 11.4 SRU69. 

After upgrade to 9.4.0.14, some settings are reset.

1) Spillover pool reset (set to Anywhere)

2) SNMP node limit reset (we only allow SNMP alerts sent from NDMP nodes.  That is A200 in our environment).  I need to reconfigure it.  

New problem comes up after upgrade.  NDMP random backup fails with memory leak

New feature data inline introduced in 9.3 and enabled by default.  After a month of upgrade to 9.4, there is random NDMP backup failure due to memory leak.  Eventually, turn off data inline with support assistance.  Then reboot all the nodes running NDMP backup one by one to workaround the issue to clear the dedupe cache.  Permanent fix will be available in 9.7.1.x.  

Tuesday, October 24, 2023

Setup VLAN with TP-LINK WR1043ND v1.6 (LibreCMC)

As mentioned before, I have flashed LibreCMC on TP-LINK WR1043ND v1.6 with WiFi disabled. It is my backup router.  Because my ISP support 2 IPs, and the new cable modem does come with 2 ethernet ports, I have one port connecting to my DIR-882 (using Padavan firmware) serving the main floor and my bedroom in 2nd floor with WiFi enabled.  The 2nd port is now connected to TP-Link WR1043 in the study room.  Now, I want to create a separate VLAN for miner, so, it is separated from my test equipment.  So, I update LibreCMC to 1.5.14.  It requires the "Keep Settings" to be unchecked for the upgrade.  Please check the link below before proceed.  Since it was a backup router, it does not matter for me if the settings are wiped.  

Releases - Gogs (librecmc.org)

There is not many guides in the internet for LibreCMC.  Since it is based on OpenWrt 19.x, I check the youtube guide below.  

How to Create a VLAN - A Beginner's Guide // OpenWrt Router (Up to 19.x) - YouTube

How to configure OpenWrt as Firewall for your home network and Guest Wifi and IPTables explained - YouTube

First step is to go to Network > Switches.  You will see VLAN 1 and 2 populated already.  Now, add a new switches with VLAN 5.  I pick LAN port 3 for VLAN 5 which will be used by the miner.  VLAN 1 will be used for test equipment (LAN port 1, 2 4). 




Second step is to go to Network > Interfaces.  Click "Add new Interface" to add a new interface named  VLAN_5.  Below are the rest of settings.
Protocol: static addresses
For my setup, "Create a bridge over Multiple Interfaces" is unchecked since WiFi is disable.  
Cover the following interfaces: select eth0.5 (automatically created when VLAN 5 is created in previous step)



Then, go to Network > Interfaces again.  Click on the VLAN_5 interface and apply the settings below.
Protocol: Static address
Bring up on Boot: checked
IPv4 address: default gateway for subnet VLAN 5
IPv4 subnet mask: select the subnet mask

Then scroll down to DHCP server section.
Select the range for DHCP client IP address and the lease time. 

Click on Advance Settings under DHCP section.  Make sure Dynamic DHCP is checked.  

Now, click on Physical Settings tab.  In my case, Bridge Interface is unchecked since WiFi is turned off and the correct Interface eth0.5 is selected for VLAN 5.  


Next, go to Firewall Settings Tab.  Create a new firewall zone called VLAN5 for VLAN 5.  


Lastly is to config firewall.  For VLAN_5, I don't want the miner to reach the router other than getting IP and DNS resolution but will have access to internet.  Also, it cannot reach to VLAN 1.  

So, setup VLAN5 zone to allow VLAN5 forwarding to WAN to get internet access.  Input and forward are set to reject and Output is set to accept.  


To allow client (miner) in VLAN_5 to reach to router for DHCP and DNS, setup a rule to "Accept Input" from VLAN_5 to router IP at 53, 67 and 68.

There are other exception rules predefined.  You can disable them based on your requirement.  






Wednesday, September 20, 2023

Setup Tumbleweed as physical host for VirtualBox

I decide to try Tumbleweed on the same old FX-8320 host.  Installation is similar to LEAP 15.4.  For GUI option, I choose Generic Desktop instead of XFCE Desktop.  XFCE is running quite smoothly in LEAP 15.4.  I decide to try Generic Desktop this time since it is only a physical host.  

Follow the same link below, I try to install driver version G05 for my old GT630 (does not work with LEAP 15.4).  

SDB:NVIDIA drivers - openSUSE Wiki 

Reboot the host and it suggests to install the previous version G04.  Follow the wiki link above and install G04.   




 
Below are my software pick for the host.  
File Manager: PCmanFM
Terminal: termit
Text Editor: Leafpad
Graphics: Pinta (take and edit basic screenshot)
GRsync: sync my USB drive for backup.  

USB wireless 8812bu adapter is discovered correctly.  No driver installation is required.  

Only issue is NetworkManager icon is not available for Generic Desktop.  First I install wicked and other tools.  It works fine but I decide to try to get NetworkManager working.  Then I remove wicked and not sure why, there is no networking at all.  So, I rollback the system to the point before wicked is installed.  

Once NetworkManager-Connection-Editor is installed, I have the simple GUI to manage my network connection.  If your connection does not change much, wicked is just fine. 

Then install VirtualBox.  Reference link is How to install Virtualbox on OpenSUSE Tumbleweed & Leap - Linux Shout (how2shout.com.


sudo zypper install virtualbox

I find out 7.0.10 is installed.  

Once completed, add the user account to group vboxusers

sudo gpasswd -a userID vboxusers

Then, logoff and log back in.

Open VirtualBox.  If you are asked to enable USB passthrough, just click Enable.  












Download the VirtualBox extension pack from VirtualBox site.  Same as before, it does not work with GUI installation.  I install it thru CLI (see command below)

VBoxManage extpack install extpack_7.0.10

I decide to go with KVM/QEMU this time.  Will create a separate post for that. 






Monday, August 28, 2023

Isilon multiscan moves data between tiers unexpectedly

 After Isilon was implemented 2.5 yrs ago, we decided to stop SmartPool jobs and use combination of IndexUpdate and FilePolicy job to save resources on the Isilon.  We have 4 SSD nodes handling front end traffic, 4 hybrid nodes handling replication plus some test traffic and 4 SATA nodes handling the backup.  Initially, SATA nodes were deployed with insufficient memory and caused couple of outages during NDMP backup.  After max the memory on SATA nodes, there are no more issue after that.  

Because we want to keep as much data as possible on SSD, we won't run FilePolicy until SSD used space is over 70%.  After firmware update (nodes reboot), we saw data move between Tiers with multiscan kicking in auto.  That's about 2 yrs ago.  Checked with support and they said it's ok to kill it.  So, we just killed the Mutliscan job.  Recently, we finally had our first disk failed on the production Isilon and Multiscan was kicked in auto.  Same as before, we saw data moving unexpectedly to SATA tier.  I suspected if that's related to FilePolicy.  However, I was told it should not by support.  After case with support for a long time, we finally got the suggestion from higher level of support to run SmartPool job periodically.  So, I play with the DR Isilon since it is not busy.  Using IndexUpdate and FilePolicy job to move the data and compare the tier usage multiple times, each tier finally reached to the utilization I want 60-70% for SSD and below 60% for SAS pool (the spillover pool).  Then, kick off MultiScan job.  Now, I don't see any more data move between tiers with MultiScan job.  







In the future, I will adjust the FilePolicy each month and then kick off the SmartPool job once a month just in case Multiscan is started due to failed HDD or node replacement.




===========================================================================

Update Sep 25, 2023

For Production Isilon, after IndexUpdate completes, I run FilePolicy then Multiscan job.  I still see data moves to SATA pool.  So, I run IndexUpdate -> FilePolicy -> SmartPool job.  Same things.  Looks like there is discrepancy between IndexUpdate + FilePolicy and SmartPool job.  

This time, I just run SmartPool job and kill it once SATA pool reaches 85%.  Then adjust the FilePolicy and rerun it SmartPool job.  After 3 tries, I finally see the results I want.  Why there is a discrepancy between FilePolicy and SmartPool job, I have no idea.  

From now on, I will adjust the FilePolicy once every 2 months and run the SmartPool job.  For DR Isilon, since it has far less data, I will adjust FilePolicy once every 3-4 month and then run SmartPool job. 

Wednesday, August 16, 2023

Cable Modem TC4400 overheated?

Switch to TC4400 cable modem for a few months because DOCSIS 3.0 will not be supported by my cable company.  This is the only one not using Puma chipset and supported by my cable company.  It worked fine until recently.  The internet connection drops few times day.  When I touch it, the modem is really hot.  The quick fix is to put a little 15mm by 15mm fan on the top of modem to draw the heat away from it.  Now, it is much better and has not experienced any more connection drop.