My VM, SAN and NetWorker blog

Monday, April 13, 2026

Upgrade to SupportAssist in Isilon

After upgrade to OneFS 9.7.1.4, there is a feature enable to use Support Assist for dial in and alerting EMC. It will be mandatory in the future release of OneFS.

We have latest version of SCG to monitor the Isilon. So, we decide to enable Support Assist via Gateway topology.

Links below give you the info about Support Assist for Isilon and how to enable Support Assist in Isilon.

https://infohub.delltechnologies.com/sv-se/p/onefs-supportassist-provisioning-part-1/

https://infohub.delltechnologies.com/sv-se/p/onefs-supportassist-provisioning-part-2/

For CLI to migrate to Support Assist, see EMC Article Number: 000225888.

https://www.dell.com/support/kbdoc/en-uk/000225888/powerscale-how-to-migrate-from-secure-remote-support-srs-to-supportassist

Since there is existing SCG monitoring Isilon, the only thing I have to add is a firewall rule to allow traffic from all Isilon nodes to both SCG GW port 9443.

It will take up to 24 hours for the provisioning to complete. See EMC KB Article Number: 000232371

https://www.dell.com/support/kbdoc/en-us/000232371/powerscale-support-assist-provision-gets-stuck-in-incoming

======================================================================

Support Assist is not sending alerts back to EMC.

Check from SCG and see connectivity is bad but CLI is ok. Have Dell dial in without issue. After going back and forth, it is actually a bug. The connection will stay up for few days to few weeks then it will not dial back to EMC. See EMC KB Article Number: 000348244

PowerScale: SupportAssist Disconnects SCG and CIQ - APEX AIOps Due to Crispies Lock on Node

We decide not to upgrade and stay in 9.7.1.4. So, support has to follow the kb to apply the workaround.

The other bug is email alert once a month to remind you the cluster is not connected back to EMC via eSRS like below. After migrating to SupportAssist, it looks like eSRS is now shutdown in Isilon. Check with support and this can be ignored since Isilon is now monitored by SupportAssist. Wonder if this will be fixed in future release.

"This cluster is not connected back to EMC via ESRS.

Configuring connectivity to EMC via ESRS is a secure method to improve the overall reliability of your cluster and reduce the time it takes to resolve any issues the cluster may develop."

NDFC upgrade / deploy part 3 - deploy new NDFC one node setup in ESX

As mentioned in previous post, we choose 1 node setup with App OVA with 16vCPU and 64GB memory. It will be run in ESX cluster.

Pls review the link below

Deploying Nexus Dashboard Using VMware vCenter

Cisco Nexus Dashboard Fabric Controller 12

For SAN, you still need the following ready before deployment.

1) 2 NICs: one for management IP and one for data IP and both can be in the same subnet.

2) DNS and NTP are required

3) Cluster name and VM name should be different but it is a one node setup. So, it does not really matter.

4) Pool IP: 2 are required in our setup (2 IPs for data network) Ask for an additional IP if you plan to use SAN Insight). All pool IP should be whitelisted for SMTP traffic in additional to Management and Data IP.

5) Internal subnets settings (these subnets are used internally and will not leave cluster. Just in case, confirm the subnets are not in use and subnet mask has to be 255.255.0.0)

App subnet and Service subnet

6) May want extra space for HDD depends on the number of switches to manage.

The steps are very straight forward. However, depending on the environment, do not specify VLAN in step 18 for data network. Otherwise, the data network will not be able to communicate with the switches even they are in the same subnet. In our environment, VLAN info is already present at the switch and ESX level.

For initial setup, you can follow the document below

Initial setup

Feature Management will be set to SAN Controller with Performance Monitoring and SAN Web Device Manager for my environment.

Server settings are self explanatory. For SMTP, make sure all pool IPs are added to SMTP allowed list.

If you want to confirm SMTP alert, go to Event -> Event setup -> Forwarding -> Add Rule (same as DCNM). Multiple recipients allowed and event email will be sent thru pool IP.

Below link provided detail steps on how to add event email.

How to Configure Event Forwarding using Cisco Nexus Dashboard Fabric Controller - Cisco

I include a blog post below for bugs we faced for NDFC appliance. See

NDFC bugs since deployed

Monday, March 2, 2026

NDFC upgrade / deploy part 2 - sizing and compatibility

At the time we were planning, we decided to use ND 3.2.1i (which comes with NDFC 12.2.2). That would work with all our existing linecard, switches and firmware. I don't need to worry about existing firmware 8.4(2e) was not supported. Please use links below for compatibility check.

For software / hardware compatibility, pls check NDFC Software and Hardware Compatibility Matrix.

To confirm the NDFC and ND compatibility, pls check Cisco Nexus Dashboard and Services Compatibility Matrix

Next is sizing. For sizing, pls check Cisco Nexus Dashboard Capacity Planning. We decide to use 1 App node config since it is a small environment. (Note: if one node config is decided, it will not support adding any more node in the future). App OVA requires 16 vCPU and 64GB memory.

Other requirement: pls go through documents below

Cisco Nexus Dashboard and Services Deployment and Upgrade Guide, Release 3.2.x - Prerequisites: Nexus Dashboard [Cisco Nexus Dashboard] - Cisco

Device Manager error in Cisco

Recently, when we try to open Device Manager to make changes on the MDS 9710 switch, the error below pops up. It only happens to one of the sites. The other site works fine. We try to open device manager thru NDFC and get the same error.

We did see a different error in the past and needed to switchover the controller in the past. The error was "Busy network, no route, or snmpd is unresponsive."

See EMC kb 000218149. However, it still fails to open.

We suspect something with network but still open a ticket with Cisco. Support runs some trace and does not see issue on the switch side and there is connectivity between switch and the client running Device Manager. Eventually, we workaround the issue by confirming TCP is set to true for DeviceManager.bat.

set JVMARGS=%JVMARGS% -Dsnmp.preferTCP=true

We still have trouble after that. So, we login to the switch in the other site that we don't have issue with. Then click Device > Preference.

Select TCP for Use SNMP on next launch. Click Apply then Ok.

Try to use Device Manager on the switch that we have issue. Now it works fine. Only problem with the workaround is to have a switch that can connect to Device Manager without issue. I have not asked support if there is any other way to force it to use TCP instead of UDP.

NDFC bugs since deployed

NDFC was deployed in last Oct. Couple of bugs were discovered.

1) About 4 - 6 weeks, we will see an alert

Elasticsearch error - 'could not fetch component status'

The bug ID is CSCwm51621.

If you follow the bug ID above, there is a solution from the forum. For me, when that error comes up, I just reboot the appliance with "acs reboot". The error will go away, and it won't come up until another 4 - 6 weeks. A normal reboot is sufficient and DO NOT add any other option after "acs reboot".

2) /logs/k8/pods 90% usage alert. About 4 months after deployed, an alert /log/k8/pods 90% usage showed up in the Admin Console of the Nexue Dashboard.

Support was contacted to clear the old logs. Currently, there is no fix, and webex is required for support to clear the old logs manually. In the future release, the log retention for the folder will be changed based on the chat with support.

Tuesday, December 30, 2025

NDFC upgrade / deploy part 1 - License planning

With DCNM reaching EOL in Apr 2026, the only option left is to upgrade to NDFC. Some site may decide to run with DCNM until the whole infrastructure is migrated to cloud. Because we are still refreshing old MDS 9700 switches, migrating to NDFC is a must. The main issue with firmware upgrade for existing switch is smart license. We manage no more than 10 MDS FC switches at a time. So, setup is kind of simple. Still, it is more complicated compared to DCNM.

It looks like 9.2(1a) is the last version of firmware in MDS 9000 supporting legacy license / license file. If you upgrade existing switch running older version of firmware to 9.2.2, the switch license will be changed to smart licensing auto. Check with support. The answer I got is those existing switch upgraded to newer firmware will be in Donor mode in the worst case. There should be no impact to functionality according to support. Because some of the MDS switches licenses were purchased through 3rd party vendor, and we change to another 3rd party vendor for support later, I don't know what will happen if there is licensing issue after firmware upgrade on existing switch. So, we decide to keep existing older MDS 9700 switches with firmware 8.4(2e) until hardware refresh completes in the next 2-3 yrs. There are a few bugs impacting firmware 8.4. Confirm you have all the workarounds before deciding to stay in 8.4 firmware. If you decide to upgrade existing MDS 9700 to use smart license, check with support first.

For all the new MDS 9700, they are all shipped with firmware 9.4.x. So, smart license is enabled auto. Check with support and there is no license file any more on these new MDS 9700 switches. License is installed at the factory.

Because it is a close environment, we setup our NDFC smart licensing to offline mixed mode. Mixed mode will support existing license file in older MDS 9700 with older 8.4 firmware. We contact support to transfer existing legacy DCNM server license to smart license before upgrade. At most 365 days / switch addition or decommission, I will export the license info back to Cisco software support website and then import it back to the NDFC server licensing section.

Below are some of the Cisco Smart licensing links / doc for your information.

Cisco Nexus Dashboard Fabric Controller Licensing FAQ - Cisco

Brownfield_Conversion_QRG

Cisco MDS 9000 Series Licensing Guide, Release 9.x - Smart Licensing Using Policy [Cisco MDS 9000 NX-OS and SAN-OS Software] - Cisco

Cisco MDS Smart Licensing Using Policy Data Sheet

Saturday, April 19, 2025

DCNM login error "RequestSendFailed: EJBCLIENT000409" for the SAN Client

We have plan to build a new NDFC to manage our MDS switches. It is more complex and will take some time to deploy. Few challenges are resource requirement, license transfer and other new requirement. In the meantime, we continue to use existing DCNM 11.5(4).

Last week, suddenly, there is a login error with the java SAN client "RequestSendFailed: EJBCLIENT000409". We have restarted the services and reboot the DCNM server. However, it does not help.

Do some research and most likely it is related to certificate. We don't have our own certificate. So, most likely, the default expired. It is confirmed by checking the certificate expiration date from the web browser. So, open a ticket with support and ask them to renew the certificate for 2 more years. Then, I can login without issue.

Thursday, April 17, 2025

Migration from VMAX 40k to PMAX 8000 and 8500

Just completed the storage migration from VMAX to PMAX 8000 and 8500 before Christmas. NDM is not an option because NDM no longer supports Solaris. Besides, there are a number of busy databases running in both test and production. Before cutover to PMAX occurs, the SRDF directors between VMAX and PMAX will be the bottleneck. Also, we do not have any extra director left to config for SRDF traffic.

Our plan is to use Storage vMotion for all VMDK. The RDMs are for SQL DB running in ESX. For the RDM, we use SRDF/S to start replication to PMAX in the background. Because there is no SRDF supported from VMAX to PMAX 8500, all luns migration using SRDF will be replicated to PMAX 8000. Then, app owner will pick a downtime to shutdown the app and the servers. We will stop the sync after confirming no outstanding tracks. Remove the VMAX luns from the initiator group then add the PMAX luns.

For Solaris, if it is Oracle DB, same size or larger luns will be added to Oracle. Then DBA will complete the balancing and drop the old VMAX luns. For the boot luns and luns from other app, some of our Unix admin will use SRDF/S to migrate them to PMAX 8000. Some decide to do host base migration. For host base migration, we just provide a lun of same size or larger to the Unix admin from PMAX 8500.

Below summarizes the general steps for the storage migration from VMAX to PMAX 8000 using SRDF/S. Pls check your environment and test to see if additional steps are required. If Volume Manager is used, migration should be complete with Volume Manager rather than SRDF.

1) Change the source luns attribute in VMAX to dyn_rdf.
2) setup SRDF/S pair from VMAX to PowerMax 8000 (put the target lun in temp target_SG)
3) during downtime, shutdown the apps and servers
4) Confirm no outstanding tracks.
5) Perform SRDF split.
6) Remove source lun from VMAX storage group
7) Add the target luns to SG in PowerMax and remove from temp target_SG.
8) Host team completes lun mapping
9) Delete SRDF pair with force option
10) Unset GCM bit if required (symdev -sid xxx -devs xxx unset -gcm)
11) Host team can perform rescan if setp 10 is required. (They should see about 1MB more space for luns in step 10)

12) Power up the servers to validate

That way, they can always go back to the VMAX luns if backout is required.

Note: in the past, if the source / target luns are not mapped to FE ports, sometimes will see some strange results on some of the SRDF operation. So, I create a temp target_SG with no HBA in the IG for the target_SG's masking view.

Isilon NDMP backup failed after NetWorker upgraded to 19.11

We need to update firmware of existing Isilon to 9.7.1.x to add new Isilon nodes as recommended by support. Before we do that, we need to confirm about NDMP backup with Isilon. Because we have NDMP backup of Isilon by NetWorker, NetWorker upgrade to 19.11 from 19.10.0.4 is required. After NetWorker upgrade in Feb, some of the NDMP backups failed with error message "Hostname resolution failed ". Multiple retries will work. However, it is very annoying.

After working with support, there is new change to NetWorker in 19.11. See kb NetWorker: server upgraded to 19.11, backup fails reporting "Hostname resolution failed" | Dell US

None of the workaround in the kb above will fix the issue on Isilon. The forward DNS lookup for Isilon are actually forwarded to Smartconnect SIP from DNS server. Only option is to add a reverse entries to the DNS server for all the Isilon nodes that handle NDMP backup. Refer to link SmartConnect and Reverse DNS | Dell PowerScale: Network Design Considerations | Dell Technologies Info Hub.

So, pointer records for all Isilon nodes handling NDMP backup will be created on the DNS server for the NDMP zone name. Once that is done, all backup is fine. Keep in mind there is no change on the forward lookup which is still handled by Smartconnect SIP. Not sure if there is a fix now.

Tuesday, April 15, 2025

Update Isilon to 9.7.1.4 from 9.4.0.14 to add new A300, H700 and F710 nodes

Main reason for the upgrade in Jan is to replace existing A200, H500 and F800 with new Isilon nodes A300, H700 and F710 nodes. Support recommends to update firmware of existing nodes to 9.7.1.4 before adding new Isilon nodes to the pool.

Things went smoothly during upgrade. So far no issue. Right after upgrade, I need to reconfigure the following settings again.

1) SNMP node restriction got reset. Manually select node 1-4 again (In our environment, we only allow SNMP trap from A200 nodes. The A200 nodes handle only NDMP backup; the only VLAN for those A200 nodes)

2) SNMP v3 is new feature and selected auto in alert channel for SNMP. Manually select SNMP v2 for our environment

3) SMTP setting is reset back to the manual settings (nothing required since it is populated with the same SMTP info)

New features

1) IPv6 feature is new and enabled by default

2) New feature for Transfer Limit at 90% for spillover pool. I guess it will not fill it up if pool is 90% full and there is other pool with space less than 90%.

3) Support Assist is required for SCG in future OneFS.

4) Firewall

My VM, SAN and NetWorker blog

Monday, April 13, 2026

Upgrade to SupportAssist in Isilon

NDFC upgrade / deploy part 3 - deploy new NDFC one node setup in ESX

NDFC bugs since deployed

Monday, March 2, 2026

NDFC upgrade / deploy part 2 - sizing and compatibility

Device Manager error in Cisco

NDFC bugs since deployed

Tuesday, December 30, 2025

NDFC upgrade / deploy part 1 - License planning

Saturday, April 19, 2025

DCNM login error "RequestSendFailed: EJBCLIENT000409" for the SAN Client

Thursday, April 17, 2025

Migration from VMAX 40k to PMAX 8000 and 8500

Isilon NDMP backup failed after NetWorker upgraded to 19.11

Tuesday, April 15, 2025

Update Isilon to 9.7.1.4 from 9.4.0.14 to add new A300, H700 and F710 nodes

Labels

Blog Archive

Disclaimer