Monday, April 13, 2026

Upgrade to SupportAssist in Isilon

After upgrade to OneFS 9.7.1.4, there is a feature enable to use Support Assist for dial in and alerting EMC.  It will be mandatory in the future release of OneFS.  

We have latest version of SCG to monitor the Isilon.  So, we decide to enable Support Assist via Gateway topology.  

Links below give you the info about Support Assist for Isilon and how to enable Support Assist in Isilon.

https://infohub.delltechnologies.com/sv-se/p/onefs-supportassist-provisioning-part-1/

https://infohub.delltechnologies.com/sv-se/p/onefs-supportassist-provisioning-part-2/

For CLI to migrate to Support Assist, see EMC Article Number: 000225888.  

https://www.dell.com/support/kbdoc/en-uk/000225888/powerscale-how-to-migrate-from-secure-remote-support-srs-to-supportassist

Since there is existing SCG monitoring Isilon, the only thing I have to add is a firewall rule to allow traffic from all Isilon nodes to both SCG GW port 9443. 

It will take up to 24 hours for the provisioning to complete.  See EMC KB Article Number: 000232371

https://www.dell.com/support/kbdoc/en-us/000232371/powerscale-support-assist-provision-gets-stuck-in-incoming

======================================================================

Support Assist is not sending alerts back to EMC.  

Check from SCG and see connectivity is bad but CLI is ok.  Have Dell dial in without issue.  After going back and forth, it is actually a bug.  The connection will stay up for few days to few weeks then it will not dial back to EMC.  See EMC KB Article Number: 000348244

PowerScale: SupportAssist Disconnects SCG and CIQ - APEX AIOps Due to Crispies Lock on Node

We decide not to upgrade and stay in 9.7.1.4.  So, support has to follow the kb to apply the workaround.  

The other bug is email alert once a month to remind you the cluster is not connected back to EMC via eSRS like below.  After migrating to SupportAssist, it looks like eSRS is now shutdown in Isilon.  Check with support and this can be ignored since Isilon is now monitored by SupportAssist.  Wonder if this will be fixed in future release.  

"This cluster is not connected back to EMC via ESRS.

Configuring connectivity to EMC via ESRS is a secure method to improve the overall reliability of your cluster and reduce the time it takes to resolve any issues the cluster may develop."

NDFC upgrade / deploy part 3 - deploy new NDFC one node setup in ESX

 As mentioned in previous post, we choose 1 node setup with App OVA with 16vCPU and 64GB memory.  It will be run in ESX cluster.  

Pls review the link below 

Deploying Nexus Dashboard Using VMware vCenter

Cisco Nexus Dashboard Fabric Controller 12 

For SAN, you still need the following ready before deployment.
1) 2 NICs: one for management IP and one for data IP and both can be in the same subnet.  
2) DNS and NTP are required
3) Cluster name and VM name should be different but it is a one node setup.  So, it does not really matter.
4) Pool IP: 2 are required in our setup (2 IPs for data network)  Ask for an additional IP if you plan to use  SAN Insight).  All pool IP should be whitelisted for SMTP traffic in additional to Management and Data IP.  
5) Internal subnets settings (these subnets are used internally and will not leave cluster.  Just in case, confirm the subnets are not in use and subnet mask has to be 255.255.0.0)
App subnet and Service subnet
6) May want extra space for HDD depends on the number of switches to manage.

The steps are very straight forward.  However, depending on the environment, do not specify VLAN in step 18 for data network.  Otherwise, the data network will not be able to communicate with the switches even they are in the same subnet.  In our environment, VLAN info is already present at the switch and ESX level.  

For initial setup, you can follow the document below 

Feature Management will be set to SAN Controller with Performance Monitoring and SAN Web Device Manager for my environment.  









Server settings are self explanatory.  For SMTP, make sure all pool IPs are added to SMTP allowed list.  

If you want to confirm SMTP alert, go to Event -> Event setup -> Forwarding -> Add Rule (same as DCNM).  Multiple recipients allowed and event email will be sent thru pool IP.  

Below link provided detail steps on how to add event email.

How to Configure Event Forwarding using Cisco Nexus Dashboard Fabric Controller - Cisco

I include a blog post below for bugs we faced for NDFC appliance.  See 

NDFC bugs since deployed

Monday, March 2, 2026

NDFC upgrade / deploy part 2 - sizing and compatibility

At the time we were planning, we decided to use ND 3.2.1i (which comes with NDFC 12.2.2).  That would work with all our existing linecard, switches and firmware.  I don't need to worry about existing firmware 8.4(2e) was not supported.  Please use links below for compatibility check.  

For software / hardware compatibility, pls check NDFC Software and Hardware Compatibility Matrix.

To confirm the NDFC and ND compatibility, pls check Cisco Nexus Dashboard and Services Compatibility Matrix    

Next is sizing.  For sizing, pls check Cisco Nexus Dashboard Capacity Planning.  We decide to use 1 App node config since it is a small environment.  (Note: if one node config is decided, it will not support adding any more node in the future).  App OVA requires 16 vCPU and 64GB memory.  

Other requirement: pls go through documents below

Cisco Nexus Dashboard and Services Deployment and Upgrade Guide, Release 3.2.x - Prerequisites: Nexus Dashboard [Cisco Nexus Dashboard] - Cisco


Device Manager error in Cisco

Recently, when we try to open Device Manager to make changes on the MDS 9710 switch, the error below pops up.  It only happens to one of the sites.  The other site works fine.  We try to open device manager thru NDFC and get the same error.  





We did see a different error in the past and needed to switchover the controller in the past.  The error was "Busy network, no route, or snmpd is unresponsive."

See EMC kb 000218149.  However, it still fails to open.  

We suspect something with network but still open a ticket with Cisco.  Support runs some trace and does not see issue on the switch side and there is connectivity between switch and the client running Device Manager.  Eventually, we workaround the issue by confirming TCP is set to true for DeviceManager.bat.

set JVMARGS=%JVMARGS% -Dsnmp.preferTCP=true

We still have trouble after that.  So, we login to the switch in the other site that we don't have issue with.  Then click Device > Preference.  






Select TCP for Use SNMP on next launch. Click Apply then Ok.  

Try to use Device Manager on the switch that we have issue.  Now it works fine.  Only problem with the workaround is to have a switch that can connect to Device Manager without issue.  I have not asked support if there is any other way to force it to use TCP instead of UDP.  


NDFC bugs since deployed

NDFC was deployed in last Oct.  Couple of bugs were discovered.  

1) About 4 - 6 weeks, we will see an alert 

Elasticsearch error - 'could not fetch component status'





The bug ID is CSCwm51621.

If you follow the bug ID above, there is a solution from the forum.  For me, when that error comes up, I just reboot the appliance with "acs reboot".  The error will go away, and it won't come up until another 4 - 6 weeks.  A normal reboot is sufficient and DO NOT add any other option after "acs reboot".  

2) /logs/k8/pods 90% usage alert.  About 4 months after deployed, an alert /log/k8/pods 90% usage showed up in the Admin Console of the Nexue Dashboard.  







Support was contacted to clear the old logs.  Currently, there is no fix, and webex is required for support to clear the old logs manually.  In the future release, the log retention for the folder will be changed based on the chat with support.