My VM, SAN and NetWorker blog: 2019

Thursday, October 31, 2019

Equallogic dirty cache

I used to manage a large number of Equallogic 6510. They have bought them for production used because of insufficient budget for enterprise storage. It is just an entry level array with very limited redundancy. Controllers are active / standby and it takes 21 s to failover when it is completely idle. So, you can imagine how long it will take if there is heavy IOPs.

It is a pain to update firmware since it takes some time for controller to failover. Linux and Unix will not like that. If you used it for production, it will be really hard to get downtime.

One time, there was a bug and both controller panic. After it starts back up, the management interface is not reachable. Check serial connection and see the following. Even though the controllers panic, it still displayed a msg "This is a POWER FAILURE RECOVERY". Also, it showed RAID LUN not recoverable. Keep looking down, sounds like we actually have a dirty cache stuck in the memory. This normally happens with power failure.

When I try to login from serial port, it shows the array is not even configured. Obviously answer No when asked to config the array.

Contact support from that point and it is indeed a dirty cache problem. Talk to support and confirm if the dirty cache stuck, user will lose management interface access. Also, only later model of EQL support port failover to standby controller. If both interfaces from the active controllers die, the interface will not failover to standby controller interface. You will need to do a manual failover. That's why I don't suggest them for production.

Clear the cache and reboot the array. Everything is normal from that point. Only thing you lost is the data in the stuck cache. Luckily, there is no database running in those arrays. The uncorrectable sectors are empty space.

Earlier version of firmware especially version 5 and 6 are problematic. Lots of problem. After the latest patch of version 7 installed, we see stability from that point. However, new enterprise arrays were installed, and these units were used for backup / archiving. Now, they were all retired.

Wednesday, September 18, 2019

Install IE and flash in Windows server 2016

Find the link below on how to reinstall IE in Windows 2012 server.

https://www.codeproject.com/Articles/1158545/Re-install-Internet-Explorer-in-Windows-Server-R

To install only IE in Windows Server 2016, just run the command below and reboot.

dism /online /enable-feature:"Internet-Explorer-Optional-amd64"

If you need to enable flash for IE, you can run the command below.

dism /online /add-package /packagepath:"C:\Windows\servicing\Packages\Adobe-Flash-For-Windows-Package~31bf3856ad364e35~amd64~~10.0.14393.0.mum"

Monday, September 9, 2019

iDRAC version 8 virtual console only works with IE

I have no trouble accessing iDRAC v8 GUI using IE 11, Chrome and Firefox. It is a R630 server. However, I found out I can only access Virtual Console using IE 11. Chrome v76 and Firefox v67 won't work for Virtual Console.

I guess try to use other browser if you cannot access Virtual Console.

Sunday, September 8, 2019

AppSync with VMAX 40k

AppSync 3.5 was installed couple years ago to protect our SQL application. Basically, AppSync agent was installed in SQL cluster. The SQL clusters were running as VMs and the databases were residing in RDM.

Kept getting VSS error that it takes more than 10s to create VSS. That was what MS supported for VSS. If it took more than 10s, the VSS creation step would fail.

Installed latest Solution Enabler 8.4.x.x available at the time and no change. Since the array was VMAX 40k, DNS was not a factor. Eventually, version Version 3.5.0.1_URM00111091_PRELIM_R2 fix the bug VSS 10s delay bug with VMAX 40k.

Couple things I don't like about the AppSync.
1) Mounting and Dismounting SQL VMFS takes long time and does not work well.
2) Need to reserve an extra copy of luns in the pool
3) If info is not sync, AppSync does not know what to do. So, manually dismount copy from mount host will cause problem because AppSync does not know the luns are dismount. Eventually, contacting support to clean up is required.

------------------------------------------------------------------------------------------------------------

Now, 3 yrs later, AppSync 3.9 was setup for proof of concept few months ago. This time, we have SQL running in VMFS. We encounter same issue as before. Mounting copy to mount host is very slow and timeout. Eventually, we keep our design for SQL as before in RDM. Things are working smoothly with RDM as expected.

We are still using the AppSync 3.5 now until we migrate to AppSync 3.9 next yr. Recently, SQL team change the DB structures and instead of a few larger database, we have a lot of smaller database to be protected. The AppSync host plugin service has memory leak problem. So, we have to setup a process to bounce the service once every 3 days. Hopefully, this won't happen in version 3.9.

Thursday, August 29, 2019

DCNM version 11.1(1) and bug ID CSCvf99665

Recently build a new DCNM 11.1(1) box in Windows 2016 to replace the existing 7.2(3) because the old one is running Windows 2008 R2. Major difference is HTML5 and the webclient is a lot faster and most of the work even port channel can be completed in the GUI (I have not tried that yet). If you don't like to use the new webclient to complete your zoning, you can still use the old FM.

Besides, I use the Oracle Express for the db of DCNM since very 7.2. The performance is better than the POSTGRESQL.

You can follow the Oracle link below to have some basic knowledge of Oracle Express.

https://docs.oracle.com/cd/E17781_01/server.112/e18804/toc.htm

If you use SolarWind as your TFTP server, make sure .NetFramework 3.5 is required. See link below on how to enable it in Windows 2016 server.

http://www.virtubytes.com/2017/02/03/install-net-framework-3-5-server-2016/

After new DCNM server is in production, I plan for the firmware update on all the FC switches in the fabric. However, I find out both of the MDS9710 are affected by Cisco bug ID CSCvf99665

It show an invalid IPV6 IP address in the mgmt/0 interface and has a zero length subnet mask.

Example:

::148.237.143.255/0

Suggestion from support

(1)Open a case and get Cisco TAC to send you a DPLUG file that will be downloaded and run on the switch. We would run the DPLUG, do a 'copy r s' to save the configuration, then do a 'system switchover' and run 'copy r s' again.

(2)Upgrade to 8.1(1a). After the upgrade has completed, do a 'copy r s', do a 'system switchover' and after both supervisors are back up, then do another 'copy r s'.

On the safe side, we choose the 1st option and have support apply the DPLUG file. Everything is fine. Then, we update the firmware to 8.1(1a).

My VM, SAN and NetWorker blog