Sunday, December 2, 2012

VSS snapshot is not cleaned up by NetWorker NMM

I have an issue on one of the Windows 2008 client running Sharepoint 2010.  I have to manually stop the backup from NMC.  However, I find out nsrsnap_vss_save is still running on the client.  So I manually stop all Networker service on the client.  However, the VSS snapshot is still there.  I have tried vssadmin to delete shadow but I receive an error the snapshot will not be deleted.  If you run vssadmin list shadows, you can still see the snapshots (borrow output from esg110691)

vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001 Microsoft Corp.

Contents of shadow copy set ID: {5bce2dd1-b18c-49d1-bb1a-51f656d07794}
Contained 5 shadow copies at creation time: 12/20/2009 11:01:01 PM
Shadow Copy ID: {a8019c4f-7dc1-4717-ac1f-affe759a1ea4}
Original Volume: (C:)\\?\Volume{264f5213-8311-11dc-82da-806e6f6e6963}\
Shadow Copy Volume: \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1
Originating Machine: w2k3.acme.com
Service Machine: w2k3.acme.com
Provider: 'Microsoft Software Shadow Copy provider 1.0'
Type: ApplicationRollback
Attributes: Persistent, No auto release, Differential, Exposed locally

Shadow Copy ID: {4d5c74fe-7683-4ad8-8cba-cbeaff1adb92}
Original Volume: (D:)\\?\Volume{4efae4e1-834a-11dc-99fe-001a64242e28}\
Shadow Copy Volume: \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy17
Originating Machine: w2k3.acme.com
Service Machine: w2k3.acme.com
Provider: 'Microsoft Software Shadow Copy provider 1.0'
Type: ApplicationRollback
Attributes: Persistent, No auto release, Differential, Exposed locally


So I contact EMC support and they recommend to try the following three method.

Method 1:
a, if nsrsnap_vss_save.exe is still running in the client while no scheduled backups are running from NetWorker Server, kill that process and it should kill all the subsequent processes.
b, After that please start another NMM scheduled backup, it is supposed to clean up the mess it left behind last time. (This does not happen successfully a number of times).
If method 1 does not work, please kindly try method 2.

Method2:
you can use the following two commands to delete the shadows and the mount points.
vssadmin list shadows
mountvol
These two commands should help you find out what shadows copies are left behind on the system and how these are mounted. Try to unmounts them first using mountvol /D.
/D Removes the volume mount point from the specified directory.
And then use:
vssadmin delete shadows all
That should clean up the rest of the mess manually.
If method 2 does not work, please kindly try method 3.

Method3:
please kindly download VSS SDK from Microsoft:
http://www.microsoft.com/en-us/download/details.aspx?id=23490. and use that instead to get rid of the existing snapshots.

Obviously, I have tried first one.  For the 2nd method, I carefully examined the vssadmin list shadows output and made sure I choose the correct one to umount.  If you pick a data volume by mistake, you can umount a volume in use and cause data unavailable.  However, it still failed.

I didn't go any further since I needed approval from server owner to install the the VSS SDK mentioned in method 3.  I decide to go for google and find out the command to remove snapshot in Windows 2008 is different (my mistake not to tell support the OS version).  I find the instructions from Doug's blog.

I just ran diskshadow then "delete shadows all".  Now, the stuck snapshots are removed.  After that, restarted the backup from NMC and go back to sleep.

Saturday, November 17, 2012

Manually uninstall NetWorker in Windows

I try to upgrade NetWorker on one of the Windows client and find out NetWorker is no longer in Add / Remove Program.  If I try to upgrade, it will complain an existing version exist.  Do a bit google search and find the following article in EMC community on how to uninstall NetWorker manually in Windows.

Remove the Upgrade Code from the following registry keys:
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Upgrade Codes

HKCR\Installer\UpgradeCodes

Windows Installer uses a GUID (Global Unique Identifier) to uniquely identify products being installed. They call it an Upgrade Code. This is used to identify that NetWorker is installed on this machine.

NetWorker has three upgrade codes (the first for NW 6.0, the second for all other NW versions (except X64), the third for the X64 package):

3EB2C626C9BC4D118811000972CCA7DF
2827A1B508153D114831000A9C877BD1
1D09C7743451200439D99949BD5A9F1E

It may also be necessary to remove the NetWorker services from the registry. They are located in:

-HKLM\SYSTEM\CurrentControlSet\Services
-Delete the following keys: nsrd, nsrexecd, nsrpm, hagentd, lgtolmd
-For NMC, the key to look for is: gstd
-After removing the services, reboot the machine.

Also run the below commands on the command prompt before you reboot the machine

sc delete nsrexecd
sc delete nsrd

All installed products have an entry in the following registry key:
-HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall

-Control Panel Add/Remove programs gets its information from this key.
-If an entry under this key is deleted, it will not be displayed in Add/Remove Programs.


The above steps work for me.  The version of NetWorker on my box is 7.6.1.  You may need to delete the following key.


HKLM\Software\Legato\NetWorker

To manually remove the old binary.  Be very careful if it is a NetWorker server.
If it is a NetWorker server running on Windows, NEVER delete res and mm folders (that's where the NetWorker config and media database located).  I normally keep the logs folder too.  I believe the dedupe folder contain the cache info for Avamar.  

Again, backup your registry keys before making changes to your registry.

Wednesday, November 7, 2012

HP Advisory on 10G NC522SFP with connectivity issue

Recently, one of the servers with 10G NIC starts to show the following error message. 

DEVICE: HP NC522SFP Dual Port 10GbE Server Adapter #2
PROBLEM: Tx path is hung. The device is being reset.
ACTION: Adapter recovers from this error automatically.

Search from the web and see this HP advisory.  When I attempt to upgrade the firmware, I get the following error. 

"Dependencies Failed -- Driver and firmware set are not allowed for update. Please install driver version 4.3.11.407 prior to performing firmware upgrade on devices with firmware version 4.0.550 or earlier installed. See nxflash.log for further details"
Upgrade to the latest driver and update the firmware.  Failed!

Upgrade to driver version 4.3.11.407 and then update the firmware.  Failed again!

Call support and they have me redo the same thing again.  Testing different firmware version and going nowhere.  After 3 hrs of troubleshooting, HP tells me it is a bad NIC.  I refuse to get a replacement NIC since the NIC is functioning fine.  So, I arrange another downtime and use my a spare 10G NIC.  I still have the same issue.  Eventually, we put the driver back to 4.3.11.407.  After that, they escalate to next level.  Suggestion is to use Local Admin account but it is still going nowhere (it is true that even domain admin account will return the same error.  You need to login as local admin to complete the firmware upgrade).

Check back on HP site and see firmware version 1.3.9.7 is available.  Run the installation and it passes the dependencies check.  So, I know it must be dependencies issue on firmware not driver. 

The solution is to stay at driver 4.3.11.407 and update firmware to 1.3.9.0.  Reboot.  Then update firwmare to 1.3.9.6.  Reboot.  Run the driver update and issue is now resolved.  If your NC522SFP firmware is too old, upgrade to firmware 1.3.9.0 first. 

Sunday, October 28, 2012

Bodhi Linux 2.1 on an old AMD Sempron

Recently read about lightweight Linux and find out about Bodhi Linux 2.1.  So, I decide to install Bodhi Linux 2.1 on it.  The machine spec is 512 MB of memory and AMD Sempron socket A 2500+ CPU.  Install is very straight forward.  The following software are installed from Add Software option.

Firefox 16.0
Java JDK 7 u5
LibreOffice 3.x
Adobe Reader
Printing for my Brother HL-3070CW

This is for basic office application and internet banking.  I am very satisfied with the performance.  At least, boot time and response time is much better than CentOS 3.  Forget about XP, it barely runs on this box.

Here are some of the issue I encounter.

I cannot read Chinese in Firefox.  So, I do some research and it requires the following packages (see reference).  Use Synaptic Package Manager to install them.

  • ttf-arphic-bkai00mp
  • ttf-arphic-bsmi00lp
  • ttf-arphic-gbsn00lp
  • ttf-arphic-gkai00mp
  • ttf-arphic-ukai
  • ttf-arphic-uming

For Flash, I see people complaining about viewing youtube (see reference).  I follow the instructions to download and install mint-flashplugin-11 package.  It works flawlessly.

After installed printing package (basically CUPS), then I select the printer from the list.  For some reason, I don't see an option to select Mono or Color printing.  So, download the squeeze package from Brother website and follow the instructions to install lpr package then the cupswrapper driver.

Go back to http://localhost:631/printers to modify the printer to select HL-3070CW again.  This time, I do see the Color/Mono option under Printer Options of Printer properties.

I try to download VMWare WS 7 and test it on this box.  However, it won't compile.  Someone point out similar issue and resolve by installing WS 8.  However, WS 8 requires 64 bit processor even for 32 bit version (see this).  It affects my plan to test Bodhi Linux with VMware WS 7.1.6 on my old Sempron since I plan to replace CentOS on my Lenovo T400 laptop with Bodhi Linux 2.1.

Sunday, October 14, 2012

SCSI reservation conflict error in ESX

Basically, if you encounter such an error in your ESX environment,

WARNING: SCSI: 5532: Failing I/O due to too many reservation conflicts

WARNING: SCSI: 5628: status SCSI reservation conflict, rstatus 0xc0de01 for vmhba1:0:7. residual R 919, CR 0, ER 3
WARNING: J3: 1970: Error committing txn to slot 0: SCSI reservation conflict

you should follow VMware kb 1005009

On the storage side, make sure you follow best practise guide.  For example, array fan in limitation.  For Symmetrix, make sure the front end port is dedicated for ESX connection only and do not mix with other host running other OS. 

Thursday, October 11, 2012

Error Codes list for Microsoft technologies

When we troubleshoot backup issue, especially system state backup, backup program normally return a Microsoft error code.  For example, in NetWorker, you may get an error like

"ASR System Files - AsrCreateStateFile failed - 0x80080005"

(By the way, if you get this error for system state backup, most likely a reboot will fix it)

To find out what it means, there is a Symantec kb "Error Codes list for Microsoft technologies" listing the MS error codes.

Over the weekend, I got an alert about System State backup on Windows 2000. Error is 0x00000070.  It complains there is not enough space.  Confirm by rdp to the box and only 3 MB available on the system drive.  An old Windows 2000 box waiting to be retired.


Tuesday, August 14, 2012

Open Solaris and Debian on NetWorker

Debian, Ubuntu and Open Solaris are not in NetWorker support matrix.  So don't expect to get any support from EMC.

For Debian, so far, I don't encounter issue by following the steps in the uWaterloo website to install NetWorker client software.  I had issue with IPV6 with NetWorker in Debian.  Backup is fine with IPV6 turned off.

I obtain the binary from PowerLink.
  1. Install alien to convert rpm packages 
  2. apt-get install alien
  3. Convert to debian packages and install
  4. alien --to-deb -i lgtoman-7.6.4-1.x86_64.rpm
  5. alien --to-deb -i --scripts lgtoclnt-7.6.4-1.x86_64.rpm (ignore scripts warning)
  6. cd /etc/init.d
  7. Get script networker: Legato Networker Startup Script
  8. chmod 755 networker
  9. update-rc.d networker defaults
  10. Update /nsr/res/servers adding backup.cs.uwaterloo.ca (/nsr directory isn't created until after networker is started the first time, even then the servers file isn't created)
  11. ./networker start

From uWaterloo website, there is a reminder for Ubuntu.
Ubuntu 10.10 now has a recover command that can undelete files and conflicts with the Legato install

I find out Open Solaris will not work in NetWorker from a blog.  I have never tested it but it recommends 7.4.5 seems to work.

========================================================================
NetWorker 8 support Debian and Ubuntu.  Open Solaris is not supported though.  Check EMC Software Compatibility Guide for latest information.

Sunday, July 29, 2012

Hotadd limitation

Other than the 1TB vmdk limit with early version of VDDK (fixed in VDDK 1.1.1), the following are the limit and suggestion of HotAdd (similar info can be found in vendor support documentation).

1)Hot-Add only works on SCSI disks, not IDE disks
2)Disable automount on proxy host
3)For block size mismatch issue, the VMFS containing the Target VM and the VMFS containing the VADP proxy must use the same or larger block size. For example, if you back up virtual disk on a datastore with 4Mb blocks, the proxy (vRanger) must also be on a datastore with 4Mb blocks or above. Its strongly recomended to install proxy on VM which is residing on 8Mb.

Other consideration not only applied to hotadd but also other transport mode.
1) change the workingDir to a datastore with enough block size if there are more than 1 vmdk for a VM and they are on different datastore with different block size.  For ex, if C drive is on datastore with 1 MB block size, and D drive is on a datastore with 2 MB block size.  workingDir should be on datastore with 2 MB block size.  See VM KB: 1012384

P.S. Do not use hotadd if there is dynamic disk in the VM.  It is just a headache.  See VM KB: 2006279

VADP application quiescing

Recently, I am helping out a friend on VADP deployment.  He has question about VADP quiescing.  I pointed him to an article in Backup Central.  To make sure application quiescing is supported for Windows 2008 R2, make sure ESX 4.1 U1 is deployed (documented in VMware KB: 1031298).  However, if Windows 2008 or Windows 2008 R2 VM deployed under ESX 4.0, make sure KB: 1028881 is followed.

For Exchange and SQL, I still suggest use of Exchange or SQL agent from backup software to backup them up.  When restore is required, for ex, Exchange, you can restore folder in a mailbox easily.

Wednesday, July 11, 2012

nsrwatch on NetWorker 7.6.2 in Windows platform

Just upgrade to NetWorker 7.6.2.  nsrwatch is available for Unix and Linux in the past.  Now, it is added to the Windows platform.  Since the Management Console GUI is slow, this really saves a lot of hassle.

Thursday, May 31, 2012

10 GbE performance troubleshooting 1

On a Windows 2003 server with 1 GbE NIC and DSN writing to DataDomain device, I can see about 140 MB/s (dedup happens on the DSN and NIC utilization is approx 15%).

Now, with 2 Windows 2008 R2 servers setup with 10GbE, I copy a file from one Windows 2008 R2 to another one.  It at most utilizes 12% of the 10GbE.  If I add write another file at the same time, I see the utilizes 20% of the 10 GbE.  I follow some of the suggestions by Cisco to tweat the OS (only thing I have not done is Jumbo frame).  However, I don't see much improvement.

After doing more research, it looks like it is an OS limitation.  See kb article from HP site.

"There was still perceived TCP performance issue, but it turned out to be a matter of limitations in performance per thread in Windows Server 2003. For instance, if copying only one file from one server hosting a NC522SFP to another using a NC522SFP, only a small fraction of the theoretical 10-Gigabit performance was achieved. However, if multiple sessions were run simultaneously, similar performance gains were seen as with UDP. In other words, the bottleneck was not the NIC."

Hopefully, I will have more time to run test and determine the limitation in the summer.  Not sure if Linux / Unix will do a better job.

Sunday, May 27, 2012

DataDomain support with NetWorker 7.6.2 and 7.6.3

With NW 7.6.2 boost devices, max session supported / devices are increased.  Max default session / device will be 10 instead of 4 in NW 7.6.1.  Dedup ratio will not be affected since it is SN side dedup for boost device.  I remember if AFTD is used, set the max session to 1 per device in 7.6. 

With NW 7.6.3, multiplexing is supported for VTL in DataDomain. 

Keep in mind as mentioned in the older article, NW 7.6.3 does not support DDOS 4.9.  If you plan to upgrade from NetWorker 7.6.1 to 7.6.2, make sure you go through the DataDomain Integration guide for NetWorker 7.6.2.  My plan is to stage off the saveset from old DD devices after migration to the new one generated in NW 7.6.2.  This seems to be the simplest option. 

Once migration completed, a case will be open with DataDomain to remove the old unused LSU.  You cannot delete old LSU from NetWorker / DataDomain GUI.  Do not delete LSU that contain data. 

Sunday, April 29, 2012

Never mixing NDMP and non-NDMP backup in the same media for NetWorker

The following are the limitation of NDMP backup with NetWorker.  You can find these in admin guide but most ppl tend to skip them.  It will be a nightmare when you find out data cannot be recovered.  Some of these limitation possibly apply to other backup solution as well.

For pure NDMP backup (backup to NDMP devices.  that is, FC tape drive presented to NAS), make sure you have non-NDMP backup sent to different pool including index and bootstrap.  The same applied to cloning.  So media written by NDMP devices should not be accessed by non-NDMP devies.  It is not possible to save data from any NAS filer to an tape device using standard NDMP and then write non-NDMP data to the same tape volume.  You will have issue during recovery.

NDMP and non-NDMP savesets can be saved on the same volume only if the NDMP backups to that volume were written using NDMP-DSA (NAS data backup to device presented to Storage Node / NetWorker server).

There is an old kb article from Powerlink to explain it.
"When non-NDMP backups are written to tape, the backup is written directly to the tape by NetWorker in it's own proprietary format. When NDMP and non-NDMP data is written to the same tape, the file marks get out of order and as a result the restore cannot position the tape correctly to find the data image.
Fix: Ensure that NDMP data be backed up to it's own separate pool to prevent NDMP recovery and scanner problems."

Keep in mind when migrating your NAS.  If you plan to migrate to different vendor, for ex, NetApp to BlueArc, make sure you no longer need the backup on the old NAS (in this ex, NetApp) or you keep your old NAS somewhere in case recovery required.  NetApp and BlueArc are on different OS.  That's why you cannot recover NDMP backup by NetApp to BlueArc.

Also, make sure you have a copy of the index saveset.  scanner -i does not work for NDMP backup.

For full details of limitation and other requirements of NetWorker NDMP backup, please consult EMC support or refer to EMC documentation.

NetWorker NDMP DSA performance

We try to backup Celerra VG2 using NetWorker NDMP.  From existing hardware, we decide to use NDMP DSA since we don't have license for VTL on DataDomain.  We backup 4 streams together and the total throughput is 30 - 40 MB/s.

(To find out backup performance on Celerra, logon to control station and run the following command: assuming you are backing up filesystem on Data Mover server_2)
server_pax server_2 -s -v

We contact support and they suggest us to use the -P option for the backup command where sn is the storage node you want the backup data sent to.

nsrndmp_save -T dump -M -P sn

We also list only the sn in the storage node field on the client properties of the Celerra in NetWorker.

After making the changes, we see performance improve to 100 MB/s.

Keep in mind, if you are running pure NDMP backup (that is, having a FC tape drive connected to Celerra Data Mover), the backup command will be

nsrndmp_save -T dump

i.e. For most NAS, dump is the only option supported.  Please refer to NetWorker and Celerra documentation for more info.
-M option is for DSA backup.
For pure NDMP, do not add -M in the backup command.

Thursday, March 8, 2012

Data Domain ifgroup part III

We try to config ifgroup with NetWorker 7.6.1, and it fails to work as expected.  For ex, if I setup ifgroup with 2 dual 10GbE NICs, and unplug a cable.  Sometimes, I don't see the backup session failovers to the other 3 working NIC.    If I set up failover on the NICs of DataDomain, I don't have any issue as mentioned in my earlier post.  Contact support and the suggestion is to upgrade to NetWorker to 7.6.2 and the DDOS to 5.0.  I am happy with NetWorker 7.6.1 and DDOS 4.9.x for now.  Will decide when to upgrade in the future.  Currently NIC failover on DataDomain is good enough!

Wednesday, February 15, 2012

VADP with hotadd configuration

Helper VM is not required for VADP with ESX 4.0 or above if hotadd mode is selected.  Make sure you do have the necessary license for hotadd.  If VCB is still used in the environment, you can follow Commvault's page "Create a Helper Virtual Machine" on how to configure VCB helper VM.

The following is the properties of the proxy VM before snapshot being mounted. 


After snapshot is created, it will be mounted on the proxy host (which is a Virtual machine).  See screenshot below. 

I have backup 2 VM with one disk on each VM at the same time.  That's why you see two extra virtual disk on the proxy host.  Now, you probably understand why you need Advanced version for ESX 4.1 which include the hotadd feature.  Otherwise, you can only use nbd or san transport mode to backup VM. 

Of course, there is limitation on hotadd such as max vmdk size of 1 TB.

Actually, the 1 TB limitation is no longer valid for VDDK 1.1.1 (http://www.vmware.com/support/developer/vddk/VDDK-1.1.1-Relnotes.html) 

If NBD is used, there is a 1 TB limitation.  

Monday, February 6, 2012

DataDomain link aggregation 10GbE part II

After reviewing DDOS 4.9 Initial Config Guide, link aggregation on 10GbE is not supported.  DDOS 4.9 only supports link aggregation on 10GbE for 1 GbE.  This is confirmed by support as well.

"The 10 Gb-to-10 Gb interface does not support link aggregation. Only 1 GbE ports are supported."

Only failover will be supported on 10GbE for DDOS 4.9 and 4.8.  However, do consider the following before configuring failover.

• 10 GbE copper-to-10 GbE copper ports across Intel NICs can be
combined for failover (but not for link aggregation).
• 10 GbE optical-to-10 GbE optical ports can be combined for
failover (but not for link aggregation).
• 10 GbE failover across Intel NICs is supported.

Thursday, January 26, 2012

VADP / VCB backup failed with hotadd mode

I am testing the VADP backup and keep getting the following error.

"Unable to open a disk of the virtual machine"

Monitor the VC and do see snapshot created.  Run the same backup with nbd and it works fine.  Double check and find out ESX license in on Standard Edition with version 4.1.  

Remember this is a license issue and confirm from VM site VDDK 5.0 release note.

"Licensing. In vSphere 5.0, the SCSI HotAdd feature is enabled only for vSphere editions Enterprise and higher, which have Hot Add licensing enabled. No separate Hot Add license is available for purchase as an add-on. In vSphere 4.1, Hot Add capability was also allowed in Advanced edition. Therefore, customers with vSphere Essentials or Standard edition who use backup products (including VMware Data Recovery) are not able to perform proxy-based backup, which relies on SCSI HotAdd. Those customers must use alternate transport modes."

The most important benefit of using hotadd mode to me is to avoid presenting a VMFS lun to Windows proxy server.

P.S. HotAdd is not supported with VDDK 1.1.1 when proxy server is on ESX 4.1 even correct license is purchased.  See reference link VDDK-1.2.1 release notes

Tuesday, January 17, 2012

Error labeling new boost device on DataDomain

NetWorker 7.6.3 is out and I try to install the new code on my test VADP proxy.  When I attempt to label a new media for my new boost device,  I get the following error message.

nsrd rd=networkerSN:datadomain_DD0304 mount operation failed: Connecting to 'datadomain' failed ([5028] rpc connection failure).

From experience, this is DNS and connectivity related issue.  Check DNS records and even add all the entries to host file.  It still return the same error.  Finally, uninstall NetWorker 7.6.3 and install NetWorker 7.6.2 back to the test box, everything works fine.

Check release notes and find out NetWorker 7.6.3 requires DDOS running at 5.0 or higher if you are using DDBoost.

Only DDOS versions 5.0.x and 5.1.x are supported with DD Boost 2.4

Currently, DDOS is running at 4.9.2.x and I don't plan to upgrade to DDOS 5.0 any time soon.  Only other option is to use AFTD.

Expanding boot partition of Windows 2008 server

Originally, I have two partition on only one disk.  Finally, I get my new disk.  So, I use Ghost to copy the second partition D drive to the new disk.  To free up space, I remove the old partition D drive on the 1st disk in Disk Management.  Then right click on C drive and choose Extend Volume.  Follow the wizard and now C drive is expanded to occupy all the space on the first disk without reboot.

There is a Shrink Volume option.  Not sure how reliable it is though.