Saturday, November 8, 2014

NetWorker new "save session distribution" option for load balancing

One of the new feature I discover after upgraded to NW 8.1.1.7 is "Save Session distribution".  There is one Networker server and 3 storage nodes to handle the backup.  NW server only handles index, bootstrap and a few Solaris client only.  Majority of the backup are sent to storage nodes.  When a pool is created, we make sure all 3 storage nodes setup a devices to the pool for load balancing.  In the past, you have certain clients go thru sn1 and the other go thru the other sn by setting the storage node affinity of clients.  Here are the few pts I found from the NetWorker training.

Max session

  • save sessions are distributed on each sn device's max session attribute (default)
  • more likely to concentrate the backup load on few storage nodes. 
Target session


  • save sessions are distributed based on each storage nodes device's target sessions attribute
  • more likely to spread backup across multiple storage nodes

P.S. Save session distribution is not available for clone or recover operations.

You can set this up from the server's properties.









This configuration can be overridden on a client basis within the client properties.








Lastly, make sure you enter all available and qualified sn in the storage affinity field of the client.  NetWorker will try to load balance the backup among the storage nodes defined in the storage node affinity field.


Migration from NetWorker 7.6 to 8.1.1.8 and dynamic nsrmmd

Finally completed the NW 7.6.5.7 to 8.1.1.7 migration and troubleshooting on Labour Day.  Backup devices are DD880 running DDOS 5.2.x.  Just have a chance to write the post now.  DFA works perfectly fine for the clients and backup window shortens quite a lot.  A new NetWorker 8.1.1.7 server is built and the same client instances are created.  So, no upgrade of existing 7.6.5.7 backup server or mmrecov on the new box.  Clients are moved in a group of 50 - 100 each week.  The old box was virtualized and waited for all saveset expired on the 7.6.5.7 box next yr since we don't set a long retention.

No issue comes up until last group of clients were moved few days before the Labour Day.  After rebooting the server, all NW services started up fine.  I ran a bootstrap backup, and it kept on waiting for media for a long time.  So, I just recycle NetWorker service, and backup was fine.  Few days later, same thing happens again.  Because of tight backup window, I just restarted the service and it was fine again.  So, I opened up a case with support to look for bugs, however, there was nothing similar to my situation.  When it happend again just before the Labour Day, follow routine troubleshooting steps, I discovered the nsrmmd count was not correct during the busiest backup period.  All nsrmmd should be used during the busy backup window but in fact, not all of them were used.  I turned off dynamic nsrmmd as a workaround, and it never happend again.  Once you turn off dynamic nsrmmd, you will see all nsrmmd of the device started.  Looks like NW does not call for more nsrmmd when they are needed.  To turn off dynamic nsrmmd (new feature in 8.x), goto NMC > Devices > Storage Nodes > open up the properties of each SN and de-select "dynamic nsrmmd".  I have not checked patch 8 and 9.  It maybe fixed by now.


When this issue occurs, because there are not enough nsrmmd to serve all the backup, lots of jobs are piled up.  If you run netstat, you will see a lot of TIME_WAIT because of that.

To count the # of nsrmmd, open up the properties of the device.


In this example, we have target sessions start from 1 and max sessions set to 20.  Keep in mind 1 nsrmmd is used for RO for restore session.  The remaining 5 nsrmmd will handle 20 sessions max for backup.  That is 1 nsrmmd handling max of 4 sessions.  I am a bit conservative and still use the ratio suggested in 7.6.1.  I know there are different numbers suggested.  However, I only need to run 1 restore every 2-3 days.  Only need to DR a box once - twice a yr.  So, 1 RO stream is enough.

If you want to achieve the best dedupe ratio for database backup, you can set your device to 1 to 1 for the sessions to nsrmmd ratio.  If you decide that device to support a max of 8 sessions.  Then the max nsrmmd count should be 9 in this case.