We do have couple of shares required replication to target Isilon. They consist of hundreds of millions of files within the share, and one of them is setup with deep folder structure. We only setup limits on resource usage for replication but this is not good enough. CPU is setup 25%. Confirm only 10% can be consumed on the trunk for replication. Worker is set to 33% Max all the time as recommended by vendor during initial setup.
First, we made a mistake at the beginning and did not read through every word in the replication document.
If the source share contains lots of files and / deep directory structure, Domain Mark job can consume a lot of resources on the first sync. It is recommended to setup replication first and run initial sync with the box checked on Prepare Policy for Accelerated Failback Performance before migrating data to the share in the source.
However, we do the complete opposite. We migrate share from Veritas cluster to the source Isilon share. Then, setup replication and enable sync for first time. Once data copy completes and Domain Mark job starts, it starts to take away resource. The application using the share is very sensitive to performance. It will complain when it exceeds 25ms. Sometimes, share performance jumps to 40ms. We receive complaints from the application owner.
The workaround suggested by support is to Set vfs.vnlru_reuse_freevnodes to 1. This can be run on any nodes. We see significant decrease in resource usage after. (Pls check with support to confirm the cause before running any command)
# isi_sysctl_cluster vfs.vnlru_reuse_freevnodes=1
Once first sync and the first Domain Mark job completed successfully, we didn't have issue after.