2013-04-19 06:09 AM
IHAC where we have some serious performance issues when provisioning new qtrees in a dataset.
The DFM server is on 4 vCPU VM with 8 GB of memory. There is no contention on the ESX server.
DB size is about 100 MB
I tried to grow dbCache to 1 GB without no change in behaviour. I disabled PerfAdvisor with still no change in behaviour.
Web GUI and NMC GUI are very responsive and CPU is OK most of the time.
The DFM server monitors only 4 controllers. At the current time, there is only 2 vfilers, one only active. The active vfiler contains 16 data volumes (containing 25 qtrees each) managed in a dataset.
The dataset follows a DR Mirror then Backup Protection Policy without a single issue.
All snapmirrors are done once a day in less than 2 minutes
All snapvaults are done once a day in less than 15 minutes.
The only problem we have is when trying to provision a new qtree (or sometimes when changing the quota of an existing one).
When using the NMC GUI, it times out after some time and does nothing.
When using a dfpm CLI command, it works but it takes ages.
What I see in the logs: A Conformance Dry Run occurs immediately but nothing happens for 7-10 minutes, then the provisioning job is launched and done in a few seconds (vol creation, qtree creation, share creation, then creation of protection volumes when necessary). Then we have to wait 5-10 more minutes before the CLI is available again.
We don't see that on another dataset .
I think we will try to add a new dataset but it will add complexity while there is no need for a new one to manage snapmirror and snapvaults since they alway complete in a few minutes.
- is this behaviour normal or not since it is the only performance issue we with OnCommand and CPU is not full at all ?
- is there a way to at least minimize the time taken to edit existing qtrees ?
The dfm diag is attached.
Here is an extract from the conformance log show what happens when provisioning a new qtree (15 minutes):
Apr 19 12:06:02 [dfmserver: INFO]: [2576:0x10d8]: Dry run: action: Selected volume xxxx:/xxxx_Data16 from dataset node to process the provisioning request.
Apr 19 12:07:00 [dfmserverEBUG]: [2576:0x1104]: Conformance checker started scanning at Fri Apr 19 11:57:38 2013
Apr 19 12:07:00 [dfmserverEBUG]: [2576:0x1104]: Finished scanning dataset xxxx_Data with protection policy: DR Mirror, then back up, unresolvable tasks: 0, resolvable tasks: 2, resolvable-need confirm tasks: 0
Apr 19 12:14:21 [dfmserver: INFO]: [2576:0x1604]: Successfully exported path /vol/xxxx_Data16/PG-xxxx over CIFS.
Apr 19 12:14:24 [dfmserver: INFO]: [2576:0x1604]: Modified CIFS share PG-xxxx with permission Change for user Everyone.
Apr 19 12:14:24 [dfmserver: INFO]: [2576:0x1604]: Successfully modified share PG-xxxx with path /vol/xxxx_Data16/PG-xxxx.
Apr 19 12:14:27 [dfmserver: WARN]: [2576:0x5c0]: Starting a re-run of provision checker task for dataset: 1046
Apr 19 12:14:27 [dfmserver: INFO]: [2576:0x1604]: Running provision request on dataset 1046
Apr 19 12:14:27 [dfmserver: INFO]: [2576:0x1604]: Running conformance checker after executing provisioning requests
Apr 19 12:14:27 [dfmserver: INFO]: [2576:0x1604]: Dataset 'xxxx_Data' (1046) is busy in check and reserve run as conformance tasks are in progress. Deferred conformance run scheduled as username 'NT AUTHORITY\SYSTEM' with user confirmation set to 1.
Apr 19 12:21:19 [dfmserverEBUG]: [2576:0x1718]: Conformance checker started scanning at Fri Apr 19 12:14:13 2013
Apr 19 12:21:19 [dfmserverEBUG]: [2576:0x1718]: Finished scanning dataset xxxx_Data with protection policy: DR Mirror, then back up, unresolvable tasks: 0, resolvable tasks: 1, resolvable-need confirm tasks: 0
2013-04-19 07:03 AM
I havent heard this before. I suggest you open a case so that they can enable some debug logging to see why things are taking a lot of time/timeout.