Subscribe

Suggestions for improving workflow performance?

I do WFA development and testing in a number of different lab and customer environments. I noticed in one customer's environment that a 5-step workflow that takes in 15-20 seconds to run in my laptop VM environment takes 2-3 minutes to run on a customer's lab system. I compared log files and found:

   - Each WFA command step startup took 22 seconds between the first and second lines of the log (assuming this is when poSH is launching)

  - It regularly takes 15 to 20 seconds between the lines saying "attempting to connect  to controller" and the "now connected" line

Both those steps take 1 to 2 seconds in my laptop environment. These delays are what add up to the 2-3 minutes ... the rest of the workflow flys by. I've confirmed the customer's environment is running a valid configuration (a VMware VM with 4GB of memory, 4-cores of ~ 2.2Ghz).

This looks like a case where PoSH startup itself (initial 22 seconds for each step) and possibly the first dataontap PoSH toolkit cmdlet execution are what takes too long. I've been advised these slow downs could be contributed to things like:

   - http://blogs.msdn.com/b/powershell/archive/2008/09/02/speeding-up-powershell-startup-updating-update-gac-ps1.aspx

   - http://velemental.com/2012/01/01/optimizing-the-execution-of-powershell-and-powercli/

... but not being a Windows or PoSH expert I'd like to get NetApp's advise on how to properly diagnose this and the steps to correct.

BELOW ARE LOGS OF A WORKFLOW RUNNING IN CUSTOEMR WFA LAB AND MY LAPTOP FOR COMPARISON

==== from CUSTOMER WFA LAB ENVIRONMENT (WFA/DFM/'Download Logs' button after executing above =====

11:51:56.083 INFO  [null] ***** Workflow Execution Started *****

11:51:56.193 INFO  [No-Op Storage] ### Command 'No-Op Storage' ###

11:52:17.549 INFO  [No-Op Storage] Executing command: ./No-Op_Storage6306799498641922380.ps1 -ArrayName 'pdclflr001.acme.com' -VolumeName 'lab_sample'

11:52:17.659 INFO  [No-Op Storage] This Command does nothing

11:52:17.737 INFO  [No-Op Storage] Command completed, took 21529 milliseconds

11:52:17.846 INFO  [ACME Grow Volume] ### Command 'ACME Grow Volume' ###

11:52:39.156 INFO  [ACME Grow Volume] Executing command: ./ACME_Grow_Volume3797965097087694000.ps1 -AggrAllocThresh '90' -AggrCommitThresh '150' -AggrUsedThresh '80' -Array 'pdclflr001.acme.com' -VolNearFull '85' -VolNearFullFree '15' -VolumeName 'lab_sample'

11:52:39.312 INFO  [ACME Grow Volume] WFA JobId =  100

11:52:39.374 INFO  [ACME Grow Volume] WFA Log File =  C:\Temp\WFA\GrowVolume.100.txt

11:52:39.453 INFO  [ACME Grow Volume] WFA Warning File = C:\Temp\WFA\GrowVolume.Warning.100.txt

11:52:39.593 INFO  [ACME Grow Volume] Get-WfaCredentials -Host pdclflr001.acme.com

11:52:39.718 INFO  [ACME Grow Volume] Credentials successfully provided for 'pdclflr001.acme.com'

11:52:39.921 INFO  [ACME Grow Volume] Connect-NaController (with credentials) -Name pdclflr001.acme.com -Timeout 60000 -ErrorAction Stop

11:52:57.013 INFO  [ACME Grow Volume] Connected to controller

11:52:59.231 INFO  [ACME Grow Volume] Growing volume lab_sample on array: pdclflr001.acme.com

11:52:59.341 INFO  [ACME Grow Volume] Volume Size (without Snapshot reserver): 1073741824

11:52:59.419 INFO  [ACME Grow Volume] Volume Used: 204800

11:52:59.481 INFO  [ACME Grow Volume] Percentage Used: 0

11:52:59.606 INFO  [ACME Grow Volume] Reserve: 0

11:52:59.684 INFO  [ACME Grow Volume] Reserve Used: 0

11:52:59.747 INFO  [ACME Grow Volume] Containing Aggregate: aggr0

11:52:59.825 INFO  [ACME Grow Volume] Volume Percentage Used: 0 is equal to or below the near full margin threshold of 70

11:53:00.106 INFO  [ACME Grow Volume] Command completed, took 42244 milliseconds

11:53:00.200 INFO  [ACME Grow VSM] ### Command 'ACME Grow VSM' ###

11:53:21.509 INFO  [ACME Grow VSM] Executing command: ./ACME_Grow_VSM6313615382240709417.ps1 -AggrAllocThresh '90' -AggrCommitThresh '150' -AggrUsedThresh '80' -Array 'pdclflr001.acme.com' -VolumeName 'lab_sample'

11:53:21.697 INFO  [ACME Grow VSM] WFA JobId =  100

11:53:21.791 INFO  [ACME Grow VSM] WFA Log File =  C:\Temp\WFA\GrowVolume.100.txt

11:53:21.869 INFO  [ACME Grow VSM] WFA Warning File = C:\Temp\WFA\GrowVolume.Warning.100.txt

11:53:22.197 INFO  [ACME Grow VSM] Using cached controller connection

11:53:40.069 INFO  [ACME Grow VSM] Determining VSM destinations for volume lab_sample on array: pdclflr001.acme.com

11:53:40.835 INFO  [ACME Grow VSM] Volume SnapMirror Source: pdclflr001.acme.com:lab_sample

11:53:41.585 INFO  [ACME Grow VSM] Volume: lab_sample does not have any VSM destinations!

11:53:41.694 INFO  [ACME Grow VSM] Command completed, took 41463 milliseconds

11:53:41.788 INFO  [ACME Throw Warning] ### Command 'ACME Throw Warning' ###

11:54:03.285 INFO  [ACME Throw Warning] Executing command: ./ACME_Throw_Warning4739232679497433350.ps1

11:54:03.426 INFO  [ACME Throw Warning] Warning File Contents:

11:54:03.504 INFO  [ACME Throw Warning] No Warning, Have a nice DaySmiley Happy

11:54:03.582 INFO  [ACME Throw Warning] Command completed, took 21778 milliseconds

11:54:03.644 INFO  [null] ***** Workflow Execution Completed *****

==== from a Laptop environment of WFA-VM/DFM-VM/DOT-VSIM  'Download Logs' button after executing above =====

11:55:46.745 INFO  [null] ***** Workflow Execution Started *****

11:55:46.792 INFO  [No-Op Storage] ### Command 'No-Op Storage' ###

11:55:47.915 INFO  [No-Op Storage] Executing command: ./No-Op_Storage8806261441669625881.ps1 -ArrayName 'hq-stor-7m-1.acme.com' -VolumeName 'alpha'

11:55:47.977 INFO  [No-Op Storage] This Command does nothing

11:55:48.009 INFO  [No-Op Storage] Command completed, took 1202 milliseconds

11:55:48.055 INFO  [ACME Grow Qtree] ### Command 'ACME Grow Qtree' ###

11:55:49.132 INFO  [ACME Grow Qtree] Executing command: ./ACME_Grow_Qtree6601099106301598062.ps1 -Array '192.168.126.185' -FreeMarginPct '30' -QtreeName 'pecan' -RefreshNow $true -VolumeName 'alpha'

11:55:49.225 INFO  [ACME Grow Qtree] Get-WfaCredentials -Host 192.168.126.185

11:55:49.272 INFO  [ACME Grow Qtree] Credentials successfully provided for '192.168.126.185'

11:55:49.366 INFO  [ACME Grow Qtree] Connect-NaController (with credentials) -Name 192.168.126.185 -Timeout 60000 -ErrorAction Stop

11:55:50.723 INFO  [ACME Grow Qtree] Connected to controller

11:55:52.330 INFO  [ACME Grow Qtree] Quota Report shows: /vol/alpha/pecan has DiskUsed =  0

11:55:52.376 INFO  [ACME Grow Qtree] Quota Report shows: /vol/alpha/pecan has DiskLimit =  821227520

11:55:52.423 INFO  [ACME Grow Qtree] A new DiskLimit that would be 30 percent above DiskUsed would be: 0

11:55:52.486 INFO  [ACME Grow Qtree] Current DiskLimit is already 30 percent or more above DiskUsed so no need to increase. Exiting without change.

11:55:52.564 INFO  [ACME Grow Qtree] Command completed, took 4493 milliseconds

11:55:52.610 INFO  [ACME Grow Volume] ### Command 'ACME Grow Volume' ###

11:55:53.656 INFO  [ACME Grow Volume] Executing command: ./ACME_Grow_Volume4457885072316574081.ps1 -AggrAllocThresh '90' -AggrCommitThresh '150' -AggrUsedThresh '80' -Array 'hq-stor-7m-1.acme.com' -VolNearFull '85' -VolNearFullFree '15' -VolumeName 'alpha'

11:55:53.749 INFO  [ACME Grow Volume] WFA JobId =  326619

11:55:53.780 INFO  [ACME Grow Volume] WFA Log File =  C:\Temp\WFA\GrowVolume.326619.txt

11:55:53.827 INFO  [ACME Grow Volume] WFA Warning File = C:\Temp\WFA\GrowVolume.Warning.326619.txt

11:55:53.952 INFO  [ACME Grow Volume] Get-WfaCredentials -Host hq-stor-7m-1.acme.com

11:55:53.999 INFO  [ACME Grow Volume] Credentials successfully provided for 'hq-stor-7m-1.acme.com'

11:55:54.108 INFO  [ACME Grow Volume] Connect-NaController (with credentials) -Name hq-stor-7m-1.acme.com -Timeout 60000 -ErrorAction Stop

11:55:55.200 INFO  [ACME Grow Volume] Connected to controller

11:55:56.167 INFO  [ACME Grow Volume] Growing volume alpha on array: hq-stor-7m-1.acme.com

11:55:56.214 INFO  [ACME Grow Volume] Volume Size (without Snapshot reserver): 885239808

11:55:56.261 INFO  [ACME Grow Volume] Volume Used: 398053376

11:55:56.308 INFO  [ACME Grow Volume] Percentage Used: 45

11:55:56.339 INFO  [ACME Grow Volume] Reserve: 0

11:55:56.386 INFO  [ACME Grow Volume] Reserve Used: 0

11:55:56.432 INFO  [ACME Grow Volume] Containing Aggregate: hq_fcal01

11:55:56.479 INFO  [ACME Grow Volume] Volume Percentage Used: 45 is equal to or below the near full margin threshold of 70

11:55:56.510 INFO  [ACME Grow Volume] Command completed, took 3900 milliseconds

11:55:56.557 INFO  [ACME Grow VSM] ### Command 'ACME Grow VSM' ###

11:55:57.571 INFO  [ACME Grow VSM] Executing command: ./ACME_Grow_VSM3341927828708692163.ps1 -AggrAllocThresh '90' -AggrCommitThresh '150' -AggrUsedThresh '80' -Array 'hq-stor-7m-1.acme.com' -VolumeName 'alpha'

11:55:57.649 INFO  [ACME Grow VSM] WFA JobId =  326619

11:55:57.696 INFO  [ACME Grow VSM] WFA Log File =  C:\Temp\WFA\GrowVolume.326619.txt

11:55:57.743 INFO  [ACME Grow VSM] WFA Warning File = C:\Temp\WFA\GrowVolume.Warning.326619.txt

11:55:57.899 INFO  [ACME Grow VSM] Using cached controller connection

11:55:59.271 INFO  [ACME Grow VSM] Determining VSM destinations for volume alpha on array: hq-stor-7m-1.acme.com

11:55:59.599 INFO  [ACME Grow VSM] Volume SnapMirror Source: hq-stor-7m-1.acme.com:alpha

11:55:59.942 INFO  [ACME Grow VSM] Volume: alpha has one VSM destination!

11:55:59.989 INFO  [ACME Grow VSM] Single SnapMirror case

11:56:00.036 INFO  [ACME Grow VSM] SnapMirror Destination : hq-stor-7m-1:sm_alpha

11:56:00.129 INFO  [ACME Grow VSM] Get-WfaCredentials -Host hq-stor-7m-1

11:56:00.192 INFO  [ACME Grow VSM] Credentials successfully provided for 'hq-stor-7m-1'

11:56:00.285 INFO  [ACME Grow VSM] Connect-NaController (with credentials) -Name hq-stor-7m-1 -Timeout 60000 -ErrorAction Stop

11:56:00.972 INFO  [ACME Grow VSM] Connected to controller

11:56:01.565 INFO  [ACME Grow VSM] Destination volume : hq-stor-7m-1:sm_alpha is greater than or equal in size to the source volume

11:56:01.611 INFO  [ACME Grow VSM] Command completed, took 5038 milliseconds

11:56:01.643 INFO  [ACME Throw Warning] ### Command 'ACME Throw Warning' ###

11:56:02.672 INFO  [ACME Throw Warning] Executing command: ./ACME_Throw_Warning7882991818375124909.ps1

11:56:02.750 INFO  [ACME Throw Warning] Warning File Contents:

11:56:02.781 INFO  [ACME Throw Warning] No Warning, Have a nice DaySmiley Happy

11:56:02.828 INFO  [ACME Throw Warning] Command completed, took 1170 milliseconds

11:56:02.859 INFO  [null] ***** Workflow Execution Completed *****

Re: Suggestions for improving workflow performance?

Hi David,

     I suggest you take a look at this thread.

https://communities.netapp.com/thread/34113

Esp try the following 2 things.

1. There is an option in Internet Explorer called "Check for publisher's certificate revocation". Removing Check for publisher's certificate revocation fixed the problem as well.

2. Set a proxy as below

In my case WFA service was running with an active directory account, because I use some workflows commandlet that needs credentials, as get-service for exemple.

So I had to login interactively  with this service account and set proxy setting or set via regedit the correct HKEY_USERS Keys

[S-1-5-21..........\Software\Microsoft\Windows\CurrentVersion\Internet Settings]

"MigrateProxy"=dword:00000001

"ProxyEnable"=dword:00000001

"ProxyHttp1.1"=dword:00000000

"ProxyServer"="http://ProxyServername:80"

Regards

adai



Re: Suggestions for improving workflow performance?

In order to change the Internet Properties such that the WFA services do not check certificate revocation, I created a ‘labad\svcwfa’ service account in the domain with local Administrative permissions.  I then shut-down the WFA database and server services, modified the services to run as user ‘labad\svcwfa’, from the "Control Panel", I updated the Internet Options for ‘labad\svcwfa’ as described above, and then re-started the WFA services.

We re-ran the the workflow and did get improvement in the run-time.  It took 83 seconds ( 1m 23s) to run.  Previously it took 171s (2m 51s) to run the same workflow.

We have basically cut the run time in half which is good progress; however that still seems like a long time for the workflow.  I will look at some of the other performance issues.

Re: Suggestions for improving workflow performance?

All,

DavidSpano and I are working on the same environmental issue. We eliminated the time delays that occurred at the start of every PowerShell Command, but there are still delays on the the order of 20 seconds for each network connection to arrays/controllers. Some details from the environment:

====

11:36:59.598 INFO  [Create volume] ### Command 'Create volume' ###

11:37:02.004 INFO  [Create volume] Executing command: ./Create_volume5239931216029618798.ps1 -AggregateName 'aggr0' -Array '10.35.8.35' -Language 'en_US' -Size '20' -SnapshotScheduleOptions '0 0 0' -State 'online' -VolumeGuarantee 'volume' -VolumeName 'pdc_oravote'

====

This shows just 3 seconds between 1st and 2nd steps (which used to be over 20 seconds) which indicates the above fix (turning 'Check for revocation' OFF) resolved that problem.

HOWEVER:

==== THIS ====

11:26:56.378 INFO  [Create volume] Connect-NaController (with credentials) -Name 10.35.8.35 -Timeout 60000 -ErrorAction Stop

11:27:13.595 INFO  [Create volume] Connected to controller

====

==== AND THIS ==== (later in the same workflow)

11:27:46.372 INFO  [SWA Create export] Using cached controller connection

11:28:04.729 INFO  [SWA Create export] Removing NFS export : /vol/pdc_oracle

====

INDICATES there is a 17 to 18 seconds delay in connecting to a controller ... even when it says "Using cached controller connection" since it still takes the about 18 seconds before the first command executes.

Are there specific ways to improve this or suggestions for how to isolate what is causing this?

Dave,

Environment:

   - WFA 2.2 GA

   - 7-Mode controllers running 7.3.4 and 8.1.3P1

Re: Suggestions for improving workflow performance?

Hi Dave,

Not that I know of. But let me check with the Engg team if there are any hacks.

Regards

adai

Message was edited by: Adaikkappan Arumugam

Re: Suggestions for improving workflow performance?

Hi Dave,

Could you kindly open a CASE with NGS and open a bug as well ?

Regards

adai

Re: Suggestions for improving workflow performance?

Hi Adai, Dave,

This is typically due to windows certificate revocation checks. I have been battling with this on a customer site (up to 100s delays starting up commands) and finally found an approach that works consistently (the suggestions that worked on other sites failed in this instance - I think it's due to their network configuration... connections to the internet are dropped rather than rejected, so they take time to timeout).

Anyway, the solution is to disable certificate revocation checking for the SYSTEM user (which WFA will typically be running as). There are a number of ways this can be achieved - see http://blog.muhimbi.com/2009/04/new-approach-to-solve-sharepoints.html for the most useful post I have found on this. The easiest is to modify the registry. Change:

[HKEY_USERS\S-1-5-18\Software\Microsoft\Windows\CurrentVersion\WinTrust\Trust Providers\Software Publishing]

"State"=dword:00023e00

[The default value is 23c00]

Or you can just import the attached registry file with an appropriately privileged user.

Regards,

Tim

Re: Suggestions for improving workflow performance?

Now that I've read some more of the context of the latest issues, let me suggest checking DNS settings on the filer for slow command startup - it's possible that it's trying to do a reverse lookup on your address for logging purposes.