ONTAP Discussions

Oracle Database 11.2 slow performance on NetApp FAS3240

AXISDATANETWORK
12,763 Views
 
Hi all,
  
I have opened this discussion to request your feedback about a performance issue that we have observer on our production environment. I will try to give to you as must information that I know, sorry if the question is too long.
 
We have bought a new (2) NetApp FAS3240 to replace out old EMC AX4-5. On one of the controller (1) we have created an aggregate with 2 raid groups of 12 disk (2,5" 10k rpm) in each diskgroup with RAID-DP. On the other controller (2) we have created an aggregate with 2 raid groups of 20 disk (2,5" 10k rpm). Both controller are running NetApp Release 8.0.2P6 7-Mode. Servers access to the filer through FC protocol.
  
On the aggregate 1 of the "Controller 1"  we have create several volume an inside of each volume we have created several LUNs following the NetApp best practices. Nowadays this aggregate is dedicated to run a Oracle RAC database 11.2.0.2 with two nodes, one instance per node. Oracle datafiles are on a volume 1.1 Tbytes with 10 LUNs of 100Gbytes each one. Volumen created without snapshot and thin provision.
  
After the migration to the new storage, at database level, we have observe an increase in the average latency response time. The following example correspond with our load per week with both storages comparing the week before and after migration:
  
EMC:
                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read      55,997,027     0    460,973      8      4.9   18.5
db file scattered read       10,712,448     0    107,856      10      0.9    4.3
 
NetApp
                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read      54,807,814     0    662,336      12      4.5   17.0
db file scattered read       11,615,111     0    218,657      19      1.0    5.6
 
First approach to try to improve the performance was to enable the Read Reallocate function for the volumen used by Oracle datafiles because in two weeks the threshold reach the value 4. With this change we have the threshold lower than 3.
 
With the information gathered with NetApp Management Console we have identified that we have latencies bigger than 100 ms on some period of time as you can see on the attached graph
Oracle Export Data Pump utility appear to be the root cause of this behavior when it execute a export of our biggest tables (30 Gbytes) using direct path way using two parallel process. This process export his biggest tables between 1:30-2:00, if we remove this process we got an improvement on the latencies, as you can see:
We have detected the same performance issue when executing an index rebuild on our biggest tables with a degree of 4 or moving this segment from one tablespace to another.
 
On the aggregate 1 of the "Controller 1"  we have create a volumen of 1.1 Tbytes with 10 LUNs of 100Gbytes each one to create a Oracle ASM Cluster file system (ACFS). On this volume we write the expdp file output. We have a slow performance also in this volume as you can see on the graphs:
Latency:

Throughput

On "controller 2" on the same a aggregate we have created several volume for deploy a SQL Server 2012 Enterprise Edition used by our BI team

Latency

 

Throughput

My feeling is that the storage has a poor performance when Oracle execute multiblock IO operations, Oracle could execute bigger (in size) multiblock IO operations than SQL Server because of that the issue only happen with Oracle.

 

Please, Any one running Oracle with a similar environment has detected any similar issue or could give me any feedback about how can I solve this issue.

 

Thank you in advance, regards

26 REPLIES 26

thomas_glodde
4,664 Views

Additionaly you can run a sis status -l and check for start time as well as duration of any dedupe operations.

AXISDATANETWORK
5,108 Views

Hi,

There is no deduplication operations enabled because for Oracle information located has no report any significative improvement.

Output of the command is:

STOMTSZRH020> sis status -l

No status entry found.

Regards

AXISDATANETWORK
5,108 Views

Hy Henry

Around midnight backup process are running, but database backup runs from 00:00-06:00 with a parallelism of 2. and the issue only happens when I export biggest tables and the throughput is more or less the same. We have a test case and we can reproduce this issue in any moment.

Take into account that this is a ideal environment, an aggregate connected to a single machine dedicated to run a single database, because of that we have isolate the issue.

Regards

HENRYPAN2
5,109 Views

Hi Axisdata Network,

You may wish to try Balance to ping point the bully process or escalate this case to the NA performance team for in deep analysis.

Good luck & Good w/e

Henry

lgreg
5,109 Views

Hey,

See if you can upload the AWR report for prior/post change, as well as a perfstat for pre/post....

Use the following:

https://upload.netapp.com/hq/Userfile

And, for the user name - use lgreg@netapp.com

As others pointed out - the environment was quite busy. And just adding 2 disks tot he environment may be an indication that your were on the edge, and that little bit helped. Would be curious to see the wait events in Oracle as well as the type of IO (ie random or sequential)But, lets see if you can upload the AWR and PErfstat..

Questions:

Is there a significant difference between the EMC and Netapp in terms of memory/cache?

Is there a difference in the SGA size when using the different storage?

Is there a difference in the nbr of luns? Are the hosts properly configured for the SNA/iSAN stack for the host OS?

Be careful in regards to what you read with Metalink note 1500460.1 - there are some unique OS versions that this is impacted upon. As well as it was in regards to an "upgrade" from one version of ONTAP to another. I sense that you are in a "new deployment" based on the thread.

But regardless - see if you can upload the AWR reports from the EMC and with where you are now at this point..Would also be good to see perfstat for the pre/post change with the additional disk too..

Regards,

AXISDATANETWORK
5,108 Views

Hi Igreg,

Sorry but this week I have other issues more critical that this one.

Next week I will upload the requested files.

About your questions:

Is there a significant difference between the EMC and Netapp in terms of memory/cache?

EMC AX4 has 1 Gbytes of cache, is a entry level storage.

Is there a difference in the SGA size when using the different storage?

No, SGA size has been the same

Is there a difference in the nbr of luns?

We have increased the LUN size from 50 Gbytes on EMC to 100 Gbytes on NetApp due our current size.

Are the hosts properly configured for the SNA/iSAN stack for the host OS?

I hope so , I have reviewed the netapp best practices about it an the system is setup as suggested

Be careful in regards to what you read with Metalink note 1500460.1 - there are some unique OS versions that this is impacted upon. As well as it was in regards to an "upgrade" from one version of ONTAP to another. I sense that you are in a "new deployment" based on the thread.

This note is the reason about we are working on a 8.0.2 instead a 8.1. This week Oracle has published an upgrade to solve it, I will deploy next weeks.

Regards

Public