Hi Igreg, Sorry but this week I have other issues more critical that this one. Next week I will upload the requested files. About your questions: Is there a significant difference between the EMC and Netapp in terms of memory/cache? EMC AX4 has 1 Gbytes of cache, is a entry level storage. Is there a difference in the SGA size when using the different storage? No, SGA size has been the same Is there a difference in the nbr of luns? We have increased the LUN size from 50 Gbytes on EMC to 100 Gbytes on NetApp due our current size. Are the hosts properly configured for the SNA/iSAN stack for the host OS? I hope so , I have reviewed the netapp best practices about it an the system is setup as suggested Be careful in regards to what you read with Metalink note 1500460.1 - there are some unique OS versions that this is impacted upon. As well as it was in regards to an "upgrade" from one version of ONTAP to another. I sense that you are in a "new deployment" based on the thread. This note is the reason about we are working on a 8.0.2 instead a 8.1. This week Oracle has published an upgrade to solve it, I will deploy next weeks. Regards
... View more
Hi, On 17/04 we have added two new disk (same size and performance 600 Gbytes SAS 10k rpm) to the aggregate 1 (10+2/10+2) to increase the number of mechanical disk. Once added the disk we forced a reallocated of the database data volume to spread the information for all the disk After three business days we gather the same performance because of that I think that this values could be more or less stable: Before the change: After the change: My conclusions are: - We have reduce the latency peak from 70 ms to 40 ms and the overall amount of time with latencies bigger than 20 ms - We have increase the throughput peak from 17,5 Mbps to 20 Mbps - We have increased the IOPs peak from 400 to 500 ms Now I am afraid, If I add more disk I will improve the performance or I will return to my previous scenario? Any suggestions? Thank you in advance, regards Regards
... View more
Hi, There is no deduplication operations enabled because for Oracle information located has no report any significative improvement. Output of the command is: STOMTSZRH020> sis status -l No status entry found. Regards
... View more
Hy Henry Around midnight backup process are running, but database backup runs from 00:00-06:00 with a parallelism of 2. and the issue only happens when I export biggest tables and the throughput is more or less the same. We have a test case and we can reproduce this issue in any moment. Take into account that this is a ideal environment, an aggregate connected to a single machine dedicated to run a single database, because of that we have isolate the issue. Regards
... View more
Hi Erick, Thank you about your feedback. I think so, because is a ideal environment, an aggregate connected to a single machine dedicated to run a single database. We have all ready open a case with NetApp Global Support and has been closed by himself with the following conclusion: - To improve the performance of the Controller 1 increase the number of disk to support the load (with 24 disk (12+12) I can not reach the performance of my previous storage EMC with 8 disks) - To improve the performance of the Controller review if I can migrate to Data Ontap 8.1.2 P3 Both operations are non rollback (in a easy way) and I do not know with improvement I could expect. We have added two new disk to the Controller 1 and the improvement has ben more than 50%, please review the following post. Because of that. my sensation is I do not have a mechanical issue (26 disk could not get a 50% improvement than 24 disk), is a Data Ontap bug or the expected behaviour.I have reviewed some forums in with some guys report an unstable performance behavior when you have raid groups lower than 16 disks but I can not found any graph or detailed test. Any suggestion are welcome. Thank you in advance, regards
... View more
Hi Thomas Thanks a lot about your help. I will review your suggested documents and I will be back as soon as I have completed some test on my environments. Regards
... View more
Hi Thomas, Thank you very much four your help, you replays are very useful try to understand my problem. Please, could you suggest me any document, book, what ever about the NetApp architecture, I could not find any reference to the "filesystem transaction log". My second controller has 40 (18+2/18+2) 10k SAS disks, and appear to be not enough to support a database export datapump I have SSD, but I can not create a FlashPool SSDs because Oracle asmlib do not work properly with the NetApp 8.1 release (see metalink alert 1500460.1) we are using like normal pool. Yesterday we have added two new disk to my first controller (10+2/10+2) (one per raid group) and execute a reallocate force of the read volume source, and we have reduce latencies from 50ms to 30ms. NetApp recommend a raid groups of 16 disks I do not know why, please, could you suggest me any document to try to understand the improvement gathered with this change?. Thank you in advance, regards
... View more
Hi Thomas, We have deployed 8.0.2P6 because was the recommended configuration for our environment do it by the partner who install the storage. I will review the compatibility matrix in order to review if my environment is compatible with this release. Yes, I only have issues if the data is large sequential and does not fit into NVRAM, Please, could you explain to me with different there are between "Memory size " reported by sysconfig (8 Gbytes for FAS3240) and "NVMEM III size" reported by nv ( 1 Gbyte) because up to now I was thinking that I have 8 Gbytes of memory cache. Yes we have already opened a performance case at netapp global support, it is opened since January, and the only solution offered up to know is to increase the number of disks because as you can see the disk utilization is 100%, but I can not got any explanation about the root causes of this behavior only in this scenario. Thank you about your feedback, regards
... View more
Hi Igreg Yes I can share with you the AWR and the perfstats, please let me know how can I do it, sorry I am new in this communities To the last question, yes controllers send Autosuport data to NetApp. Thank you about your feedback. Regards
... View more
Hi Thomas, Thank you about your feedback. Yes, we have checked for partition misaligment, all the luns show "Alignment: aligned" for the volume created to storage the database data, the output of one of the LUNs is: LUNs from one controller: /vol/vol_racprod_data/LUN_racprod_A_data0 100g (107374182400) (r/w, online, mapped) Serial#: GoeKc4nt27zN Share: none Space Reservation: enabled Multiprotocol Type: linux Maps: 36=10 38=10 Occupied Size: 97.7g (104879898624) Creation Time: Wed Nov 28 17:19:30 CET 2012 Alignment: aligned Cluster Shared Volume Information: 0x0 LUNs from the other controller: /vol/vol_racprod_B_filesystem/LUN_racprod_B_filesystem0 100g (107374182400) (r/w, online, mapped) Serial#: GoeKcZpe6WXH Share: none Space Reservation: enabled Multiprotocol Type: linux Maps: 38=8 36=9 Occupied Size: 99.9g (107282169856) Creation Time: Tue Feb 26 15:39:55 CET 2013 Alignment: aligned Cluster Shared Volume Information: 0x0 We also have checked the reallocate status for each volume and is under threshold (4). Any other suggestion are welcome. Regards
... View more
Hi all,
I have opened this discussion to request your feedback about a performance issue that we have observer on our production environment. I will try to give to you as must information that I know, sorry if the question is too long.
We have bought a new (2) NetApp FAS3240 to replace out old EMC AX4-5. On one of the controller (1) we have created an aggregate with 2 raid groups of 12 disk (2,5" 10k rpm) in each diskgroup with RAID-DP. On the other controller (2) we have created an aggregate with 2 raid groups of 20 disk (2,5" 10k rpm). Both controller are running NetApp Release 8.0.2P6 7-Mode. Servers access to the filer through FC protocol.
On the aggregate 1 of the "Controller 1" we have create several volume an inside of each volume we have created several LUNs following the NetApp best practices. Nowadays this aggregate is dedicated to run a Oracle RAC database 11.2.0.2 with two nodes, one instance per node. Oracle datafiles are on a volume 1.1 Tbytes with 10 LUNs of 100Gbytes each one. Volumen created without snapshot and thin provision.
After the migration to the new storage, at database level, we have observe an increase in the average latency response time. The following example correspond with our load per week with both storages comparing the week before and after migration:
EMC:
Avg
%Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read 55,997,027 0 460,973 8 4.9 18.5
db file scattered read 10,712,448 0 107,856 10 0.9 4.3
NetApp
Avg
%Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read 54,807,814 0 662,336 12 4.5 17.0
db file scattered read 11,615,111 0 218,657 19 1.0 5.6
First approach to try to improve the performance was to enable the Read Reallocate function for the volumen used by Oracle datafiles because in two weeks the threshold reach the value 4. With this change we have the threshold lower than 3.
With the information gathered with NetApp Management Console we have identified that we have latencies bigger than 100 ms on some period of time as you can see on the attached graph
Oracle Export Data Pump utility appear to be the root cause of this behavior when it execute a export of our biggest tables (30 Gbytes) using direct path way using two parallel process. This process export his biggest tables between 1:30-2:00, if we remove this process we got an improvement on the latencies, as you can see:
We have detected the same performance issue when executing an index rebuild on our biggest tables with a degree of 4 or moving this segment from one tablespace to another.
On the aggregate 1 of the "Controller 1" we have create a volumen of 1.1 Tbytes with 10 LUNs of 100Gbytes each one to create a Oracle ASM Cluster file system (ACFS). On this volume we write the expdp file output. We have a slow performance also in this volume as you can see on the graphs:
Latency:
Throughput
On "controller 2" on the same a aggregate we have created several volume for deploy a SQL Server 2012 Enterprise Edition used by our BI team
Latency
Throughput
My feeling is that the storage has a poor performance when Oracle execute multiblock IO operations, Oracle could execute bigger (in size) multiblock IO operations than SQL Server because of that the issue only happen with Oracle.
Please, Any one running Oracle with a similar environment has detected any similar issue or could give me any feedback about how can I solve this issue.
Thank you in advance, regards
... View more