ONTAP Discussions

Optimal settings for vxfs/vxvm on Solaris 10 and Oracle 11gR2 connected to NetApp Storage via fibre channel

PYTHONMEISTER
5,683 Views

Hello,

 

because of on bug in Oracle 11.2.0.3 we had to disable Veritas ODM, and since then we suffer from poor IO performance.

We switched to Quick IO, which did not remedy the problem.

 

No I was asking myself how and where to tune the system for optimal performance (we cannot use ODM in the near future).

 

First, I think we should stick with Quick IO - the penalty compared to ODM should be in a range from 1-3% from the documentation.

 

Now, our mount options are these:

 

read/write/setuid/devices/delaylog/largefiles/qio/ioerror=mwdisable

 

I was asking my self if we should add: mincache=direct/convosync=direct/nodatainlog, as pointed in many documents on the web.

 

The tunvxfs values for our volumes are all these:

 

root@NB25010 # vxtunefs /dev/vx/dsk/krwdata11dg/krwdata11vol

Filesystem i/o parameters for /oracle/KRW/sapdata11

read_pref_io = 65536

read_nstream = 1

read_unit_io = 65536

write_pref_io = 65536

write_nstream = 1

write_unit_io = 65536

pref_strength = 10

buf_breakup_size = 8388608

discovered_direct_iosz = 262144

max_direct_iosz = 1048576

default_indir_size = 8192

qio_cache_enable = 0

odm_cache_enable = 0

write_throttle = 0

max_diskq = 1048576

initial_extent_size = 8

max_seqio_extent_size = 2048

max_buf_data_size = 8192

hsm_write_prealloc = 0

read_ahead = 1

inode_aging_size = 0

inode_aging_count = 0

fcl_maxalloc = 39031525376

fcl_keeptime = 0

fcl_winterval = 3600

fcl_ointerval = 600

oltp_load = 0

 

So I was asking my self:

read_unit_io = 65536 => should be 4096, because NetApp stripe set in WAFL is 4K

write_unit_io = 65536 => should be 4096, because NetApp stripe set in WAFL is 4K

read_nstream = 1 => should by 14, because 14 data disks in one RAID-DP set

write_nstream = 1 => should by 14, because 14 data disks in one RAID-DP set

max_direct_iosz = 1048576 => should be 4194304, because maxphys in /etc/system

 

/etc/system is this:

set maxphys=8388608

set vxio:vol_maxio=16384

set vxio:vol_maxioctl=131072

set vxio:vol_maxspecialio=16384

set vxio:vol_default_iodelay=10

set vxio:voliomem_chunk_size=131072

set vxio:voliomem_maxpool_sz=134217728

set ssd:ssd_max_throttle=8

 

For completeness:

 

root@NB25010 # fstyp -v /dev/vx/dsk/krwdata11dg/krwdata11vol

vxfs

magic a501fcf5  version 7  ctime Tue Sep 18 15:18:35 2012

logstart 0  logend 0

bsize  1024 size  1257851904 dsize  0  ninode 1257851904  nau 0

defiextsize 0  ilbsize 0  immedlen 96  ndaddr 10

aufirst 0  emap 0  imap 0  iextop 0  istart 0

bstart 0  femap 0  fimap 0  fiextop 0  fistart 0  fbstart 0

nindir 2048  aulen 32768  auimlen 0  auemlen 8

auilen 0  aupad 0  aublocks 32768  maxtier 15

inopb 4  inopau 0  ndiripau 0  iaddrlen 8 bshift 10

inoshift 2  bmask fffffc00  boffmask 3ff  checksum f568c441

oltext1 32  oltext2 1282  oltsize 1  checksum2 0

free 258380170  ifree 0

efree  0 1 0 35 33 35 34 33 33 32 34 35 33 33 7 3 7 8 5 12 3 9 2 6 2 0 0 1 0 0 0 0

 

I guess bsize should by 8192, because of Oracle blocksize = 8192; for the redo log I think we should stick with this.

 

And what about set vxfs:vx_vmodsort=1 ?

 

Any hint is highly appreciated!

1 REPLY 1

nielgreef
5,683 Views

If you are not using ODM , we would suggest to use mincache=direct (ie. direct mounting - no cache).

Could also mount with noatime to speed it up.

Oracle does cache and does this in the SGA.

The problem is that Oracle cache and any filesystem cache is not the same at all.

Filesystems have different blocks sizes and also have to update metadata (inode information and allocation information specific to the filesystem).

If you mount the Oracle database files with cache, you get a "clash" of the 2 caches , and will actually get a slow down.

The mincahce=direct    mount option will mount the VxFS filesystem without filesystem cache. The problem with this is that all other files (on the same mount point and filesystem) will need some cache as well to perform better (Oracle does cache for database tables).

The other option to mount with, is noatime  . This will prevent VxFS from modifying the access time of the DBF (I assume dbf) files. This will also save a little time as the access time does not need to change every time a file is being accessed. (would not do the same with the modification time -- mtime ; for backup purposes).

Now, modsort .....

Any filesystem caches metadata (regardless). So, when the filesystem reads an inode from disk, it will keep this in memory (to access later - disk being 100 times slower than memory).

Big problem is, how many are you going to keep in memory ?

And, when you release some of it, which inodes (or the memory associated with the inode) will you release?

The solution is obviously to find the inodes (in memory) that has been least accessed (or last accessed).

This means that you will have to go through the inodes that you have in memory, sort them by last date/time accessed, and the release the "oldest" ones.

With any sorting (does not matter if you use bubble sort or mod sort or ......) this does take time.

Solaris has actually put all of this sorting in the kernel (so you do not have to do it). Well, at least certain versions...

All you have to do, is tell the filesystem to use this.

This is why you can enable vmodsort for vxfs. Just make sure that you have the correct versions for this (Solaris and VxFS)

Last thing ...

The qio mount option does not actually enable quick IO. All it does is check if you have a license to use it.

To use quickio, you will actually have to make quickio files (please see the VxFS / SFRAC documentation on how to do this or some forums with a quick how to .. like this one : http://www.dba-resources.com/oracle/using-quick-io-files-with-oracle/)

Obviously, if you can upgrade  (to get around the known issue), ODM is a lot more efficient.

Just a bit of background on Quick IO and ODM

---------------------------------------------------------------------

A long time ago, Veritas could look at IO to raw volumes and to filesystems to see what the IO looks like. (size , order, ....)

Veritas then took what they learned, and did quickio to allow VxFS to do IO in a way that is very close to raw performance.

ODM is just a formal specification (from Oracle) for this. So, now any filesystem vendor (zfs or ufs or ...) can write odm routines (that Oracle will call to do IO) and put this into a library and link into Oracle.

Veritas ODM is just this. A library of functions that does IO in a way Oracle expects it to do (not going into too much of the specifications now, but there are a lot and public).

If you do not link the Veritas ODM, Oracle uses the "old" way of doing IO (it calls ODM function calls, but these map back to standard IO routines and system calls)

Sorry, very long answer, but hope that helps

Public