SQL database correpting frequently on NetApp 7-mode storage lun.

Naveenpusuluru · ‎2016-10-26

Hi Team,

We have a lun which is hosting SQL database, but frequently we are getting complaints from SQL team saying that database correpted. I am pasting all the properties of volumes below. Can someone please help me to resolve this issue.

/vol/vol_EDCSQLCL01_DATA_SOLW/qtree_EDCSQLCL01_DATA_SOLW/lun_EDCSQLCL01_DATA_SOLW 700.1g (751724789760) (r/w, online, mapped)
                Serial#: 7Sbkp+DIii8i
                Share: none
                Space Reservation: enabled
                Multiprotocol Type: windows_2008
                Maps: igroup_EDCSQLCL01=17
                Occupied Size: 655.7g (704005595136)
                Creation Time: Thu Jun 30 07:50:10 EDT 2016
                Alignment: aligned
                Cluster Shared Volume Information: 0x1
                Space_alloc: disabled
                report-physical-size: enabled

Volume   Tree     Style Oplocks Status    Owning vfiler
-------- -------- ----- -------- --------- -------------
vol_EDCSQLCL01_DATA_SOLW          unix enabled normal    vfiler0
vol_EDCSQLCL01_DATA_SOLW qtree_EDCSQLCL01_DATA_SOLW unix enabled normal    vfiler0

HRM3240FAS01*> vol status vol_EDCSQLCL01_DATA_SOLW
         Volume State           Status            Options
vol_EDCSQLCL01_DATA_SOLW online          raid_dp, flex     fractional_reserve=0
                                64-bit
                         Volume UUID: e6a28b1a-3eb3-11e6-9244-123478563412
                Containing aggregate: 'aggr0'
HRM3240FAS01*>
HRM3240FAS01*>
HRM3240FAS01*>
HRM3240FAS01*> vol options vol_EDCSQLCL01_DATA_SOLW
nosnap=off, nosnapdir=off, minra=off, no_atime_update=off, nvfail=off,
ignore_inconsistent=off, snapmirrored=off, create_ucode=off,
convert_ucode=off, maxdirsize=73400, schedsnapname=ordinal,
fs_size_fixed=off, guarantee=volume, svo_enable=off, svo_checksum=off,
svo_allow_rman=off, svo_reject_errors=off, no_i2p=off,
fractional_reserve=0, extent=off, try_first=volume_grow,
read_realloc=off, snapshot_clone_dependency=off, dlog_hole_reserve=off,
nbu_archival_snap=off

Please help me to resolve this issue.

Naveenpusuluru · ‎2016-10-27

Hi

Can anyone please help me on this issue?

DaemonFF · ‎2016-10-31

Hi!

What kind of protocol You use for presenting the LUN to MSSQL server hardware? iSCSI or FC?

Which version of OnTAP You have?

What kind of issue have Your SQL team?

Naveenpusuluru · ‎2016-10-31

Hi @DaemonFF

We are using FC protocol to allocate this LUN to server.

Ontap version is 8.1.3

Issue : The database is got correcpted frequently. When they moved to physical hard disk on server its working but when they moved it back to storage lun again its correpted.

Application : We are using this Database for solar winds (Private server monitoring tool for our company).

GidonMarcus · ‎2016-10-31

Hi.

Just saw that post randomly and got the light-bulb on. we had a few unexplained corruption on our SolarWinds as well!

i'm going to try and migrate it to our other vendor All Flash storage and see if it stops...

Our environment (if it helps to isolate your case): 8.1.4p2 7Mode, ISCSI, NetApp MPIO Host Util 6.02, SolarWinds Orion Platform 2016.2.100, SQL Cluster 2012SP3CU4, On Windows 2012

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Naveenpusuluru · ‎2016-10-31

Hi @GidonMarcus

We have to find the RCA and resolution why it is happening in NetApp. I don't see any issues with other luns. I am facing this issue with perticular lun only.

GidonMarcus · ‎2016-10-31

Same here. This Filer served dozens of SQL instances in this configuration without any corruptions issues before. Gladly we now migrating off so I will just bring the date forward for that cluster. as you still require to use it, I would have open a case with MSFT to find out the exact place and cause of corruption, if it’s legit the storage should have coop with it. If it’s not legit, should have been stopped in SQL itself….

I would have also leave the storage for last, as you’re on end-of-engineering support of NetApp anyway….

Iv'e asked our DBA to try and find me the table name for you to have more exact match. Hopefully will have it shortly.

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Naveenpusuluru · ‎2016-10-31

Hi @GidonMarcus

Thank you so much, i will be waiting for your reply.

GidonMarcus · ‎2016-10-31

[edit] Googling "consistency errors in table 'MapStudioFiles'" shown a few people hitting it recently...

Message
Executed as user:contoso\SuperDuperUser. Date and time: 2016-10-30 17:00:01 Server: SuperDuperCluster\IT_PROD Version: 11.0.6540.0 Edition: Enterprise Edition (64-bit) Procedure: [DBA].[dbo].[DatabaseIntegrityCheck] Parameters: @Databases = 'Solarwinds', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N', @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'Y', @Execute = 'Y' Source: http://ola.hallengren.com [SQLSTATE 01000] (Message 50000) Date and time: 2016-10-30 17:00:01 Database: [Solarwinds] Status: ONLINE Standby: No Updateability: READ_WRITE User access: MULTI_USER Is accessible: Yes Recovery model: FULL Availability group: SuperDuperDAG Availability group role: PRIMARY [SQLSTATE 01000] (Message 50000) Date and time: 2016-10-30 17:00:01 Command: DBCC CHECKDB ([Solarwinds]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY [SQLSTATE 01000] (Message 50000) Table error: Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594050510848 (type LOB data). The off-row data node at page (1:235176), slot 0, text ID 383725325320192 does not match its reference from page (1:1484658), slot 2. [SQLSTATE 42000] (Error 8961) Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594741784576 (type In-row data): Errors found in off-row data with ID 102250348609536 owned by data record identified by RID = (1:1484658:2) [SQLSTATE 42000] (Error 8929) CHECKDB found 0 allocation errors and 2 consistency errors in table 'MapStudioFiles' (object ID 299148111). [SQLSTATE 01000] (Error 8990) CHECKDB found 0 allocation errors and 2 consistency errors in database 'Solarwinds'. [SQLSTATE 01000] (Error 8989) repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Solarwinds). [SQLSTATE 01000] (Error 8958) Outcome: Failed Duration: 00:09:07 Date and time: 2016-10-30 17:09:08 [SQLSTATE 01000] (Message 50000) Date and time: 2016-10-30 17:09:08 [SQLSTATE 01000] (Message 50000). The step failed.

Message
Executed as user:contoso\SuperDuperUser. Date and time: 2016-10-27 17:00:00 Server: SuperDuperCluster\IT_PROD Version: 11.0.6020.0 Edition: Enterprise Edition (64-bit) Procedure: [DBA].[dbo].[DatabaseIntegrityCheck] Parameters: @Databases = 'Solarwinds', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N', @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'Y', @Execute = 'Y' Source: http://ola.hallengren.com [SQLSTATE 01000] (Message 50000) Date and time: 2016-10-27 17:00:03 Database: [Solarwinds] Status: ONLINE Standby: No Updateability: READ_WRITE User access: MULTI_USER Is accessible: Yes Recovery model: FULL Availability group: SuperDuperDAG Availability group role: PRIMARY [SQLSTATE 01000] (Message 50000) Date and time: 2016-10-27 17:00:03 Command: DBCC CHECKDB ([Solarwinds]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY [SQLSTATE 01000] (Message 50000) Table error: Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594050510848 (type LOB data). The off-row data node at page (1:235176), slot 0, text ID 383725325320192 does not match its reference from page (1:1484658), slot 2. [SQLSTATE 42000] (Error 8961) Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594741784576 (type In-row data): Errors found in off-row data with ID 102250348609536 owned by data record identified by RID = (1:1484658:2) [SQLSTATE 42000] (Error 8929) CHECKDB found 0 allocation errors and 2 consistency errors in table 'MapStudioFiles' (object ID 299148111). [SQLSTATE 01000] (Error 8990) CHECKDB found 0 allocation errors and 2 consistency errors in database 'Solarwinds'. [SQLSTATE 01000] (Error 8989) repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Solarwinds). [SQLSTATE 01000] (Error 8958) Outcome: Failed Duration: 00:10:26 Date and time: 2016-10-27 17:10:29 [SQLSTATE 01000] (Message 50000) Date and time: 2016-10-27 17:10:30 [SQLSTATE 01000] (Message 50000). The step failed.

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Naveenpusuluru · ‎2016-10-31

I think there is a problem with application itself.

DaemonFF · ‎2016-10-31

Hello!

It seems to me the issue is SQL software layer, not NetApp. The first error message indicate the inconsistent between clustered (HA) nodes of MSSQL server. When your SQL team tring to move data on local HDD, DB works in non-HA environment and it has no errors.

Try to run DB on FC-attached disks but without SQL HA.

DaemonFF · ‎2016-10-31

You need to upgrade your OnTAP software to latest version - now 8.2.4P6 is actually. (Don't forget about licenses change!)