ONTAP Discussions

SQL database correpting frequently on NetApp 7-mode storage lun.

Naveenpusuluru
7,024 Views

Hi Team,

 

We have a lun which is hosting SQL database, but frequently we are getting complaints from SQL team saying that database correpted. I am pasting all the properties of volumes below. Can someone please help me to resolve this issue.

 

 /vol/vol_EDCSQLCL01_DATA_SOLW/qtree_EDCSQLCL01_DATA_SOLW/lun_EDCSQLCL01_DATA_SOLW  700.1g (751724789760)  (r/w, online, mapped)
                Serial#: 7Sbkp+DIii8i
                Share: none
                Space Reservation: enabled
                Multiprotocol Type: windows_2008
                Maps: igroup_EDCSQLCL01=17
                Occupied Size:  655.7g (704005595136)
                Creation Time: Thu Jun 30 07:50:10 EDT 2016
                Alignment: aligned
                Cluster Shared Volume Information: 0x1
                Space_alloc: disabled
                report-physical-size: enabled

 

 

Volume   Tree     Style Oplocks  Status    Owning vfiler
-------- -------- ----- -------- --------- -------------
vol_EDCSQLCL01_DATA_SOLW          unix  enabled  normal    vfiler0
vol_EDCSQLCL01_DATA_SOLW qtree_EDCSQLCL01_DATA_SOLW unix  enabled  normal    vfiler0

 

HRM3240FAS01*> vol status vol_EDCSQLCL01_DATA_SOLW
         Volume State           Status            Options
vol_EDCSQLCL01_DATA_SOLW online          raid_dp, flex     fractional_reserve=0
                                64-bit
                         Volume UUID: e6a28b1a-3eb3-11e6-9244-123478563412
                Containing aggregate: 'aggr0'
HRM3240FAS01*>
HRM3240FAS01*>
HRM3240FAS01*>
HRM3240FAS01*> vol options vol_EDCSQLCL01_DATA_SOLW
nosnap=off, nosnapdir=off, minra=off, no_atime_update=off, nvfail=off,
ignore_inconsistent=off, snapmirrored=off, create_ucode=off,
convert_ucode=off, maxdirsize=73400, schedsnapname=ordinal,
fs_size_fixed=off, guarantee=volume, svo_enable=off, svo_checksum=off,
svo_allow_rman=off, svo_reject_errors=off, no_i2p=off,
fractional_reserve=0, extent=off, try_first=volume_grow,
read_realloc=off, snapshot_clone_dependency=off, dlog_hole_reserve=off,
nbu_archival_snap=off

 

 

Please help me to resolve this issue.

11 REPLIES 11

Naveenpusuluru
6,929 Views

Hi

 

 

Can anyone please help me on this issue?

DaemonFF
6,904 Views

Hi!

 

What kind of protocol You use for presenting the LUN to MSSQL server hardware? iSCSI or FC?

 

Which version of OnTAP You have?

 

What kind of issue have Your SQL team?

Naveenpusuluru
6,896 Views

Hi @DaemonFF

 

We are using FC protocol to allocate this LUN to server.

 

Ontap version is 8.1.3

 

Issue : The database is got correcpted frequently. When they moved to physical hard disk on server its working but when they moved it back to storage lun again its correpted.

 

Application : We are using this Database for solar winds (Private server monitoring tool for our company).

GidonMarcus
6,889 Views

Hi.

 

Just saw that post randomly and got the light-bulb on. we had a few unexplained corruption on our SolarWinds as well!

i'm going to try and migrate it to our other vendor All Flash storage and see if  it stops...

 

Our environment (if it helps to isolate your case):  8.1.4p2 7Mode, ISCSI, NetApp MPIO Host Util 6.02, SolarWinds Orion Platform 2016.2.100,   SQL Cluster 2012SP3CU4, On Windows 2012

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Naveenpusuluru
6,881 Views

Hi @GidonMarcus

 

We have to find the RCA and resolution why it is happening in NetApp. I don't see any issues with other luns. I am facing this issue with perticular lun only.

GidonMarcus
6,869 Views

Same here. This Filer served dozens of SQL instances in this configuration without any corruptions issues before. Gladly we now migrating off so I will just bring the date forward for that cluster. as you still require to use it, I would have open a case with MSFT to find out the exact place and cause of corruption, if it’s legit the storage should have coop with it. If it’s not legit, should have been stopped in SQL itself….

 

I would have also leave the storage for last, as you’re on end-of-engineering support of NetApp anyway….

Iv'e  asked our DBA to try and find me the table name for you to have more exact match. Hopefully will have it shortly.

 

Gidi

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Naveenpusuluru
6,866 Views

Hi @GidonMarcus

 

Thank you so much, i will be waiting for your reply.

GidonMarcus
6,845 Views

[edit] Googling "consistency errors in table 'MapStudioFiles'"   shown a few people hitting it recently... 

 

 

 

Message
Executed as user:contoso\SuperDuperUser. Date and time: 2016-10-30 17:00:01  Server: SuperDuperCluster\IT_PROD  Version: 11.0.6540.0  Edition: Enterprise Edition (64-bit)  Procedure: [DBA].[dbo].[DatabaseIntegrityCheck]  Parameters: @Databases = 'Solarwinds', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N', @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'Y', @Execute = 'Y'  Source: http://ola.hallengren.com [SQLSTATE 01000] (Message 50000)  Date and time: 2016-10-30 17:00:01  Database: [Solarwinds]  Status: ONLINE  Standby: No  Updateability: READ_WRITE  User access: MULTI_USER  Is accessible: Yes  Recovery model: FULL  Availability group: SuperDuperDAG  Availability group role: PRIMARY [SQLSTATE 01000] (Message 50000)  Date and time: 2016-10-30 17:00:01  Command: DBCC CHECKDB ([Solarwinds]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY [SQLSTATE 01000] (Message 50000)  Table error: Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594050510848 (type LOB data). The off-row data node at page (1:235176), slot 0, text ID 383725325320192 does not match its reference from page (1:1484658), slot 2. [SQLSTATE 42000] (Error 8961)  Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594741784576 (type In-row data): Errors found in off-row data with ID 102250348609536 owned by data record identified by RID = (1:1484658:2) [SQLSTATE 42000] (Error 8929)  CHECKDB found 0 allocation errors and 2 consistency errors in table 'MapStudioFiles' (object ID 299148111). [SQLSTATE 01000] (Error 8990)  CHECKDB found 0 allocation errors and 2 consistency errors in database 'Solarwinds'. [SQLSTATE 01000] (Error 8989)  repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Solarwinds). [SQLSTATE 01000] (Error 8958)  Outcome: Failed  Duration: 00:09:07  Date and time: 2016-10-30 17:09:08 [SQLSTATE 01000] (Message 50000)  Date and time: 2016-10-30 17:09:08 [SQLSTATE 01000] (Message 50000).  The step failed.


Message
Executed as user:contoso\SuperDuperUser. Date and time: 2016-10-27 17:00:00  Server: SuperDuperCluster\IT_PROD  Version: 11.0.6020.0  Edition: Enterprise Edition (64-bit)  Procedure: [DBA].[dbo].[DatabaseIntegrityCheck]  Parameters: @Databases = 'Solarwinds', @CheckCommands = 'CHECKDB', @PhysicalOnly = 'N', @NoIndex = 'N', @ExtendedLogicalChecks = 'N', @TabLock = 'N', @FileGroups = NULL, @Objects = NULL, @LockTimeout = NULL, @LogToTable = 'Y', @Execute = 'Y'  Source: http://ola.hallengren.com [SQLSTATE 01000] (Message 50000)  Date and time: 2016-10-27 17:00:03  Database: [Solarwinds]  Status: ONLINE  Standby: No  Updateability: READ_WRITE  User access: MULTI_USER  Is accessible: Yes  Recovery model: FULL  Availability group: SuperDuperDAG  Availability group role: PRIMARY [SQLSTATE 01000] (Message 50000)  Date and time: 2016-10-27 17:00:03  Command: DBCC CHECKDB ([Solarwinds]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY [SQLSTATE 01000] (Message 50000)  Table error: Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594050510848 (type LOB data). The off-row data node at page (1:235176), slot 0, text ID 383725325320192 does not match its reference from page (1:1484658), slot 2. [SQLSTATE 42000] (Error 8961)  Object ID 299148111, index ID 1, partition ID 72057594722779136, alloc unit ID 72057594741784576 (type In-row data): Errors found in off-row data with ID 102250348609536 owned by data record identified by RID = (1:1484658:2) [SQLSTATE 42000] (Error 8929)  CHECKDB found 0 allocation errors and 2 consistency errors in table 'MapStudioFiles' (object ID 299148111). [SQLSTATE 01000] (Error 8990)  CHECKDB found 0 allocation errors and 2 consistency errors in database 'Solarwinds'. [SQLSTATE 01000] (Error 8989)  repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (Solarwinds). [SQLSTATE 01000] (Error 8958)  Outcome: Failed  Duration: 00:10:26  Date and time: 2016-10-27 17:10:29 [SQLSTATE 01000] (Message 50000)  Date and time: 2016-10-27 17:10:30 [SQLSTATE 01000] (Message 50000).  The step failed.

Gidi Marcus (Linkedin) - Storage and Microsoft technologies consultant - Hydro IT LTD - UK

Naveenpusuluru
6,842 Views

I think there is a problem with application itself.

DaemonFF
6,217 Views

Hello!

 

It seems to me the issue is SQL software layer, not NetApp. The first error message indicate the inconsistent between clustered (HA) nodes of MSSQL server. When your SQL team tring to move data on local HDD, DB works in non-HA environment and it has no errors.

 

Try to run DB on FC-attached disks but without SQL HA.

DaemonFF
6,217 Views

You need to upgrade your OnTAP software to latest version - now 8.2.4P6 is actually. (Don't forget about licenses change!)

Public