VSeries Panic

JMALMAGRO · ‎2014-08-19

Hello

I have a V3210 with Data OnTap 8.1.4P1

Yesterday, the AC failed, and the HP EVA goes offline, and one of the controllers was offline.

When I tried to start the controller, I get this errors:

Aug 18 18:50:08 [localhost:raid.cksum.wc.blkErr:EMERGENCY]: Checksum error due to wafl context mismatch on volume vm_win09_eva, Disk /aggr_fc/plex0/rg0/SWFC-A1:3.126L11 Shelf - Bay - [HP HSV300 0953] S/N [600508B40008ED290000D00002E0000PANIC : raidtop_map: aggr aggr_fc (max vbn 1112666240): vbn 262000935028, no matching range

version: 8.1.4P1: Tue Feb 11 23:23:31 PST 2014

conf : x86_64

cpuid = 0

Uptime: 48s

PANIC: raidtop_map: aggr aggr_fc (max vbn 1112666240): vbn 262000935028, no matching range in SK process wafl_cppri on release 8.1.4P1 on Mon Aug 18 18:50:08 GMT 2014

I only could start the controller in maintenaince mode, and offlined the aggregate, so the controller starts

Each time I online that aggregate, the controller panics and halt.

NetApp support says that the problem is in EVA side, but in Command View I only can see green spots and status OK.

So, can I run a wafl_check or wafliron on the aggregate?

I can't find any information of the message, so I'm little lost....

Can somebody help me?

Best regards and thanks in advance

José Miguel

billshaffer · ‎2014-08-19

Did NetApp offer anything else besides "the problem is on the EVA side"? I can buy that - maybe data got corrupted when the array lost power - but NetApp should still be able to offer some recovery suggestions - like telling you whether or not you can run wafl_check. With that error, they should be able to give you a pretty good idea of what, exactly, the problem on the EVA side is.

Bill

JMALMAGRO · ‎2014-08-19

Hello Bill

Thanks for the answer

I finally started in mainteinance mode, offlined the aggregate that has the problem and ran a wafliron on that aggregate.

It took about 1 hour to 99% and about two hours more to 100%

The process finished deleting some "bad blocks" but I could online the aggregate and started to register VMs

So, the next step should be migrate all the data to another aggrs and recreate it with new vDisks from EVA

Thanks for all

José Miguel

billshaffer · ‎2014-08-20

Goog to hear, Jose. Why migrate the data and recreate on new vdisks? Sounds like the problem was corruption due to the power drop, not a problem with the EVA or the current vdisks, right? The wafliron should have sorted everything out.

Bill