not equal number of spares

lukasz_borek · ‎2011-06-24

Hi,

What can caouse such situation : FAS 3160 (A/A + syncmirror)

filerA> vol status -s

Pool1 sparedisks

RAIDDisk Device HA SHELF BAY CHANPool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ------------------- --------------

Spare disksfor block or zoned checksum traditional volumes or aggregates

spare 2b.45 2b 2 13 FC:A 1 FCAL 15000 418000/856064000 420156/860480768

Pool0 sparedisks

RAIDDisk Device HA SHELF BAY CHANPool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ------------------- --------------

Spare disksfor block or zoned checksum traditional volumes or aggregates

spare 2a.61 2a 3 13 FC:A 0 FCAL 15000 418000/856064000 420156/860480768

filerB> vol status -s

Pool1 sparedisks

RAIDDisk Device HA SHELF BAY CHANPool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ------------------- --------------

Spare disksfor block or zoned checksum traditional volumes or aggregates

spare 1c.29 1c 1 13 FC:A 1 FCAL 15000 418000/856064000 420156/860480768

spare 1c.45 1c 2 13 FC:A 1 FCAL 15000 418000/856064000 420156/860480768

spare 1c.58 1c 3 10 FC:A 1 FCAL 15000 418000/856064000 420156/860480768

spare 2c.28 2c 1 12 FC:B 1 FCAL 15000 418000/856064000 420156/860480768

spare 2c.37 2c 2 5 FC:B 1 FCAL 15000 418000/856064000 420156/860480768

Pool0 sparedisks

RAIDDisk Device HA SHELF BAY CHANPool Type RPM Used (MB/blks) Phys (MB/blks)

--------- ------ ------------- ---- ---- ---- ------------------- --------------

Spare disksfor block or zoned checksum traditional volumes or aggregates

spare 1d.29 1d 1 13 FC:A 0 FCAL 15000 418000/856064000 420156/860480768

spare 1d.45 1d 2 13 FC:A 0 FCAL 15000 418000/856064000 420156/860480768

spare 1d.61 1d 3 13 FC:A 0 FCAL 15000 418000/856064000 420156/860480768

spare 1d.77 1d 4 13 FC:A 0 FCAL 15000 418000/856064000 420156/860480768

spare 2d.76 2d 4 12 FC:B 0 FCAL 15000 418000/856064000 420156/860480768

filerA> vol status -f

Broken disks (empty)

filerB> vol status -f

Broken disks (empty)

filerA> disk show -v | grep -ifail

filerB> disk show -v | grep -ifail

1c.58 filerB(151707439) FAILED 3QQ11KNX00009940GUDH [wtf?]

But :

filerA> disk show -v 1c.58

DISK OWNER POOL SERIAL NUMBER

------------------------- ----- -------------

1c.58 filerB (151707439) FAILED 3QQ11KNX00009940GUDH

filerB> disk show -v 1c.58

DISK OWNER POOL SERIAL NUMBER

------------------------- ----- -------------

1c.58 filerB(151707439) Pool1 3QQ11KNX00009940GUDH

My understanding was that both nodes should have same spares visable? Why one filer shows disk 1c.58 as failed and second as healthy?

Darkstar · ‎2011-07-12

I can only answer part of your question:

Each filer has its own spare disks. So it's perfectly normal to see different spare counts for each filer.

However, why the partner filer still thinkgs that 1c.58 is FAILED although the owning filer (filerB) sees the disk as working is beyond me. Maybe a simple "disk fail" followed by "disk unfail" (in diag mode) on filerB can resolve that issue. Otherwise I would open a case with NetApp if it bothers you too much. But as long as the owning filer sees the disk as usable (and not the other way round) it's normally not a problem

-Michael