Re: Howto remove Broken disk "label version" after upgrade...

neves_jacques · ‎2011-05-13

I need some help here for this situation:

FAS2020HA running Ontap 7.3.5.1 with 2x FC shelves (DS14mk2).

New DS4243 (24x 2TB) sold with a new FAS2040HA (Ontap 8.x).

I did a hardware upgrade from FAS2020HA to the FAS2040HA.

FAS2040HA is now connected to the FC Shelves (running Ontap 7.3.5.1)

I connected later the DS4243 to the FAS2040HA (running Ontap 7.3.5.1).

When I run "sysconfig -", it reports 6 broken disk:

Broken disks

RAID Disk    Device      HA SHELF BAY CHAN Pool Type RPM Used (MB/blks)    Phys (MB/blks)
---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
label version    0d.01.0     0d    1   0   SA:B   - BSAS 7200 1695466/3472315904 1695759/3472914816
label version    0d.01.2     0d    1   2   SA:B   - BSAS 7200 1695466/3472315904 1695759/3472914816
label version    0d.01.3     0d    1   3   SA:B   - BSAS 7200 1695466/3472315904 1695759/3472914816

.....

Any ideas howto remove Broken disks "label version" ?

Thanks.

JNV

aborzenkov · ‎2011-05-13

Disks were initialized under DataONTAP 8.x; disk labels are not downward compatible, so 7.x does not recognize them. You have two options.

1. Easy one – just upgrade to 8.x. That will recognize your disks again.

2. If you absolutely need to have 7G and not 8.x – downgrade labels to 7.x. For this you will need to assemble your 2040 back as it was at delivery (disconnect DS14 and connect DS4246), boot into 8.x and follow procedure in https://kb.netapp.com/support/index?page=content&id=1012944.

neves_jacques · ‎2011-05-13

Thanks aborzenkov,

Unfortunately everything is back in production and some DS14mk2 shelves has been moved to a second computer room.

Data have been migrated between DS14mk2 and DS4243.

So, I can't boot back on Ontap 8.x and customer is running SnapLock also...

I'm surprised that there is now way to clean the label from Ontap 7G.

I suppose, I will have to ask for a swap disks using maintenance & warranty...

Thanks again for your support.

Regards,

JNV

shaunjurr · ‎2011-05-14

Have you tried to "unfail" the disks?

I know we had this problem when we went from 6.x to 7.x as well and there was a procedure for fixing labels, but I have since had occassion (moving disks from vtls back to filer heads perhaps) where this showed up and I believe I could simply unfail the disks and they got re-initialized.

neves_jacques · ‎2011-05-14

Thanks shaunjurr,

Not yet; I will try soon to "disk unfail -s"...

I will keep you informed.

Regards,

JNV

neves_jacques · ‎2011-05-16

Hello,

I have just tried to do "disk unfail" without success ;-(

Here is below the output:

filer1*> disk unfail -s 0d.01.4
disk unfail -s: Disk 0d.01.4 has a bad raid label version. It will not be overwritten or formatted

Regards,

JNV

neves_jacques · ‎2011-05-16

Hello,

I can not try the procedure : https://kb.netapp.com/support/index?page=content&id=1012944. because I don't have anymore a FAS2040HA running in Ontap 8.0.1.

I've tried to boot on the disks in Ontap 8.x but it failed because the compact flash is on 7G.

I've tried to initialize and reinstall the FAS2040HA with Ontap 8 but it failed too.

Do you know how to upgrade the compact flash from 7.3.5.1 to 8.0.1 using netboot?

Again Thanks.

JNV

aborzenkov · ‎2011-05-16

What happens if you netboot 8.x? Could you

- netboot

- Enter maintenance mode (^C – 5)

- Execute disk show –v

And give complete console output from the start?

neves_jacques · ‎2011-05-16

The system is back in production.

I have to ask for maintenance slot in order to provide the output for "disk show -v"

I can already provide you the output for "disk show" here is below:

---------------------------------------------------------------------------------------------------------------------

LOADER-B> netboot //172.30.1.10/           ~jacquesn/kernel
Loading 172.30.1.10/~jacquesn/kernel:0x200000/3321400 0x52be40/3127448 0x8276d8/434732 0x892000/663552 Entry at 0x0023b540
Found 172.30.1.10/~jacquesn/platform.rc
Loading 172.30.1.10/~jacquesn/platform.ko:0x934000/405216 0x997000/30524 0x99e73c/111624
Loading 172.30.1.10/~jacquesn/rootfs.img.uzip:...0x9ba000/119703552
Loading 172.30.1.10/~jacquesn/platfs.img.uzip:0x7be3000/12177920
Closing network.
Starting program at 0x0023b540
NetApp Data ONTAP 8.0.1 7-Mode
md1: Preloaded image <172.30.1.10/~jacquesn/rootfs.img.uzip> 119703552 bytes at 0x809ba000
md2: Preloaded image <172.30.1.10/~jacquesn/platfs.img.uzip> 12177920 bytes at 0x87be3000
md2.uzip: 1408 x 16384 blocks
md1.uzip: 19200 x 16384 blocks
Copyright (C) 1992-2010 NetApp.
All rights reserved.
Skipped mounting the CFCARD.
*******************************
*                             *
* Press Ctrl-C for Boot Menu. *
*                             *
*******************************
mkdir: /cfcard/cores: Read-only file system

[May 16 11:10:52]: 0x8055000: ERR: RpcConnectionCache: RpcConnectionCache failed to create client handle to host localhost prog 536873474 vers 1 netclass tcp port 0 timeout 2:0 start 1305544252:497034 finish 1305544252:500621

[May 16 11:10:52]: 0x8055000: ERR: spmctl: main:spmctl.cc:541 Failed to connect to spm. RPC: Program not registered
The Boot Menu will be presented because an alternate boot method was specified.

Please choose one of the following:

(1) Normal Boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Clean configuration and initialize all disks.
(5) Maintenance mode boot.
(6) Update flash from backup config.
(7) Install new software first.
(8) Reboot node.
Selection (1-8)? 5
init_northbridge: WARNING freebsd did not set CMD_PARITY_CHECK_ENABLE
Mon May 16 11:11:38 GMT [fci.initialization.failed:error]: Initialization failed on Fibre Channel adapter 0b.
nvmem2_dblade_initialization: nv_base=0x9e200000, nv_size=0MB
Mon May 16 11:11:38 GMT [kern.version.change:notice]: Data ONTAP kernel version was changed from NetApp Release 7.3.5.1 to NetApp Release 8.0.1.
Mon May 16 11:11:40 GMT [netif.linkUp:info]: Ethernet e0P: Link up.
Mon May 16 11:11:41 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
arp_rtrequest: bad gateway 127.0.20.1 add host 127.0.1(!AF_LINK)
0.1: gateway 127.0.20.1
Mon May 16 11:11:42 GMT [netif.linkUp:info]: Ethernet e0a: Link up.
5

    You have selected the maintenance boot option:
    the system has booted in maintenance mode allowing the
    following operations to be performed:

    ?                     disk
    key_manager           fcadmin
    fcstat                sasadmin
    sasstat               acpadmin
    halt                  help
    ifconfig              raid_config
    storage               sesdiag
    sysconfig             vmservices
    version               vol
    aggr                  sldiag
    dumpblock             environment
    sMon May 16 11:11:43 GMT [netif.linkUp:info]: Ethernet e0b: Link up.
ystemshell           vol_db
    led_on                led_off
    sata                  acorn
    scsi                  nv8
    disk_list             fctest
    disktest              diskcopy
    vsa                   xortest
    faltest               disk_mung

Type "help <command>" for more details.

    In a High Availablity configuration, you MUST ensure that the
    partner node is (and remains) down, or that takeover is manually
    disabled on the partner node, because High Availability
    software is not started or fully enabled in Maintenance mode.

FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED

NOTE: It is okay to use 'show/status' sub-commands such as
'disk show or aggr status' in Maintenance mode while the partner is up
Continue with boot? yMon May 16 11:11:45 GMT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Mon May 16 11:11:45 GMT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.

Mon May 16 11:11:46 GMT [ses.status.psWarning:warning]: DS12-ESAS shelf 0 on channel 0c power warning for Power supply 1: non-critical status; DC undervoltage fault. This module is on the rear side of the shelf, at the at the left.
Mon May 16 11:11:46 GMT [ses.status.psWarning:warning]: DS12-ESAS shelf 0 on channel 0c power warning for Power supply 2: non-critical status; DC undervoltage fault. This module is on the rear side of the shelf, at the at the right.
y

Mon May 16 11:11:46 GMT [mgr.boot.reason_ok:notice]: System rebooted after a halt command.
Mon May 16 11:11:46 GMT [callhome.reboot.halt:info]: Call home for REBOOT (halt command)
==>shelf name 0c shelf_id 0
==>sensor count 18
==>0: type(2) id(1) reading(1) state (1)
==>1: type(2) id(2) reading(1) state (1)
==>2: type(3) id(1) reading(5640) state (1)
==>3: type(3) id(2) reading(5530) state (1)
==>4: type(4) id(1) reading(-20) state (1)
==>5: type(4) id(2) reading(-20) state (1)
==>6: type(4) id(3) reading(-20) state (1)
==>7: type(4) id(4) reading(-20) state (1)
==>8: type(18) id(1) reading(12170) state (1)
==>9: type(18) id(2) reading(5140) state (1)
==>10: type(18) id(3) reading(3580) state (1)
==>11: type(18) id(4) reading(12120) state (1)
==>12: type(18) id(5) reading(5140) state (1)
==>13: type(18) id(6) reading(3550) state (1)
==>14: type(19) id(1) reading(3070) state (1)
==>15: type(19) id(2) reading(180) state (1)
==>16: type(19) id(3) reading(3010) state (1)
==>17: type(19) id(4) reading(220) state (1)
ses_dblade_init success
Mon May 16 11:11:51 GMT [monitor.chassisTemperature.ok:notice]: Chassis temperature is ok
Mon May 16 11:11:57 GMT [monitor.chassisTemperature.ok:notice]: Chassis temperature is ok
filter sync'd

Mon May 16 11:11:59 UTC 2011
Mon May 16 11:11:59 GMT [console_login_mgr:info]: root logged in from console
Mon May 16 11:11:59 GMT [rlmauth_login_mgr:info]: root logged in from SP NONE
*> disk show
Local System ID: 142228665

DISK       OWNER                      POOL   SERIAL NUMBER         HOME
------------ -------------              ----- -------------         -------------
0d.01.5      drp-cffiler1(142228621)    Pool0 WD-WMAY01069561       drp-cffiler1(142228621)
0d.01.4      drp-cffiler1(142228621)    Pool0 WD-WMAY01237611       drp-cffiler1(142228621)
0d.01.0                (142228665)    Pool0 WD-WMAY01237509                 (142228665)
0d.01.2                (142228665)    Pool0 WD-WMAY01067648                 (142228665)
0d.01.3                (142228665)    Pool0 WD-WMAY01245303                 (142228665)

---------------------------------------------------------------------------------------------------------------------

Regards,

JNV

aborzenkov · ‎2011-05-16

There is no need to do “disk show –v” additionally.

Is it the same DS4243? In this case disks are apparently correctly recognized under DOT 8.0.1. So what exactly is your question now? What have you done and what did not work?

If you do not need information on this shelf, I’d now go for option 4 to reinstall DOT 8.0.1 and do revert right after that.

neves_jacques · ‎2011-05-16

Yes, it's the same DS4243.

When I boot normally the FAS2040HA on Ontap 7G I see " Broken disk label version" here below:

----------------------------------------------------

Broken disks

RAID Disk    Device      HA SHELF BAY CHAN Pool Type RPM Used (MB/blks)    Phys (MB/blks)
---------    ------      ------------- ---- ---- ---- ----- --------------    --------------
label version    0d.01.4     0d    1   4   SA:A   - BSAS 7200 1695466/3472315904 1695759/3472914816

....

----------------------------------------------------

I wish to use these disks with FAS2040HA/Ontap 7G.

My question is: How to remove "Broken disks label version"?

I have already tried to reinstall DOT 8.0.1 with option 4 and to revert right after but I failed here is the output:

-------------------------------------------

LOADER-A> netboot hr   ttp://172.30.1.10/~jacquesn/kernel
Loading 172.30.1.10/~jacquesn/kernel:0x200000/3321400 0x52be40/3127448 0x8276d8/434732 0x892000/663552 Entry at 0x0023b540
Found 172.30.1.10/~jacquesn/platform.rc
Loading 172.30.1.10/~jacquesn/platform.ko:0x934000/405216 0x997000/30524 0x99e73c/111624
Loading 172.30.1.10/~jacquesn/rootfs.img.uzip:...0x9ba000/119703552
Loading 172.30.1.10/~jacquesn/platfs.img.uzip:0x7be3000/12177920
Closing network.
Starting program at 0x0023b540
NetApp Data ONTAP 8.0.1 7-Mode
md1: Preloaded image <172.30.1.10/~jacquesn/rootfs.img.uzip> 119703552 bytes at 0x809ba000
md2: Preloaded image <172.30.1.10/~jacquesn/platfs.img.uzip> 12177920 bytes at 0x87be3000
md2.uzip: 1408 x 16384 blocks
md1.uzip: 19200 x 16384 blocks
Copyright (C) 1992-2010 NetApp.
All rights reserved.
Skipped mounting the CFCARD.
*******************************
*                             *
* Press Ctrl-C for Boot Menu. *
*                             *
*******************************
^CThe Boot Menu will be presented because an alternate boot method was specified.
mkdir: /cfcard/cores: Read-only file system

Please choose one of the following:

(1) Normal Boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Clean configuration and initialize all disks.
(5) Maintenance mode boot.
(6) Update flash from backup config.
(7) Install new software first.
(8) Reboot node.
Selection (1-8)? 4

Zero disks, reset config and install a new file system?: y

This will erase all the data on the disks, are you sure?: y

Rebooting to finish wipeconfig request.
Skipped backing up /var file system to CF.
Uptime: 1m12s
System rebooting...

AMI BIOS8 Modular BIOS
Copyright (C) 1985-2009, American Megatrends, Inc. All Rights Reserved
Portions Copyright (C) 2009 NetApp, Inc. All Rights Reserved
BIOS Version 6.1
+++++++++++++++++++++++++++++++++++

4096MB RAM installed
CPU Type: Intel(R) Xeon(R) CPU @ 1.66GHz

Starting AUTOBOOT press Ctrl-C to abort...
Loading X86_ELF/kernel/primary.krn:.......0x200000/44314868 0x2c430f4/14474492 0x3a10df0/1414090 0x3b6a1ba/6 Entry at 0x00200000
Starting program at 0x00200000
Press CTRL-C for special boot menu
The platform doesn't support service processor

NetApp Release 7.3.5.1: Sat Jan 29 12:45:56 PST 2011
Copyright (c) 1992-2010 NetApp.
Starting boot on Mon May 16 12:24:26 GMT 2011
Mon May 16 12:24:41 GMT [fci.initialization.failed:error]: Initialization failed on Fibre Channel adapter 0b.
Mon May 16 12:24:41 GMT [nvram.battery.turned.on:info]: The NVRAM battery is turned ON. It is turned OFF during system shutdown.
Mon May 16 12:24:45 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
Mon May 16 12:24:47 GMT [config.noPartnerDisks:CRITICAL]: No disks were detected for the partner; this node will be unable to takeover correctly
Mon May 16 12:24:48 GMT [fmmb.instStat.change:info]: no mailbox instance on local side.
Mon May 16 12:24:48 GMT [fmmb.instStat.change:info]: no mailbox instance on partner side.
Mon May 16 12:24:48 GMT [cf.fm.noMBDisksOrIc:warning]: Could not find the local mailbox disks. Could not determine the firmware state of the partner through the cluster interconnect.
Mon May 16 12:24:48 GMT [raid.assim.disk.badlabelversion:error]: Disk 0d.01.1 Shelf ? Bay ? [NETAPP   X306_WMANT02TSSM NA04] S/N [WD-WMAY01067162] has raid label with version (10), which is not within the currently supported range (5 - 9). Please contact NetApp Global Services.
Mon May 16 12:24:48 GMT [raid.assim.disk.badlabelversion:error]: Disk 0d.01.4 Shelf ? Bay ? [NETAPP   X306_WMANT02TSSM NA04] S/N [WD-WMAY01237611] has raid label with version (10), which is not within the currently supported range (5 - 9). Please contact NetApp Global Services.
Mon May 16 12:24:48 GMT [raid.assim.disk.badlabelversion:error]: Disk 0d.01.5 Shelf ? Bay ? [NETAPP   X306_WMANT02TSSM NA04] S/N [WD-WMAY01069561] has raid label with version (10), which is not within the currently supported range (5 - 9). Please contact NetApp Global Services.
Mon May 16 12:24:49 GMT [coredump.spare.none:info]: No sparecore disk was found.
Mon May 16 12:24:49 GMT [raid.assim.tree.noRootVol:error]: No usable root volume was found!
disk:L#: rtID/pl
PANIC: raid: there are no data or parity disks in process rc on release NetApp Release 7.3.5.1 on Mon May 16 12:24:49 GMT 2011

version: NetApp Release 7.3.5.1: Sat Jan 29 12:45:56 PST 2011
cc flags: 8O

DUMPCORE: START

DUMPCORE: END -- coredump *NOT* written.
halt after panic during system initialization

--------------------------------------------------------------------------------------------------------------------

Regards,

JNV

aborzenkov · ‎2011-05-16

Have you tried to abort AUTOBOOT (message Starting AUTOBOOT press Ctrl-C to abort...) second time and netboot once more but just let it boot normally this time? There are some chances that it will continue with disks initialization. After all, it should not matter from which media NetApp was booted … hopefully ☺

aborzenkov · ‎2013-10-22

1. Put disks in system with 8.x

2. Make them spares (delete any leftover aggregates)

3. Zero spares

4. Remove disk ownership

After this you should be able to add them to any version.

allison · ‎2015-06-10

Content has been removed due to the potential of irreversible system damage.

Please refer to NetApp KB Article "How to repurpose physical storage" for supported methods to resolve this issue. If you still need assistance, please open a NetApp Support case.