Legacy Product Discussions

Ontap upgrade to v6.3

johnchang
12,977 Views

Hi,

I know this may be a tough question to ask since the filer F740 and the os ONTAP v6.1 are quite old but we still use these in our test environment and I'm having some problems that I can't get support through netapp as we ran out of support.

 

My problem is that the ONTAP OS no longer boots and I was trying to upgrade to v6.3. I have downloaded the v6.3 on the floppy disks and tried to install but does not work, can anyone tell me how I would go about installing the new version?

39 REPLIES 39

wstroh
8,433 Views

Hi John!

Do you have the logs from the console that you're seeing when you try to boot?

kusek
8,498 Views

Hello John,

You are in quite a predicament!

But putting the 'supportability' facts to the side, let's try to address what might be going on.

Any errors you're getting, and conditions which lead up to the errors.

What version you were running, and the version you respectively are attempting to upgrade to.

(Are you limited to 6.3? or would an attempt to 6.4.5 work - theoretically given the circumstances ofcourse)

Especially when you say, trying the install does not work - What exactly does not work.

We could be dealing with a function issue, a configuration problem, or something perhaps even addressed in the release notes of going from one version to another.

So, let us know what you can about what your current state is, and lets see if we can get you out of this situation!

Christopher

johnchang
8,498 Views

HI, thank you all foryour replys, much appreciated.

I'll try to explain what is going on here.

I have a 2 - F740 filers that was running ONTAP 6.1R1 and it keeps rebooting itself complaining of various errors. So, what I tried to do was shut one down and use the good disks on one filer. When I did that I looks like I somehow messed up the ONTAP OS?????

So, I downloaded the v6.3 on the diskettes and tried to install it and the process looked good but when the netapp goes into the maintenace mode, it defaults to v6.1. I'm thinking it's doing this because I can't get a prmopt screen to issue the "download" command????

I was able to get lots of logs yesterday. I'll try to insert some of it.

Here are some of the logs in the order they showed up:

Alpha Open Firmware by FirmWorks
Copyright 1995-2000 FirmWorks, Network Appliance. All Rights Reserved.
Firmware release 2.8_a2

Memory size is 512 MB
Testing SIO
Testing LCD
Probing devices
Testing 512MB
16MB chunks

1 to 16MB

...

496 to 512MB

Complete
Finding image...
Loading 3 2 disk@1

100%
Starting Press CTRL-C for special boot menu
................................................................................................................
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.10(0xfffffc00012deee0, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.11(0xfffffc00012df140, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.12(0xfffffc00012df3a0, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.13(0xfffffc00012df600, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.14(0xfffffc00012df860, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.15(0xfffffc00012dfac0, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.

ispfc: Fibre Channel adapter 7 appears to be unattached/disconnected. If
adapter is in use, check cabling and seating of LRC cards in disk shelves.
Thu Jan 1 00:00:00 GMT [ispfc_init_2:error]: Initialization failed on Fibre Channel adapter 7.

NetApp Release 6.1R1: Fri Apr 20 03:08:22 PDT 2001
Copyright (c) 1992-2001 Network Appliance, Inc.
Starting boot on Fri Aug 1 20:22:12 GMT 2008
Fri Aug 1 20:22:42 GMT [net_e0:info]: Ethernet e0: Link up.
raid_checklabels(0): disk 6.2 has time/generation 1217069933/172 whereas disk 6.13 has time/generation 1217604717/176
raid_checklabels(0): disk 6.2 has time/generation 1217069933/172 whereas disk 6.13 has time/generation 1217604717/176
Fri Aug 1 20:22:52 GMT [rc:EMERGENCY]: Root volume vol0, RAID group 0 is missing 3 disks and is unusable.
Fri Aug 1 20:22:52 GMT [rc:CRITICAL]: Volume vol0 is missing 3 disks from RAID group 0 and is being taken offline for this boot.
Fri Aug 1 20:22:52 GMT [rc:ALERT]: Cluster monitor: Root and Mailbox disks are uninitialized on local system.
Fri Aug 1 20:22:52 GMT [rc:ALERT]: Cluster monitor: Root and Mailbox disks are uninitialized on partner system.
Fri Aug 1 20:22:52 GMT [rc:ALERT]: Cluster monitor: Root and Mailbox disks are uninitialized on partner system.
Disk 6.5 is reserved for "hot spare"

1 disk is reserved for "hot spare".
Fri Aug 1 20:22:52 GMT [cf.disk.invalidMailbox:error]: Cluster monitor: partner mailbox disk data invalid
Fri Aug 1 20:22:52 GMT [raid.assim.outofdateNdisks:ALERT]: More than one disk in RAID group 0 of volume vol0 has a valid label not
consistent with all the others. These disks are assumed to be out of date;
their contents cannot be restored.
Fri Aug 1 20:22:53 GMT [rc:error]: There is an out-of-date disk 6.2 originally belonging to volume vol0, RAID group 0
Fri Aug 1 20:22:53 GMT [mgr.boot.noBloop:CRITICAL]: No disks were detected on Fibre Channel port B;
this node will be unable to takeover correctly

PANIC: No root volume found. in process rc on release NetApp Release 6.1R1 on Fri Aug 1 20:22:54 2008

version: NetApp Release 6.1R1: Fri Apr 20 03:08:22 PDT 2001
cc flags: 1
::::::::::halt after panic during system initialization

Then I tried to install the v6.3

Finding image...
Loading isa floppy

Insert Disk #2 of 6 ........ Insert Disk #6 of 6

100%
Starting Press CTRL-C for floppy boot menu
........................................................................................................................................................................................................................................................................................................................................

ispfc: Fibre Channel adapter 7 appears to be unattached/disconnected. If
adapter is in use, check cabling and seating of LRC cards in disk shelves.
Fri Aug 1 20:35:11 GMT [ispfc_init_2:error]: Initialization failed on Fibre Channel adapter 7.

NetApp Release 6.3: Wed Aug 7 23:04:57 PDT 2002
Copyright (c) 1992-2002 Network Appliance, Inc.
Starting boot on Fri Aug 1 20:34:42 GMT 2008
Fri Aug 1 20:35:16 GMT [net_e0:info]: Ethernet e0: Link up.
Swarm Replay Stats Daemon Started
Fri Aug 1 20:35:16 GMT [config.noBloop:CRITICAL]: No disks were detected on Fibre Channel port B; this node will be unable to takeover correctly
This boot is of OS version: NetApp Release 6.3.
The last time this filer booted, it used OS version: NetApp Release 6.1R1.
This boot is of an OS which has RAID version 5 and WAFL version 28;
The last time this filer booted, it used an OS version that contained
RAID version 4 and WAFL version 13.
If you do not want your file system upgraded, choose
Maintenance mode or reboot with the correct
OS version. If you upgrade, and then want to return to the
previous OS, you will need to revert your filesystem.


(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize all disks.
(5) Maintenance mode boot.

Selection (1-5)? Fri Aug 1 20:35:30 GMT [ses_admin:error]:
XL500 shelf 1 on channel 6 reports a critical condition
Power Supply
Element 2: critical status
power supply failed

Fri Aug 1 20:35:31 GMT [ses_admin:error]:
XL500 shelf 0 on channel 6 reports a critical condition
Power Supply
Element 2: critical status
power supply failed

100%
Starting Press CTRL-C for floppy boot menu
........................................................................................................................................................................................................................................................................................................................................

ispfc: Fibre Channel adapter 7 appears to be unattached/disconnected. If
adapter is in use, check cabling and seating of LRC cards in disk shelves.
Fri Aug 1 20:35:11 GMT [ispfc_init_2:error]: Initialization failed on Fibre Channel adapter 7.

NetApp Release 6.3: Wed Aug 7 23:04:57 PDT 2002
Copyright (c) 1992-2002 Network Appliance, Inc.
Starting boot on Fri Aug 1 20:34:42 GMT 2008
Fri Aug 1 20:35:16 GMT [net_e0:info]: Ethernet e0: Link up.
Swarm Replay Stats Daemon Started
Fri Aug 1 20:35:16 GMT [config.noBloop:CRITICAL]: No disks were detected on Fibre Channel port B; this node will be unable to takeover correctly
This boot is of OS version: NetApp Release 6.3.
The last time this filer booted, it used OS version: NetApp Release 6.1R1.
This boot is of an OS which has RAID version 5 and WAFL version 28;
The last time this filer booted, it used an OS version that contained
RAID version 4 and WAFL version 13.
If you do not want your file system upgraded, choose
Maintenance mode or reboot with the correct
OS version. If you upgrade, and then want to return to the
previous OS, you will need to revert your filesystem.


(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize all disks.
(5) Maintenance mode boot.

Selection (1-5)? Fri Aug 1 20:35:30 GMT [ses_admin:error]:
XL500 shelf 1 on channel 6 reports a critical condition
Power Supply
Element 2: critical status
power supply failed

Fri Aug 1 20:35:31 GMT [ses_admin:error]:
XL500 shelf 0 on channel 6 reports a critical condition
Power Supply
Element 2: critical status
power supply failed

There are more errors but maybe we can start from here??? I've also attached all the captured data from the console.

Thanks guys......

kusek
8,433 Views

Wow John,

It looks like your system is going through a number of challenges.

Some of those look like hardware failures actively, or concurrently?

If I'm reading this correctly (and you'd be the best to know) you had loops go offline, power supplies showing as failed in shelves, and more?

Is this correct?

I know you're in a situation rightnow, with the inability to load up a new version and it being unbootable as it looks.

What other circumstances led up to this? did you have any failures, or faliure of hardware over time?

Can you detail the number of items which might be currently failed, and/or can you verify the validity of your cables used for loops, shelves and Power, etc?

Let me know if these are truly failed items and we can see how we can work through this if possible - this is of a particular concern itself!

*** Booting in maintenance mode because of loop inconsistency ...

ispfc: 10 loop-initialization events seen in 60 seconds on Fibre Channel adapter 7: indicates loop stability problem.

Thanks John, and look forward to some details - Hope we can help with this!

Christopher

wstroh
8,498 Views

John -

Let's start with the basics. Please power down the system and reseat the VEM modules in the back of your FC shelves on both the 6 and 7 loops. Additionally, please reseat the cables there as well. Finally, please pull and reseat disk 6.4. We need to NOT floppy boot to 6.3 right now as your filesystem is 6.1. Floppy booting to 6.3 will upgrade raid but not WAFL, and leave us in a much more difficult position.

After you have reseated, please boot into maintenance mode and issue 'vol status -r'. My concern is this line in your console logs:

raid_checklabels(0): disk 6.2 has time/generation 1217069933/172 whereas disk 6.13 has time/generation 1217604717/176

This means that disk 6.2 is out of date in the raid group.

Additionally, it looks like disk 6.4 is being chatty -

6.3 0 -156 16 0 18446744073709501589 18446744073709501589

6.4 0 0 6 0 18446744073709551551 18446744073709551551

6.5 0 0 3000 108 2 2

We have underruns on 6.3 and CRC errors on 6.5 This is a classic example of a disk being the cause of loop instability.

johnchang
8,498 Views

Hi,

This may sound stupid but what is a VEM module?

And thanks so much to all of you who responded.

wstroh
8,498 Views

The VEM module is the module in the back of the shelf that the FCAL cable plugs into. In newer shelves it's a LRC or an ESH, but in your rev, it's called a VEM.

johnchang
8,498 Views

FCAL cable plugs into.

So that would be the fiber cable ports on the back of the filer ?

wstroh
8,498 Views

On the other end

The cables come out of the filer and plug into the shelf. Where they plug into the shelf - that's the VEM module.

johnchang
8,050 Views

Thanks,

I'll try your suggestion later, when I can take my laptop and capture some output to post.

johnchang
8,051 Views

Ok,

So, I took out all the VEM's then put them back in and started the filer and see the following output being repeated:

Alpha Open Firmware by FirmWorks
Copyright 1995-2000 FirmWorks, Network Appliance. All Rights Reserved.
Firmware release 2.8_a2

Memory size is 512 MB
Testing SIO
Testing LCD
Probing devices
Testing 512MB
16MB chunks

1 to 16MB

16 to 32MB

32 to 48MB

48 to 64MB

64 to 80MB

80 to 96MB

96 to 112MB

112 to 128MB

128 to 144MB

144 to 160MB

160 to 176MB

176 to 192MB

192 to 208MB

208 to 224MB

224 to 240MB

240 to 256MB

256 to 272MB

272 to 288MB

288 to 304MB

304 to 320MB

320 to 336MB

336 to 352MB

352 to 368MB

368 to 384MB

384 to 400MB

400 to 416MB

416 to 432MB

432 to 448MB

448 to 464MB

464 to 480MB

480 to 496MB

496 to 512MB

Complete
Finding image...
Loading 3 3 disk@0

Can't open boot device


Startup failed


Alpha Open Firmware by FirmWorks
Copyright 1995-2000 FirmWorks, Network Appliance. All Rights Reserved.
Firmware release 2.8_a2

Memory size is 512 MB
Testing SIO
Testing LCD
Probing devices
Testing 512MB
16MB chunks

1 to 16MB

16 to 32MB

32 to 48MB

48 to 64MB

64 to 80MB

80 to 96MB

96 to 112MB

112 to 128MB

128 to 144MB

144 to 160MB

160 to 176MB

176 to 192MB

192 to 208MB

208 to 224MB

224 to 240MB

240 to 256MB

256 to 272MB

272 to 288MB

288 to 304MB

304 to 320MB

320 to 336MB

336 to 352MB

352 to 368MB

368 to 384MB

384 to 400MB

400 to 416MB

416 to 432MB

432 to 448MB

448 to 464MB

464 to 480MB

480 to 496MB

496 to 512MB

Complete
Finding image...
Loading 3 3 disk@1

Can't open boot device


Startup failed

I don't see all the other errors that I was seeing before.

wstroh
8,051 Views

Alrighty. That's looking good. Well, better at least 🙂

Do you have the 6.1 boot floppies kicking about? What I'd like to do is boot off of those, go into maintenance mode (option 5 on the 1-5 menu), and issue a 'vol status -r' and see what shape the volumes are in.

johnchang
8,051 Views

I don't have the 6.1 floppies,think I can download these. After the downloads, I'll see if I can get to the maintenance mode and issue the command you suggested.

Tks.

johnchang
8,051 Views

Ok, I was able to make the install floppies for the v6.1R1 and after the 4th floppy got loaded it complained of the following:

100%
Starting Press CTRL-C for special boot menu
................................................................................................................
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.10(0xfffffc00012deee0, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.11(0xfffffc00012df140, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.12(0xfffffc00012df3a0, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.13(0xfffffc00012df600, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.14(0xfffffc00012df860, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.
Thu Jan 1 00:00:00 GMT [isp_timeout:warning]: 0b.15(0xfffffc00012dfac0, 0x12): command timeout, quiescing bus to allow outstanding I/O to complete.

NetApp Release 6.1R1: Fri Apr 20 03:08:22 PDT 2001
Copyright (c) 1992-2001 Network Appliance, Inc.
Starting boot on Mon Aug 18 20:24:54 GMT 2008
Mon Aug 18 20:25:09 GMT [net_e0:info]: Ethernet e0: Link up.
Mon Aug 18 20:25:36 GMT [scsi.cmd.pastTimeToLive:error]: Device 7.4: request failed after retry #0: cdb 0x1b.
ispfc: 15 loop-initialization events seen in 60 seconds on Fibre Channel adapter 7: indicates loop stability problem.
Mon Aug 18 20:26:29 GMT [ses_admin:error]: Disk 7.4 not detected by Enclosure Services.
ispfc: 14 loop-initialization events seen in 60 seconds on Fibre Channel adapter 7: indicates loop stability problem.
Mon Aug 18 20:27:10 GMT [ispfc_init_2:warning]: Resetting Fibre Channel adapter 7.
Mon Aug 18 20:27:22 GMT [scsi.cmd.pastTimeToLive:error]: Device 7.4: request failed after retry #0: cdb 0x1b.
Mon Aug 18 20:27:59 GMT [ses_admin:error]: Disk 7.4 not detected by Enclosure Services.
Mon Aug 18 20:27:59 GMT [ses_admin:error]: Possible actual addresses:
7.9 (shelf 1 bay 1)
Drive loop addresses failed consistency check.
Additional information is being syslogged.

ok

wstroh
8,051 Views

okay, looks like we still have loop instability. do you have a sysconfig -r from a previous autosupport by chance? or for that matter do you know which disks made up the root volume?

we have something causing trouble on those loops. what i'd like to do is to tear out whatever is not absolutely necessary to boot this system. i.e. if the root volume is kept on disks 7.1, 7.2, and 7.3, pull off all shelves and loops with the exception of the first shelf on the 7 loop.

make sense?

johnchang
7,874 Views

"we have something causing trouble on those loops. what i'd like to do is to tear out whatever is not absolutely necessary to boot this system. i.e. if the root volume is kept on disks 7.1, 7.2, and 7.3, pull off all shelves and loops with the exception of the first shelf on the 7 loop."

Well here is the problem, I do not know what disk has the OS as the disks have been moved around on the filer. Which was why i was just trying to install a new OS so I wouldn't have to know which disks contained the OS.

Can't I just not install a new OS ?

wstroh
7,874 Views

Ah, I thought we were trying to save the filesystem. You can absolutely just blow the existing OS away and reinstall. Given the loop instability however, I'd recommend the following course of action:

- Power down the system and the shelves

- Uncable all but one shelf

- Floppy boot to the 1-5 menu and choose option '4' (you can use 6.3 for this if we don't care about the existing OS)

- This will ask you to confirm that you wish to zero all disks and install the base 6.3 files

- After zeroing is complete, the filer will run you through a setup program. Once the setup is complete, license your protocols and install the setup.exe or setup.tar package for 6.3

Once you have a stable install, then I would begin adding shelves one at a time. This will allow you to determine where the instability is coming from.

johnchang
7,874 Views

Excellent!!! Have few questions before I start your recommeded procedure:

1. This Filer was in a two filer 4 shelf configuration where one shelf's FC cable was going into the shelf of another system.

BUT right now, I would just like to have one filer and 2 shelfs running.

If this is what I want, then how do I need to reconfigure the FC cables ?

wstroh
7,874 Views

It sounds like you had a cluster set up previously. To change this to a single headed configuration you'll simply need to disconnect any of the cables going to the other node.

For instance, you'll have the 740 head with a FCAL adapter in slot 7. From this adaptor you'll have a cable going into the in-port on the first shelf. You'll then have a DB9/DB9 cable from the out-port on this shelf going to the in-port on the second shelf. Then, you'll place a DB9 terminator on the out-port of the second shelf. The FC9's were auto-terminating, however without knowing what type of shelves you have, we'll just cover all bases. There is no harm in placing a terminator on the FC9 out-port either.

If you have NOW site access, the FC9 installation flyer has a good example of this on page 1

http://now.netapp.com/NOW/knowledge/docs/hardware/filer/fcshf9fy.pdf

The first shelf in the loop should have a disk shelf ID of 0, while the second disk shelf should have a shelf ID of 1. The ID is set via a push-wheel located in the middle of the back of the shelf.

If you are mixing shelves, you'll want the "latest" type of shelf to be the last shelf in the loop. For example, if you are mixing FC7's and FC9's, you'll want the FC7 to be shelf0 and the FC9 to be shelf1.

One final point I would add. The onboard FCAL port on the 7xx series would sometimes cause headaches. I would recommend staying off it if at all possible, just to keep things simple.

That should about cover it. The important thing to keep in mind is the DB9 terminators.

johnchang
7,833 Views

That is correct, the orginal setup was a 2 F740 filer cluster.

Public