ONTAP Discussions

Solaris write errors during failover

mrnelgintm
6,645 Views

Hi all,

I have a a number of Solaris 10 boxes that are connected to the Netapp filers.

There are two heads, cross connected and we have two paths to the netapp, so we see a total of 4 paths, 2 primary and 2 secondary.

During a new shelf installation we dropped one of the filer heads. At some point, it appears at the give back, we lost some file systems on 3 of the systems. Here's an example.

Jul 28 09:52:03 daa30304orc002 md_stripe: [ID 641072 kern.warning] WARNING: md: d119: write error on /dev/dsk/c6t60A98000572D44345A34566E33733855d0s6

Jul 28 09:52:03 daa30304orc002 ufs: [ID 702911 kern.warning] WARNING: Error writing master during ufs log roll

Jul 28 09:52:03 daa30304orc002 ufs: [ID 127457 kern.warning] WARNING: ufs log for /oradata/DBUTLPRD/redo01 changed state to Error

Jul 28 09:52:03 daa30304orc002 ufs: [ID 616219 kern.warning] WARNING: Please umount(1M) /oradata/DBUTLPRD/redo01 and run fsck(1M)

Jul 28 09:52:05 daa30304orc002 md_stripe: [ID 641072 kern.warning] WARNING: md: d103: write error on /dev/dsk/c6t60A98000572D44345A344F654F73344Fd0s6

Jul 28 09:52:05 daa30304orc002 ufs: [ID 702911 kern.warning] WARNING: Error writing master during ufs log roll

Jul 28 09:52:05 daa30304orc002 ufs: [ID 127457 kern.warning] WARNING: ufs log for /u06 changed state to Error

Jul 28 09:52:05 daa30304orc002 ufs: [ID 616219 kern.warning] WARNING: Please umount(1M) /u06 and run fsck(1M)

The 2nd head was dropped, then on giveback, we had the same thing with different file systems

Jul 28 10:21:46 daa30304orc002 ufs: [ID 702911 kern.warning] WARNING: Error writing master during ufs log roll

Jul 28 10:21:46 daa30304orc002 ufs: [ID 127457 kern.warning] WARNING: ufs log for /u04 changed state to Error

Jul 28 10:21:46 daa30304orc002 ufs: [ID 616219 kern.warning] WARNING: Please umount(1M) /u04 and run fsck(1M)

Jul 28 10:21:46 daa30304orc002 ufs: [ID 702911 kern.warning] WARNING: Error writing master during ufs log roll

Jul 28 10:21:46 daa30304orc002 ufs: [ID 127457 kern.warning] WARNING: ufs log for /u33 changed state to Error

Jul 28 10:21:46 daa30304orc002 ufs: [ID 616219 kern.warning] WARNING: Please umount(1M) /u33 and run fsck(1M)

Considering that the 2nd head was not dropped until the first head was up and I had checked all 4 were available, there should have been no disconnect.

I've been asked to provide RCA and I'm not really a Solaris person so not quite sure where to start to look at this. I'd be happy to provide any configuration information if you let me know what you need and where to start. The filers are running ontap 7.3.5.1 P1 and the Solaris boxes are using HostUtils 5.0.

Thanks,

Nigel

1 ACCEPTED SOLUTION

Darkstar
6,645 Views

Make sure that you installed the NetApp Host Utilities for Solaris on all hosts. This looks like a timeout issue, i.e. your host loses access to the disk(s) during takeover/giveback for a short period of time.

You can also increase the LUN disconnection timeout yourself (if you know how to do it in Solaris; I don't), or simply install the Host Utilities

-Michael

View solution in original post

3 REPLIES 3

Darkstar
6,646 Views

Make sure that you installed the NetApp Host Utilities for Solaris on all hosts. This looks like a timeout issue, i.e. your host loses access to the disk(s) during takeover/giveback for a short period of time.

You can also increase the LUN disconnection timeout yourself (if you know how to do it in Solaris; I don't), or simply install the Host Utilities

-Michael

mrnelgintm
6,645 Views

As I said, Host Utilties are installed. At least, the software is there. I inherited these boxes. Is there a sure fire way to tell that the installation scripts were actually run to configure the system?

Thanks,

Nigel    

mrnelgintm
6,645 Views

It turns out that the host utilties were installed, however had never been run on the 3 systems that experinces file system write errors. The other that didn't suffer any problems had the NetApp recommended configurations in /kernel/drv/ssd.conf.

Thanks for the help.

Public