ONTAP Discussions

SnapDrive process failure in Linux

tribadmin
15,079 Views

Hi all,

I'm deploying a new filer and am having some troubles with SnapDrive 4.0 for Linux - specifically CentOS 5.1 x86_64 (fully patched).

snapdrived starts up ok and I can interact with it to the extent of setting the root password for the filer. When I try to perform a filer operation, however, things don't go so well. To start,

[root@db2 log]# snapdrive storage list -all

Status call to SDU daemon failed

[root@db2 log]# ps -ef | grep snapdri
root 7587 1 0 Jul24 ? 00:00:00 snapdrived start
root 11283 7587 0 13:40 ? 00:00:00 [snapdrived] <defunct>

Each re-iteration of a snapdrive storage command will spawn a new defunct process. Commands such as "snapdrive config show" will run fine.

And in sd-trace.log:

13:43:06 07/25/08 [f7f7cb90]?,2,2,Job tag: bEogRP90xw
13:43:06 07/25/08 [f7f7cb90]?,2,2,snapdrive storage list -all
13:43:06 07/25/08 [f7f7cb90]v,2,6,FileSpecOperation::FileSpecOperation: 12
13:43:06 07/25/08 [f7f7cb90]v,2,6,StorageOperation::StorageOperation: 12
13:43:06 07/25/08 [f7f7cb90]i,2,2,Job tag bEogRP90xw
13:43:06 07/25/08 [f7f7cb90]i,2,6,Operation::setUserCred user id from soap context: root
13:43:06 07/25/08 [f7f7cb90]i,2,6,Operation::setUserCred uid:0 gid:0 userName:root
13:43:06 07/25/08 [f7f7cb90]F,0,0,Fatal error: Assertion detected in production code: ../sbl/StorageOperation.cpp:182: Test 'osAssistants.size() == 1' failed

When I strace the snapdrive process I see things conclude with:

connect(3, {sa_family=AF_INET, sin_port=htons(4094), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
send(3, "POST / HTTP/1.1\r\nHost: localhost"..., 1555, 0) = 1555
recv(3, "HTTP/1.1 200 OK\r\nServer: gSOAP/2"..., 65536, 0) = 1722
shutdown(3, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected)
close(3) = 0
write(2, "Status call to SDU daemon failed"..., 33) = 33
munmap(0xf7f7d000, 135168) = 0
exit_group(104) = ?

Which follows what I see on the packet capture side of things where the snapdrived port sends RSTs (no doubt after the child process has gone defunct) after a very limited exchange:

POST / HTTP/1.1
Host: localhoHTTP/1.1 200 OK
Server: gSOAP

Any input appreciated.

Thanks in advance.

44 REPLIES 44

jesseyoung
7,483 Views

I agree completely. I was strung on by support for a few weeks wth common setup question. Once it came down to the nitty gritty they just referred back to the support matrix. They didn't even have the curteousy of checking this forum posting, even when I put it in the ticket.

I happen to have a RHEL system available, my VAR obtained a copy for me to use to prove that SnapDrive does in fact work. Funny thing is, it still doesn't work. It gets hung up in a different area, but doesn't work none the less. I will get the exact same RPM versions installed tonight, and as close of a kernel as possible and get the strace for the app. I'll post the results here in a few hours.

nikhilm
7,483 Views

Please provide the NetApp support Ticket Number.

Thanks

~Nikhil

jesseyoung
7,483 Views

Case 2000172643

nikhilm
7,483 Views

Thanks

jesseyoung
7,483 Views

Here is the strace from a successful snap using RHEL 5.2:

connect(3, {sa_family=AF_INET, sin_port=htons(4094), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
send(3, "POST / HTTP/1.1\r\nHost: localhost"..., 1555, 0) = 1555
recv(3, "HTTP/1.1 200 OK\r\nServer: gSOAP/2"..., 65536, 0) = 1688
shutdown(3, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected)
close(3) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 😎 = 0
rt_sigaction(SIGCHLD, NULL, , 😎 = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 😎 = 0
nanosleep({1, 0}, {1, 0}) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [65536], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
open("/etc/hosts", O_RDONLY) = 4

I have also attached the full strace as a text file.

nicholas4704
7,482 Views

Hello...

I have similar problems.

I need to install Snap Drive for Unix (SDU) on Oracle Enterprise Linux 5.2 (RedHat 5.2 indeed) and iSCSI HBA (qlogic 4050c) and FCP HBA (HP FC2243 (Emulex LP11002 rebrand))

I used Netapp FCP (and iSCSI) Host Utilities for Linux 3.0

The most popular error on all systems is:

admin Error: HBA assistant not found

sanlun fcp show adapter says

sanlun fcp show
WARNING: libHBAAPI.so not found in /usr/lib
Unable to load HBA control library

Similar case is described here for Qlogic FC HBA:

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb45308

But there are no libHBAAPI.so in Qlogic qlaiscsi package for 4050C. I installed qlogic drivers and sansurfer CLI but no that library.

Even for Emulex HBA:

https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb40902

I installed described tools and host utilities and sanlun fcp show adapter all do not want /usr/lib/libHBAAPI.so

but says "No supported adapters present"

In both cases SnapDrive does not want to work with HBA assistant error.

SnapDrive config checker says the following:

Detected Intel/AMD x64 Architecture
Detected Linux OS
Detected Host OS Oracle Enterprise Linux 5
Detected Host OS Oracle Enterprise Linux 5
Detected NFS FileSystem on Linux
Detected FCP on Linux
Detected Ext3 File System
Detected Linux Native LVM1
Detected Linux Native LVM2
Detected Linux Native MPIO

Did not find any supported cluster solutions.
Detected Software iSCSI Linux Initiator Support Kit 3.0

Supported Configurations on this host by SDU Version 4.1
-------------------------------------------------------------
Linux NFS Configuration

Interesting that the ckecker does not support Host Utilities 4.1 and 4.2.

Did anybody have similar problems with SDU and SAN HBAs? The only config where SDU works is SW iSCSI.

Thanks.

senthilk
7,482 Views

This is due to iSCSI Hardware initiator. SDU currently does not support iSCSI Hardware initiator in Linux (Please refer to NetApp InterOperability Matrix). iSCSI Hardware initiator is supported only on Solaris platform.

Regards,

Senthil



nicholas4704
7,482 Views

Thank you.

Probably I'm not very good with new Netapp Compatibility Matrix tool.

And what about FCP? I have the same problems with FCP even with HBAlib installed?

Nick

senthilk
7,401 Views

It should work with FCP.   It could be that some settings are not configured properly.  Please make sure you have followed SnapDrive for Unix Installation and Administration Guide properly.  Please contact NetApp support, if you still have issues.

michaelproact
7,552 Views

Poked around in the snapdrived binary. Apparently it make things differently if it is a Red Hat 4 or Red Hat 5 distribution. It determines what kind of host it is by using /etc/redhat-release.

Changing the contents in that file to:

Red Hat Enterprise Linux Server release 4 (Tikanga)

Makes everything magically work. This is with SDU 4.1 using NFS on a CentOS 5.2 with recent updates. SnapManager for Oracle with Oracle 10gR2 works like a charm too!

This is ofcourse an unsupported lab environment.

markleeuw
7,471 Views

Thanks Michael,

your post made my day. Now I've my proof of concept running, using CentOs 5.1 as a Xen host with VMs (also CentOs 5.1) running a simulator and two Oracle 11g RAC nodes, all controlled by Snapmanager for Oracle.Tested all combinations (NFS, DNFS both directly using the simulators NFS shares as well as ASM on top of NFS shares).

Works great.

Mark

teddgardiner
6,971 Views

This post is directed primarily to the NetApp engineers that check these threads.

I was having the same problems as stated above.  I have installed SnapDrive for Unix(Linux) 4.1 on a RHEL 5.2 server.  The install works fine and the SDU daemon appears to be working properly.  However, when running the majority of the snapdrive commands, I receive that following error:

Status call to SDU daemon failed

I tried troubleshooting this problem for a few days, but didn't have much luck.  Just today though, I saw the post form Michael Mattsson stating that changing the /etc/redhat-release file to read "Red Hat Enterprise Linux Server release 4 (Tikanga)", makes everything magically work.  I tried the fix myself, and sure enough, he is correct.

I have been messing around with SnapDrive in our lab, but we are getting ready to deploy it in our production environments.  I don't want to have to make this change across thousands of servers, especially since it seems to be a rather "dirty" fix.

Is there any supported fix that is being developed or a patch in a later release?  This seems to be a large bug in the SnapDrive code since it claims to be compatible up to RHEL 5.3.

Please let me know.

Thanks.

cmm
NetApp Alumni
6,971 Views

SnapDrive for UNIX v4.1 works with Redhat v5.2.

What was the original value in the /etc/redhat-release file? What were the commands that were failing?

I do not suspect this to be an an issue with the /etc/redhat-release file. We need to analyze the trace logs and system configuration to understand the problem.

Kindly file a ticket with NetApp and provide them with the output of "snapdrive.dc" and linux_info diagnostic scripts.

chriselectricmail
6,971 Views

Good luck with any type of solid support from NetApp regarding SnapDrive for Unix. I've had a ticket open with them for over 2 weeks with absolutely no hint of resolution.

Funny thing is, the snapdrive utility actually creates the iSCSI LUN on the filer as well as the iGroup but then fails to 'discover' the new LUNs after they have been created:

  mapping new lun(s) ... done
  discovering new lun(s) ... *failed*

Not so fun times with NetApp.......

karana
6,971 Views

Please provide the ticket number.



brian_bartlett
6,610 Views

Did you ever find a solution to this problem?  I am having the exact same problem.

jesseyoung
6,970 Views

It's great to see so many other users in the last 6 months that have also wanted this.  I wonder if NetApp will choose to support it's users any time soon.

schaeferm
6,971 Views

We have RHEL 5.3 with Emulex HBA, Host Utils 5.0, Snapdrive 4.1 and were running into the same issue as above.  Netapp has released a fix to make this work, but does not work.  This did fix the problem but we need a supportable fix and this is a hack.  I have heard Netapp is working on officially releasing the "broken" fix soon.  We are trying to figure out what is wrong with our environment that the fix does not work, as they have not released the docs yet.  We are supposed to be on the line with the product managers tonight.

Keep you updated.

jesseyoung
6,971 Views

Mark,

Good luck with the fix.  Luckily RHEL is a supported distribution so you're getting much better support with SnapDrive than those of us using CentOS.  I look forward to hearing how the resolution turns out and more about the 'hack' they suggested you use and their final fix for you.  Perhaps it can shed some light for CentOS users.  Thanks for updating the community, it's interesting to hear this problem is now coming up in a supported distribution.

Jesse

bmerjil01
6,970 Views

OK I think I may have found the solution.

After digging through the snapdrive and snapdrived binaries I noticed that the check it does between RHEL4 and RHEL5 are different.

For RHEL4 it does a 'cat /etc/redhat-release' to get the OS version information.

For RHEL5 id does a 'cat /etc/issue' for the OS information.

As to why this changed I have no idea, just modify '/etc/issue' with:

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

Kernel \r on an \m

Don't forget to set '/etc/redhat-release' back to the original for your system. For RHEL5 it is:

Red Hat Enterprise Linux Server release 5.3 (Tikanga)

Restart snapdrived.

I tested with 'snapdrive storage list -all' and it gave me the correct output.

Hope this helps.

onlinedpt
6,609 Views

My solution may be one year bit late but I have the exact same problem: "Status call to SDU daemon failed" with the latest version of SDU installed. None of the solutions in this post able to resolve it until I found the answer in page 239 of the SDU manual:

http://www.scribd.com/doc/28571140/SnapDrive-4-1-1-Linux-Installation-and-Administration-Guide#page239

Below are the commands:

# export LVM_SUPPRESS_FD_WARNINGS=1

# snapdrived restart

I have also coded this variable change into the system startup script /etc/init.d/snapdrived.

Cheers,

Keith

Public