Data Backup and Recovery

SnapDrive Unix 4.1.1 on HP/UX not working with LUNs

jont
4,629 Views
Environment: 
- HP-UX 11.31 IA64 March 2009 release
- FAS3170 Fabric MetroCluster
- ONTAP 7.3.1.1L1P2
- FCP LUNs attached via various means (see below)
- SnapDrive for UNIX 4.1.1
- HUK 4.3

I’ve got SMSAP up and running on a HP-UX 11.31 system with the database running on NFS. As part of the PoC, we moved the database over to LUNs (provisioned using the CLI, not SnapDrive). SAP runs just fine, but SnapDrive refuses to talk to the LUNs, and I’m not sure why. Note that the volumes mounted per NFS don't have any of the issues described below.

I’ve tried various means of attaching to the LUNs, all with the same results:
 
Create LUN with CLI, mount directly, without PVlinks and LVM – LUN is mountable, SnapDrive reports errors as below
Create LUN with CLI and configure within PVLinks and LVM (using LV Version 1.0) - LUN is mountable, SnapDrive reports errors as below
Create LUN with SnapDrive – LUN is created but fails to connect
Create LUN with SnapDrive and –nolvm – LUN is created but fails to connect
 
In the cases where the LUN is externally created, snapdrive reports the following error in the sd-trace.log when “snapdrive snap create -fs /test -snapname snap55” is issued:

17:54:42 09/22/09 [1]F,0,0,Fatal error: Assertion detected in production code: ../sbl/SnapshotTOC.cpp:195: Test '!filespecSoftPtrList.empty() || !vdiskSoftPtrList.empty()' failed
17:54:42 09/22/09 [1]v,9,0,ASSISTANT EXECUTION (at 1290.026227): logger -p 3 -t snapdrived[25621] Fatal error: Assertion detected in production code: ../sbl/SnapshotTOC.cpp:195: Test '!filespecSoftPtrList.empty() || !vdiskSoftPtrList.empty()' failed
17:54:42 09/22/09 [1]v,9,0,ASSISTANT EXECUTION (at 1290.043443): Output:

Futher, “snapdrive storage list –all” reports:
 
LUNs not found:
  10.1.197.12:/vol/BAQ_FCP/lun0
  des00630:/vol/BAQData/lun1
  des00630:/vol/SAPBin/lun2
  des00630:/vol/BAQLogs2/lun3
  des00630:/vol/BAQData/lun2
  des00630:/vol/BAQData/lun3

which is a complete list of all the FCP LUNs in use on the system (it has no problem listing the NFS-mounted volumes, however).
 
When trying to create LUNs using SDU, I run into the following problem:
 
snapdrive storage create -fs /test2 -nolvm -lun 10.0.52.29:/vol/BAQData/lun3 -lunsize 1g
or
snapdrive storage create -lun 10.0.52.29:/vol/BAQData/testlun -lunsize 1g -lvol vgTEST/lvol1
 
looking in the sd-trace.log file, the LUN appears in HP-UX as:

/dev/dsk/c27t0d4   /dev/rdsk/c27t0d4 disk    26  0/5/0/0/0/1.2.1.0.0.0.5  sdisk   NO_HW       DEVICE       NETAPP  LUN
/dev/dsk/c29t0d4   /dev/rdsk/c29t0d4 disk    27  0/5/0/0/0/1.2.2.0.0.0.5  sdisk   NO_HW       DEVICE       NETAPP  LUN

(Note the NO_HW state). After SnapDrive is finished, these stale entries remain in the ioscan output.
 
Does any of this look familiar? I have snapdrive DC output and logs of the operations, contact me directly if you'd like to see them.

Cheers,

--Jon

4 REPLIES 4

nagendrk
4,629 Views

Have you configured snapdrive.conf file to reflect your environment?

jont
4,629 Views

Hi,

yes, we configured the snapdrive.conf (see below). Only the default-transport needed to be changed, from iSCSI to FCP. All the other (default) values are correct.

#
# Snapdrive Configuration
#    file: /opt/NetApp/snapdrive/snapdrive.conf
#    Version 4.1.1    (Change 942392 Built Fri Jul 17 05:35:08 PDT 2009)
#
#
# Default values are shown by lines which are commented-out in this file.
# If there is no un-commented-out line in this file relating to a particular value, then
# the default value represented in the commented-out line is what SnapDrive will use.
#
# To change a value:
#
#     -- copy the line that is commented out to another line
#     -- Leave the commented-out line
#     -- Modify the new line to remove the '#' and to set the new value.
#     -- Save the file and exit
#
#audit-log-file="/var/snapdrive/sd-audit.log"  # audit log file
#trace-log-file="/var/snapdrive/sd-trace.log"  # trace log file
#recovery-log-file="/var/snapdrive/sd-recovery.log"  # recovery log file
#client-trace-log-file="/var/snapdrive/sd-client-trace.log"  # client trace log file
#daemon-trace-log-file="/var/snapdrive/sd-daemon-trace.log"  # daemon trace log file
#autosupport-enabled=off  # Enable autosupport (requires autosupport-filer be set)
#autosupport-filer=""  # Filer to use for autosupport (filer must be configured for autosupport)
#audit-log-max-size=20480  # Maximum size (in bytes) of audit log file
#audit-log-save=2  # Number of old copies of audit log file to save
#available-lun-reserve=8  # Number of LUNs for which to reserve host resources
#cluster-operation-timeout-secs=600  # Cluster Operation timeout in seconds
#contact-http-port=80  # HTTP port to contact to access the filer
#contact-http-dfm-port=8088  # HTTP server port to contact to access the DFM
#contact-http-port-sdu-daemon=4094  # HTTP port on which sdu daemon will bind
#contact-https-port-sdu-daemon=4095  # HTTPS port on which sdu daemon will bind
#contact-ssl-port=443  # SSL port to contact to access the filer
#contact-ssl-dfm-port=8488  # SSL server port to contact to access the DFM
#device-retries=3  # Number of retries on Ontap filer LUN device inquiry
#sfsr-polling-frequency=10  # Sleep for the given amount of seconds before attempting SFSR
#device-retry-sleep-secs=1  # Number of seconds between Ontap filer LUN device inquiry retries
#enable-implicit-host-preparation=on  # Enable implicit host preparation for LUN creation
#filer-restore-retries=1440  # Number of retries while doing lun restore
#filer-restore-retry-sleep-secs=15  # Number of secs between retries while restoring lun
#filesystem-freeze-timeout-secs=300  # File system freeze timeout in seconds
#default-noprompt=off  # A default value for -noprompt option in the command line
#mgmt-retries=2  # Number of retries on ManageONTAP control channel
#lun-onlining-in-progress-sleep-secs=3  # Number of secs between retries when lun onlining in progress after VBSR
#lun-onlining-in-progress-retries=40  # Number of retries when lun onlining in progress after VBSR
#mgmt-retry-sleep-secs=2  # Number of seconds between retries on ManageONTAP control channel
#mgmt-retry-sleep-long-secs=90  # Number of seconds between retries on ManageONTAP control channel (failover error)
#prepare-lun-count=16  # Number of LUNs for which to request host preparation
#PATH="/sbin:/usr/sbin:/bin:/usr/bin:/opt/NTAP/SANToolkit/bin:/opt/NetApp/santools/bin:/opt/NetApp/iscsitools/bin:/opt/VRTS/bin"  # toolset search path
#password-file=/opt/NetApp/snapdrive/.pwfile  # location of password file
#sdu-password-file=/opt/NetApp/snapdrive/.sdupw  # location of SDU Daemon and DFM password file
#prefix-filer-lun=""  # Prefix for all filer LUN names internally generated by storage create
#sdu-daemon-certificate-path=/opt/NetApp/snapdrive/snapdrive.pem  # location of https server certificate
#recovery-log-save=20  # Number of old copies of recovery log file to save
#snapcreate-consistency-retries=3  # Number of retries on best-effort snapshot consistency check failure
#snapcreate-consistency-retry-sleep=1  # Number of seconds between best-effort snapshot consistency retries
#snapcreate-must-make-snapinfo-on-qtree=off  # snap create must be able to create snapinfo on qtree
#snapcreate-cg-timeout="relaxed"  # Timeout type used in snapshot creation with Consitency Groups. Possible values are \"urgent\", \"medium\" or \"relaxed\".
#snapcreate-check-nonpersistent-nfs=on  # Check that entries exist in /etc/fstab for specified nfs fs.
#rbac-cache=off  # Use RBAC cache when all DFM servers are down. Active only when rbac-method is dfm.
#enable-split-clone="off"  # Enable split clone volume or lun during connnect/disconnect
#enable-parallel-operations=on  # Enable support for parallel operations
#snapconnect-nfs-removedirectories=off  # NFS snap connect cleaup unwanted dirs;
#snapdelete-delete-rollback-with-snap=off  # Delete all rollback snapshots related to specified snapshot
#snaprestore-snapmirror-check=on  # Enable snapmirror destination volume check in snap restore
#snaprestore-delete-rollback-after-restore=on  # Delete rollback snapshot after a successfull restore
#snaprestore-make-rollback=on  # Create snap rollback before restore
#snaprestore-must-make-rollback=on  # Do not continue 'snap restore' if rollback creation fails
#space-reservations-enabled=on  # Enable space reservations when creating new luns
#flexclone-writereserve-enabled=off  # Enable space reservations during FlexClone creation
#space-reservations-volume-enabled="snapshot"  # Enable space reservation over volume, possible values snapshot, volume, none
#vol-restore="off"  # Method of restoring a volume. Possible values execute, preview and off
#snapmirror-dest-multiple-filervolumes-enabled=off  # Enable snap restore and snap connect commands to deal with snapshots moved to another filer volume (e.g. via SnapMirror) where snapshot spans multiple filers or volumes
default-transport="fcp"  # Transport type to use for storage provisioning, when a decision is needed
#multipathing-type="PVLinks"  # Multipathing software to use when more than one multipathing solution is available. Possible values are 'DMP', 'PVLinks' or 'none'
#fstype="vxfs"  # File system to use when more than one file system is available
#vmtype="lvm"  # Volume manager to use when more than one volume manager is available
#trace-enabled=on  # Enable trace
#secure-communication-among-cluster-nodes=off  # Enable Secure Communication
#trace-level=7  # Trace levels: 1=FatalError; 2=AdminError; 3=CommandError; 4=warning, 5=info, 6=verbose, 7=full
trace-level=7
#trace-log-max-size=10485760  # Maximum size of trace log file in bytes; 0 means one trace log file per command
#trace-log-save=100  # Number of old copies of trace log file to save
#all-access-if-rbac-unspecified=on  # Allow all access if the RBAC permissions file is missing
#san-clone-method="lunclone"  # Clone methods for snap connect: unrestricted, optimal or lunclone
#prefix-clone-name=""  # Prefix string for naming FlexClone
#rbac-method="native"  # Role Based Access Control(RBAC) methods: native or dfm
#dfm-rbac-retries=12  # Number of rbac access retries upon a DFM refresh
#dfm-rbac-retry-sleep-secs=15  # Number of seconds between DFM rbac access retries upon a DFM refresh
#use-https-to-filer=off  # Communication with filer done via HTTPS instead of HTTP
#use-https-to-dfm=off  # Communication with DFM done via HTTPS instead of HTTP
#use-https-to-sdu-daemon=off  # Communication with daemon done via HTTPS instead of HTTP

nagendrk
4,630 Views

Is multi-pathing configured?  If not can you configure the same in snapdrive.conf and restart snapdrived demon?

Also, pls provide the trace log for lun create activity.

soffa
4,629 Views

I ran into this week and I believe I have your answer.  The big clue is the "NO_HW" state of the lun.

To resolve the issue

1:- Run "fcp config" on the controller and get the target_nport id which is a six digit number, for example, 010000

2: On the hpux host run `fcmsutil /dev/fcdx replace_dsk 0x010000`, for Qlogic HBA’s or use /dev/tdx for Tachyon HBA's.  Replace fcdx with appropriate adapter number (eg, fcd0,fcd1)

Do this for each of the "NO_HW" disk.

3:  Run ioscan -fnC disk on the host after executing the respective command, those NetApp LUNs that were in NO_HW will show as CLAIMED.  After this SnapDrive and sanlun should work as expected.

As I understand, this can happen when HPUX is reusing a disk.  I.E. the NetApp is c20txdx and at one time another c20txdx existed on the system.

Thanks to Kiran Raje Urs for the answer,

Sam

Public