ONTAP Hardware
ONTAP Hardware
Hi,
I am attempting to perform a local ndmpcopy (i.e. within the same filer) of a subdirectory within a qtree from one volume to another, but receiving a timeout error. Any help or assistance would be hugely appreciated!
First, some background info:
NetApp FAS3240-AE
ONTAP 8.1.4P1 7-mode
64-bit aggregates & volumes
3TB SATA disks. Volumes are 46TB, aggregates 62TB. Data is served via CIFS, a low amount of network IO (under 1Gbps)
TOASTER2> ndmpcopy -sa root:netapp01 -da root:netapp01 172.28.12.157:/vol/vol_02_02/qt1/1099_resources 172.28.12.157:/vol/vol_02_10/qt1/1099_resources
Ndmpcopy: Starting copy [ 6 ] ...
Ndmpcopy: 172.28.12.157: Notify: Connection established
Ndmpcopy: 172.28.12.157: Notify: Connection established
Ndmpcopy: 172.28.12.157: Connect: Authentication successful
Ndmpcopy: 172.28.12.157: Connect: Authentication successful
Ndmpcopy: 172.28.12.157: Log: DUMP: creating "/vol/vol_02_02/../snapshot_for_backup.6" snapshot.
Ndmpcopy: 172.28.12.157: Log: DUMP: Using subtree dump
Ndmpcopy: 172.28.12.157: Log: DUMP: Date of this level 0 dump: Thu Sep 4 10:23:27 2014.
Ndmpcopy: 172.28.12.157: Log: DUMP: Date of last level 0 dump: the epoch.
Ndmpcopy: 172.28.12.157: Log: DUMP: Dumping /vol/vol_02_02/qt1/1099_resources to NDMP connection
Ndmpcopy: 172.28.12.157: Log: DUMP: mapping (Pass I)[regular files]
The directory being moved is about 8TB in size and has roughly 10 million files.
We have performed this sort of transfer successfully twice in this environment already on similar directory migrations.
After about 14 hours (without progressing any further) it returns the following error:
Not heard from other end. Timing out
Ndmpcopy: Notify: Transfer failed
Ndmpcopy: Done
TOASTER2>
The first time I ran the transfer (and on previous occasions), the system has defaulted to e0M (which I believe shouldn't matter as it is a local transfer)
The network interface I specified for the most recent attempt is a VLAN on a 2 x 10g LACP interface group, just to try and eliminate e0M as a bottleneck.
Below are the log files (note, the Error in data_listen_reply is very early on in the transfer)
TOASTER2>rdfile /etc/log/ndmpcopy.20140904
Thu Sep 4 10:20:36 EST 2014 [ndmpc:5]: Starting ndmpcopy operation
Thu Sep 4 10:20:38 EST 2014 [ndmpc:5]: Source version: 4
Thu Sep 4 10:20:38 EST 2014 [ndmpc:5]: Destination version: 4
Thu Sep 4 10:20:38 EST 2014 [ndmpc:5]: Ndmpcopy: Auto negotiated to version 4
Thu Sep 4 10:20:38 EST 2014 [ndmpc:5]: Error in data_listen_reply: Error: unknown error
Thu Sep 4 10:23:24 EST 2014 [ndmpc:6]: Starting ndmpcopy operation
Thu Sep 4 10:23:26 EST 2014 [ndmpc:6]: Source version: 4
Thu Sep 4 10:23:26 EST 2014 [ndmpc:6]: Destination version: 4
Thu Sep 4 10:23:26 EST 2014 [ndmpc:6]: Ndmpcopy: Auto negotiated to version 4
TOASTER2> rdfile /etc/log/ndmpcopy.20140903
Wed Sep 3 13:30:13 EST 2014 [ndmpc:4]: Starting ndmpcopy operation
Wed Sep 3 13:30:15 EST 2014 [ndmpc:4]: Source version: 4
Wed Sep 3 13:30:15 EST 2014 [ndmpc:4]: Destination version: 4
Wed Sep 3 13:30:15 EST 2014 [ndmpc:4]: Ndmpcopy: Auto negotiated to version 4
Wed Sep 3 13:30:15 EST 2014 [ndmpc:4]: Error in data_listen_reply: Error: unknown error
TOASTER2> rdfile /etc/log/ndmpcopy.20140905
Fri Sep 5 07:32:40 EST 2014 [ndmpc:7]: Starting ndmpcopy operation
Fri Sep 5 07:32:42 EST 2014 [ndmpc:7]: Source version: 4
Fri Sep 5 07:32:42 EST 2014 [ndmpc:7]: Destination version: 4
Fri Sep 5 07:32:42 EST 2014 [ndmpc:7]: Ndmpcopy: Auto negotiated to version 4
The next command was run about 10 minutes after the ndmpcopy was commenced. The data counter didn't seem to increase when I ran the same command a few hours later.
TOASTER2> ndmpd probe
ndmpd ON.
Session: 96166
isActive: TRUE
protocol version: 4
effHost: Local
authorized: TRUE
client addr: 172.28.12.157:35360
spt.device_id: none
spt.ha: -1
spt.scsi_id: -1
spt.scsi_lun: -1
tape.device: not open
tape.mode: Read only
mover state: Idle
mover.mode: Data
mover.pauseReason N/A
mover.haltReason N/A
mover.recordSize: 0
mover.recordNum: 0
mover.bytesMoved: 0
mover.seekPosition: 0
mover.bytesLeftToRead: 0
mover.windowOffset: 0
mover.windowLength: 0
mover.position: 0
mover.setRecordSizeFlag: false
mover.setWindowFlag: false
mover.connect.addr_type: LOCAL
data.operation: Backup
data.state: Active
data.haltReason: N/A
data.connect.addr_type: TCP
data.connect.addr: 172.28.12.157
data.connect.port: 26198
data.bytesProcessed: 1024
Session: 96167
isActive: TRUE
protocol version: 4
effHost: Local
authorized: TRUE
client addr: 172.28.12.157:35361
spt.device_id: none
spt.ha: -1
spt.scsi_id: -1
spt.scsi_lun: -1
tape.device: not open
tape.mode: Read only
mover state: Idle
mover.mode: Data
mover.pauseReason N/A
mover.haltReason N/A
mover.recordSize: 0
mover.recordNum: 0
mover.bytesMoved: 0
mover.seekPosition: 0
mover.bytesLeftToRead: 0
mover.windowOffset: 0
mover.windowLength: 0
mover.position: 0
mover.setRecordSizeFlag: false
mover.setWindowFlag: false
mover.connect.addr_type: LOCAL
data.operation: Restore
data.state: Active
data.haltReason: N/A
data.connect.addr_type: TCP
data.connect.addr: 172.28.12.157
data.connect.port: 26199
data.bytesProcessed: 0
TOASTER2> ndmpd status
ndmpd ON.
Session: 96166
Active
version: 4
Operating on behalf of primary host.
tape.device: Not open
mover state: Idle
data state: Active
data operation: Backup
Session: 96167
Active
version: 4
Operating on behalf of primary host.
tape.device: Not open
mover state: Idle
data state: Active
data operation: Restore
TOASTER2> snap list -V vol_02_02
Volume vol_02_02
working...
%/used %/total date name
---------- ---------- ------------ --------
0% ( 0%) 0% ( 0%) Sep 04 10:23 snapshot_for_backup.6 (busy,backup[0],dump)
TOASTER2> options ndmp
ndmpd.access all
ndmpd.authtype challenge
ndmpd.connectlog.enabled off
ndmpd.data_port_range all
ndmpd.enable on
ndmpd.fh_node_retry_interval 250
ndmpd.ignore_ctime.enabled off
ndmpd.maxversion 4
ndmpd.offset_map.enable on
ndmpd.password_length 16
ndmpd.preferred_interface disable (value might be overwritten in takeover)
ndmpd.tcpnodelay.enable off
ndmpd.tcpwinsize 32768
TOASTER2> df -Vg vol_02_02
Filesystem total used avail capacity Mounted on
/vol/vol_02_02/ 46079GB 33751GB 12328GB 73% /vol/vol_02_02/
snap reserve 0GB 34GB 0GB ---% /vol/vol_02_02/..
TOASTER2> df -Vg vol_02_10
Filesystem total used avail capacity Mounted on
/vol/vol_02_10/ 12500GB 0GB 12499GB 0% /vol/vol_02_10/
/vol/vol_02_10/.snapshot 0GB 0GB 0GB ---% /vol/vol_02_10/.snapshot
Parity Error -
Try using the localhost, 127.0.0.1, instead ?
I hope this response has been helpful to you.
At your service,
Eugene E. Kashpureff, Sr.
Independent NetApp Consultant, K&H Research http://www.linkedin.com/in/eugenekashpureff
Senior NetApp Instructor, IT Learning Solutions http://sg.itls.asia/netapp
(P.S. I appreciate points for helpful or correct answers.)
Hi Eugene,
I retried the transfer (using the same command as the previous attempts) when I posted this question and it looks like it has been successful (data is still being transferred).
I have a few more of these to go so I'll try using localhost next time and see if that makes any difference to the outcome.
Interestingly, when this was successful, the first pass (mapping regular files) only took 7 hours (well below the "14 hours before timeout previously observed). So perhaps there was some sort of early termination of one side fo the NDMP copy thus resulting in the timeout.
The unsuccessful ndmpcopy attempts had logged the following messages "Error in data_listen_reply: Error: unknown error"
in the ndmpcopy log files, very early on in the transfer. Perhaps this is a good indication that the transfer is broken and will likely time out after 14 hours. The successful copy did not include that error.
Anyway, thats heaps for the suggestion, I'll try it next time I run an ndmpcopy