About aborzenkov

aborzenkov · ‎2011-07-29

Resiliency? Do not put all eggs in one basket? ☺

aborzenkov · ‎2011-07-25

You should open case with NetApp and request replacement of PSU2.

aborzenkov · ‎2011-07-25

Yes, you should configure NFS and CIFS exports separately.

aborzenkov · ‎2011-07-24

You could just do “snapmirror resync”. It would pick up the latest common snapshot (basically, at the point where you stopped snapmirror) and continue from there.

aborzenkov · ‎2011-07-24

performing a snapmirror status on source does not show the qtree relationship This simply cannot be the result of editing snapmirror.conf, so you obviously did something else. Could you paste exact output of "snapmirror status", snapmirror quiesce", and "snapmirror break" commands on destination?

aborzenkov · ‎2011-07-20

Why cannot you just add them back to snapmirror.conf? Removing entries from snapmirror.conf does not cause mirror relationship to disappear, unless you did something else.

aborzenkov · ‎2011-07-18

Where have you got it from? Not found on fieldportal or NetApp library.

aborzenkov · ‎2011-07-17

no

aborzenkov · ‎2011-07-16

It cannot be explained by round trip delays alone. Here is tcpdump trace: 665 9.601275 192.168.2.130 192.168.2.57 TCP ndmp > 50031 [PSH, ACK] Seq=3213 Ack=1 Win=8760 Len=4 TSV=824507566 TSER=2117681160 666 9.601290 192.168.2.57 192.168.2.130 TCP 50031 > ndmp [ACK] Seq=1 Ack=3217 Win=2224 Len=0 TSV=2118197846 TSER=824507566 667 9.601410 192.168.2.130 192.168.2.57 TCP ndmp > 50031 [PSH, ACK] Seq=3217 Ack=1 Win=8760 Len=40 TSV=824507566 TSER=2117681160 668 9.601415 192.168.2.57 192.168.2.130 TCP 50031 > ndmp [ACK] Seq=1 Ack=3257 Win=2224 Len=0 TSV=2118197846 TSER=824507566 669 9.601444 192.168.2.57 192.168.2.131 NDMP MOVER_READ Request 670 9.601685 192.168.2.131 192.168.2.57 TCP [TCP segment of a reassembled PDU] 671 9.601691 192.168.2.57 192.168.2.131 TCP 52555 > ndmp [ACK] Seq=3285 Ack=2457 Win=1460 Len=0 TSV=2118197846 TSER=846710219 672 9.601785 192.168.2.131 192.168.2.57 NDMP MOVER_READ Reply 673 9.601787 192.168.2.57 192.168.2.131 TCP 52555 > ndmp [ACK] Seq=3285 Ack=2485 Win=1460 Len=0 TSV=2118197846 TSER=846710219 674 9.701244 192.168.2.130 192.168.2.57 TCP ndmp > 50031 [PSH, ACK] Seq=3257 Ack=1 Win=8760 Len=4 TSV=824507576 TSER=2117681160 Packets numbers 665+667 are for NDMP_NOTIFY_DATA_READ (for some reasons request is split by NetApp between two packets). As can be seen NetWorker immediately issues data mover request which is completed promptly (packets 669 - 673). After this we have 100ms pause before NetApp issues next NOTIFY_READ (674) ... Unfortunately NDMP logs on NetApp do not offer sub-second granularity making it impossible to dig into it. I also have seen description of exactly the same problem using Commvault with 3-way backup (using NRS which is analog of NetWorker DSA) - extremly slow restore without any obvious bottleneck anyhere. This does look like NetApp bug ...

aborzenkov · ‎2011-07-16

After I had slept on it a bit I relalized that all documented cases of slow restore were using 3-way backup (with either another NetApp or NetWorker DSA as tape server). So I checked what was going on in 1TB restore session running right now. It appears there is indeed quite a bit of round trip overhead ... caused by NetApp. From NDMP server log: Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Message NDMP_NOTIFY_DATA_READ sent Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Message Header: Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Sequence 466373 Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Timestamp 1310805146 Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Msgtype 0 Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Method 1285 Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: ReplySequence 0 Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Error NDMP_NO_ERR Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Offset: 610934457344 Jul 16 12:32:26 GMT+04:00 [ndmpd:109]: Length: 1310720 From NDMP mover log: Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: NDMP message type: NDMP_MOVER_READ Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: NDMP message replysequence: 470892 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Message Header: Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Sequence 0 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Timestamp 0 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Msgtype 1 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Method 2566 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: ReplySequence 470892 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Error NDMP_NO_ERR Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Error code: NDMP_NO_ERR Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Offset: 616921826304 Jul 16 12:43:10 GMT+04:00 [ndmpd:37]: Length: 1310720 So it appears NetApp is issuing 10*128KB synchronous reads. Each read results in NDMP_NOTIFY_DATA_READ request from NDMP server to DMA (NetWorker) and NDMP_MOVER_READ request from DMA (NetWorker) to NDMP mover. So it is indeed quite a bit of round trip ... there are appr. 10 such requests per second which accounts for ~10MB/s restore performance. Is it possible to change NetApp read size? Personally I'd rather issue single NDMP_NOTIFY_DATA_READ request for the whole dump size ... or at least once per file. I have to test what NetApp does for direct tape restore.

aborzenkov · ‎2011-07-15

It is not busy. As mentioned, this is not limited to single installation only. While I agree that recovery is slower, I would not expect it to be in order of magnitude slower.

aborzenkov · ‎2011-07-15

I have observed it on multiple installations, now I have chance to capture it. NetWorker 7.5 and 7.6, Data ONTAP 7.2 and 7.3, directly connected tape or 3-way backup via DSM, tape drive or backup on disk - in all cases I see exactly the same problem. Backup speed is quite good, it corrresponds to backend (i.e. over 100MB/s for tape drive or 1Gb/s network interface for 3-way backup). But recovery is miserable - it is approximately 8MB/s! In all cases recovery was done via file base recover (nwrecover - browsing files, selecting them, starting recover, or direct call to recover). As mentioned this happens in very different combinations of versions and backjup technology, so it appears to be inherent limitation of either NetApp or NetWorker. I attach screenshot (sorry, being connected to customer via read-only VNC session so can only do it) of recover session. Observe the throughput This is not high file count environment. It is NFS database with relatively low number of files. Total size is slightly above 1TB. And this is really reading data, after directory phase was already done. Any hints, pointers and ideas appreciated. I probably has to (recommend to) try save set recovery next time. But it is frustrating ...

aborzenkov · ‎2011-07-15

Does it support per-file recovery?

aborzenkov · ‎2011-07-14

Oh, and the first thing that comes to mind is to check fragmentation level ☺

aborzenkov · ‎2011-07-14

I had several installations with NetWorker and backup speed was OK. Restore was different matter though ☹ Incidentally, all installations were using 3-way backup (via NetWorker DSA) and not direct backup to tape. I constantly observed full wire speed (120MB/s for 1Gb/s interface). But these all were database installations with relatively small number of large files, 10-20GB each.

aborzenkov · ‎2011-07-12

Your switch configuration must of course match your device configuration. So if NetApp is configured to use tagged VLANs you have to configure the same VLANs as tagged on a switch port.

aborzenkov · ‎2011-07-12

You can’t indeed (run tagged and non-tagged traffic on the same port) in Data ONTAP 7.x. There is no problem to do it from the switch side (at least, those switches I am aware of ☺ )

aborzenkov · ‎2011-07-11

This would surprise me actually. Can you use 64 bit MMC to administer 32 bit Windows host?

aborzenkov · ‎2011-07-11

I do not think there will be any measureable difference.

aborzenkov · ‎2011-07-10

The first step dump does is to build full file list; with large number of files it can be quite time consuming. You can test the theoretical dump speed by calling it directly and dumping to NULL: https://kb.netapp.com/support/index?page=content&id=1011894 Watch how much time each of steps takes. You should also be able to see dump logs in /etc/log/backup.

aborzenkov · ‎2011-07-09

The simplest way is to use MMC from Windows client. Connect to filer and edit share permissions.

aborzenkov · ‎2011-07-09

Does compression need PVR? I thought it was released in 8.0.1.

aborzenkov · ‎2011-07-09

You just create snapvault relationship as usual. SnapVault between 32 and 64 bit aggregates is fully supported. I do not see any pros and cons specifically related to large aggregates. Of course you can enable compression on SV secondary if you have correspondent license.

aborzenkov · ‎2011-07-03

Single file snap restore performance is miserable.Recently I had to test it on FAS3140 with 2 x 112 disks (SyncMirror). You get as much as 1GB/min (not measured exactly, but just looking at restoed size so far). It is really frustrating.

aborzenkov · ‎2011-07-02

You have a single mapped LUN which is mapped to igroup WB_ENG_ESXi_01 which includes single WWPN 50:01:43:80:09:ac:d4:80. This WWPN does not appear in any defined zone according to output you provided.

Re: If a filer has identical disks, why use more than 1 aggregate?

Re: Fault reported on disk storage shelf attached to channel 0a. Please check fans, power, and tempe...

Re: configure both NFS and CIFS access

Re: vol move with a snapvault destination

snapmirror.conf edit during idle with restart checkpoint

Re: snapmirror.conf edit during idle with restart checkpoint

Re: NetApp disks in a Solaris zfs zpool

Re: Extremely slow NDMP recover with NetWorker

Re: Extremely slow NDMP recover with NetWorker

Re: Extremely slow NDMP recover with NetWorker

Re: Extremely slow NDMP recover with NetWorker

Extremely slow NDMP recover with NetWorker

Re: NDMP Backup 2MB/min

Re: NDMP Backup 2MB/min

Re: NDMP Backup 2MB/min

Re: NetApp 2020 x2 filers - network ?

Re: NetApp 2020 x2 filers - network ?

Re: cifs permission

Re: Snapshot Questions

Re: NDMP Backup 2MB/min

Re: cifs permission

Re: Snapvault between 32 bit aggregate to 64 bit aggregate

Re: Snapvault between 32 bit aggregate to 64 bit aggregate

Single file snap restore performance

Re: Initiators not logging in to FAS2050