I encrypt the password first: $password = ConvertTo-SecureString -string $encrypted Then just before connecting to each system put this: $cred = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList "$userid",$password Connect-NcController -Name $clname -Credential $cred -HTTPS –ErrorAction Stop (This is for cDOT but change it tConnect-NaController)
... View more
We tested this, within a few mintues of losing both switches the cluster went done, we tested this on 8.2.0, I can't be sure what would happen in newer version. The real question is what do you think the chances are of losing both switches? The MTBF on switches is very high. You can put the switches in two diifferent cabinets to lower this chances of losing both switches.
... View more
Did you look at RSYNC? You can specify access date or change date and it can delete it from the source one the transfer is complete. We use it a lot for various reasons, easy to script it from Linux. Not sure what NDMP can do when it comes limiting the data to certain dates of files.
... View more
Is this 7mode or cDOT? What is your MTU setting on both sides includng the network swich? Are you using enccruption on the of the network equipment? Did you check you snapmirror windowsize? What is the bandwdith and latency between the filers?
... View more
We are encountering SSH connection refused on our Red Hat servers, nothing consistent. Happens at different days and times, multiple Red Hat servers (about 70 of them), occurring on two different cDOT systems. It happens on different commands, vol show, vol snap show. We tried staggering the number of concurrent times this script is running, no help. We are running Red Hat 6.4 with SSH V2 5.3. We haven't had a chance to try a new version of SSH. We are running cDOT Ontap 8.2.1. We have never encountered this problem when we do SSH command on the cluster management LIF, this problem is occurring on the VSM management. We moved the VSM LIF management to a different node from the cluster management. I found that if I retry command multiple times while sleeping between each command it works, a couple of times up to 10 times before it works. I don't believe we are encountering 64 concurrent SSH sessions or 10 per second but I can't prove it. We are working with NetApp support, we also found a few restrictions. From NetApp: The Data ONTAP 8.2 release family supports OpenSSH client version 5.4p1 and OpenSSH server version 5.4p1. Only the SSH v2 protocol is supported; SSH v1 is not supported. Data ONTAP supports a maximum of 64 concurrent SSH sessions per node. If the cluster management LIF resides on the node, it shares this limit with the node management LIF. If the rate of in-coming connections is higher than 10 per second, the service is temporarily disabled for 60 seconds.
... View more
If you are expereniencing any NFS disconnects or some performance problem and you are running EXSi 5.x then this may help. VMWare made a change between 4.1 and 5.x, the default NFS.MaxQueueDepth was changed from 64 to 4294967295. We run our own benchmark and NetApp told us they ran some, changing NFS.MaxQueueDepth to 64 increased performance. Others are seeing some NFS disconnects. NetApp told us to run netstat -sp tcp and grep on zero window. “The NFS queue depth, was introduced in 5.0 so that SIOC would work with NFS. VMware (TAM and GSS) provided clear information regarding this during the post-mortem session with VMware and NetApp support engineers. Changing the NFS.MaxQueueDepth to 64 is, in fact, a workaround. Vaughn (NetApp) mentions this as well in his article above stating that “A fix has been released by NetApp engineering and for those unable to upgrade their storage controllers, VMware engineering has published a pair of workarounds” (SIOC or NFS.MaxQueueDepth to 64)” http://cormachogan.com/2013/02/08/heads-up-netapp-nfs-disconnects/
... View more
Recently we had this explaiined to us from NetAApp. When a write request comes it's cached in system memory, the system that requested the write operations is told is was completed, a copy of this write request is written to NVRAM then this is replicated to it's partner NVRAM. At some point the data is written to disk from the main memory. If a failure over takes place the partner is complete the write request. I hope this help.
... View more
It appears to be a little more confusiing for cDOT flow control, one KB/Best Practice tells us to turn it off for clustered ports only not the data ports (NFS/CIFS/ISCSI) go over. The only way I found to see if the change will have an impact is to look at the ifstat -a output, in 7mode the field is pause frame, in cDOT it's Xoff. In our case we see no receive pause frames, we do transmit pase frames, only a few for the clustered ports (1 for every 141M frames) for data ports (NFS/CIFS/ISCSI) (1 for 5.5M frames).
... View more
You might be able to cut the number of them, you will find different opinions about acceptable number of dropped packets.
How can you tell if you are experiencing dropped packets? Packet trace on the client side, you should limit the amount of data you capture to make easier to read, then use WireShark to read the packet trace. NetApp does not record packet loss except for NIC's with Chelsio chipset. While you are in WireShark, go into expert mode, click on Analyze then exprtinfo. Some other things to look things to look put in the filer box:
tcp.analysis.ack_lost_segment
tcp.analysis.retransmission
rpc.time > x (x is in seconds so .5 would be half a second, a long time
If you have a controller with an NIC that has the Chelsio chipset:
ifinfo -a | egrep "(bad headers|interface|Driver)" | grep -B2 "bad headers”
We run this on about 120 NetApp controllers that have the Chelsio chipset, the numbers varied a lot. The worst case, we looked at had a total of 31B packets, the packet loss was one out of every 1500 packets. The best case was one out of 3B packets.
Another way to tell if you are getting a fair amount of dropped packets look at the number of Oracle log writer errors. When we replaced the NIC on the NetApp controller with one that had a much bigger buffer the number of Oracle log writer errors dropped drastically,
One option to help cut down the number of dropped packets, set net.ip4.tcp_sack=1 (selective acknowledgement) on Linux machines. This will tell whatever system your machine is sending packets to, just retransmit one packet of the chain instead of all of them. This helped cut the number of Oracle log writer errors.
The X1107 (Chelsio) NIC has a 160K RX buffer, X1117 (Intel) has 512K buffer and the X1139/40 has 64K buffer size. I believe the X1117 card is the latest card. Before you consider replacing the card make sure you are running the correct version of OnTap that is required.
Part #
Description
FAS Platform
Data ONTAP
FCoE
Bus
Supplier
Transmit Buffer
Notes
X1005A-R5
NIC 1-Port Optical 10GbE PCI-X
FAS3050, FAS60xx
7.2.3, 7.3.x, 8.x
No
PCI-X
Chelsio
X1008A-R5
NIC 2-Port Optical 10GbE PCIe
FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series
7.2.3, 7.3.x, 8.x
No
PCIe Gen1, 8 lanes
Chelsio
X1106A-R6
NIC 1-Port Optical 10Gbe PCIe
FAS2050 only
7.3.2
No
PCIe Gen1, 8 lanes
Chelsio
X1107A-R6
NIC 2-Port Bare Cage SFP+ 10GbE PCIe
FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series
7.3.2, 8.x
No
PCIe Gen1, 8 lanes
Chelsio
160K
Same throughput as X1139
(Gen 1 8 Lanes = Gen 2 4 Lanes)
X1117-R6
NICII 2-Port Bare Cage SFP+ 10GbE PCIe
FAS32xx, FAS62xx
8.0.4+
No
PCIe Gen2, 8 lanes
Intel
256K/512K
Cannot use on AP1/AP2
X1139A/40-R6
ADPT 2-Port Unified Target 10GbE SFP+ PCIe
FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series
7.3.2, 8.x
Yes*
PCIe Gen2, 4 lanes
QLogic
16K/64K
Gen 1 CNA Card
N/A
On-board 10GbE ports
FAS2240, FAS32xx, FAS62xx
8.x
No
PCIe Gen2, 8 lanes
Intel
We struggled with this for a while, hopefully this will someone that is seeing this problem.
... View more
It appears this happens anytime there is an odd number of shelves in a stack. If we have 2 or 4 shelves it's correct, if we have 3 or 7 the middle shelf will have every other disk on a different primary path.
... View more
We have four different cDOT systems that we noticed that all the drives primary path is all on one connection except for the 3rd disk shelf (node run local -command storage show disk -p). For example, shelf 10 primary path 11a.10.x port B, shelf 11 the primary path 11a.11.x Port B, shelf 12 primary path 1a.12.0 port B, 11a.12.12.1 Port B, 1a.12.2 port A. All the drives in the stack are assigned to ONE controller. We have different stacks and different type of shelves (SAS vs. BSAS), the same problem. Config Advisor is clean. We are running 8.2P5.
... View more
What NIC card do you have in your system(s) (use sysconfig -ac to get the model number and the card chipset)? What version of Ontap are you running? You should check the card slot the NIC is in to make sure you are getting maximum out of the PCIe slot. Some controllers have PCIe x2 and x4 slots for example. Are you running Jumbo frames from end to end? The chipset is very important, I believe NetApp only shows drop packets for Chelsio, nothing for Intel or QLogic. Which NIC card you have is very important. We experienced drop packets recently, it took a while to get the information, we have over 100 7mode controllers (NFS only) most had a fair amount had drop packets. Keep in mind dropping packets on Ethernet connections is common, the amount compared to total packets, if you are you dropping 1 out of 50 million packets, most people consider that reasonable. If you do a packet capture on the client side (limit the amount of data), use WireShark, go into expert mode, click on Analyze then exprtinfo. If a NFS client is sending or receiving a fair amount of data and a packet is dropped it can cause the filer to retransmit all of the packets if you set net.ip4.tcp_sack=1 (selective acknowledgement) on the NFS client, then only the dropped packet will be retransmitted. This made a big differences for us on Oracle logwr messages. I would also look at the NFS mount options for the client machine, you cut down the number of setattr and getattr. I plan to write up how we gathered our information on dropped packets and the differences in the 10G NIC cards, once I do I will post on this community web page. I hope this useful.
... View more
I haven't done a benchmark, but I think QSM will be faster, if you use rsync the data has to read into memory of machine then sent to another mount point. I believe QSM is basically a file copy.
... View more
I don't believe there is way do global throttle at any level in cDOT, the way only I know of is by volume. In 7mode you could set it on each controller.
... View more
What information do you want in the report? More information requires using an API to get it. Maybe include the header of the report you are generating now might help. LAG time? Report when it hasn't completed within 24 hour (or what you want)? How much time it took? How much data was transferred?
... View more
There are some Powershell programs that are available or you should be able to OnCommand/DFM. If you have a Linux machine with rsh/ssh access to the filers it's very easy to write a simple bash program to grab this information and send out an email every day when it finds an exception such as the lag time.
... View more
I don’t know the command but you should be able to user DFM cli command to pull this data for a particular time frame. You also use statit, disk busy percentage and latency.
... View more
What OS are you using? We are seeing logwriter errors on controllers with just SCSI drives. We are working with NetApp now. We may have a solution if you are running RedHat.
... View more
I believe if you have 3 and 6 gig connection on one connection they will slow down to 3 gig. I believe NetApp best practice is not mix SATA and SCSI on the same controller, increases the likelihood of more back to back CPs.
... View more