About bbjholcomb

bbjholcomb · ‎2015-04-03

I encrypt the password first: $password = ConvertTo-SecureString -string $encrypted Then just before connecting to each system put this: $cred = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList "$userid",$password Connect-NcController -Name $clname -Credential $cred -HTTPS –ErrorAction Stop (This is for cDOT but change it tConnect-NaController)

bbjholcomb · ‎2015-04-03

We tested this, within a few mintues of losing both switches the cluster went done, we tested this on 8.2.0, I can't be sure what would happen in newer version. The real question is what do you think the chances are of losing both switches? The MTBF on switches is very high. You can put the switches in two diifferent cabinets to lower this chances of losing both switches.

bbjholcomb · ‎2015-02-02

Did you look at RSYNC? You can specify access date or change date and it can delete it from the source one the transfer is complete. We use it a lot for various reasons, easy to script it from Linux. Not sure what NDMP can do when it comes limiting the data to certain dates of files.

bbjholcomb · ‎2015-02-02

Is this 7mode or cDOT? What is your MTU setting on both sides includng the network swich? Are you using enccruption on the of the network equipment? Did you check you snapmirror windowsize? What is the bandwdith and latency between the filers?

bbjholcomb · ‎2015-01-13

We are encountering SSH connection refused on our Red Hat servers, nothing consistent. Happens at different days and times, multiple Red Hat servers (about 70 of them), occurring on two different cDOT systems. It happens on different commands, vol show, vol snap show. We tried staggering the number of concurrent times this script is running, no help. We are running Red Hat 6.4 with SSH V2 5.3. We haven't had a chance to try a new version of SSH. We are running cDOT Ontap 8.2.1. We have never encountered this problem when we do SSH command on the cluster management LIF, this problem is occurring on the VSM management. We moved the VSM LIF management to a different node from the cluster management. I found that if I retry command multiple times while sleeping between each command it works, a couple of times up to 10 times before it works. I don't believe we are encountering 64 concurrent SSH sessions or 10 per second but I can't prove it. We are working with NetApp support, we also found a few restrictions. From NetApp: The Data ONTAP 8.2 release family supports OpenSSH client version 5.4p1 and OpenSSH server version 5.4p1. Only the SSH v2 protocol is supported; SSH v1 is not supported. Data ONTAP supports a maximum of 64 concurrent SSH sessions per node. If the cluster management LIF resides on the node, it shares this limit with the node management LIF. If the rate of in-coming connections is higher than 10 per second, the service is temporarily disabled for 60 seconds.

bbjholcomb · ‎2014-11-16

If you are expereniencing any NFS disconnects or some performance problem and you are running EXSi 5.x then this may help. VMWare made a change between 4.1 and 5.x, the default NFS.MaxQueueDepth was changed from 64 to 4294967295. We run our own benchmark and NetApp told us they ran some, changing NFS.MaxQueueDepth to 64 increased performance. Others are seeing some NFS disconnects. NetApp told us to run netstat -sp tcp and grep on zero window. “The NFS queue depth, was introduced in 5.0 so that SIOC would work with NFS. VMware (TAM and GSS) provided clear information regarding this during the post-mortem session with VMware and NetApp support engineers. Changing the NFS.MaxQueueDepth to 64 is, in fact, a workaround. Vaughn (NetApp) mentions this as well in his article above stating that “A fix has been released by NetApp engineering and for those unable to upgrade their storage controllers, VMware engineering has published a pair of workarounds” (SIOC or NFS.MaxQueueDepth to 64)” http://cormachogan.com/2013/02/08/heads-up-netapp-nfs-disconnects/

bbjholcomb · ‎2014-10-28

Recently we had this explaiined to us from NetAApp. When a write request comes it's cached in system memory, the system that requested the write operations is told is was completed, a copy of this write request is written to NVRAM then this is replicated to it's partner NVRAM. At some point the data is written to disk from the main memory. If a failure over takes place the partner is complete the write request. I hope this help.

bbjholcomb · ‎2014-10-24

It appears to be a little more confusiing for cDOT flow control, one KB/Best Practice tells us to turn it off for clustered ports only not the data ports (NFS/CIFS/ISCSI) go over. The only way I found to see if the change will have an impact is to look at the ifstat -a output, in 7mode the field is pause frame, in cDOT it's Xoff. In our case we see no receive pause frames, we do transmit pase frames, only a few for the clustered ports (1 for every 141M frames) for data ports (NFS/CIFS/ISCSI) (1 for 5.5M frames).

bbjholcomb · ‎2014-09-17

You might be able to cut the number of them, you will find different opinions about acceptable number of dropped packets. How can you tell if you are experiencing dropped packets? Packet trace on the client side, you should limit the amount of data you capture to make easier to read, then use WireShark to read the packet trace. NetApp does not record packet loss except for NIC's with Chelsio chipset. While you are in WireShark, go into expert mode, click on Analyze then exprtinfo. Some other things to look things to look put in the filer box: tcp.analysis.ack_lost_segment tcp.analysis.retransmission rpc.time > x (x is in seconds so .5 would be half a second, a long time If you have a controller with an NIC that has the Chelsio chipset: ifinfo -a | egrep "(bad headers|interface|Driver)" | grep -B2 "bad headers” We run this on about 120 NetApp controllers that have the Chelsio chipset, the numbers varied a lot. The worst case, we looked at had a total of 31B packets, the packet loss was one out of every 1500 packets. The best case was one out of 3B packets. Another way to tell if you are getting a fair amount of dropped packets look at the number of Oracle log writer errors. When we replaced the NIC on the NetApp controller with one that had a much bigger buffer the number of Oracle log writer errors dropped drastically, One option to help cut down the number of dropped packets, set net.ip4.tcp_sack=1 (selective acknowledgement) on Linux machines. This will tell whatever system your machine is sending packets to, just retransmit one packet of the chain instead of all of them. This helped cut the number of Oracle log writer errors. The X1107 (Chelsio) NIC has a 160K RX buffer, X1117 (Intel) has 512K buffer and the X1139/40 has 64K buffer size. I believe the X1117 card is the latest card. Before you consider replacing the card make sure you are running the correct version of OnTap that is required. Part # Description FAS Platform Data ONTAP FCoE Bus Supplier Transmit Buffer Notes X1005A-R5 NIC 1-Port Optical 10GbE PCI-X FAS3050, FAS60xx 7.2.3, 7.3.x, 8.x No PCI-X Chelsio X1008A-R5 NIC 2-Port Optical 10GbE PCIe FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series 7.2.3, 7.3.x, 8.x No PCIe Gen1, 8 lanes Chelsio X1106A-R6 NIC 1-Port Optical 10Gbe PCIe FAS2050 only 7.3.2 No PCIe Gen1, 8 lanes Chelsio X1107A-R6 NIC 2-Port Bare Cage SFP+ 10GbE PCIe FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series 7.3.2, 8.x No PCIe Gen1, 8 lanes Chelsio 160K Same throughput as X1139 (Gen 1 8 Lanes = Gen 2 4 Lanes) X1117-R6 NICII 2-Port Bare Cage SFP+ 10GbE PCIe FAS32xx, FAS62xx 8.0.4+ No PCIe Gen2, 8 lanes Intel 256K/512K Cannot use on AP1/AP2 X1139A/40-R6 ADPT 2-Port Unified Target 10GbE SFP+ PCIe FAS3040, FAS3070, FAS31xx, FAS32xx, FAS60xx, SA300/600, V-Series 7.3.2, 8.x Yes* PCIe Gen2, 4 lanes QLogic 16K/64K Gen 1 CNA Card N/A On-board 10GbE ports FAS2240, FAS32xx, FAS62xx 8.x No PCIe Gen2, 8 lanes Intel We struggled with this for a while, hopefully this will someone that is seeing this problem.

bbjholcomb · ‎2014-08-08

It appears this happens anytime there is an odd number of shelves in a stack. If we have 2 or 4 shelves it's correct, if we have 3 or 7 the middle shelf will have every other disk on a different primary path.

bbjholcomb · ‎2014-08-07

Did you look at Hardware Universe? http://hwu.netapp.com/Home/Index FAS8040: 1 Networking PCIe X1143A-R6 [2] 111-00910 2p 16Gb/10Gb UTA2 Cu/Op View 8.2.1rc2 4 3, 4, 1, 2 2 Networking PCIe X1117A-R6 [3] 111-01232, 111-00754 2p 10GbE NIC Cu/Op View 8.2.1rc2 4 3, 4, 1, 2 3 Networking PCIe X1120A-R6 111-01688 2p 10Gb NIC Cu View 8.2.1rc2 4 3, 4, 1, 2 4 Block Access PCIe X1143A-R6 [2] 111-00910 2p 16Gb/10Gb UTA2 Cu/Op View 8.2.1rc2 4 3, 4, 1, 2 5 Block Access PCIe X1132A-R6 [4] 111-00481 4p 8Gb FC Op View 8.2.1rc2 4 3, 4, 1, 2 6 Performance Acceleration PCIe X1973A-R6 [5] 111-00902 Flash Cache 2 (512GB) View 8.2.1rc2 4 3, 4, 1, 2 7 Performance Acceleration PCIe X1974A-R6 [5] 111-00903 Flash Cache 2 (1.0TB) View 8.2.1rc2 2 3, 4, 1, 2 8 Performance Acceleration PCIe X1975A-R6 [5] 111-00904 Flash Cache 2 (2.0TB) View 8.2.1rc2 1 3, 4, 1, 2

bbjholcomb · ‎2014-08-06

We have four different cDOT systems that we noticed that all the drives primary path is all on one connection except for the 3rd disk shelf (node run local -command storage show disk -p). For example, shelf 10 primary path 11a.10.x port B, shelf 11 the primary path 11a.11.x Port B, shelf 12 primary path 1a.12.0 port B, 11a.12.12.1 Port B, 1a.12.2 port A. All the drives in the stack are assigned to ONE controller. We have different stacks and different type of shelves (SAS vs. BSAS), the same problem. Config Advisor is clean. We are running 8.2P5.

bbjholcomb · ‎2014-06-18

If this for cDOT, the initial cert. is only one year, Now site has instructions how to create a new one.

bbjholcomb · ‎2014-06-18

What NIC card do you have in your system(s) (use sysconfig -ac to get the model number and the card chipset)? What version of Ontap are you running? You should check the card slot the NIC is in to make sure you are getting maximum out of the PCIe slot. Some controllers have PCIe x2 and x4 slots for example. Are you running Jumbo frames from end to end? The chipset is very important, I believe NetApp only shows drop packets for Chelsio, nothing for Intel or QLogic. Which NIC card you have is very important. We experienced drop packets recently, it took a while to get the information, we have over 100 7mode controllers (NFS only) most had a fair amount had drop packets. Keep in mind dropping packets on Ethernet connections is common, the amount compared to total packets, if you are you dropping 1 out of 50 million packets, most people consider that reasonable. If you do a packet capture on the client side (limit the amount of data), use WireShark, go into expert mode, click on Analyze then exprtinfo. If a NFS client is sending or receiving a fair amount of data and a packet is dropped it can cause the filer to retransmit all of the packets if you set net.ip4.tcp_sack=1 (selective acknowledgement) on the NFS client, then only the dropped packet will be retransmitted. This made a big differences for us on Oracle logwr messages. I would also look at the NFS mount options for the client machine, you cut down the number of setattr and getattr. I plan to write up how we gathered our information on dropped packets and the differences in the 10G NIC cards, once I do I will post on this community web page. I hope this useful.

bbjholcomb · ‎2014-06-16

I know there is a maximum number of export policies, I haven't been able to find the maximum number rules.

bbjholcomb · ‎2014-04-03

I haven't done a benchmark, but I think QSM will be faster, if you use rsync the data has to read into memory of machine then sent to another mount point. I believe QSM is basically a file copy.

bbjholcomb · ‎2014-04-03

I don't believe there is way do global throttle at any level in cDOT, the way only I know of is by volume. In 7mode you could set it on each controller.

bbjholcomb · ‎2014-03-19

/etc/log/auditlog should have the information you are looking for.

bbjholcomb · ‎2014-03-18

What information do you want in the report? More information requires using an API to get it. Maybe include the header of the report you are generating now might help. LAG time? Report when it hasn't completed within 24 hour (or what you want)? How much time it took? How much data was transferred?

bbjholcomb · ‎2014-03-18

There are some Powershell programs that are available or you should be able to OnCommand/DFM. If you have a Linux machine with rsh/ssh access to the filers it's very easy to write a simple bash program to grab this information and send out an email every day when it finds an exception such as the lag time.

bbjholcomb · ‎2014-03-12

I don’t know the command but you should be able to user DFM cli command to pull this data for a particular time frame. You also use statit, disk busy percentage and latency.

bbjholcomb · ‎2014-03-12

Under Manage Performance click on Physical (instead of Logical) and you will see the aggregates.

bbjholcomb · ‎2014-03-05

What OS are you using? We are seeing logwriter errors on controllers with just SCSI drives. We are working with NetApp now. We may have a solution if you are running RedHat.

bbjholcomb · ‎2014-01-23

I believe if you have 3 and 6 gig connection on one connection they will slow down to 3 gig. I believe NetApp best practice is not mix SATA and SCSI on the same controller, increases the likelihood of more back to back CPs.

bbjholcomb · ‎2014-01-21

We haven't had any problem removing the default snmp string.

Re: connecting to multiple filers using powershell

Re: Cluster Mode interconnect switches

Re: Copy only recently accessed data

Re: Snapmirror replication caused Network packet loss on Source Filer

SSH reset or multiple sessions

NFS disconnect, EXSi 5.x performance problem and setting

Re: Quick NVRAM questions

Re: Flow Control

Dropped packets and buffer size of NICs

Re: cDOT strange disk assignment on 3rd disk shelf of a stack

Re: PAM-II Module slot assignment

cDOT strange disk assignment on 3rd disk shelf of a stack

Re: RSA Setup Error "server certificate invalid"

Re: How to identify NFSv4 drop package count by autosupportdata ?

How many export rules can we create within one export policy?

Re: Moving a qtree: snapmirror vs ndmp vs rsync,etc

Re: Is there anyway to throttle a snapmirror at vserver level?

Re: Is there a log or snmp to track user activity on the netapp filer?

Re: Snapmirror Daily Report

Re: Snapmirror Daily Report

Re: Report All disk usage

Re: Report All disk usage

Re: Latency experienced when copying large files to volumes (NFS)

Re: FAS2240-2 4243 and 2246 on same loops?

Re: Disabling the default SNMP community string