About shaunjurr

shaunjurr · ‎2011-06-09

'snapvault status -c' might give you a better overview of per session settings.

shaunjurr · ‎2011-06-09

Hi, If you "thin-provision" then you basically just have one place to watch: the aggregate filling. Basically, the rule has been to either keep all of the volumes in an aggregate under 90% or the aggregate itself under 90% full. NetApp will often recommend 80%, but at 90% you start to see problems.

shaunjurr · ‎2011-06-09

As already mentioned, KISS ... The savings are minimal and the potential for "foot-shooting" probably much greater than any advantages. Losing the root volume is just going to ruin your day.

shaunjurr · ‎2011-06-09

Hi, From personal experience and benchmarking in this area in the past, I would not recommend trying to run with aggregates over 90% full where you need optimal performance. I run a lot of CIFS shares on filesystems that are a little too full, but you will experience a significant performance degradation when you cross the 90% barrier. For things like video streaming, such an occurrence could be very disruptive. If the older data is not so active and premium performance isn't a requirement, you might get some saving out of deduplication, but I'm not sure how much, if any savings, can be achieved with video formats... Hope this helps...

shaunjurr · ‎2011-06-08

Hi, I guess if you have deep pockets, you can get yourself DFM/Operation Manager. The Performance Advisor can do at least some minor diagnosis and you can get graphs of physical interfaces as well. There is the possibility of creating custom thresholds here too. The biggest problem is the brokenness of the java "NetApp Management Console" and the relative unreliability of data collection/display. You could always collect a good deal of the statistics yourself via snmp and make some simple graphs with mrtg or the like. I don't remember 100%, but I think there are OIDs for each target interface as well.

shaunjurr · ‎2011-06-08

ok... Now, if you take this out of context, then yeah, most of what you say is correct. The point is, if you have the total sudden loss of one site (controller, disks, interconnect), the situation is, for the suriviving controller, indiscernible from a split-brain situation. The surviving controller does not have, nor can it get, any information about whether its partner is still running. Whether it is running or dead is not relevant here because the surviving controller makes the same decisions as it would if the partner were still running and all interconnects were severed. Since the goal of the UPS setup was to transition to a working failover, the advice was to try to get one element, either the controller or the disks, to fail first or else a "split-brain" (software) procedure would be followed by the surviving controller and no failover would occur. You have to have a lot of spare time to dig into a 17 day old post to "nit-pick" . "Never" is, by the way, a very long time... 😜

shaunjurr · ‎2011-06-07

Hi, The requirements before software disk ownership were even wierder. The documentation for fabric-attached metroclusters has always been very incomplete. I made a big stink once and got 3548 cleaned up a lot because the information there conflicted with other setup guides. The confusion isn't nice when you are stuck in the middle of an installation either. But, I digress... Even with normal and stretch metroclusters, you have a sort of "backwards" hookup of your normal 0a and 0c ports to the B modules. I'm no metrocluster expert but we have a half dozen of them in varying configurations. The inflexiblity (and frankly immaturity) of the disk fabric is still one of the elements that irritates me the most.... besides the wild experiment with MPO interconnects for stretch clusters... Anyway, I hope most of the unclear parts of metrocluster setups are clear in 3548, but if not, feel free to ask or open a case, or both... 😉

shaunjurr · ‎2011-06-07

NetApp has provided such a script for many years in the Utility Tools section on NOW: HA Configuration Checker (cf-config-check.cgi)

shaunjurr · ‎2011-06-07

I think you better benchmark a single LAMP server with NFS storage for your web pages against your VMWare idea. I get the feeling that you would have been better off saving the vmware licenses and getting a bigger filer. Why would you want to run linux apache and mysql servers on VMWare?

shaunjurr · ‎2011-06-07

Hi, Yes, NDMP is not going to be your best bet on file systems like that. NDMP, even if it has been optimized the last few years, is really still a simple filesystem dump. Fishing first through all 17M files and directories (which is how NDMP starts) takes a small eternity. Then the transfer speed is not that bad, but you probably are digging through a lot more data than you need. You have a few options, but nothing is free: 1) Split up the volume into smaller volumes and change how things are mounted on your serves so that things look the same. Then you can backup more quickly with multiple systems at the same time... up to a point, at least 2) Use a NFS/CIFS client based backup. This may be no improvement at all, however. 3) Use a snapshot "aware" backup like TSM or NetBackup. TSM can actually do CIFS (and NFS) based backup from just an list of file system changes that it gets from the filer... sort of an abreviated NDMP backup. 4) Archiving solutions which remove older unused data and archive it on separate systems. Good Luck.

shaunjurr · ‎2011-06-06

You probably should dig into TR-3431... http://media.netapp.com/documents/tr-3431.pdf

shaunjurr · ‎2011-06-06

That's TR-3864... I found it by accident... the search interface is really terribly broken...

shaunjurr · ‎2011-06-06

Perhaps you could include some information as to the filer model and ONTap version you are using. Performance problems probably should include some sort of sysstat or perfstat information or it is going to be hard help.

shaunjurr · ‎2011-06-06

Hi, FlexShare isn't difficult to establish. Basically, you just set it up once and add the priority to new volumes after creation. There will always be a balance between time and money. Here again, you just need to prioritize the potential improvements: what gets you the most improvement for the least work? Setting up 'priority' is going to be easier than redesigning your datastore concepts, for example. The savings of the latter (through de-dupe, snapmirror replication, backup) might only be significant on a larger scale anyway. The success of priority will, of course, be dependent on how well you can prioritize certain volumes over others and if you have VMWare datastores that have basically every sort of I/O need, then you won't be able to solve some of your problems with FlexShare. No need to dispair. Part of storage maintenance is trying to compensate for limitations in technology and budgets. Even VMWare has come to the conclusion that they need more fine-grained resource allocations/limitations to be able to prioritize resources according to economic and technological realities. These will be realities no matter how big your next NetApp is.

shaunjurr · ‎2011-06-06

Hi, There is a "TR" for this but it is incredibly hard to find, if it is public at all. One can setup a least privileged user for SD (the SD user), but it is a real PITA on the CLI because of command line length limits. DFM lets you do it a bit nicer. Perhaps one of the NetApp guys could get the TR published for us mere mortals, hehe... I just have a paper copy... and it's on my desk at work... RBAC is a bit of a pain on NetApp still... Good Luck

shaunjurr · ‎2011-06-06

Hi, My point on 1) was to say that it might just be easier to run multple SQL instances on one host. It is a type of virtualization as well. Using a simple MSCS cluster (no extra cost on the windows Enterprise server images, afaik) would also give you some flexibility as far as errors and manual load balancing. Using mountpoint disks, you could run up to 21(26 drive letters - A, B,C and quorum disk) instances per cluster... depending on a few things like where you install binaries and msdtc. Most SQL isn't terribly CPU intensive, so it is a matter of getting sufficient memory in the systems. 2). Don't worry about fractional reserve. Just turn off space reservation and volume guarantees and monitor aggregate free space ... and volume free space for snapshot growth... you use autosize, which is recommended. This way you can overprovision and not have to worry about increasing file system sizes so often. Just don't go crazy or one day everything might fill before you can buy more disk. 3) Because you can prioritize I/O per volume on NetApp, you should probably try to create a datastore structure that differentiates between systems that require high I/O (i.e. are paying for it) and those which can muck things up if they use too much I/O. You can do this with any volume so you can down-prioritize test volumes, etc. See the TR on FlexShare and read the 'priority' manpage. 4) One of the great pains of the VMWare world is backup. With what I wrote in 3), you don't want your backup-exec runs from some lesser important VM ruining your I/O for your SQL instances, so don't put them on the same datastore and don't give these datastores (read: flexvols) the same priority. Things like backup and server virus-scanning can really kill your I/O in a virtualized environment. Segregating I/O priorities can limit the damage done by these lower priority jobs. You might want to have just a few datastores for the C: drives, for example, because they could de-dupe with huge savings while not hurting application I/O which could be on different "drives" on different datastores, depending on SLA, etc.. Make sure your pagefiles and VMWare swaps are segregated so you don't kill yourself with block deltas for snapmirror/snap reserve... etc, etc, etc. It makes things a bit more complicated, but it will save resources and money in the long run. I hope most of this is clear. You can make your 2020 do a better job with a little more detailed design on how its resources are used.

shaunjurr · ‎2011-06-06

How about letting us see the error message. IIRC, there's already a SD 6.3.1... perhaps already a 6.3.2... a quick view of the fixes might get you your answers quicker.

shaunjurr · ‎2011-06-06

i, We probably need to split up the problem into a few more manageable chunks. 1. SQL instances per server: VMWare is one way to better utilize server hardware, but running multiple SQL instances per server is another. If you use mountpoint disks and perhaps even cluster your setup, then you have a better chance of getting better server utilization without having to press everything into vmware. This way you also have SMSQL already there and your backup problems are already solved. Disk access times will also probably be easier to tune. 2. Space usage should basically not be a problem. If you are using thin provisioning for your MSSQL LUN's then you aren't using more space than you need anyway. If you are not, then I can understand why you can get the idea that VMWare could save you storage. Thin-provisioning is really the simplest way to do things. 3. If you do choose to use VMWare, you should look at separating data that has significantly different performance requirements off to their own datastores. Here you can tune I/O on the NetApp to reflect your needs. This basically is valid for backup policies too. 4. Depending on how you already do backup on VMWare, you are probably just going to be adding to response time problems with trying to put SQL in the mix. It isn't that NFS is incredibly slow, but if your VMWare storage philosophy was just to dump everything into datastores randomly, then you are pretty much hopelessly a victim of a backup system that might suck the life out of your storage system many hours a day and cause you endless pain in SQL customer complaints. Now SQL clusters can work if you have enough customers to populate them and can get the network segregation to work right. VMWare could work but you have all of the problems of the rest of the environment and no clear backup savings over SMSQL. VMWare filesystems aren't know for their speed and you will have problems if you run de-dupe, etc. You can still use iSCSI on VMWare hosts. I'm not saying that this all won't work fine on VMWare, but it really depends on how ready one is to make it work. If the current VMWare operational model is too inflexible to make things work on a satisfactory performance level then it's a bad idea. You basically need to take into account what the limitations of each element are. You don't have a lot of horsepower in a FAS2020.

shaunjurr · ‎2011-06-06

Hi, It looks like you tried to attach disks from a higher release (probably 8.x) to a 7.x system without first removing them in an orderly fashion from the 8.x system. You might get things to work by doing this on the spares with bad labels: Removing software-based disk ownership: At the storage system prompt, list the disks and their RAID status. filer> aggr status -r Note the names of the disks from which software-based disk ownership is to be removed. Enter advanced privilege mode. filer> priv set advanced Execute the following command on each disk: filer*> disk remove_ownership disk_name filer*> disk remove disk_name Exit advanced privilege mode. filer*> priv set Verify the disk(s) show as 'Not Owned'. filer> disk show -v Any disk that is labeled 'Not Owned' is now ready to be moved to another storage system. Normally, this should be done before you remove the disks from the first system. If you can get this done on your current system, then re-assigning ownership should get you where you want to be. I'd suggest an upgrade to something a bit newer than 7.2.4L1 as well... like 7.3.5.1

shaunjurr · ‎2011-06-06

Hi, Most of this is really fundamental setup stuff. Basically, you either need to pay someone to come in and set it up for you or you need to read the documentation on NOW: File Access and Protocol Management Guide. This covers basically everything you need to know. A few hints: 1. Don't use mixed mode if you can avoid it. It is really very rarely necessary. You need to read the docs and understand how the qtree security and file rights + user authentication work together. 2. The easiest way to get your users to sync (if the linux users haven't gone off the radar with user names) is to use ldap on linux (which again is a can of worms) and point your linux servers towards your AD controllers as ldap clients. There is the alternative of using a separate ldap server but then you get to try to keep them in sync. Most linux admins will cringe at the thought of using AD for ldap, but the real world demands a few sacrifices. Adding automount info to Windows AD could be a challenge too, though. 2008 can also do NIS. I would avoid NIS as well because of it's age and brokeness. Users are linked by usernames unless you want to manually map each windows user to a unix user... This is more work and prone to errors over time. There are a number of approaches to solve your problems, but basically this is consultant work if you don't work it out yourself. You seem to be at the very beginning of your setup and learning.

shaunjurr · ‎2011-06-05

Hi, Getting gfilers/V-Series to just "work" is not so straight forward. Assuming you have the appropriate licenses for HP EVA storage, you might want to view the supported configurations here: http://now.netapp.com/NOW/knowledge/docs/V-Series/supportmatrix/V-Series_SupportMatrix.pdf You might get other configurations to work, but these are known working configurations and probably the only supported ones.

shaunjurr · ‎2011-06-05

Hi, Basically, it's pretty simple. The caddies are constructed so that you can't put a drive in a non-compatible shelf. The numeric label is just to help you recognize the size. Sometimes they will have different colors (silver vs. black) for some speed differences between drives of the same size when those differences exist. The "dongles" are just there to deal with missing "hot-swap" connectors on certain drive types, (P)ATA/SATA .

shaunjurr · ‎2011-06-03

I think you are going to have to include a bit more information. What does your setup look like? How many VM's per data store? Load on the ESX servers and NetApp? Log information? ONTap version? What have you discovered so far? What have you tried? (How much money are you willing to pay to have it fixed? hehe) No one is going to beg you to supply enough information. If you don't do it, you won't get help.

shaunjurr · ‎2011-06-03

Hi, The reallocate vs. dedupe (sis) debate certainly is an on-going one, but I think there are a bit too many that are looking for black and white answers when there are probably a good deal of "gray" cases. For the moment, this isn't really an issue in your setup anyway. Again, you can avoid larger snapshot deltas by using the "-p" switch or by running reallocate a bit more often, or both. Depending on the churn in your databases, you might even want to run things every couple of days. You'll unfortunately have to check the results in the messages logs manually because NetApp hasn't yet delivered a method to track the need/results here. Performance hits with reallocation isn't really an issue, but you probably want to stagger your runs. 'reallocate' is a low priority background process, so it shouldn't affect things too much (nothing is perfect, but it works pretty well). You can, again, fix things here to your advantage with the 'priority' command. I find using priority really evens out I/O performance on the whole. It seems like most operators, however, haven't used it very much, even if it also allows a better use of your PAM cards. Use of SnapManager is really a tangential issue. The performance should be basically the same as using normal snapshots. Again, with 'priority' you can increase the priority of snapshot operations on volumes where performance is important. Performance degradation in combination with using snapshots on NetApp is negligible, unlike EMC and HDS. I think I've seen benchmarks where the hit was 2-3%, iirc. Basically, what I wrote before should work well for you. I have used this with a good deal of success on a number of systems. There's not a lot to fear here. Get your db luns reallocated and take a look at some TR's on FlexShare and PAM usage and setup 'priority'. 🙂

shaunjurr · ‎2011-06-03

Hi, I'm not sure I understand the problem. You can easily leave the share permissions to "everyone / Full Controll". The rights on the individual directories and files will prevent others from changing files that don't belong to them. "cifs_homedir" functionality just requires that you setup where the home directories are on the filer in cifs_homedir.cfg. If you need admin rights to see the directories, there's an option for that. You can also setup an administrative share to point to the volumes and qtrees where the user directories are.

SnapVault Update Error

WAFL volume usage > 90% ok?

Disadvantages of enabling deduplication (A-SIS) on vol0

WAFL volume usage > 90% ok?

Expert System for diagnosing high level performance bottlenecks? - eg loop saturation

UPS support in a Metrocluster

Re: Fabric MetroCluster sporadically selectes non-HA pathes to shelf

script to compare rc config on ha pair

New project: (quite) high traffic web sites on a vSphere cluster and FAS2020

NDMP Backup extremely slow... other solutions?

SQL Server LUNs, Volumes, best practices

Re: Sanpdrive Role Basec Access

Deswizzle - how to read status

Question regarding small SQL deployment,

Re: Sanpdrive Role Basec Access

Question regarding small SQL deployment,

SDW 6.3 - Timeout of 120 secs

Question regarding small SQL deployment,

How to fix "raid.config.disk.bad.label:error"

Netapp for CIFS and NFS

Unable to present LUNs from HPA Eva 4400 to gFiler

Newbie: NetApp HDD Trays / Caddy(s)

VM Sanpshot is slow on netapp

Re: Improving Disk Latency Issues and Overall Performance for Microsoft SQL

Permissions issues with HomeDir