About shaunjurr

shaunjurr · ‎2011-06-03

Hi, The automatic downgrading, I would think, is the result of not being able to map who root is, perhaps. Anyway, that would break just about everything. NFSv4 is just hard to implement in mixed environments if one wants to share anything CIFS and NFS since one needs to use the AD controller for kerberos because no unix variants exist that support CIFS... but I digress... The only other apparent possible error is a bug in the linux (l33n0x) implementation for NFSv4. A quick Google search points to recent situations such as yours.

shaunjurr · ‎2011-06-03

Hi, There are only a few setting that should be affecting this. I'm going to go through the CLI options, because it just works faster than 100 screen shots and is often more accurate. You can familiarize yourself with the CLI commands in the manpage documentation on the filer or on NOW. 'options' values are also documented there. 1. 'options cifs.ms_snapshot_mode xp' This basically should stay at "xp" unless you have win2000 clients and some other brokenness. Default is "xp" 2. 'options cifs.show_snapshot off' This is the one that is confusing your indexing service and making a big mess of things. The default is off and generally because of the problems it creates (much like you have now) leaving it off is just a good thing. Default is "off" 3. 'vol options <vol_name> nosnapdir off' This works on a volume level to turn off displaying and access to the snapshot directory for clients. Default is off. Now things should work fine with "Previous Versions" if you just use the defaults. It will also relieve you of your current headaches with luser (local user... not a typo) behaviour. That which has already been indexed may cause a few headaches if users try to access files that are hidden from them now, but... even that should still work with the defaults. The paths are just not visible. Try checking these values through the CLI. Good Luck.

shaunjurr · ‎2011-06-02

Hi, An alternative is also just thin provisioning your LUNs. If you don't space reserve your LUN's and have no guarantees on your volumes, then you don't really use any extra space just making the volumes 10-20% larger than your luns. In the end, it can really make life easier because you really only have to watch the aggregate filling levels. The added value is, of course, that you can use the blocks that have been ordered and "reserved" with full provisioning for other things. Just a thought. It made my life a lot easier.

shaunjurr · ‎2011-06-02

Hi, Your users can also use the "Previous Versions" tab from the right-click --> Properties on files (and directories?) to access versions of files in snapshots. Good luck.

shaunjurr · ‎2011-06-02

Hi, I'm guessing that if you don't want to go through the work of pulling and graphing a large number of snmp variables (or using the developer toolkit and grabing things via xml) then you are going to be pushed towards the Performance Advisor part of DFM/OM, where you can get a bit of an overview of such things. The good old sysstat output is a good place to start. Sending a perfstat to your local NetApp guy might get you a few hints as well. Why do you think you have a bottleneck?

shaunjurr · ‎2011-06-02

Hi, You definitely want to regularly run reallocate on your luns. If you have split log and database luns, then basically, you really only will need to reallocate the database luns. The even log writing and rotation causes little fragmentation. This will clear up some of the response time problems. Be advised, using incorrect options will cause larger snapshot deltas. You will also want to look at the flexshare (a.k.a. priority command) scheduling subsystem. Here you will be able to more finely tune which volumes get higher I/O priority and which get to have priority in your PAM modules. If you get both of these right, your problems should basically go away quickly. I normally split log and database luns (using a volume mountpoint structure on the windows server) between the controllers so that I get full use of both controllers for each database, but that will be more of a task to achieve than the suggestions I made above. Good luck.

shaunjurr · ‎2011-06-02

Hi, I had this before too. Basically, if you have the switch ports configured correctly (see the TR... 3548, iirc), then basically, it is probably a ESH and/or disk fw issue. When you upgrade, have somebody close to the system. Shelf fw upgrades on fabric-attached metroclusters have never worked correctly for me, so you might need someone to re-seat the ESH modules to get them to boot correctly. Then, of course, you will have aggregates to sync up, etc. It could be a long, rather involved process. I think your Brocade Fabric OS needs to be at 6.1.1 or so too for 7.3.5.1, but you can easily check the matrix for that. The path selection algorithm for disks on metroclusters is unfortunately not as refined as one would hope. FWIW, I haven't had the problem for a long time, but it was a real PITA with support when it happened. Good luck.

shaunjurr · ‎2011-06-01

Hi, I'm actually wondering how you got any operation in your (probably bash) shell completed on a file with hyphens ( - ) without getting an error. Try escaping special characters in file names (if you really have to use them) with a backslash or single quotes. You should review the linux manpage (or whatever other help system your distro uses) as well as check what the filer sees as export/mount rights by using 'exportfs -c .... ' ... see the NetApp manpage for exportfs.

shaunjurr · ‎2011-06-01

Hi, As much as one would like to view your constant architectural questions with a positive awe of your curiosity, I think you are going to get neither in-depth technical descriptions (these are technical trade secrets, to some extent) nor long, involved "NetApp for Dummies" type of explanations here. If your interest is genuine, then you will need to do a bit more of your own searching and researching. The architectures of the different series of NetApps throughout the years have simultaneously used both ECC and NVRAM for different purposes.

shaunjurr · ‎2011-06-01

Hi, Just one thought at the moment: are you using vscan functionality? If so, check your messages logs on the filer and the logs on the vscan server.

shaunjurr · ‎2011-05-31

I've tried to read the documentation and at some point I sort of drowned in the marketing bla-bla. Basically, this is a re-branding of mostly existing functionality and some things that most of us probably wished for for many years from DFM/OM as Storage (and Backup) administrators. (I don't blame anyone... companies like EMC make tons just changing the names of things and adding some fireworks) Why the split? Why is NetApp making more disparate products instead of a real "single pane" management system? We've been wishing for end-to-end data management of NetApp NAS/SAN for years, from NFS/CIFS/LUN to secondary (tertiary) mirrors and on to tape, essentially SLA monitoring. What I have seen and tried to use from DFM/OM has been a source of great frustration and a full implementation was so riddled with technical hurdles that it was simply left to decay. Where is the "single pane" for storage administration? Integration of SAN administration/monitoring? Integration with backup systems? Will any of this integrate with DFM/OM? What we have here is a server backup solution that involves as much NetApp equipment and software as possible. I don't want to rain on anybody's parade here, but I'd be more excited if this looked like an extension of current management tools (that could possibly sold separately) that was working towards a "single pane" (not that I like that marketing term either... ) solution.

shaunjurr · ‎2011-05-31

A couple of things come immediately to mind: 1) Firewalls, either local on your server or somewhere in between 2) multiple IP aliases in the same subnet as the one you are polling on the filer 3) options snmp.access does not include your dfm server

shaunjurr · ‎2011-05-31

Hi, I think the problem needs to be dissected a little here. You started out with the assumption that the caching settings for the CIFS share were the cause of the problem. Perhaps this is not the case at all. Caching settings don't affect filesystem permissions. Basically, these are are just files and they should behave as any other files given the correct configuration of the filer. There should be no specific reason to have to move them to a windows server unless you are using some sort of MS Windows specific functionality like DFS-R for profiles. You probably need to divide the problem up into smaller pieces and try things step-by-step. It will make solving the problem a little more manageable, at least. Make a simple CIFS share (after you have created a volume and qtree just for this purpose) and add a single profile with robocopy and make the changes to AD necessary that it will be used. Test that. Enable the "options" necessary to give you more verbose logging about file access events. I think you either have used some incorrect flags (options) with robocopy, or you have set some sort of strange CIFS share access rights, or you have changed rights somehow in the directory stucture above the profiles that has been propogated downwards into your profiles. Either way, you don't seem to have a clear enough error from the filer to be able to proceed, so I assume you don't have some of the additional logging enabled or don't know where to look for it. Good luck.

shaunjurr · ‎2011-05-31

"P" levels are available by using the download selection at the very bottom of the software download page. Just select ONTAP and then enter the release number manally, like 7.3.5.1P2 You might want to familiarize yourself with the fixed bugs pages and read the fixed bugs before downloading a "P" release. There is probably no real "safe havens" as far as software version choices.

shaunjurr · ‎2011-05-26

You might have some advantages in keep profiles in a separate qtree, but since this is essentially a property of the CIFS protocol and can be set on a share basis, the share could basically point to any directory. It doesn't really have any other effects on the WAFL file system. Things like whether or not you want to qtree snapmirror it or perhaps not take backup of it are probably more deciding factors than a CIFS setting eliminating file caching. 🙂

shaunjurr · ‎2011-05-26

Some of the migration aspects are coming, but somethings even NetApp can't change. I'd like to see "data move" in ONTap 8.x be able to move CIFS/NFS shares as transparently as it is supposed to be able to move LUN's, at least within the same controller. I guess that would essentially make such moves possible with entire aggregates as well, but the complexity and duration of such an operation would have prohibitive risks in today's world, I would think... Upgrading disk sizes is a phenomena that has only recently gained speed. Since the availibility of consumer grade ATA/SATA disks that were forced to compete on a purely per GB price market, sizes have expanded rapidly. It hasn't really been a necessity. You could by 144GB disks for many years, for example. The pace was slower. Perhaps this wasn't one of the highest priorities when WAFL was made. The file system would have to be modified to understand the modifications in physical storage that were occurring somehow. I think calling WAFL a mantra is pushing things a bit far. Given enough time, money, and human resources, it probably can be done, but NetApp is a company that needs to make a profit and not an academic institution developing solutions for historic corner cases. It may become necessary, but it hasn't been so far, I would think. I don't believe there are any competitors that offer this functionality either.

shaunjurr · ‎2011-05-26

Hi, Well, I'm glad that most of the news here is positive. It seems that you have a "boatload" of resources and need new ways to beat up on it. I guess there are a number of different windows benchmarks but you probably need to find some way to run them in parallel with larger data sets to actually beat up on the cache enough. The "big boys" use specfs benchmarks for nfs/cifs and I guess the top SAN benchmark is from SPC. You can view results from a 3270 benchmark run (these go for long runs...) here: http://www.storageperformance.org/benchmark_results_files/SPC-1E/NetApp/AE00004_NetApp_FAS3270A/ae00004_NetApp_FAS3270A_SPC1E_executive-summary.pdf As far as obtaining a copy, I think it is still members only and then $2500 to get the software. The main point here, even if the SPC benchmark was basically probably just done to stick it to EMC as a mid-range player, is that you might be able to get some idea of how you might extrapolate the results on to a 6280 as far as IO expectations go. I'm not sure what workloads you are trying to benchmark for or if this is largely just academic/hacker interest, but you seem to be safely in the range of "big iron" in capacity and as long as you have enough disk capacity, you probably have a good deal of expansion room for more PAM cards if things ever get tight. All that aside, whatever your planned implementation is, it will probably be things like the wrong LUN types, unexpected growth, poorly designed applications (or sql queries), client-side bugs, ontap bugs, stone age backup methods, that cause you more problems than raw I/O response times. Benchmarking the implementation, snapmanager software, snapvault or snapmirror backups, Operations manager, "data motion" in ONTap 8.x, if you're not already proficient in these areas, are probably equally important in getting a handle on a successful implementation. Sorry about rambling on, but other than parallelizing your currently available benchmark software or spending a load of cash on SPC software, it might just be an idea to move on to other aspects of storage administration that can be equally challenging. Good luck. 🙂

shaunjurr · ‎2011-05-26

Hi, I think basically what you are doing is going to be a very painful experience, if you ever get it to work at all. Sharing a single NetApp easily among different authentication domains is normally done by using (and buying) the Multistore license and splitting up your storage unit into multiple virtual filers (vfilers). You are going to have to do some reading on setting up local users and mapping them and somehow getting authentication to work on a filer that is already attached to a windows AD domain. I'm not even sure if it is at all possible.

shaunjurr · ‎2011-05-26

Well, 2TB or larger (when they get here)... I almost hope that I am wrong, but based on my previous experiences, the size of that disk in that raid group can't be changed after it is added. You can always give it a try, but the reconstruction time could be pretty extensive (especially with -i). Basically, it would want to do a preventive copy of all of the blocks over to a new disk (without -i), so it has to be of the same size. Even the parity rebuild will want to produce the same number of blocks on the replacement. I believe you are in the same boat if you use disk replace as well. There's probably a KB article that makes this information more official on the NOW site. Sorry to be the bearer of bad news.

shaunjurr · ‎2011-05-26

Is there any chance you could give us a few more clues about what the customer is trying to accomplish with these immensely huge filesystems? Just if you are open to a little more of a "brainstorming" solution. Sometimes one is too close to the trees to see the forest... For my part, I'm still trying to get my head around situations where such huge filesystems would be necessary. The application is either hugely specific or terribly designed... at least from my initial gut feeling... 🙂

shaunjurr · ‎2011-05-26

Do you have a firewall between the filer and the vscan server on the "not-directly connected" vscan-filer combination? It might just be the dynamic opening in the firewall timing out after 10 minutes (if it happens pretty consistently after 10 minutes). If you do have a fw between vscan server and your filer(s) this will probably be a continued source of headaches in the future and you probably really want to avoid this. Good luck.

shaunjurr · ‎2011-05-26

Hi, I see no one has jumped at this one yet and I can understand that to some extent because permission problems can be a real pain to diagnose via "blogs". Your basic "bible" for network file sharing protocols for your release can be find on the NOW site. The File Access and Protocols Management Guide Basically, you can use a few options settings and the system messages file to get more information. Since I rarely use the http interface (it is just not exact enough in all situations), you might want to try 'options cifs.trace_login on' (just use "off" to turn it off afterwards) on both filers to see what is different. Entering "options cifs" will show you all options in the cifs "tree" of options. Having said that, a lot depends on a number of other factors too: multi-protocol filer (nfs and cifs?)?, qtree security?, cifs guest access options, options for admin access to shares, share browsing options, acces-based enumeration, authentication with AD... etc. As far as share permissions, as long at you are running ntfs security style, you really don't need them. The underlying file system permissions are basically enough to work with. You might want to have an "admin" share pointing to the netapp vol above your cifs share (really a good idea to point CIFS shares to qtrees) so you can set filesystem permissions on the qtree (functions like a windows directory/folder for this)a. Local users can be problematic if you don't have an overview of all the roles and capabilities (terribly documented, unfortunately). You probably, if your organization is realatively small or your admin groups are pretty specific, can get away with simply adding your AD admin group to the filer in the administrators group, like this: useradmin domainuser add AD\my_AD_admins_with_clue -g Administrators on the command line. For the rest of the problems, I guess you are going to need to provide some more specific information: ONTap version, AD version, Windows version, SMB version, qtree security for the vol/qtree with cifs shares, output from relevant 'options cifs', output from 'cifs shares' for the problematic shares, etc... Good luck

shaunjurr · ‎2011-05-26

Hi, You can also do this on the cli: cifs shares -change <share_name> -no_caching You can get an overview of the manpages (manual pages) for the cli commands here: http://now.netapp.com/NOW/knowledge/docs/ontap/rel7351/html/ontap/cmdref/index.htm I really hope you aren't using ONTap 7.3.5 . This release was pulled from availability. Update to 7.3.5.1 (or some P level) if this is the case.

shaunjurr · ‎2011-05-26

Hi, Basically the C$ share is just there to make things sort of look like a real filer (windows server) and to give access to a few files for administration. You already know that you can point it pretty much anywhere you want to point it. There are only a few log files written there by the system and it doesn't use the CIFS share for this. The exports, nsswitch.conf, resolv.conf files are things that you probably want to keep access to, so creating some sort of <admin>$ share is probably a good idea. I guess the ETC$ share is still going to get most of that If you use any of the SnapManager products, they _might_ want to write to a file there (perhaps for thin-privisioning), but otherwise I know of no reason that the c$ share has to point exactly where it does, but I guess I could be mistaken. You could just watch your logs for problems and move it back if you see anything. It won't affect the rest of your cifs shares. I personally have moved them before with no consequences.

shaunjurr · ‎2011-05-26

I guess it wasn't entirely apparent for me that you were addressing your comments to the "diskkeeper" software, per se. Since these threads will also be read, perhaps in haste, as reference information in the future, I just tried to clarify that what the original poster wanted wasn't defragging per se. So, diskkeeper may be a "no-no" but "vacuuming" routines from applications are required and a subsequent reallocate will give or reclaim performance advantages.

Change owner with NFSv4

Re: Hidden Snapshots on CIF shares

How can I suppress the volume full error message on certain volumes?

Re: Hidden Snapshots on CIF shares

Expert System for diagnosing high level performance bottlenecks? - eg loop saturation

Re: Improving Disk Latency Issues and Overall Performance for Microsoft SQL

Fabric MetroCluster sporadically selectes non-HA pathes to shelf

Change owner with NFSv4

Memory

New Files on CIFS Don't Immediately Appear

SnapProtect questions

'dfm host add' snmp issue

CIFS & Win Roaming profiles

CIFS & Win Roaming profiles

CIFS & Win Roaming profiles

Disk replace 2TB by 1TB

Large difference in Windows OS Avg. Disk sec/Transfer and lun avg_latency

CIFS + Map Drive Letter Permission

Disk replace 2TB by 1TB

Data ONTAP 8.x 16TB LUN Size Limitation

McAfee vscan connection terminates

MMC - ...not have permissions to see the list of shares...

CIFS & Win Roaming profiles

Redirect c$ on a vfiler

Exchange 2003 online defrag