About Darkstar

Darkstar · ‎2011-03-20

We've been wondering about the same thing recently. Disadvantages of large window sizes are the same on all systems, windows, linux, etc. NetApp is no different: Larger window means more data to retransmit on a lost packet for example since the full window needs to be resent. Also, it *might* make it easier to OOM the filer (out-of-memory) since every packet sent through the net has to reserve a full window, so I wouldn't make it too big, even on a perfectly reliable network. The default window sizes have changed in recent OnTap versions. We don't have any performance measurements yet, but we prefer to not touch them unless we have a specific issue which we're trying to debug... some official insight in these values would be nice -Michael

Darkstar · ‎2011-03-20

I think you've been confusing aggregate and volume a few times in your post, but I think I understood what you want to do. It is true, to do a "reallocate" on a VOLUME, you need to have some space in that volume free. Since your VOLUME is completely full, performance will degrade on it. Note that the recommended fill ratio for a volume is around 80% to have enough "breathing space" for internal volume operations. So here's what you do: try "df -Ah" on the console and see if there's some space in the AGGREGATE left. If not, you can free up some space by deleting the volume you don't need anymore. After you have free space in the aggregate you can resize the volume. To delete volume vcb1 (ALL DATA WILL BE LOST!) vol offline vcb1 vol destroy vcb1 try "df -Ah" again afterwards, it should show some free space Then, resize VMFS1 by a few gigs: vol size VMFS1 +150g You can also do it in the web frontend of course -Michael

Darkstar · ‎2011-02-25

0% savings seems a bit low for any kind of volume. Did you run SIS on the volume at all? "sis start -s /vol/<volumename>" ? As soon as you get more free space by deduplication, you can increase the volume size again. The number of physically allocated blocks cannot exceed 2TB. So if you save, say, 50g by deduplication you can increase the size of the volume by 50g afterwards -Michael

Darkstar · ‎2011-02-25

Not quite right. Snapinfo stores copies of your SQL logs for up-to-the-minute restorability. For this to work, you have to keep the truncated logs somwehere, otherwise you can only restore from the latest full backup. If you don't want/don't need up-to-the-minute restorability, you can make your snapinfo LUN quite small (a gig or five). However, you can then only perform point in time restores (or up-to-the-minute restores from your most recent full backup if logs haven't been truncated yet) A correct SMSQL sizing should have taken the additional space for snapinfo lun into account IMHO (at least, our sizings always do take it into account) -Michael

Darkstar · ‎2011-02-24

Can't you set the ASUP to use smtp instead of http/https? It's often easier to configure the mail server to allow/whitelist one specific recipient -Michael

Darkstar · ‎2011-02-24

This is described in detail in the admin guide, and there's also a few TRs on SMSQL best practices. Sizing basically involves calculating the size of the logs per day (for example, 100mb of logs per day) multiplied by the number of days you want to keep your snapshots (for example, 21 days). In that case the snapinfo lun would have to be at least 2.1gb, although i'd rather make it at least 50% bigger (you can increase its size on demand later though) since there have been some cases of SMSQL not correctly deleting old SnapInfo entries when it deletes the corresponding snapshots, so if you don't watch out it could fill up your disk pretty fast. Oh and you can of course share the SnapInfo LUN between multiple databases on the same server (even in different instances) which is great because it saves you from creating 20 new LUNs just for SnapInfo -Michael

Darkstar · ‎2011-02-21

Hmm... it probably works as designed (the restore from SMVI always overwrites the original VM IIRC). But if you already did a FlexClone of the affected VM, there is actually no need to do a restore: Just power off your original VM, start the cloned VM directly on the clone datastore, and then do a Storage VMotion back to the "correct" datastore (i.e. where the original VM was running). When this has finished, unmap/unmount and destroy the cloned volume and you're done -Michael

Darkstar · ‎2011-01-10

True, that was some time ago. I don't know which OnTap version it was though. But I also had problems on newer versions, like come commands at the end of the clipboard got cut off or were lost completely, so I'd double-check that all commands you pasted are actually executed and nothing got lost -Michael

Darkstar · ‎2011-01-10

All answers given here are valid options. BUT: Whatever you do, please DON'T try being smart and doing a copy/paste of all these commands into a PuTTY session (or any other SSH session for that matter). I did that once and it crashed the filer I guess the NetApp has only a limited input buffer for SSH commands so you should not send more than, say, 10 commands or so via copy/paste. Wait for the prompt to come back and then send the next 10. And so on. Or use any other way outlined in the above posts. -Michael

Darkstar · ‎2011-01-10

Group quotas won't work in a Windows environment -- they are UNIX/NFS only. Why? Because on unix, each file belongs to exactly one group (as identified by its gid) and so it can be counted towards exactly one quota rule (the one for that specific group). On Windows, each file has an owner (which counts towards user quota) but a file does not have an "owning group". There is no such concept in windows. You can have multiple groups with access to the file, but no "primary group". If you have a file that's 100mb and where groups AD\Group1 and AD\Group2 have the same ACL rights, towards whose quota will this file count? Group1 or Group2? Or both? How about nested groups? How should the filer find out when you nest one group inside another, i.e put ad\group3 inside ad\group2? should the file now also count towards group3's quota? How often should the filer scan all AD groups for such changes? and so on. If you really need something like that you need to use additional software (there's something called QFS from NTP Software that does things like that) -Michael

Darkstar · ‎2011-01-10

I wonder why everyone is so keen on trunking 4, 8 or even more ethernet ports to one VIF. Having more than 2 links active at the same time makes no sense in >90% of all use cases as you won't be benefiting from the increased (theoretical) throughput because almost everything you do is I/O bound and I/Os scale with number of disks ,not the network, and that's the bottleneck in almost all cases. Anyway, as the others already explained, you can only do VIFs on one controller and you should do single VIFs if you have multiple switches connected (i.e. one single vif over 2 multi-vifs with 2 ports each) -Michael

Darkstar · ‎2011-01-10

You can still export the LUN via iSCSI even if it's already mapped to an FCP initiator. It's not an "either ... or" relationship, both iSCSI and FCP can be active on the same LUN at the same time. Just create an iSCSI igroup and map the LUN to it. Make sure you only mount it read-only though! -Michael

Darkstar · ‎2010-12-14

Known bug (142292) http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=142292 Either upgrade to a newer OnTap (at least 7.3.3 in your case, latest P-release recommended) or try and run "wafl scan ownblocks_calc" on the affected volume, which might fix it. Anyway it's only a cosmetic issue, no data is lost. "aggr show_space -h" also shows you the volume usage -Michael

Darkstar · ‎2010-12-01

Try using ldd to find out if you're missing any (32-bit) libraries for the binary: [root@esx1 ~]# ldd /opt/netapp/santools/mbrscan libnsl.so.1 => /lib/libnsl.so.1 (0x00699000) libdl.so.2 => /lib/libdl.so.2 (0x0047a000) libm.so.6 => /lib/libm.so.6 (0x004ae000) libcrypt.so.1 => /lib/libcrypt.so.1 (0xf7ef6000) libutil.so.1 => /lib/libutil.so.1 (0xf7ef2000) libpthread.so.0 => /lib/libpthread.so.0 (0x00480000) libc.so.6 => /lib/libc.so.6 (0x00334000) /lib/ld-linux.so.2 (0x00316000) [root@esx1 ~]# ldd /opt/netapp/santools/mbralign libnsl.so.1 => /lib/libnsl.so.1 (0x00699000) libdl.so.2 => /lib/libdl.so.2 (0x0047a000) libm.so.6 => /lib/libm.so.6 (0x004ae000) libcrypt.so.1 => /lib/libcrypt.so.1 (0xf7f8b000) libutil.so.1 => /lib/libutil.so.1 (0xf7f87000) libpthread.so.0 => /lib/libpthread.so.0 (0x00480000) libc.so.6 => /lib/libc.so.6 (0x00334000) /lib/ld-linux.so.2 (0x00316000) [root@esx1 ~]# If any of these libs comes up as "not found" then you need to install (compatibility-)libraries for your OS -Michael

Darkstar · ‎2010-11-15

SnapMirror is actually the best option IMHO. It copies all ACLs and works online. Get SnapMirror demo license (they're good for 90 days), mirror the data incrementally during normal working hours and when you're ready, deactivate CIFS on the old filer, do a final "snapmirror update" and activate your CIFS shares on the new filer. You can use NetBios aliases to give your new filer the same name as the old one, if needed -Michael

Darkstar · ‎2010-10-25

If you have two seperate switches (as opposed to a "huge" redundant single switch) you need to set up a single mode VIF because MultiMode requires you to have a trunk/etherchannel on the switch ports which you can't do across switches. If you're worried about performance you can do two MultiMode VIFs, one to each switch, and set up a single-mode VIF ontop of that. You need 4 network ports for that (or 3 if you're ok with an asymmetric setup -- 2 primary and 1 backup path) Then you set up the filers to take over each partner vif during takeover. Then you only need to make sure that your Xen host has redundant network connectivity. On ESX you would configure the vSwitch where the VMkernel port is connected in such a way that it has at least 2 physical uplinks. I guess there's a similar functionality for Xen. -Michael

Darkstar · ‎2010-10-25

The most technical info you'll find is the paper "FlexVol: Flexible, Efficient File Volume Virtualization in WAFL" from Edwards et.al.that was published at USENIX 08. Available for download here: http://www.usenix.org/event/usenix08/tech/full_papers/edwards/ still leaves open a lot of questions but answers many as well -Michael

Darkstar · ‎2010-10-22

If you want to have different snapshot retention times for different directories you need to separate them to different volumes. I.e. have user homedirs with retention times of 4 weeks and group directories with 8 weeks. -Michael

Darkstar · ‎2010-10-20

Hmm.. I get the following (with your figures), assuming the snapshot you listed is the last (and thus largest) on the volume: %/used: 4148 / 8088 = 51% (snapshot used is 4148 gb, used volume space is 8088 gb) I guess the "difference" between 51% and 53% is that some blocks are locked in more than one snapshot and thus are counted differently. you could try "snap delta" to get more detailed info of changed blocks between 2 snapshots. I, too, don't get the 69%, but I guess it *might* have something to do with the fact that your snapshots are "spilling" over into the live volume. On the other hand, the output of "snap list" is *not* authorative as there are several bugs in OnTAP which might lead to wrong values there (e.g. BUG 226848 or 347779) -Michael

Darkstar · ‎2010-10-20

49gb of 6095gb is about 0.8% which is rounded to 1% and so the figures seem correct to me -Michael

Darkstar · ‎2010-10-20

Sorry, I can't really help with the error message but removing shelves from a running system is never supported (MC or not, spare disks or not, doesn't matter), so while it *might* work, you're better off shutting down the cluster before moving the shelves. We had some success with hot-removing shelves (involving pulling single loops, disabling ports and multiple takeover/giveback transistions) but I wouldn't generally recommend it (especially on an FMC) -Michael

Darkstar · ‎2010-10-20

This is most probably a network/routing issue. Check if the server and switchports are on the correct VLAN. Try connecting a "regular" NIC on the port where your HBA is connected and giving it the same IP as the HBA. Then you can perform ping testing and traceroute for example. -Michael

Darkstar · ‎2010-10-19

I have not found anything official on this but my guess is that it's some kind of "worst case" fragmentation. I.e. on the whole volume, the (averaged) fragmentation is 2, but the worst case fragmentation encountered (as seen over some kind of suitable window) is 31. This is just my interpretation, I'd love to hear something official on this topic -Michael

Darkstar · ‎2010-10-19

This is to be expected. If you snapshot with the memory, ESX doesn't need to quiesce the vmdk because all the not-yet-written data still is in the host memory so no data is lost. Usually these errors are from the Sync driver that gets installed with the VMware tools. Try disabling and uninstalling it from the device manager (the name is "sync driver" IIRC...). Also, check if a VSS sync driver is available in the vmware tools for your OS and install that (if VSS is installed, it will be used instead of the sync driver). This resolves random backup problems like this most of the time. Be aware, however, that in rare cases you might end up with a nonconsistent snapshot. But IMHO a nonconsistent snapshot is almost always better than no snapshot at all (or a VM that crashes because of a bug in the sync driver) -Michael

Darkstar · ‎2010-10-18

A qtree on the NetApp is (for all CIFS and client access functions) 100% equal to a directory. I.e. if you create a qtree, you essentially create a directory in the filesystem. This directory defaults to the inherited security of the volume on which it is created, except if you remove the "inherit" flag from the ACL in the volume's root directory. This is the same as on windows, where if you create a directory called "D:\TEST", it inherits its ACLs from the ACLs on D:\ Usually you add an administrative CIFS share on the volume root (i.e. "cifs shares -add USERHOMES$ /vol/vol_UserHomes" if your volume is called vol_UserHomes for example). You then connect to it and set the ACLs on all qtrees and directories that you find inside. You can then chose to inherit the ACLs or not. Also, if you map the USERHOMES$ share to a drive letter you can change the ACLs of the volume root directory. It really works the same way as on a windows server. There's, sadly, no way to set/change NTFS ACLs on a file/directory via the NetApp command line. You can only set Share-level security or import a so-called "fsecurity" batch file which uses an MS SDDL config file to set ACLs on whole volumes/directories. The Mode-bits you set when creating the qtree only apply to a UNIX style volume/qtree, which you should really avoid if you're sharing via CIFS (i.e. make sure that your volume AND all your qtrees are set to NTFS security style, not MIXED and not UNIX). Access-based enum is enabled on a per-share basis and has no noticeable impact on performance since WAFL directly stores windows ACLs and thus doesn't have to do any sort of mapping to determine whether a given user should see a directory or not. I really encourage you to read the "File Access and Protocol Management Guide" which describes these things in great detail. You find it on your filer (if you have installed the documentation) via http://<your-filer-ip>/na_admin (click on "Documentation"). Also the Storage Management Guide describes how volumes and qtrees work. -Michael

Re: TCP Window Sizes

Re: Reallocate scan for volume VMFS1 has stopped because there is insufficient free space.

Re: Increasing deduplicated volume size after the threshold limit

Re: SMSQL and snapinfo directory

Re: DOT 8.0.1 7-Mode https proxy for ASUP act different as in 7.3.4

Re: SMSQL and snapinfo directory

Re: VSC restore crashes filer

Re: Running 400 commands within the commandline

Re: Running 400 commands within the commandline

Re: Quota Implementation in AD ( Windows ) Enviroment

Re: NetApp FAS2040-R5 as NFS Server for ESX 3.5

Re: Accessing a lun with NFS

Re: Snapshot space usage on a volume reported 0kb/0%

Re: MBRAlign on ESXi

Re: CIFS migration between Filers

Re: NFS High availability

Re: Looking For NetApp Documentation About Disks

Re: CIFS - several small volumes or 1 large volume?

Re: size in snaphots

Re: size in snaphots

Re: config.SameSwitchWarn MetroCluster error message

Re: HP NC551i iSCSI HBA can't talk to FAS3140

Re: Interpretation of wafl measurement 'hotspots'

Re: SMVI VSS Error

Re: Joining Active Directory

ONTAP 9.17.1 EAP