About GARDINEC_EBRD

GARDINEC_EBRD · ‎2013-05-30

Hi Craig, Thanks for the info, I'll give that a try and report back. I'd still like to know why the limitation exists at all. I'm sure there's a good reason for it, but for the life of me I can't think what reason there can be to stop sdw from snapshotting a cloned vol. C

GARDINEC_EBRD · ‎2013-05-30

Hi, We are using SM SAP 3.3 and SDW 6.4.2 on Win2008 Server (VM). We've successfully created a clone of a database and mounted it on another server - this works very well. However, we would like to be able to use SMSAP to take snapshots of the cloned database. It appears that Snapdrive (and so SMSAP) doesn't support taking snapshots of LUN's backed be a clone. We have used the volume clone method, so the cloned LUN's are within their own volume (effectively) so I don't understand why there is a limitation in Snapdrive? We can snapshot the cloned volume on the storage console, so I don't understand why snapdrive cannot do the same. Does anyone have a view on this, or know if there's a workaround, or if this is a limitation that might disappear in a future release? Looking at the docs for SDW 5.0, it appears to still be a limitation in that release also. Thanks for any help, Craig

GARDINEC_EBRD · ‎2013-05-17

Thanks for the response Thomas, I agree totally, and I'm not worried, just curious. Well...that and people keep asking why DFM is alerting for high CPU and I'm having trouble explaining why it's nothing to worry about!! It would be good to have an understanding of what is happening, either an error in the cpu_busy counter, or a domain that isn't included in the sysstat/statit output???

GARDINEC_EBRD · ‎2013-05-17

Hi, I still haven't found out why cpu_busy is so high. I have a system exhibiting this again right now. I know this controller is running some NDMP tape operations for TSM at the moment, so suspect this is the reason, however, I'd like to see which domain is at 100%. @Thomas: I've run sysstat -M 1 at diag level (thanks, didn't know this one) but still unclear which domain/core is at 100%. Apologies for paste below, but this is a FAS3240, not busy enough to warrant 100% cpu, but what is causing it? As before, latency is fine and seems to be unaffected: ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 100% 69% 27% 7% 52% 52% 52% 52% 54% 118% 0% 0% 12% 2% 1% 6% 14%( 13%) 0% 0% 0% 49% 5% 4% 2369 0% 100% 75% 35% 10% 57% 58% 57% 57% 57% 126% 0% 0% 11% 1% 0% 9% 20%( 19%) 0% 0% 0% 48% 7% 5% 3927 0% 100% 69% 27% 7% 52% 52% 52% 52% 53% 119% 0% 0% 11% 1% 0% 6% 13%( 12%) 0% 0% 0% 50% 5% 4% 2152 0% 100% 71% 31% 10% 55% 55% 54% 55% 56% 121% 0% 0% 12% 1% 0% 8% 18%( 16%) 0% 0% 0% 50% 5% 4% 2338 0% 100% 70% 28% 8% 53% 53% 52% 52% 54% 120% 0% 0% 11% 1% 0% 6% 14%( 13%) 0% 0% 0% 50% 5% 4% 2177 0% 100% 74% 33% 9% 56% 56% 56% 56% 57% 124% 0% 0% 11% 1% 0% 8% 18%( 17%) 0% 0% 0% 50% 6% 5% 3773 0% 100% 70% 29% 9% 54% 54% 53% 53% 55% 122% 0% 0% 11% 1% 0% 5% 15%( 14%) 0% 0% 0% 49% 6% 5% 3111 0% 100% 84% 58% 37% 71% 71% 71% 71% 72% 101% 0% 0% 17% 11% 0% 9% 64%( 43%) 23% 0% 0% 52% 6% 4% 2886 49% 100% 74% 36% 15% 58% 57% 57% 57% 59% 117% 0% 0% 13% 7% 0% 6% 19%( 16%) 0% 0% 0% 59% 5% 4% 2478 100% 100% 78% 44% 20% 62% 63% 62% 62% 63% 119% 0% 0% 14% 8% 0% 7% 31%( 24%) 0% 0% 0% 60% 6% 4% 3316 100% 100% 73% 35% 14% 57% 57% 56% 57% 58% 115% 0% 0% 14% 8% 0% 5% 18%( 15%) 0% 0% 0% 59% 5% 4% 2006 100% 100% 80% 44% 17% 64% 65% 64% 64% 64% 122% 0% 0% 14% 6% 0% 8% 28%( 24%) 0% 0% 0% 59% 7% 13% 3969 100% 100% 78% 41% 16% 61% 62% 61% 60% 62% 124% 0% 0% 15% 4% 0% 6% 27%( 24%) 0% 0% 0% 55% 8% 6% 3583 18% 100% 76% 38% 13% 59% 60% 58% 58% 60% 122% 0% 0% 15% 5% 0% 5% 24%( 21%) 0% 0% 0% 52% 7% 5% 3453 0% 100% 74% 35% 12% 57% 58% 57% 56% 58% 120% 0% 0% 15% 4% 0% 4% 20%( 18%) 0% 0% 0% 53% 6% 5% 2458 0% 100% 78% 41% 15% 62% 62% 62% 61% 62% 122% 0% 0% 17% 6% 0% 6% 25%( 22%) 0% 0% 0% 52% 7% 10% 3212 0% 100% 78% 42% 16% 61% 62% 61% 61% 61% 124% 0% 0% 17% 7% 0% 4% 28%( 24%) 0% 0% 0% 51% 8% 6% 3599 0% 100% 80% 44% 17% 63% 63% 62% 62% 63% 122% 0% 0% 18% 7% 2% 5% 29%( 25%) 0% 0% 0% 54% 8% 6% 3825 0% 100% 84% 53% 26% 68% 69% 68% 67% 68% 125% 0% 0% 20% 8% 0% 8% 41%( 32%) 0% 0% 0% 54% 9% 7% 6012 0% 100% 82% 49% 21% 66% 67% 65% 65% 66% 125% 0% 0% 20% 9% 0% 4% 33%( 29%) 0% 0% 0% 55% 9% 7% 4600 0% ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 100% 86% 58% 31% 72% 72% 71% 71% 72% 129% 0% 0% 20% 9% 1% 6% 51%( 38%) 0% 0% 0% 53% 10% 7% 5297 0% 100% 84% 51% 23% 67% 69% 66% 67% 67% 126% 0% 0% 21% 10% 0% 3% 36%( 30%) 0% 0% 0% 55% 10% 7% 4873 0% 100% 93% 77% 59% 84% 85% 84% 84% 84% 92% 0% 0% 24% 20% 0% 7% 104%( 57%) 22% 0% 0% 55% 7% 5% 4177 78% 100% 87% 59% 30% 73% 74% 72% 72% 72% 121% 0% 0% 23% 15% 0% 5% 41%( 33%) 0% 0% 0% 64% 10% 12% 5077 100% 100% 89% 64% 36% 76% 77% 75% 75% 75% 121% 0% 0% 23% 16% 0% 8% 52%( 38%) 0% 0% 0% 63% 10% 9% 5349 100% 100% 89% 63% 33% 74% 76% 74% 74% 74% 124% 0% 0% 24% 16% 0% 8% 43%( 35%) 0% 0% 0% 63% 11% 9% 5560 100% 100% 89% 63% 36% 75% 76% 75% 74% 74% 122% 0% 0% 23% 14% 0% 5% 56%( 40%) 0% 0% 0% 61% 10% 8% 6042 100% 100% 89% 62% 33% 74% 76% 74% 74% 74% 129% 0% 0% 25% 13% 0% 4% 48%( 39%) 0% 0% 0% 57% 12% 10% 6886 1% 100% 90% 66% 39% 77% 79% 77% 76% 76% 126% 0% 0% 25% 13% 0% 5% 61%( 44%) 0% 0% 0% 57% 12% 9% 7179 0% 100% 89% 63% 34% 74% 76% 74% 74% 74% 131% 0% 0% 25% 13% 0% 4% 49%( 40%) 0% 0% 0% 56% 11% 9% 7226 0% 100% 74% 38% 16% 59% 60% 59% 58% 60% 121% 0% 0% 17% 6% 0% 4% 25%( 21%) 0% 0% 0% 51% 7% 5% 3626 0% 100% 63% 20% 4% 48% 47% 47% 47% 49% 112% 0% 0% 11% 1% 0% 4% 7%( 6%) 0% 0% 0% 49% 4% 2% 829 0% 100% 67% 25% 6% 51% 51% 51% 51% 52% 117% 0% 0% 11% 1% 0% 5% 12%( 11%) 0% 0% 0% 49% 5% 3% 2277 0% 100% 69% 28% 8% 53% 53% 52% 52% 54% 119% 0% 0% 11% 1% 0% 6% 15%( 13%) 0% 0% 0% 50% 5% 3% 2478 0% 100% 65% 23% 5% 49% 49% 49% 49% 51% 115% 0% 0% 12% 1% 0% 4% 9%( 9%) 0% 0% 0% 50% 4% 2% 1247 0% 100% 66% 24% 7% 50% 50% 50% 49% 52% 116% 0% 0% 11% 1% 0% 5% 11%( 10%) 0% 0% 0% 50% 4% 3% 1214 0% 100% 66% 23% 6% 51% 50% 50% 50% 52% 116% 0% 0% 11% 1% 0% 4% 10%( 9%) 0% 0% 0% 50% 4% 5% 1488 0% 100% 80% 53% 35% 68% 68% 67% 68% 69% 94% 0% 0% 18% 13% 1% 8% 56%( 34%) 18% 0% 0% 57% 4% 2% 1357 97% 100% 74% 38% 18% 59% 59% 59% 58% 60% 119% 0% 0% 13% 7% 1% 7% 23%( 19%) 0% 0% 0% 57% 6% 3% 3751 100% 100% 67% 26% 9% 51% 51% 51% 50% 54% 112% 0% 0% 12% 6% 0% 4% 9%( 8%) 0% 0% 0% 56% 4% 2% 929 100% ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 100% 80% 44% 17% 63% 63% 62% 62% 63% 126% 0% 0% 12% 3% 1% 13% 27%( 24%) 0% 0% 0% 55% 7% 6% 3182 61% 100% 69% 27% 7% 52% 51% 52% 52% 53% 118% 0% 0% 12% 2% 0% 6% 12%( 11%) 0% 0% 0% 50% 5% 3% 2184 0% 100% 67% 24% 6% 51% 51% 50% 50% 52% 117% 0% 0% 11% 1% 0% 5% 11%( 10%) 0% 0% 0% 50% 5% 3% 1883 0% 100% 80% 43% 16% 62% 62% 62% 61% 62% 130% 0% 0% 12% 1% 0% 11% 30%( 25%) 0% 0% 0% 49% 7% 6% 5291 0% 100% 72% 30% 8% 54% 54% 54% 53% 55% 122% 0% 0% 11% 1% 0% 8% 16%( 15%) 0% 0% 0% 49% 6% 4% 3123 0% 100% 80% 40% 12% 60% 60% 60% 59% 61% 128% 0% 0% 11% 1% 0% 13% 28%( 24%) 0% 0% 0% 49% 6% 5% 3490 0% 100% 62% 18% 4% 47% 46% 46% 46% 49% 112% 0% 0% 11% 1% 0% 3% 6%( 5%) 0% 0% 0% 49% 4% 2% 881 0% 100% 74% 42% 26% 62% 61% 61% 62% 63% 105% 0% 0% 12% 6% 0% 6% 47%( 30%) 14% 0% 0% 49% 4% 3% 1525 16% 100% 78% 48% 29% 65% 64% 64% 65% 66% 101% 0% 0% 16% 11% 0% 6% 49%( 31%) 8% 0% 0% 62% 4% 2% 1682 100% 100% 73% 34% 12% 56% 56% 56% 56% 57% 114% 0% 0% 14% 8% 0% 5% 16%( 14%) 0% 0% 0% 59% 5% 3% 1881 100% 100% 70% 31% 12% 54% 54% 54% 54% 56% 111% 0% 0% 14% 8% 1% 4% 13%( 11%) 0% 0% 0% 59% 4% 2% 1135 100% 100% 69% 29% 10% 53% 53% 52% 53% 55% 112% 0% 0% 14% 6% 0% 3% 12%( 10%) 0% 0% 0% 59% 4% 2% 1119 100% 100% 66% 24% 7% 51% 50% 50% 50% 52% 114% 0% 0% 12% 2% 0% 4% 11%( 10%) 0% 0% 0% 52% 4% 3% 1320 19% 100% 64% 22% 5% 49% 49% 48% 49% 51% 113% 0% 0% 12% 1% 0% 5% 8%( 7%) 0% 0% 0% 50% 4% 2% 1157 0% 100% 65% 22% 5% 49% 49% 49% 49% 51% 115% 0% 0% 11% 1% 0% 4% 9%( 8%) 0% 0% 0% 50% 4% 3% 1594 0% 100% 65% 22% 5% 50% 50% 49% 49% 51% 115% 0% 0% 11% 1% 0% 5% 9%( 8%) 0% 0% 0% 49% 4% 5% 1530 0% 100% 62% 19% 4% 47% 47% 47% 47% 49% 113% 0% 0% 11% 1% 0% 3% 7%( 6%) 0% 0% 0% 48% 4% 2% 1022 0% 100% 63% 20% 5% 48% 48% 48% 48% 50% 114% 0% 0% 11% 1% 0% 3% 9%( 8%) 0% 0% 0% 48% 4% 2% 1073 0% 100% 62% 19% 4% 48% 47% 47% 47% 50% 114% 0% 0% 11% 1% 0% 3% 8%( 7%) 0% 0% 0% 49% 4% 2% 1199 0% 100% 61% 18% 4% 47% 46% 46% 46% 49% 113% 0% 0% 11% 1% 0% 3% 7%( 6%) 0% 0% 0% 48% 4% 2% 803 0% ANY1+ ANY2+ ANY3+ ANY4+ AVG CPU0 CPU1 CPU2 CPU3 Network Protocol Cluster Storage Raid Target Kahuna WAFL_Ex(Kahu) WAFL_XClean SM_Exempt Cifs Exempt Intr Host Ops/s CP 100% 67% 24% 6% 51% 50% 50% 50% 52% 116% 0% 0% 11% 1% 1% 5% 12%( 11%) 0% 0% 0% 49% 5% 3% 2117 0% 100% 65% 21% 4% 49% 48% 48% 48% 50% 115% 0% 0% 11% 1% 0% 4% 9%( 8%) 0% 0% 0% 49% 4% 3% 1638 0%

GARDINEC_EBRD · ‎2013-05-02

Having the exact same problem here. Did anyone ever find a solution? First login attempt via LDAP/AD fails every time. Second attempt is OK. Thanks, Craig

GARDINEC_EBRD · ‎2013-04-26

Hi Tony, Hope you are well. Can you check if the filer(s) are set to send SNMP traps to the DFM server? Check the output of the 'snmp' command on the filer(s). Traphost should include the hostname or ip of the DFM server and 'init' should be set to '1'. If not... snmp traphost add <hostname | IP address> snmp init 1 ...should do the trick. Regards, Craig

GARDINEC_EBRD · ‎2013-04-23

It's a limitation of a given host, ESX or otherwise. The 256 LUN limit applies to each ESX host, but since an RDM needs to be presented to all ESX hosts, the limit is effectively per cluster. It's also a concern here, but so far I've drawn a blank as far as finding a solution (other than creating a new cluster, that is!!). If anyone has any solutions, I'd love to hear them - I know we're not alone!! Craig

GARDINEC_EBRD · ‎2013-04-12

Yes, that's correct. Just as long as the block is locked by other snapshot(s) and/or the AFS, deleting the snapshot will not allow it to be freed. You may already know this, but you can also use the 'snap reclaimable' command to find out how much space would be reclaimed by deleting a specified snapshot or snapshots.

GARDINEC_EBRD · ‎2013-04-12

Comments below in BOLD: matt090385 wrote: Hi, Could someone help me get my head around snapshots and how in some instances deleting one snapshot doesn't save me space. My understanding of snapshot technology is also a little sketchy and this doesn't help me with my question above. Here a step by step guide of what I think I know. AFS contains blocks A, B, C T0 snapshot occurs - Size of snapshot is zero (although I'm assuming this is not strictly true as what I believe happens is the pointers to the active blocks are copied into the snapshot, so ill say zero even though it might be a couple of Kb - (Blocks A,B,C are locked in place) Block D is added, this goes into the active file system the snapshot doesn't change size at all as new data has no effect T1 snapshot occurs Size is zero - blocks locked by this snapshot are A,B,C,D or is it just Block D as A,B,C are locked by the previous snapshot? >>> a snapshot 'locks' every block in the AFS at the time regardless of whether they are already locked by any other snapshot. T0 Snapshot is still zero Block D is deleted T1 snapshot grows by the size of Block D T0 snapshot is still zero >>> Correct, but see below... Block E is created - This has no affect on any snapshots Block A is deleted T1 snapshot grows by the size of Block A (so now is A+D big) T0 snapshot stays at zero >>> Here's probably where you are getting confused. I find it's better not to think of individual snapshots having sizes, rather the size of all snapshots in a given volume. The reason is the .snapshot usage at this point will be the sum of blocks locked by snapshots that are not currently part of the active filesystem. So, at this point the snapshot's will consume A and D. Both T0 and T1 have locks on block A. Deleting one snapshot or the other will not release block A, only if you delete both snapshots. T2 Snapshot is created and is zero - BLOCKS being locked are B,C,E Block E is deleted T2 snapshot goes up by the value of E T1 is still A+D T0 is still Zero >>> think of it in terms of blocks, and what has a lock on what. So (and I think I've followed you correctly so far): A is locked by T0, T1 B is locked by T0, T1, T2, AFS C is locked by T0, T1, T2, AFS D is locked by T1 E is locked by T2 Is this all correct so far? Right, so if I were to delete T0 I would gain back nothing - Sure I get that. If I were to delete T1 which currently holds A+D, the space reclaimable would be zero as the 2 blocks would role over to T0 as they are still referenced in this snapshot.. correct? >>> Not quite; T1 only has an exclusive lock on block D, so only block D would be freed by deleting snapshot T1. Does anyone have a way of summing this up therefore making it easier to understand? >>> Hope that helps.

GARDINEC_EBRD · ‎2013-03-27

Whatever was going on with that filer has stopped now, but I still have a sysstat -x 1 in my putty history from earlier. seems to be the same output with -m -x as it is with just -x: CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk OTHER FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 96% 4409 0 0 4440 47525 20247 21071 78693 139256 139321 36s 94% 82% : 10% 0 31 0 468 401 0 0 99% 5479 0 0 5496 84828 28699 10989 24 146130 145999 36s 96% 0% - 6% 5 12 0 40 5 0 0 99% 5118 1 0 5123 51956 16436 6283 0 148077 148274 36s 98% 0% - 4% 0 4 0 99 4 0 0 85% 4122 0 0 4353 60060 15606 22340 51172 129171 129106 36s 98% 50% Ff 13% 0 231 0 4392 3971 0 0 95% 4258 0 0 4264 45687 27781 18496 61452 140313 140247 41s 97% 100% :f 8% 0 6 0 11 0 0 0 97% 4312 0 0 4314 74270 13122 17403 54282 142726 142726 41s 98% 100% :f 9% 0 2 0 9 0 0 0 97% 4623 0 0 4644 60221 17715 13232 75404 143458 143458 41s 98% 100% :f 10% 5 16 0 113 0 0 0 97% 3878 1 0 3887 65123 15201 12303 14945 142554 142489 41s 98% 35% Fn 6% 0 8 0 11 5 0 0 89% 3893 0 0 3939 58449 10234 19375 83097 131594 131659 41s 98% 100% :f 12% 0 46 0 763 658 0 0 95% 4375 0 0 4410 59059 29603 26335 86619 134277 134277 41s 92% 100% :f 13% 0 35 0 519 396 0 0 95% 3987 0 0 3995 54986 8975 9708 56304 141930 141930 45s 90% 100% :f 11% 0 8 0 27 0 0 0 98% 2624 0 0 2634 36462 12530 15296 20542 144231 144166 45s 88% 64% : 15% 5 5 0 29 0 0 0 97% 2662 0 0 2682 69821 4072 25248 93228 116828 116959 42s 90% 53% Ms 24% 0 20 0 329 211 0 0 90% 2707 0 0 2721 62560 7956 20814 57417 135585 135585 37s 89% 100% :f 29% 0 14 0 141 137 0 0 90% 2901 0 0 2910 25538 11683 33010 68322 136544 136480 38s 89% 100% :f 33% 0 9 0 41 0 0 0 92% 2880 0 0 2883 27997 11600 19170 46875 140609 140738 34s 90% 90% : 31% 0 3 0 2 0 0 0 94% 2538 1 0 2556 18032 9506 17258 24 145203 145074 35s 88% 0% - 30% 10 7 0 15 0 0 0

GARDINEC_EBRD · ‎2013-03-27

I don't believe sysstat -m gives domain stats (?). The statit output shows some domain info...

GARDINEC_EBRD · ‎2013-03-27

Hi All, I must be having a bad day or something, but can't get my head around this today. I've got a FAS3240 running at close 100% on the cpu_busy counter. Latency looks fine, so whatever it is it's not causing a performance issue, but it is generating CPU alerts on DFM. I have a number of NDMP tape to tape operations running which seem to be generating this as the vmware over NFS load is relatively light (1000-2000 NFS ops/sec). So, sysstat 1 looks like this: CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache in out read write read write age 92% 1165 0 0 15273 4320 31932 47209 145214 145214 51s 98% 2662 0 0 7119 1015 14508 55220 149094 149094 51s 99% 1251 0 0 13042 5001 3756 51936 151912 151978 51s 99% 650 0 0 5901 671 456 7308 155058 154993 51s 99% 1190 0 0 8936 757 296 8 154927 154927 51s 99% 2134 0 0 5885 899 256 0 154206 154206 51s It doesn't appear to be a CPU core that's at 98% (sysstat -m)... ANY AVG CPU0 CPU1 CPU2 CPU3 100% 58% 58% 57% 57% 58% 100% 54% 54% 54% 53% 55% 100% 57% 57% 56% 56% 58% 100% 55% 56% 55% 54% 56% 100% 77% 78% 77% 77% 77% ...so I assume it's a cpu domain that's maxed out. Looking at statit output, I can't see any particular domain that's at high utilization: NetApp Release 8.1.2 7-Mode: Tue Oct 30 19:56:51 PDT 2012 <1O> Start time: Wed Mar 27 09:32:55 GMT 2013 CPU Statistics 14.141772 time (seconds) 100 % 28.341048 system time 200 % 0.611241 rupt time 4 % (211010 rupts x 3 usec/rupt) 27.729807 non-rupt system time 196 % 28.226036 idle time 200 % 2.514662 time in CP 18 % 100 % 0.104331 rupt time in CP 4 % (37134 rupts x 3 usec/rupt) Multiprocessor Statistics (per second) cpu0 cpu1 cpu2 cpu3 total sk switches 161727.33 163272.40 163634.80 165143.94 653778.47 hard switches 78693.46 79794.03 80383.21 83490.46 322361.16 domain switches 45627.52 45924.02 46501.74 49155.37 187208.65 CP rupts 730.25 355.47 406.10 1134.02 2625.84 nonCP rupts 3352.41 1643.64 1731.04 5568.11 12295.21 IPI rupts 0.00 0.00 0.00 0.00 0.00 grab kahuna 0.00 0.00 0.00 0.00 0.00 grab kahuna usec 0.00 0.00 0.00 0.00 0.00 CP rupt usec 4324.63 127.00 509.55 2416.25 7377.51 nonCP rupt usec 20245.62 579.42 2214.01 12805.68 35844.80 idle 502422.54 502765.85 503554.86 487190.15 1995933.54 kahuna 12694.87 12387.56 12685.04 13127.49 50895.11 storage 28276.80 29771.09 30174.72 26242.11 114464.79 exempt 120301.76 122678.12 121024.58 124690.03 488694.63 raid 4533.66 4430.14 4064.55 4970.66 17999.16 target 626.23 744.81 609.47 849.12 2829.77 dnscache 0.00 0.00 0.00 0.00 0.00 cifs 26.87 46.10 55.79 65.20 194.04 wafl_exempt 33716.85 30728.04 29905.73 28599.32 122950.08 wafl_xcleaner 2258.34 2442.83 2412.92 1487.15 8601.33 sm_exempt 13.15 19.23 17.47 20.79 70.78 cluster 0.00 0.00 0.00 0.00 0.00 protocol 34.93 33.16 50.63 36.63 155.43 nwk_exclusive 629.98 857.46 483.53 895.08 2866.26 nwk_exempt 34561.79 51149.32 49054.67 51691.61 186457.47 nwk_legacy 222720.96 227699.82 229450.67 244670.12 924541.78 hostOS 12610.30 13539.39 13731.16 241.98 40122.98 13.862242 seconds with one or more CPUs active ( 98%) 9.259903 seconds with 2 or more CPUs active ( 65%) 3.400337 seconds with 3 or more CPUs active ( 24%) 4.602338 seconds with one CPU active ( 33%) 5.859565 seconds with 2 CPUs active ( 41%) 2.451869 seconds with 3 CPUs active ( 17%) 0.948468 seconds with all CPUs active ( 7%) Domain Utilization of Shared Domains (per second) 0.00 idle 106174.04 kahuna 0.00 storage 0.00 exempt 0.00 raid 0.00 target 0.00 dnscache 0.00 cifs 0.00 wafl_exempt 0.00 wafl_xcleaner 0.00 sm_exempt 0.00 cluster 0.00 protocol 956506.30 nwk_exclusive 0.00 nwk_exempt 0.00 nwk_legacy 0.00 hostOS Can anyone see what I'm missing???? Thanks, Craig

GARDINEC_EBRD · ‎2013-03-26

Ah, OK, I see:- so if either of those options are selected it changes the criteria from 'do this if not found" to "do this if found". Subtle difference, but it works as you say. Thank you very much for the help, Craig

GARDINEC_EBRD · ‎2013-03-26

Thanks Sivaprasad K, I appreciate your help with this!! Here's how I had it set up originally (by specifying the dataset name implicitly): Here's the new setup (searching for dataset by name):

GARDINEC_EBRD · ‎2013-03-26

Hi Sivaprasad K, Thanks for the reply. That works, but it introduces another problem for me:- I'm also using the Advanced tab in the command to determine whether to run the command based on the value of the user variable $ProtectionLevel. So, if $ProtectionLevel is Gold, Silver or Bronze it will execute the command, otherwise it will not. When I set the Dataset name option to "by searching for an existing Dataset" as you suggest, it changes the 'Advanced' tab for the command - it now only executes the command if the Dataset was not found AND $ProtectionLevel is Gold, Silver or Bronze. What I want to do is run the command if the $ProtectionLevel variable is correct and if the Dataset was found. Thanks in advance, Craig

GARDINEC_EBRD · ‎2013-03-25

Hi All, Scratching my head here - wonder of someone can help? I have 4 datasets which do the same thing, but at staggered schedules. This is because I can't update all our Oracle mirrors using the same schedule as it overloads the source filers. So I have datasets called 'ORA 1', 'ORA 2', 'ORA 3' & 'ORA 4'. When we provision storage for a new database, it gets added to the dataset with the least resources (thus keeping them evenly balanaced, more or less). So I want to do the same thing within a workflow using the 'Add volume to dataset' command. I've created a filter which finds datasets by name prefix, and returns the dataset names in order of the number of resources, using the following SQL: SELECT dataset.name, dataset.qtree_id, dataset.id, dataset.dfm_name, dataset.volume_id, dataset.uuid, count(*) FROM storage.dataset WHERE dataset.name like '${name}%' GROUP BY dataset.name ORDER BY count(*) asc So, if $name is 'ORA %' then it returns my 4 datasets in the correct order. The question is, how can I use this filter/finder in the 'Add Volume to Dataset' command? In the Dataset tab, I only get the option of using an 'Incremental Naming Wizard' for the dataset name field, but not sure this is what I'm looking for. Any thoughts? WFA 2.0, Windows. Many thanks, Craig

GARDINEC_EBRD · ‎2013-03-22

We're having similar concerns with the number of LUN's per ESX cluster. To answer the question about the number of LUN's, one case in point is with SAP (Windows). In order to meet best practice (SAP and NetApp's for SM SAP) we need 7 RDM LUN's per SAP instance (sapdata1-4, origlogA-B, oraarch). We've moved as much as we can onto vmdk's, but 7 seems to be the least we can have in our environment. As a result we are limited to around 37 SAP instances per cluster, which is not many when you account for dev/test and clones used for verifications, etc. Craig

GARDINEC_EBRD · ‎2013-03-22

Hi, I’ve configured a Protection Manager dataset with a Provisioning policy to create secondary (SnapMirror) storage when a primary storage volume is added to the Dataset. The dataset protection policy is just a single Mirror. The provisioning policy defines a resource group containing 4 aggregates at DR, and has the default options (ie just requires RAID-DP) . The process is: Add volume to dataset. Dataset defines a Snapmirror relationship using a ‘Secondary’ type provisioning policy for the Mirror, so it will create a secondary volume using a resource pool called ‘DR – SATA’. Resource pool 'DR - SATA' contains 4 aggregates, all using 1TB SATA, all the same size. The utilization on these aggrs is as follows: drfiler1:aggr00_sata = 69% drfiler1:aggr01_sata = 74% drfiler2:aggr02_sata = 33% drfiler2:aggr03_sata = 40% The question is about how the Provisioning Policy selects the aggregate to provision the SnapMirror destination volumes. I’ve tested this but strangely, it is selecting aggr00_sata for the mirror destination volumes. Based on usage, I would expect it to choose the one with the most free space (drlfiler2:aggr02_sata). Generally, the disk I/O and filer cpu is significantly lighter on drfiler2, so I don't think it can be selecting drfiler01 based on performance. Does anyone know if there are any logs, etc which can be used to determine what the decision making process was? Thanks, Craig

GARDINEC_EBRD · ‎2013-03-12

Hey Jeremy, OK, now things are making more sense. The db_vol.array.ip did actually work, but this is because I'd previously tried setting the db_vol.array field to 'db_vol.aggregate.array.ip' - after unchecking 'Show only attributes used by Create Volume'. If I remove this form db_vol.array, you are correct it doesn't work. With it removed, I changed the filter to use db_vol.aggregate.array.ip as you suggest and it works again. So I guess you could use either, but I think I'll go with your suggestion. Thanks, Craig

GARDINEC_EBRD · ‎2013-03-12

Wow, that was easy!! Yep, it works. Must have tried everything else, but the the ip. Thanks!!

GARDINEC_EBRD · ‎2013-03-12

Hi All, What I'm trying to do is a little hard to explain, but I'll give it a go... I'm using WFA 2.0 / DFM 5.0.2. I'm creating an Oracle provisioning workflow based on our specific requirements. Mostly there, but got stuck on one part. In an earlier step (db_vol), I create a volume allowing WFA to select the aggregate by available space from a resource pool. This works just fine, but later I want to create a qtree for redo logs. Say we have 2 filers, one has a volume called ORA_REDO_01, the other a vol called ORA_REDO_02, for example. These exist prior to running the workflow. I want to create the qtree in either ORA_REDO_01 or ORA_REDO_02 depending which filer the 'db_vol' was created on. Here's how the workflow looks so far: I'm trying to use a filter to identify the volume for the 'redo_qtree' command: I'm trying to identify the volume for this qtree (should be ORA_REDO_01 or ORA_REDO_02 depending on the filer) I use the filter 'volume in array by name pattern', then specify 'db_vol.array' for the 'Array IP or Name' field, and 'ORA_REDO' as the pattern to search for. This fails with the error: "Failed to evaluate resource selector. Found variable - expected literal At command 'Create QTree', tab 'Qtree', variable 'redo_qtree', property 'volume'" So it's not expecting a variable here. If I replace the db_vol.array variable with a string value (one of the filer names in quotes) to test, and it works. I've also tried the db_vol.array variable in quotes in the filter and run a preview I get the error: "Workflow aborted. No results were found. The following filters have returned empty results: volume in array by name pattern At command 'Create QTree', tab 'Qtree', variable 'redo_qtree', property 'volume'" Can anyone give me any pointers to resolve this? Thanks in advance, Craig

GARDINEC_EBRD · ‎2013-03-08

Yes, you can mirror A to B and C at the same time, no problem.

GARDINEC_EBRD · ‎2013-03-07

Hi, "-Initiate a backup of datastore A in site A with our templates on the primary site using VSC, which will cause snapmirror replication to take place from site A to site B." >> Correct. "-Site A must refrain from writes to datastore A until snapmirror replication completes, but Site A can continue reads" >> No, Site A can continue reads and writes, but it would make sense to have the templates in a static state (ie not being changed) when the SnapMirror update is initiated (ie the VSC job is run). The filer will create a snapshot of datastore A which will be used for the mirror update to site B. You can still write to Datastore A, but only data that was present when the snapshot was created will be replicated. "-Site B can continue reads from datastore B during the replication. (It can never write to datastore B)" >> Technically, Site B can continue reads from Datastore B, but it is possible data will change as the mirror update completes. It would be good practice not to deploy from any templates at site B until the mirror update is completed. The mirror destination volume cannot be written to from Site B unless the mirror is broken. "-After the replication, Site A can resume writes." >> As above. I'm assuming you have a dedicated datastore for Templates? If not, I would recommend this. Hope that helps, Craig

GARDINEC_EBRD · ‎2013-02-27

Thanks Bill and Adai, Yesterday I enabled a tree quota on that qtree, and sure enough it generates alerts/alarms in DFM. All good. This makes sense. I'm now trying to figure out why the qtree was showing full from the client, however, and this I'm struggling with. So, to recap, the volume has 197GB free, and only one qtree. The 'effective' used, accounting for dedupe savings meant the qtree was effectively full. That made sense in a way, until I decided to look as some other volumes which also have /etc/quota entries that apply no limits, (just there so I get capcity stats in DFM). Here's an example that appears to break the previous theory: df -h xxxxxx_fsdep2 Filesystem total used avail capacity Mounted on /vol/xxxxxx_fsdep2/ 3250GB 2931GB 318GB 90% /vol/xxxxxx_fsdep2/ /vol/xxxxxx_fsdep2/.snapshot 812GB 74GB 737GB 9% /vol/xxxxxx_fsdep2/.snapshot df -hs xxxxxx_fsdep2 Filesystem used saved %saved /vol/xxxxxx_fsdep2/ 2931GB 1112GB 28% The effective used is 2931GB + 1112GB = 4043GB which is more than the total size of the volume, right? This volume has 24 qtrees, which is the only significant difference I can see. /etc/quotas entry looks like the following, and I've confirmed quotas are on for this vol: * tree@/vol/xxxxxx_fsdep2 - - - - - From a Windows CIFS client, I can mount one of the qtrees in this volume and Windows reports 318GB free of 3.17TB, which matches the df -h output, not the effective used accounting for dedupe. Am I missing something? Thanks, Craig

GARDINEC_EBRD · ‎2013-02-26

Hi, We’ve come across a strange situation today, I was wondering if you could advise how best to deal with this. We have a qtree which is very nearly full. It is the only qtree in the containing volume and there is no other data outside the qtree in that volume. The containing volume has 197GB free: Filesystem total used avail capacity Mounted on /vol/xxxxxx_fsdata1/ 820GB 622GB 197GB 76% /vol/xxxxxx_fsdata1/ /vol/xxxxxx_fsdata1/.snapshot 502GB 355GB 147GB 71% /vol/xxxxxx_fsdata1/.snapshot There is only one qtree in that volume (called Profiles). According to quota report there is 806GB used in this qtree: vfilerxx@filerxx> quota report K-Bytes Files Type ID Volume Tree Used Limit Used Limit Quota Specifier ----- -------- -------- -------- --------- --------- ------- ------- --------------- tree * xxxxxx_fsdata1 - 0 - 0 - * tree 1 xxxxxx_fsdata1 Profiles 845595168 - 12409822 - /vol/xxxxxx_fsdata1/Profiles Note that there is no usage limits set on this qtree – it should be using free space in the volume. The /etc/quotas entry looks like this: * tree@/vol/xxxxxx_fsdata1 - - - - - The Profiles qtree is shared via CIFS. When connecting via a Windows client, it shows nearly full (eg 818GB used of 820GB): I believe this difference is due to the dedupe savings. If I look at the volume usage including dedupe savings, it looks like this: vfilerxx@filerxx> df -hs xxxxxx_fsdata1 Filesystem used saved %saved /vol/xxxxxx_fsdata1/ 622GB 208GB 25% …so I’m assuming the qtree used is accounting for effective used, rather than the actual used values. The main area of concern here is that Operations Manager (version 5.0.1) has not generated any alerts for this condition. Looking at the Event History for the qtree I see no alerts at all. The quota nearly full and quota full alert global thresholds are set to defaults (ie 80% and 90%) and the quota object in Operations Manager does not have any custom alerts set. If I look at the quota summary in Operations Manager, it shows the Capacity Used as 99%, so why no alert? Any help to understand this would be appreciated. Craig

Re: SMSAP (Windows) Take a snapshot of a Clone

SMSAP (Windows) Take a snapshot of a Clone

Re: ONTAP 8.1.2 7-Mode cpu_busy counter

Re: ONTAP 8.1.2 7-Mode cpu_busy counter

Re: First LDAP connection failed then works...

Re: Is NetApp DFM capable of alerting on “interface down” or “SAS channel down"?

Re: What if more than 256 LUN IDs to be assigned to ESX clustter

Re: Snapshots Explained

Re: Snapshots Explained

Re: ONTAP 8.1.2 7-Mode cpu_busy counter

Re: ONTAP 8.1.2 7-Mode cpu_busy counter

ONTAP 8.1.2 7-Mode cpu_busy counter

Re: Dynamically select a Dataset (Add volume to Dataset)

Re: Dynamically select a Dataset (Add volume to Dataset)

Re: Dynamically select a Dataset (Add volume to Dataset)

Dynamically select a Dataset (Add volume to Dataset)

Re: How to handle LUN limit on vsphere with NetApp

Provisioning Manager - Secondary Volume aggregate selection criteria

Re: Find a volume on the same filer selected in a previous step, based on volume name

Re: Find a volume on the same filer selected in a previous step, based on volume name

Find a volume on the same filer selected in a previous step, based on volume name

Re: Replicating a VMware datastore on NFS

Re: Replicating a VMware datastore on NFS

Re: OnCommand 5.0.1 - no alert for qtree (nearly) full

OnCommand 5.0.1 - no alert for qtree (nearly) full