Ah, that explains a lot. The IOPS limits on the underlying disks for CVO are based on bandwidth, not actual IO operations. It's fairly easy to size CVO for the day-to-day needs of an Oracle database, but when it's time for backups, you can easily overwhelm those disks. It's not a CVO limitation, the same thing happens with databases directly on Azure or AWS drives. 5000 IOPS with an 8K block size might be plenty for normal database operations, but an RMAN operation can consume all 5K IOPS and starve the database. The best option is really to stop moving backup data around and rely on snapshots for in-place backup and restore instead. If that's not an option, you might be stuck increasing the IOPS capabilities of the backend disks to handle the streaming backup workloads.
... View more
Most DBA's just use ASM's native ability. They make the new LUNs of equivalent size, add them to the current diskgroup, and then start dropping the old LUNs. Once the rebalance is complete, disconnect the old LUNs. If any of those LUNs manage grid services, you may need to run an additional command or two.
... View more
I think this needs some clarification. SnapCenter is primarily designed as an online backup/recover/cloning tool. It's not designed to address disaster recovery where the primary site has been lost entirely. There are some cases where the snapshots that were created by SnapCenter can be used as part of a DR procedure, but it's unlikely SC itself would be used for site recovery. There could be some situations where the environment was fully virtualized, but you'd want to be very careful about the design and test everything comprehensively. Most customers that need DR build a solution independently of SC. Some snapshots and snapmirror relationships are part of the DR architecture, others are part of the day-to-day backup/recovery architecture. The best way to start is to nail down exactly what the requirements are in terms of RPO, RTO, amount of data under management, and the total number of databases that need protection.
... View more
I've never see a good reason to use a 1:1 mapping of volumes to LUNs. I know a lot of customers like that idea, but it's not something I've ever done. I bought my first NetApp in 1995, so I've been doing this for a while. My usual practice is to group related LUNs in a single volume. I'll put ASM diskgroup #1 in a single volume, and ASM diskgroup #2 in another volume. I can now protect/restore/clone/QoS/etc all of the LUNs in the volume with a single operation. It does still help to have multiple LUNs. It's mostly related to the SCSI protocol itself. A LUN still does map to all the drives in the aggregate, but the SCSI protocol won't let you put an unlimited amount of IO on a single LUN device. 8 LUNs will get you close to the maximum possible, although 16 does offer some measurable improvements. (side note - yes, there are situations where you want to bring multiple volumes into play for an Oracle database, but unless you need 250K+ IOPS it's not usually necessary.) The same principles apply outside ASM. We had a customer recently with ext4 filesystems and we spent a lot of time experimenting with LVM striping. Same situation - stripe across 8 LUNs. In addition, tune the striping. If you have 8 LUNs, use a stripe width of 128K. That way, when Oracle does those big 1MB IO's during full table scans or RMAN backups, it can read 1MB at a time, hitting all 8 LUNs at the same time, in parallel. The performance boost was huge over a single LUN. In theory, NVMe namespaces would eliminate a lot of this complexity because it removes SCSI from the equation. In practice, nobody is likely to offer a storage array where a single NVMe namespace can consume all the potential performance on the whole array. We'll probably need to keep making ASM diskgroups out of perhaps 4 to 8 NVMe namespaces. I'm not aware of any formal testing of this yet.
... View more
ASM is definitely still useful. Most of the reason is related to the OS. Even if a single LUN could support 1M IOPS, the OS won't be able to move that much data. We find the sweet spot is usually around 8 to 16 LUNs total. AIX tends to need even more because of limits on the number of in-flight SCSI operations. A second benefit is growing the ASM diskgroup. You can resize ASM LUNs, but most customers prefer to grow in increments of an individual LUN size. The more LUNs you have, the more granular the growth. Pushing beyond 20 or so LUNs is usually a waste of time and effort. There are exceptions, of course, but unless you need 250K IOPS it's probably not helpful. You can also use LVM striping and get similar benefits. Most customers seem to prefer ASM, but we've seen a definitely increase in xfs on striped LVM. I will be updated TR-3633 with additional details on this in the few weeks.
... View more
That's ridiculous. If they're archiving to disk, you just need capacity. The vendor shouldn't be relevant. I was a Networker customer at one time, and I never heard of a limitation like this. It's possible the EMC rep was deliberately misunderstanding the question and assuming you want to use some kind of unusual capability that requires a specific vendor. There are features like Boost and such that allow some advanced device-to-device communcation that speeds things up. Maybe that's what they're getting at. Are we talking about an Networker AFTD volume? If so, ask EMC where it's stated that only specific vendors are supported. Their own documentation looks silent on that, and they even have examples with generic NFS shares.
... View more
Looks like we have a problem with the web site right now. If you need a direct link,log into your mysupport account and go here: http://mysupport.netapp.com/NOW/download/tools/ntap_storage_plugin/ We're investigating the state of posts.
... View more
As communicated, NetApp has announced the end of availability for 3 Oracle integration products - the OVM plugin, the 12c cloning plugin, and the rman plugin. This should not be seen as a reduction in commitment to Oracle. The primary reason for discontinuing development of these products was the limited benefit to customers due to the support model used. They were only supported under the "community" support model, which was not sufficient for the majority of enterprise customers. As a result, the adoption rate of these products was low and without a formal support process there was uncertainty whether we could provide assistance to the level expected. Products such as SnapManager for Oracle, SnapCreator, and SnapCenter's Oracle capabilities continue to be developed. We continue to work with Oracle engineering on new initiatives and we continue to explore ways to deliver the functionality delivered by the EOA products in a way that can offer complete support and with committed engineering resources. Product management recognizes that some customers depend on the now end-of-availability products, and we have created a PVR program for customers requiring continuing support. For more information, please contact product management with specifics on the customer and their requirements. Gautam Jain Senior Product Manager Jeffrey Steiner Principal Architect
... View more
I can't think of any reason why SC agents would have problems with su commands. Almost all of the plugins do an "su" regularly. There must be something blocking it from exiting. Can you pipe stdout to a temporary file within the su scripts? Maybe that will indicate what stalled the exit.
... View more
The timeouts with SC suggest that one of the scripts themselves is stalling on mounting an ASM diskgroup, running a recovery, or something like that. Depending on what happened, the SC debug logs might contain the SQL output that explains the failure, or you might have to modify some of your shell and SQL scripts to log to a local file. The only logs we have are this: Failed to execute command: /opt/NetApp/OSAP/sc_oradb_recovery.sh recover Reason: null] Which I believe means SC got no response from the script. There's probably an error message in the output of that script, but it was still sitting in the buffer when SC timed the operation out. You'll need to get logs from the various "su - oracle" operations, plus add some lines like "echo recovery.controlfile >> /tmp/logfile" to sc_oradb_recovery.sh so you can figure out precisely where it's stalling.
... View more
That makes sense. The Solaris_EFI causes that hidden offset to be present, and it misaligns everything. The regular "Solaris" LUN doesn't have the hidden offset.
... View more
A few of us talked about this internally, and we don't see a good option. Once you start the job, it's running. A nonzero return code from a script will fail the job. You could customize the plugin to get around this, btu that would require some coding. One option would be to write a wrapper that checks whether a database is running as a standby, and if it isn't then it launches the SC job via the cli.
... View more
No worries with ZFS on ONTAP. Check out TR-3633 for details. I'm the author. That's a database-specific TR, but the ZFS information applies to anything with ZFS on ONTAP. ZFS was always stable, but there were some performance glitches up until about 3 years ago until NetApp and Oracle engineering got together and figured out what to do with 4k block management. As long as you follow the procedures in 3633 it should perform fine.
... View more
One question - when you say T-Series boxes, are you using the LDOM functionality or are they running in full system mode? That's important, as if you're using LDOM's you can't really use things like SDU/SMO. The LUNs are re-virtualized by the IO domain, so all the child domains just see Sun LUNs. The source is hidden. Assuming you're not using LDOM's, then I think the question depends on how dynamic the environment is. If you just want basic backup/restore ability and it doesn't happen that often then ZFS with SnapCreator would be fine. If you do have a more dynamic environment then ufs would make things easier as SMO/SDU would work. The problem with ZFS and cloning is the need to relabel the LUNs when cloning to the same system. You can do this with other LVM's because you can first clone the LUNs and then relabel the metadata before bringing it online. I don't know a good way to do that with ZFS. You can clone to a different server easy enough, but cloning to the original server is hard.
... View more
Good questions. I don't want to get off-topic, but are you sure you'll be using PDB's at this point? I work in engineering on a variety of architecture projects and I still don't know what to think of PDB's. The main issue is there's still just one set of redo logs, controlfiles, and archive logs. That complicates managing them as truly independent databases. I know one cusotmer who host multi-tenant environments where they're using PDB's for isolation. They have something like 10 major end-customers running a very complicated and CPU-intensive product. Each end customer has their own app environment pointing at a PDB. They don't really do anything special beyond that. They just wanted the isolation. Then there's the license costs for the multitenant option to consider. They're not horrible, but they aren't zero either. I'm still looking for a good use case on how that's proven to save money, not just something that looks good on paper. The good news is that the PDB architecture has almost no impact on how snapshots are used. I tested the SnapCreator plugins with PDB's and I didn't have to make any changes at all. That makes sense, as the actual commands like "alter database begin backup" are unchanged. I wouldn't be surprised if Oracle eventually delivered a PDB tech with full data separation, meaning a PDB had dedicated logs, but for the moment a PDB is really more like a set of datafiles with better security. It's not quite a "database". My best bet on how PDB's would be deployed would be separate flexvols for each PDB so they can be backed up and restored on an individual basis. The app layer, I think, will still end up being on a different server, probably running under a VM. I don't see a reason not to just treat backups the same as always. Put the whole DB in hot backup mode and take a snapshot. If you want to restore an individual PDB, you just restore those datafiles. I don't see a reason to backup a PDB on an individual basis, though. Why not just back up the whole thing with a single command and assume you've now captured the hot backup snapshots for all of them? What would the value be in doing them individually? The efficiencies of a snapshot make that of minimal value. Got further thoughts? This is all new territory right now because there isn't significant adoption of the PDB option. I'm looking for someone to provide a different point of view.
... View more
I don't see a good solution here. If this was an NFS environment, you could have a post-job script that registered the rman backup, but with Windows you would need to involve SDW. It would get very complicated.
... View more
I've been talking to the OpenStack team about this for a few months now, and it seems to me like OpenStack isn't quite ready for this, but it's going to need to be resolved somehow. It wouldn't be too diffcult to add a tab to OpenStack for backups and then figure out a way to invoke a script to do a backup, but that would mean a customized approach just for one customer. What really needs to happen is OpenStack needs to have a backup API where you can tell a Cinder, Nova, or Manila storage resource to "backup yourself up" and then it calls an outside progam. That might be SnapCreator, SnapManager for Oracle, or even something like CommVault. I'm sure it will happen eventually, but I'd guess it's a year out before standards are really established. In the interim, the best plan is probably to configure something like SnapCreator to work alongside OpenStack. The wouldn't integrate, but they wouldn't interfere with one another either.
... View more
It's still being discussed, but overall there are no plans to add this directly to SnapCreator in its current form. I was looking back at the thread, and it looks like you were thinking backups in the context of the boot LUNs. Is that correct, or are you thinking of backups of other data types in an OpenStack cloud?
... View more
Could you explain your configuration further? I'm not sure what you mean by agentless. Without an agent, I don't know how it would be possible to run any scripts at all. Do you mean you're not using a plugin and just using config file scripts?
... View more
ASM devices are opened with O_DIRECT, so there is no risk of having anything buffered when the snapshots are created. One other thing to mention - if you have any ASM diskgroup spanning volumes, you need to ensure the following options are in the config file: NTAP_CONSISTENCY_GROUP_TIMEOUT=MEDIUM NTAP_CONSISTENCY_GROUP_SNAPSHOT=Y If you do NOT span volumes with a diskgroup, then set it to just: NTAP_CONSISTENCY_GROUP_SNAPSHOT=N
... View more
1) I prefer the PVM approach. That always yields better performance. The only time I would use HVM is if there was no way to get PVM drivers installed on the system. 2) Your LUN/volume layout looks good. Unless you expect a huge amount of IO, there is no reason to separate LUNs among volumes. I usually put all 8 of the datafile LUNs in one volume and the 4 archive/control/redo LUNs in a second volume. That means 2 diskgroups in all. Also, make sure the spfile for ASM is not in the datafile diskgroup. 3) Use External redundancy. 4) Whether you use ASMlib or udev rules is up to you. If you're using OL6, you might as well go ahead with ASMlib. It's included and it's overall easier to use. The Plugin is mostly useful for cloning hosts, not management of databases. 5) dm-multipath exists on OVM, and I'd recommend you use that. That's the only way you'll have resilience. 6) For backups, you're correct. The datafile diskgroup is in VOLUMES and the control/arch/redo diskgroup is in META_DATA_VOLUME. 7) Your restore procedure is correct. Odds are a recovery would only require the datafile volumes to be restored, but if someone destroys the archive/control/redo diskgroup you can recover that too. 😎 Cloning is a pain. You'll have to manually make the clones of the volumes, discover them on the OVM server, and then map them to a different guest. You can't bring them back to the original host because there would be duplicate diskgroup names.
... View more
The RC1 is out. Is there a link for the 8.3 API documentation yet? Also, please ensure there is link for the download that is easily found.
... View more