2012-12-19 07:47 AM
Experiencing nothing but problems since we started trying to use OnCommand host for our VMware volumes.
Currently, we are having two severe issues: DFM seems to lose connectivity to the host agent, all backups fail from that point on, and restarting the host service hangs, I'm having to reboot the server to restore functionality.
Secondly, the local backup schedule isn't being followed consistently: I have it set to keep hourlies for a day, daylies for a week, weeklies for a month, and monthlies for 3 months, however, right now, I have: the last hourly, an hourly from 9 days ago, a daily from 2 weeks ago. (past this are snapshots from the previous backup setup.) However, it isn't consistent across all the datasets, as one of the other datasets has a few more dailies kept, even though all of them are on the same local policy.
I have a case open with support about this, but I've been getting very little traction (no contact in the last 2 days, despite attempts on my part to contact them) and am looking for any help I can get.
Not sure what info will be needed to troubleshoot, but here's some salient details: OC core 5.1, OC host 1.2, FAS3210s, both software packages are installed on Win 2k8 R2-64, all living on the same subnet. OC Host is installed on the VMware vCenter server, VMware is all on 4.1.
Thanks for any help you mabe be able to provide.
2012-12-19 08:06 AM
your problem description reads like a server resource problem.
You stated that OCHP is installed on the WIn2k8 vCenter server.
1) Do you have MS hotfix 2577795 installed on that server?
2) Are the resources reserved for the vCetner/OCHP VM (assuming it is one)?
3) What CPU/RAM resources does the server have reserved?
2012-12-19 08:50 AM
Thanks for your reply.
Downloading the hotfix now.
as for your other questions: We don't really have resources reserved for any of our VMs, since we generally run at 25 to 50 percent of our VMware cluster capacity. To be clear, OC core is installed on a VM as well. Utilization on that server tends to run very low, as well, but taking into consideration our cluster utilization, would you recommend I still reserve resources?
2012-12-19 09:03 AM
Always reserve the recommended resources for OCUM and OCHP servers to prevent resource issues.
That hotfix should be applicable for any Win2k8 server, so perhaps you already had it installed?
C:\>systeminfo |findstr 2577795
2012-12-19 09:18 AM
I get no result on the systeminfo command, nor did i find the hotfix listed in "control panel>programs & features>view installed updates" (based on a search for the number.)
Is there a whitepaper on the recommend resources for these VMs? If it helps, we're only running 2 filers with 3 total controllers.
2012-12-19 10:08 AM
Apologies, the hotfix is applicable to any Windows Server 2008 R2 server.
There is no published documentation that I am aware of for the reserved resources, although it has been requested to be added Installation/Setup guides.
It has been observed that reserving server resources has resolved "out of memory" and missed scheduled job conditions in the past, particularly on the OCHP server which kicks off the backup jobs and then registers them to UM once completed (similar to Snapmanager product integration scenarios).
2012-12-19 11:25 AM
I reserved the entire memory allocated to each server, and 10000 Mhz CPU.
The vCenter/OCHP server is Server 2008 R2 standard.
This is the error I'm encountering in the failed jobs currently happening: (somewhat of a new symptom, in that a reboot hasn't cleared it up.) OnCommandHSVMware: hsBackup8 1ddbaefad80a96414abc3b00bf865b18: Failed to connect to vCenter Server <servername>.
2012-12-20 02:04 PM
Just to update, I fixed the problem that was causing backups to outright fail, so I'll monitor it to see if the resource reservations make a difference.
I did, however, notice that the one dataset that mirroring isn't currently functional on (wouldn't initiate the mirror due to space constraints) seems to have had no problems with retaining backups to schedule, is it possible that the problem is a conflict between the local backup policy and the storage service being used for mirroring?
edit: I somewhat lied above: upon looking into the dataset in more detail, it decided to create the mirror some time since the first try failed.
So to summarize, current situation is that one dataset is working right, 5 others aren't.