ONTAP Discussions

VSC / SMVI backup - last run status

gdefevere
13,280 Views

I have one of 4 VSC backup jobs that is still running. Snapshot hasn't been taken (job running for 14 hours, so actually hanging). Can't do anything when you right click on the job.

Restarting services will probably fix the problem, but I want to know what caused the issue. As you can see in attachment, it's the first job that went hanging. Jobs that came later on, runned fine. How can we interact in this job only (without restarting services) ?

6 REPLIES 6

paulstringfellow
13,280 Views

I have seen this before, there are two things in my experience.

One is trying to do vmware consistent snapshots as part of a VSC job.

Secondly within vmware tools within the vm, is the option to remove its ability to do vmware snapshots.

I’ve seen this happen on a number of occasions and it is normally vmware’s rather patchy snapshotting that breaks the NetApp jobs.

gdefevere
13,280 Views

I restarted the services SMVI and VSC, but that didn't helped !

Later on, I found this in another discussion : ... seems that the file that stores the backup configuration stores the the status of the job. I hacked the xml file and changed "running" to "success" then restarted the service and everything seems ok. Brilliant...  it lives here: \Program Files\NetApp\Virtual Storage Console\smvi\server\repository\backups.xml

I didn't adjust the backup config file, since I was wondering how this job would react on the next scheduled backup. As I expected : it executed, but it Failed however. I found this in the log "2012-12-19 19:16:50,615 ERROR - Failed to rename snapshot ..." Probably due to hanging job from yesterday.

Still don't know, what's has been the cause (no log file from hanging backup job).

peixin
13,280 Views

Hi gdefevere

How many ESX Hosts in your  vSphere environments?

If the number of the ESX Hosts exceed 20. Please refer to contents:

Description

This article describes the procedure to be followed to change the amount of RAM for JAVA through Virtual Storage Console (VSC) and SnapManager for Virtual Infrastructure (SMVI). The default amount of JAVA HEAP is designed for vSphere environments with 10-20 ESX hosts. If you find that your VSC is responding slowly, listing backups is taking a long time, or the VSC plugin appears to be unresponsive, the procedure below will improve the overall performance of the plugin by modifying the amount of RAM on the Virtual Machine (VM) or physical server, and increasing the amount of JAVA heap for VSC and SMVI.

This KB article will guide you through increasing the values for both the NetApp vSphere Plugin Framework service (VSC NVPF Service), and the Backup and Recovery (SMVI) service. The values listed below are guidelines only, and the values can be increased or decreased depending on your environment and configuration. Modifying the amount of JAVA HEAP allocated without increasing the 'physical memory' of the VM or host on which VSC is installed is neither recommended or useful; without the increase in memory to the OS and other applications, increasing the JAVA HEAP will only cause further performance issues.

Procedure

It is necessary to modify the virtual machine where VSC is running, to ensure that it has enough 'physical memory'. The following changes assume that you are running VSC on the same VM as vCenter. You might need to change these values depending on what is installed on the VSC server, and how much memory pressure is on the VM.

  1. Modify the VM so it has more RAM:
     
    • Go into vCenter and check the host on which the vCenter VM is running
    • Close the vSphere client connection to the vCenter
    • Open a new vSphere client connection directly to the host (remember to log in with 'root', and not admin)
    • Power off the vCenter VM
    • Edit settings on the VM and change the RAM to 16GB or higher
      Note: These values are indicated for a vSphere environment of 500-1000 VMs, if your environment is larger, you might need to increase the values.
    • Power the VM back on, and check the properties to make sure it 'sees' the higher amount of RAM
  2. Modify the NetApp vSphere Plugin Framework so that it has more JAVA HEAP:
     
    • Stop the 'NetApp vSphere Plugin Framework Service' and the 'SnapManager for Virtual Infrastructure Service'
    • Open the installation directory for VSC in Windows Explorer (C:\Program Files\NetApp\Virtual Storage Console)
    • Open the wrapper directory
    • Copy the wrapper.conf file to wrapper.original
    • Edit the wrapper.conf file using WordPad
    • Locate the following section:
      Initial Java Heap Size (in MB) wrapper.java.initmemory=64
      Maximum Java Heap Size (in MB) wrapper.java.maxmemory=1024
    • Change the initmemory to '4096' and the maxmemory to '8192'
      Note: These values are indicated for a vSphere environment of 500-1000 VM's, if your environment is larger, you might need to increase the values.
    • Save the wrapper.conf file and close it
  3. Modify the NetApp SnapManager for Virtual Infrastructure so that it has more JAVA HEAP:
     
    • Open the installation directory for VSC in Windows Explorer (C:\Program Files\NetApp\Virtual Storage Console)
    • Go to smvi > server > etc (The full path should be C:\Program Files\NetApp\Virtual Storage Console\smvi\server\etc)
    • Copy the wrapper.conf file to wrapper.original
    • Locate the following section:
      Initial Java Heap Size (in MB) wrapper.java.initmemory=64
      Maximum Java Heap Size (in MB) wrapper.java.maxmemory=512
    • Change the initmemory to '4096' and the maxmemory to '8192'
      Note: These values are indicated for a vSphere environment of 500-1000 VM's, if your environment is larger, you may need to increase the values
    • Save the wrapper.conf file and close it
  4. Restart the 'NetApp vSphere Plugin Framework Service' and the 'SnapManager for Virtual Storage Console' service.
  5. Reopen the vSphere client to the vCenter server, if the plugin icon isn't appearing yet, refresh the vSphere client, or go into 'Plugin Manager' and re-enable it.

gdefevere
13,280 Views

Hi,

At the moment we have 16 ESX Hosts and according to our VMware administrator we are using the default settings. Since we don't have more than 20, I think I can leave it this way ?

It is the first time that I saw this "RUNNING" job, but when I opened the backup.xml file and searched for the word "RUNNING", there we're more entries. Backup's are now failing because of snapshot rename issues. We had these snapshot rename issues / backup failures before. Maybe they are related to each other.

peixin
13,280 Views

Hi

You can lease it.

By your description,VSC backup failure due to snapshot rename failure.

This is occurring because a previous SMVI job left an existing snapshot with the same name on the filer which SMVI is not aware of. When viewing existing snapshots for the volume (in this case /vol/vmware), you will see an existing snapshot named _recent:

Volume vmware
working...
date name
------------ --------
Nov 24 03:00 smvi__vmname_recent
Nov 23 03:00 smvi__vmname_20091123030000
Nov 22 03:00 smvi__vmname_20091122030000
Nov 21 03:00 smvi__vmname_20091121030000
Nov 20 03:00 smvi__vmname_20091120030000

Solution

      "Quick" resolution:

  • Open FilerView and rename the snapshot matching the _recent for the backup job which has failed to an alternate name.
  • In SMVI, retest the backup and ensure that this allows SMVI to properly create a backup named _recent.
Recommended course of action for permanent resolution: 
  • In SMVI, create a new backup job with a different name than the original, but the same properties.
  • In SMVI, turn off the existing backup so that it no longer runs on a schedule.
  • In SMVI, turn on the schedule for the new backup job.
  • Over time as your backups would normally reach their retention period, manually delete the oldest SMVI backup and snapshot.

gdefevere
13,280 Views

Had these a couple of times before and knew already what to do. Probably the times before, were also caused by this kind of a hanging job, without even knowing, because if it failes you get no (succes/warning/failed) mail (since it still running). What we really need to figure out is, why it get stucked somewhere, but its not logged (as far as I can see). Now it's running and I'm on holiday for more than a week. Hope it keeps running.

Thanks for the feedback.

Public