Data Protection

KVM plugin failure

pierrek
9,780 Views

Dear All,

I can't make the SC KVM plugin work, it always fails in the discovery phase with the following error about 10 minutes after the discover or backup operation is started:

Before the failure is returned, the KVM host is almost frozen due to the high amount on IOs observed on local disks. A couple of java processes (the SC agent) are apparently responsible for the freeze.

The exact error message is provided here:

[2014-03-03 15:05:34,345] DEBUG: Workflow : discover started with workflow id : 7

[2014-03-03 15:05:34,345] DEBUG: Version: Snap Creator Framework 4.1.0

[2014-03-03 15:05:34,345] DEBUG: Profile: init

[2014-03-03 15:05:34,345] DEBUG: Config: kvm

[2014-03-03 15:05:34,345] DEBUG: Action: discover

[2014-03-03 15:05:34,346] DEBUG: Plugin: kvm

[2014-03-03 15:05:34,346] DEBUG: Policy: null

[2014-03-03 15:05:34,346] DEBUG: Volume Name: detect

[2014-03-03 15:05:34,346] DEBUG: Snapshot Name: kvm_20140303150534

########## Agent validation ##########

[2014-03-03 15:05:34,559] INFO: Agent validation completed successfully for agent localhost:9090

########## Plugin validation ##########

[2014-03-03 15:05:34,584] INFO: Plugin validation completed successfully for plugin kvm

########## Application Discovery ##########

[2014-03-03 15:05:34,757] INFO: Performing discovery on : kvm

[2014-03-03 15:09:09,013] ERROR: SCF-00028: Auto Discovery for plug-in [kvm] failed with error [java.net.ConnectException: ConnectException invoking https://localhost:9090/SnapCreator/operations/status/059225cf-e871-4ee4-af7b-14b5e37776e4/7: Connection refused] and exit code [-1], Exiting!

[2014-03-03 15:09:09,802] DEBUG: Workflow : discover_OnFailure started with workflow id : 8

[2014-03-03 15:09:09,968] DEBUG: Version: Snap Creator Framework 4.1.0

[2014-03-03 15:09:09,969] DEBUG: Profile: init

[2014-03-03 15:09:09,969] DEBUG: Config: kvm

[2014-03-03 15:09:09,969] DEBUG: Action: discover_OnFailure

[2014-03-03 15:09:09,998] DEBUG: Plugin: kvm

[2014-03-03 15:09:09,998] DEBUG: Policy: null

[2014-03-03 15:09:10,034] DEBUG: Volume Name: detect

[2014-03-03 15:09:10,034] DEBUG: Snapshot Name: kvm_20140303150534

########## Agent Workflow Finalization ##########

[2014-03-03 15:10:42,366] INFO: Agent Workflow Finalization started

[2014-03-03 15:11:41,584] WARN: SCF-00080: Agent finalization for workflow failed with warning [java.net.SocketException: SocketException invoking http://localhost:9090/SnapCreator/Daemon/Agent: Unexpected end of file from server]. Continuing.

########## Snap Creator Framework 4.1.0 failed ##########

[2014-03-03 15:11:49,679] INFO: Pre Exit commands are not defined. Skipping !

I have no idea where to look at for debugging this as no log file (except an unknown.log) is created under the logs/plugins directory of the scAgent installation path.

Any help would be highly appreciated.

Kind regards

Pierre Kestremond

11 REPLIES 11

pierrek
9,712 Views

The only strange message I found in the agent.log for the above job is shown underlined below:

[2014-03-03T15:05:41,684+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stdoutLogger running=true

[2014-03-03T15:05:41,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stderrLogger running=true

[2014-03-03T15:05:43,645+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - 2014-03-03 15:05:43 (pid=17342) - Deleting request plugins/wrapper/in/7+kvm+1393855539683+discover

[2014-03-03T15:05:43,646+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - 2014-03-03 15:05:43 (pid=17342) - Will execute discover now (plugin=kvm, workflowId=7)

[2014-03-03T15:05:43,646+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - 2014-03-03 15:05:43 (pid=17342) - Config Parameters: ACTION, ALLOW_IP_ADDR, APP_AUTO_DISCOVERY, APP_CONF_PERSISTENCE, APP_DEFINED_BACKUP, APP_DEFINED_CLONE, APP_DEFINED_MOUNT, APP_DEFINED_RESTORE, APP_DEFINED_UMOUNT, APP_IGNORE_ERROR, APP_NAME, APP_VM, ARCHIVE_LOG_RECURSIVE_SEARCH, CMODE_SET, CMODE_SNAPSHOT_FORCE_DELETE, CONFDIR, CONFIG_NAME, CONFIG_TYPE, CONFNAME, KVM_CMD_RETRY_COUNT, KVM_VM_CONFIG_DIR, KVM_VM_MAPPING, LOG_NUM, LOG_TRACE_ENABLE, NTAP_ASUP_ERROR_ENABLE, NTAP_CIFS_EXPORT_ENABLE, NTAP_CLONE_FOR_BACKUP, NTAP_CLONE_SECONDARY, NTAP_CONSISTENCY_GROUP_SNAPSHOT, NTAP_CONSISTENCY_GROUP_TIMEOUT, NTAP_CONSISTENCY_GROUP_WAFL_SYNC, NTAP_EXTERNAL_SNAPSHOT_REGEX, NTAP_LUN_CLONE_RESERVATION, NTAP_NFS_EXPORT_ACCESS, NTAP_NFS_EXPORT_PERSISTENT, NTAP_NUM_VOL_CLONES, NTAP_OSSV_ENABLE, NTAP_PM_RUN_BACKUP, NTAP_PM_UPDATE, NTAP_PWD_PROTECTION, NTAP_SNAPMIRROR_CASCADING_UPDATE, NTAP_SNAPMIRROR_UPDATE, NTAP_SNAPMIRROR_USE_SNAPSHOT, NTAP_SNAPMIRROR_WAIT, NTAP_SNAPSHOT_CLEANUP, NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY, NTAP_SNAPSHOT_DEPENDENCY_IGNORE, NTAP_SNAPSHOT_DISABLE, NTAP_SNAPSHOT_NODELETE, NTAP_SNAPSHOT_RESTORE_AUTO_DETECT, NTAP_SNAPSHOT_RETENTIONS, NTAP_SNAPSHOT_RETENTION_AGE, NTAP_SNAPVAULT_NODELETE, NTAP_SNAPVAULT_RESTORE_WAIT, NTAP_SNAPVAULT_SNAPSHOT, NTAP_SNAPVAULT_UPDATE, NTAP_SNAPVAULT_WAIT, NTAP_TIMEOUT, NTAP_USERS, NTAP_USE_EXTERNAL_SNAPSHOT, NTAP_VOL_CLONE_RESERVE, OM_EVENT_GENERATE, OPERATION_ID, PORT, PROFILE_NAME, RECORD_ID, SC_AGENT, SC_AGENT_LOG_ENABLE, SC_AGENT_TIMEOUT, SC_AGENT_UNQUIESCE_TIMEOUT, SC_AGENT_WATCHDOG_ENABLE, SC_ITERATION_ID, SC_SERVER_PORT, SC_USER_NAME, SNAME, SNAME_COMBINED, SNAPDRIVE, SNAPDRIVE_DISCOVERY, SNAP_TIME, SNAP_TIMESTAMP_ONLY, SUCCESS_MSG, SVNAME_COMBINED, TRANSPORT, USE_GLOBAL_CONFIG, USE_PROXY, VOLUMES, obj,

[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Bin dir: /opt/NetApp/scAgent4.1.0/plugins/wrapper

[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Plugin dir: /opt/NetApp/scAgent4.1.0/plugins/

[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Perl Plugin dir: /opt/NetApp/scAgent4.1.0/plugins/perl

[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Getting instance for kvm

[2014-03-03T15:05:43,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.WrapperHelperImpl - execute() - Result file doesn't exist

[2014-03-03T15:05:43,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stdoutLogger running=true

[2014-03-03T15:05:43,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stderrLogger running=true

[2014-03-03T15:05:43,915+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Executing discover

[2014-03-03T15:05:43,935+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - OS - execute() - trying to run ls /etc/libvirt/qemu

[2014-03-03T15:05:43,944+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - could not find ParserDetails.ini in /tmp/pdk-root-17342/XML/SAX

[2014-03-03T15:05:44,443+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - OS - execute() - trying to run mount | grep nfs

[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - Deep recursion on subroutine "Net::SSLeay::AUTOLOAD" at

[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -   /</opt/NetApp/scAgent4.1.0/plugins/wrapper/wrapper>Net/SSLeay.pm line 349 (#1)

[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -     (W recursion) This subroutine has called itself (directly or indirectly)

[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -     100 times more than it has returned.  This probably indicates an

[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -     infinite recursion, unless you're writing strange benchmark programs, in

[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -     which case it indicates something else.

[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -

[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -     This threshold can be changed from 100, by recompiling the perl binary,

[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -     setting the C pre-processor macro PERL_SUB_DEPTH_WARN to the desired value.

[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -

[2014-03-03T15:05:44,956+0100] DEBUG [qtp1279199758-20] com.netapp.snapcreator.agent.nextgen.security.AuthorizationValidatorImpl - isAuthorized() - Incoming request from user scServer2agent (127.0.0.1, 127.0.0.1)

sivar
9,711 Views

Hello Pierre,

1) Please provide us with the Operating System detail.

2) Also, please provide us with the KVM configuration options from your snapcreator config file.

3) Please refer this thread https://communities.netapp.com/message/122252

     Does this file exist? ls /etc/libvirt/qemu

4) Are you able to run virsh commands?

    

Please let me know the details.

I shall forward the information to our KVM expert and get a feedback.

Thanks,
Siva Ramanathan

pierrek
9,711 Views

Hi Siva,

Here is the info you asked for:

1) Linux version

[root@treefrog ~]# cat /etc/redhat-release                                                                                                                                                               

Fedora release 19 (Schrödinger’s Cat)

[root@treefrog ~]# uname -a

Linux treefrog.netapp.com 3.12.11-201.fc19.x86_64 #1 SMP Fri Feb 14 19:08:33 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

[root@treefrog ~]#

2) the SC config file is included here:

[root@treefrog init]# grep -v "^#" kvm.conf

CONFIG_TYPE=STANDARD

VOLUMES=auto:detect

META_DATA_VOLUME=

LOG_NUM=30

SNAPDRIVE=N

SNAPDRIVE_DISCOVERY=N

LOG_TRACE_ENABLE=N

NTAP_PWD_PROTECTION=Y

USE_PROXY=N

NTAP_TIMEOUT=60

USE_GLOBAL_CONFIG=N

FEDERATED_APPLICATIONS=

ALLOW_IP_ADDR=N

SC_TMP_DIR=

NTAP_USERS=vs_kvm_pierrek_admin:vsadmin/793563786f416f4a6661735434714e3479484e3872413d3d0a

MANAGEMENT_INTERFACES=

PORT=443

TRANSPORT=HTTPS

SECONDARY_INTERFACES=

CMODE_CLUSTER_NAME=

CMODE_CLUSTER_USERS=

CMODE_SNAPSHOT_FORCE_DELETE=Y

CMODE_SET=Y

SNAME=kvm

SNAP_TIMESTAMP_ONLY=Y

NTAP_SNAPSHOT_POLICIES=

NTAP_USE_EXTERNAL_SNAPSHOT=N

NTAP_SNAPSHOT_NODELETE=N

NTAP_CONSISTENCY_GROUP_TIMEOUT=MEDIUM

NTAP_SNAPSHOT_RETENTIONS=DAILY:7

NTAP_SNAPSHOT_CREATE_CMD01=

NTAP_CONSISTENCY_GROUP_WAFL_SYNC=N

NTAP_CONSISTENCY_GROUP_SNAPSHOT=N

NTAP_SNAPSHOT_RETENTION_AGE=7

NTAP_METADATA_SNAPSHOT_CREATE_CMD=

NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY=N

NTAP_SNAPSHOT_DEPENDENCY_IGNORE=N

NTAP_SNAPSHOT_RESTORE_AUTO_DETECT=N

NTAP_SNAPSHOT_DISABLE=N

NTAP_SNAPSHOT_LABEL=

NTAP_EXTERNAL_SNAPSHOT_REGEX=.*

NTAP_SNAPSHOT_CLEANUP=N

NTAP_SNAPVAULT_UPDATE=N

SNAPVAULT_QTREE_INCLUDE=

NTAP_SNAPVAULT_NODELETE=N

NTAP_SNAPVAULT_RESTORE_WAIT=N

NTAP_SNAPVAULT_RETENTION_AGE=

NTAP_SNAPVAULT_SNAPSHOT=N

NTAP_SNAPVAULT_MAX_TRANSFER=

NTAP_SNAPVAULT_WAIT=0

NTAP_SNAPVAULT_RETENTIONS=

SNAPVAULT_VOLUMES=

NTAP_SNAPMIRROR_WAIT=60

NTAP_SNAPMIRROR_USE_SNAPSHOT=N

NTAP_SNAPMIRROR_MAX_TRANSFER=

SNAPMIRROR_VOLUMES=

SNAPMIRROR_CASCADING_VOLUMES=

NTAP_SNAPMIRROR_CASCADING_UPDATE=N

NTAP_SNAPMIRROR_UPDATE=N

NTAP_CLONE_IGROUP_MAP=

NTAP_NFS_EXPORT_ACCESS=read-only

NTAP_LUN_CLONE_RESERVATION=N

NTAP_CLONE_SECONDARY_VOLUMES=

NTAP_NUM_VOL_CLONES=1

NTAP_NFS_EXPORT_PERSISTENT=N

NTAP_CLONE_FOR_BACKUP=N

NTAP_CIFS_EXPORT_ENABLE=N

NTAP_CLONE_SECONDARY=N

NTAP_NFS_EXPORT_HOST=

NTAP_VOL_CLONE_RESERVE=NONE

NTAP_DFM_DATA_SET=

NTAP_PM_RUN_BACKUP=N

NTAP_PM_UPDATE=N

NTAP_DFM_SNAPSHOT_FORMAT=

OM_TRANSPORT=

OM_HOST=

OM_EVENT_GENERATE=N

OM_USER=

OM_PWD=

OM_PORT=

NTAP_OSSV_FS_SNAPSHOT=

NTAP_OSSV_FS_SNAPSHOT_CREATE_CMD01=

NTAP_OSSV_HOMEDIR=

NTAP_OSSV_ENABLE=N

UMOUNT_CMD01=

ARCHIVE_CMD01=

APP_UNQUIESCE_CMD01=

MOUNT_CMD01=

APP_QUIESCE_CMD01=

PRE_CLONE_CREATE_CMD01=

PRE_RESTORE_CMD01=

PRE_NTAP_CMD01=

PRE_APP_UNQUIESCE_CMD01=

PRE_EXIT_CMD01=

PRE_APP_QUIESCE_CMD01=

PRE_NTAP_CLONE_DELETE_CMD01=

POST_NTAP_DATA_TRANSFER_CMD01=

POST_RESTORE_CMD01=

POST_APP_QUIESCE_CMD01=

POST_NTAP_CMD01=

POST_CLONE_CREATE_CMD01=

POST_APP_UNQUIESCE_CMD01=

SUCCESS_MSG=INFO: NetApp Snap Creator Framework finished successfully "(Action: %ACTION)"

NTAP_ASUP_ERROR_ENABLE=N

SENDTRAP=

SUCCESS_TRAP=

SC_AGENT_UNQUIESCE_TIMEOUT=305

SC_AGENT_WATCHDOG_ENABLE=N

SC_AGENT=localhost:9090

SC_AGENT_LOG_ENABLE=N

SC_AGENT_TIMEOUT=600

SC_CLONE_TARGET=

APP_NAME=kvm

APP_CONF_PERSISTENCE=Y

APP_IGNORE_ERROR=N

APP_DEFINED_RESTORE=Y

PYTHON_OPTS=

APP_DEFINED_CLONE=N

JAVA_HOME=

PYTHON_HOME=

FS_NAME=

JVM_ARGS=

PERL_OPTS=

PERL_HOME=

PLUGINS_FORWARD=

VALIDATE_VOLUMES=

APP_AUTO_DISCOVERY=Y

JAVA_CLASSPATH=

PLUGINS_RESTORE=

APP_DEFINED_BACKUP=N

APP_DEFINED_UMOUNT=N

PLUGINS_REVERSE=

APP_DEFINED_MOUNT=N

ARCHIVE_LOG_RECURSIVE_SEARCH=N

ARCHIVE_LOG_DIR=

ARCHIVE_LOG_EXT=

ARCHIVE_LOG_RETENTION=

ARCHIVE_LOG_ENABLE=

KVM_CMD_RETRY_COUNT=3

KVM_VM_MAPPING=auto:detect:detect

KVM_VM_CONFIG_DIR=/etc/libvirt/qemu

[root@treefrog init]#

3) here is the content of the /etc/libvirt/qemu directory:

[root@treefrog init]# ls -l /etc/libvirt/qemu

total 52

-rw-------. 1 root root 2868 Dec 24 11:23 arch-linux-01.xml

-rw-------. 1 root root 2802 Dec 20 16:32 arch-linux-02.xml

-rw-------. 1 root root 2852 Jan 26 08:12 fedora18-01.xml

-rw-------. 1 root root 2956 Jan 31 13:40 netapp7.xml

drwx------. 3 root root 4096 Jan 17 03:12 networks

-rw-------. 1 root root 5719 Jan 19 13:23 oel6-01.xml

-rw-------. 1 root root 5120 Dec 31 14:41 oel6-01.xml.save

-rw-------. 1 root root 2313 Jan  8 12:57 pierrek-lw7.xml

-rw-------. 1 root root 2915 Mar  3 11:28 tiny-centos64-template.xml  --> This one is located on NFS storage exported from my SVM

-rw-------. 1 root root 2877 Mar  3 09:15 tiny-centos64.xml

-rw-------. 1 root root 2546 Dec 21 12:57 win-2k8r2-02.xml

[root@treefrog init]#

I can run virsh commands as you can see here:

[root@treefrog init]# virsh list --all

Id    Name                           State

----------------------------------------------------

3     tiny-centos64-template         running

-     arch-linux-01                  shut off

-     arch-linux-02                  shut off

-     fedora18-01                    shut off

-     netapp7                        shut off

-     oel6-01                        shut off

-     pierrek-lw7                    shut off

-     tiny-centos64                  shut off

-     win-2k8r2-02                   shut off

[root@treefrog init]#

Thanks already for your help.

Pierre

sivar
9,711 Views

We have not qualified or tested Fedora with Snap Creator. (

Meanwhile, could you please check by disbaling selinux?

http://docs.fedoraproject.org/en-US/Fedora/13/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-Enabling_and_Disabling_SE...

I see that the files have ACL information

[root@treefrog init]# ls -l /etc/libvirt/qemu

total 52

-rw-------. 1 root root 2868 Dec 24 11:23 arch-linux-01.xml

Where is your snapcreator server running?

Also, please try with SC_AGENT=hostname:9090 (instead of localhost).

Please ensure /etc/hosts on the agent includes the storage controller IPs. (or in your DNS)

Thanks,
Siva Ramanathan

pierrek
9,711 Views

Hi Siva,

I changed the SC_AGENT parameter in the config file to the FQDN of the KVM host (in this test setup I am running the agent and server on the same machine, the KVM host).

Now I am getting the below error:

It is different from the one I had before. I will now disable SELINUX (I will do it only for troubleshooting but selinux should be in enforcing mode on my host). After a reboot I will let you know the result.

Kind regards

Pierre

pierrek
9,711 Views

Sorry the screenshot was not added. Here is the debug output of the job:

[2014-03-03 17:06:18,597] DEBUG: Workflow : discover started with workflow id : 9

[2014-03-03 17:06:18,600] DEBUG: Version: Snap Creator Framework 4.1.0

[2014-03-03 17:06:18,600] DEBUG: Profile: init

[2014-03-03 17:06:18,600] DEBUG: Config: kvm

[2014-03-03 17:06:18,600] DEBUG: Action: discover

[2014-03-03 17:06:18,600] DEBUG: Plugin: kvm

[2014-03-03 17:06:18,600] DEBUG: Policy: null

[2014-03-03 17:06:18,600] DEBUG: Volume Name: detect

[2014-03-03 17:06:18,600] DEBUG: Snapshot Name: kvm_20140303170618

########## Agent validation ##########

[2014-03-03 17:06:19,154] INFO: Agent validation completed successfully for agent treefrog.netapp.com:9090

########## Plugin validation ##########

[2014-03-03 17:06:19,268] INFO: Plugin validation completed successfully for plugin kvm

########## Application Discovery ##########

[2014-03-03 17:06:19,560] INFO: Performing discovery on : kvm

[2014-03-03 17:18:09,207] ERROR: [treefrog.netapp.com:9090 (4.1.0.1)] Operation failed. Reason: Executor timed out. Make sure plugins/wrapper/wrapper binary has execute permission set. Increasing WRAPPER_TIMEOUT_IN_MSEC in agent.properties may help, too.

[2014-03-03 17:18:09,549] ERROR: [treefrog.netapp.com:9090(4.1.0.1)] SCF-00028: Auto Discovery for plug-in [kvm] failed with error [validation failed] and exit code [102], Exiting!

[2014-03-03 17:18:10,285] DEBUG: Workflow : discover_OnFailure started with workflow id : 10

[2014-03-03 17:18:10,310] DEBUG: Version: Snap Creator Framework 4.1.0

[2014-03-03 17:18:10,310] DEBUG: Profile: init

[2014-03-03 17:18:10,310] DEBUG: Config: kvm

[2014-03-03 17:18:10,310] DEBUG: Action: discover_OnFailure

[2014-03-03 17:18:10,356] DEBUG: Plugin: kvm

[2014-03-03 17:18:10,356] DEBUG: Policy: null

[2014-03-03 17:18:10,397] DEBUG: Volume Name: detect

[2014-03-03 17:18:10,397] DEBUG: Snapshot Name: kvm_20140303170618

########## Agent Workflow Finalization ##########

[2014-03-03 17:18:10,715] INFO: Agent Workflow Finalization started

[2014-03-03 17:18:28,203] INFO: [treefrog.netapp.com:9090 (4.1.0.1)] Finalized workflow with id 9

[2014-03-03 17:18:28,203] INFO: Agent Workflow Finalization finished successfully

########## Snap Creator Framework 4.1.0 failed ##########

[2014-03-03 17:18:29,516] INFO: Pre Exit commands are not defined. Skipping !

sivar
9,711 Views

From the treefrog host can you perform,

fping vs_kvm_pierrek_admin

pierrek
9,711 Views

Siva,

I have disabled selinux in /etc/selinux/config then the host has been rebooted.

[root@treefrog ~]# getenforce

Disabled

[root@treefrog ~]#

I have restarted the discovery job and it has failed the same way as with selinux in enforcing mode:

[2014-03-03 19:59:24,016] DEBUG: Workflow : discover started with workflow id : 11

[2014-03-03 19:59:24,018] DEBUG: Version: Snap Creator Framework 4.1.0

[2014-03-03 19:59:24,018] DEBUG: Profile: init

[2014-03-03 19:59:24,018] DEBUG: Config: kvm

[2014-03-03 19:59:24,018] DEBUG: Action: discover

[2014-03-03 19:59:24,018] DEBUG: Plugin: kvm

[2014-03-03 19:59:24,018] DEBUG: Policy: null

[2014-03-03 19:59:24,019] DEBUG: Volume Name: detect

[2014-03-03 19:59:24,019] DEBUG: Snapshot Name: kvm_20140303195923

########## Agent validation ##########

[2014-03-03 19:59:24,533] INFO: Agent validation completed successfully for agent treefrog.netapp.com:9090

########## Plugin validation ##########

[2014-03-03 19:59:24,629] INFO: Plugin validation completed successfully for plugin kvm

########## Application Discovery ##########

[2014-03-03 19:59:24,859] INFO: Performing discovery on : kvm

[2014-03-03 20:09:20,155] ERROR: SCF-00028: Auto Discovery for plug-in [kvm] failed with error [java.net.ConnectException: ConnectException invoking https://treefrog.netapp.com:9090/SnapCreator/operations/status/c19a2d6e-71d7-4195-ad28-30f78374053a/11: Connection refused] and exit code [-1], Exiting!

[2014-03-03 20:09:25,956] DEBUG: Workflow : discover_OnFailure started with workflow id : 12

[2014-03-03 20:09:25,956] DEBUG: Version: Snap Creator Framework 4.1.0

[2014-03-03 20:09:25,956] DEBUG: Profile: init

[2014-03-03 20:09:25,956] DEBUG: Config: kvm

[2014-03-03 20:09:25,956] DEBUG: Action: discover_OnFailure

[2014-03-03 20:09:25,956] DEBUG: Plugin: kvm

[2014-03-03 20:09:25,956] DEBUG: Policy: null

[2014-03-03 20:09:25,956] DEBUG: Volume Name: detect

[2014-03-03 20:09:25,956] DEBUG: Snapshot Name: kvm_20140303195923

########## Agent Workflow Finalization ##########

[2014-03-03 20:09:26,105] INFO: Agent Workflow Finalization started

[2014-03-03 20:09:46,525] WARN: SCF-00080: Agent finalization for workflow failed with warning [com.ctc.wstx.exc.WstxIOException: Connection refused]. Continuing.

########## Snap Creator Framework 4.1.0 failed ##########

[2014-03-03 20:09:47,977] INFO: Pre Exit commands are not defined. Skipping !

fping result:

[root@treefrog adm]# yum install fping

Loaded plugins: langpacks, refresh-packagekit, verify

Resolving Dependencies

--> Running transaction check

---> Package fping.x86_64 0:3.5-3.fc19 will be installed

--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================================

Package                            Arch                                Version                                   Repository                            Size

=============================================================================================================================================================

Installing:

fping                              x86_64                              3.5-3.fc19                                updates                               45 k

Transaction Summary

=============================================================================================================================================================

Install  1 Package

Total download size: 45 k

Installed size: 102 k

Is this ok [y/d/N]: y

Downloading packages:

fping-3.5-3.fc19.x86_64.rpm                                                                                                           |  45 kB  00:00:00    

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

  Installing : fping-3.5-3.fc19.x86_64                                                                                                                   1/1

  Verifying  : fping-3.5-3.fc19.x86_64                                                                                                                   1/1

Installed:

  fping.x86_64 0:3.5-3.fc19                                                                                                                                 

Complete!

[root@treefrog adm]# df -h

Filesystem                        Size  Used Avail Use% Mounted on

/dev/mapper/fedora_treefrog-root   99G   19G   75G  20% /

devtmpfs                          2.9G     0  2.9G   0% /dev

tmpfs                             2.9G   84K  2.9G   1% /dev/shm

tmpfs                             2.9G  992K  2.9G   1% /run

tmpfs                             2.9G     0  2.9G   0% /sys/fs/cgroup

tmpfs                             2.9G  6.3M  2.9G   1% /tmp

/dev/sda1                         976M  149M  760M  17% /boot

/dev/mapper/fedora_treefrog-home  583G  321G  233G  59% /home

vs_kvm_pierrek:/kvm2               95G  1.3G   94G   2% /var/lib/libvirt/images/vs_kvm_pierrek

[root@treefrog adm]# fping vs_kvm_pierrek_admin

vs_kvm_pierrek_admin is alive

[root@treefrog adm]#

sivar
9,712 Views

Thank you.

This sure looks like a network issue. The process waited for 10 minutes (600 seconds) and came back with a networking related error.

[2014-03-03 19:59:24,859] INFO: Performing discovery on : kvm

[2014-03-03 20:09:20,155] ERROR: SCF-00028: Auto Discovery for plug-in [kvm] failed with error [java.net.ConnectException: ConnectException invoking https://treefrog.netapp.com:9090/SnapCreator/operations/status/c19a2d6e-71d7-4195-ad28-30f78374053a/11: Connection refused] and exit code [-1], Exiting!

What is your java version? -- please try with a latest java 7 JRE (lesser than u51 update)

Please provide us with the below output.

netstat -tulpn

Can you try disable firewall and try a backup? (I am not sure if firewall is running)

/etc/init.d/iptables stop

Thanks,
Siva Ramanathan

pierrek
8,699 Views

Siva,

Same behaviour/result with firewalld stopped/disabled:

[root@treefrog ~]# service firewalld status

Redirecting to /bin/systemctl status  firewalld.service

firewalld.service - firewalld - dynamic firewall daemon

   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled)

   Active: inactive (dead)

Mar 03 17:32:28 treefrog.netapp.com systemd[1]: Unit firewalld.service entered failed state.

Mar 03 17:34:01 treefrog.netapp.com systemd[1]: Started firewalld - dynamic firewall daemon.

Mar 03 17:36:10 treefrog.netapp.com firewalld[520]: 2014-03-03 17:36:10 ERROR: UNKNOWN_INTERFACE: wlp16s0

Mar 03 17:43:37 treefrog.netapp.com systemd[1]: Started firewalld - dynamic firewall daemon.

Mar 03 17:44:24 treefrog.netapp.com systemd[1]: firewalld.service: main process exited, code=exited, status=1/FAILURE

Mar 03 17:44:24 treefrog.netapp.com systemd[1]: Stopped firewalld - dynamic firewall daemon.

Mar 03 17:44:24 treefrog.netapp.com systemd[1]: Unit firewalld.service entered failed state.

Mar 03 19:25:44 treefrog.netapp.com systemd[1]: Started firewalld - dynamic firewall daemon.

Mar 03 21:55:09 treefrog.netapp.com systemd[1]: Stopping firewalld - dynamic firewall daemon...

Mar 03 21:55:14 treefrog.netapp.com systemd[1]: Stopped firewalld - dynamic firewall daemon.

[root@treefrog ~]# netstat -tulpn

Active Internet connections (only servers)

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name   

tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      1926/mysqld        

tcp        0      0 0.0.0.0:55979           0.0.0.0:*               LISTEN      -                  

tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      1560/smbd          

tcp        0      0 127.0.0.1:5900          0.0.0.0:*               LISTEN      3330/qemu-system-x8

tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      552/rpcbind        

tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      1929/dnsmasq       

tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      683/sshd           

tcp        0      0 127.0.0.1:631           0.0.0.0:*               LISTEN      534/cupsd          

tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      5111/sendmail: acce

tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN      1560/smbd          

tcp        0      0 0.0.0.0:45439           0.0.0.0:*               LISTEN      2842/rpc.statd     

tcp6       0      0 :::9090                 :::*                    LISTEN      4841/java          

tcp6       0      0 :::53578                :::*                    LISTEN      2842/rpc.statd     

tcp6       0      0 :::139                  :::*                    LISTEN      1560/smbd          

tcp6       0      0 :::111                  :::*                    LISTEN      552/rpcbind        

tcp6       0      0 :::55700                :::*                    LISTEN      -                  

tcp6       0      0 :::22                   :::*                    LISTEN      683/sshd           

tcp6       0      0 ::1:631                 :::*                    LISTEN      534/cupsd          

tcp6       0      0 :::8443                 :::*                    LISTEN      3394/java          

tcp6       0      0 :::445                  :::*                    LISTEN      1560/smbd          

udp        0      0 192.168.122.1:53        0.0.0.0:*                           1929/dnsmasq       

udp        0      0 0.0.0.0:67              0.0.0.0:*                           1929/dnsmasq       

udp        0      0 0.0.0.0:68              0.0.0.0:*                           5071/dhclient      

udp        0      0 0.0.0.0:68              0.0.0.0:*                           640/dhclient       

udp        0      0 0.0.0.0:111             0.0.0.0:*                           552/rpcbind        

udp        0      0 0.0.0.0:123             0.0.0.0:*                           532/chronyd        

udp        0      0 0.0.0.0:323             0.0.0.0:*                           532/chronyd        

udp        0      0 0.0.0.0:712             0.0.0.0:*                           552/rpcbind        

udp        0      0 0.0.0.0:45935           0.0.0.0:*                           2842/rpc.statd     

udp        0      0 127.0.0.1:898           0.0.0.0:*                           2842/rpc.statd     

udp        0      0 0.0.0.0:5353            0.0.0.0:*                           539/avahi-daemon: r

udp        0      0 0.0.0.0:42455           0.0.0.0:*                           539/avahi-daemon: r

udp        0      0 0.0.0.0:38723           0.0.0.0:*                           -                  

udp        0      0 0.0.0.0:2909            0.0.0.0:*                           640/dhclient       

udp        0      0 0.0.0.0:7307            0.0.0.0:*                           5071/dhclient      

udp6       0      0 :::111                  :::*                                552/rpcbind        

udp6       0      0 :::123                  :::*                                532/chronyd        

udp6       0      0 :::323                  :::*                                532/chronyd        

udp6       0      0 :::712                  :::*                                552/rpcbind        

udp6       0      0 :::54054                :::*                                2842/rpc.statd     

udp6       0      0 :::58796                :::*                                -                  

udp6       0      0 :::43631                :::*                                640/dhclient       

udp6       0      0 :::48284                :::*                                5071/dhclient      

udp6       0      0 :::40425                :::*                                658/java           

[root@treefrog ~]#

java version:

bash-4.2$ java -version

java version "1.7.0_51"

OpenJDK Runtime Environment (fedora-2.4.5.0.fc19-x86_64 u51-b31)

OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

bash-4.2$

To me it looks like a bug in the agent code because as I told you, the system is becoming unusable just after the discovery starts. All virtual memory gets quickly consumed and the system starts swapping like crazy:

vmstat 1

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa

1 16 5885116  97788    816 155304  688    0  4260     0 1609 3076  1  2  1 96

0 17 5885016  95184    816 157124  912  372  3552   372 2338 4424  1  2  5 92

0 12 5885472  94432    708 155868  220  816  2824   916 1922 3623  1  3 20 76

0 12 5885436  97528    700 151276 1084  444  2196   488 1982 3767  1  3 16 80

0 13 5885056  96800    700 151212 1480  144  1528   144 2501 4987  1  2  9 88

0 14 5884572  95560    700 151512 1300    0  1616     0 2470 4921  1  2 16 82

0 18 5884588  97332    572 150976  428  376  3528   388 1963 3933  1  2  8 88

0 17 5884940  94300    572 152672  144  456  2028   576 1780 3268  1  3  0 96

0 18 5886632  97332    588 149276   32 1732  1880  1740 1851 3106  1  2 14 83

0 20 5889996  97008    728 151020    0 3372  1944  3420 1616 2853  1  3  0 96

1 13 5891784  99528    716 151004  500 1980  3696  2016 2278 4215  1  3  5 90

1 13 5891400  95532    716 154188 1256   20  4540    20 2653 5194  1  2 18 79

1 11 5892608  98432    652 150888  460 1504  1644  1504 2289 4343  2  3 13 82

0 12 5895716  98268    652 153152  160 3152  2972  3164 1606 2954  1  3  9 87

1 14 5895644  95044    652 156172  712  216  3800   216 2455 4821  1  2  0 96

1 16 5903904  99504    652 155116  164 8320   688  8364 1992 3657  1  3  8 88

1 15 5911956  95988    652 155368  324 8152  1640  8160 1924 3447  1  4 11 84

0 10 5911552  95868    656 156152  524    0  1628    44 2446 4794  1  2  2 95

0  8 5913536 105524    364 145104 1428 2460  2396  2460 2575 4696  1  4 19 76

0 11 5913248  95832    364 147160  876  144  2656   156 2225 4199  2  2  8 88

0 10 5914512  98428    364 147172  264 1636  1384  1668 2116 3886  1  3 10 86

0 15 5914608  95940    300 119144 2368 1104 44088  1244 4401 5693  4  5 11 81

0 26 5914620  95004    264 117232 1052  280 167400   412 20258 27834  1  3  2 95

0 32 1350696 4966284    416 131200 1208 59488 36152 59488 22826 43560  0  2  1 97 --> look at the swapped memory usage decrease when the discovery job fails

0 25 1350436 4956364    820 139952  400    0  5644     0 4159 8275  1  1  0 98

0 28 1350184 4950720    824 142624  980    0  3560     0 2355 4709  1  2  0 97

0 26 1350100 4946008   1364 146100  368    0  4332     0 2299 4534  1  2  0 98

0 27 1349980 4942040   1368 149416  720    0  4216    12 2404 4730  1  2  0 97

0 25 1349788 4939684   1520 150668  776    0  2124     4 2362 4748  1  1  0 99

1 25 1349684 4937328   1796 152060  580    0  2196     0 2353 4667  1  1  0 98

0 25 1349452 4935716   1796 152812  916    0  1708     0 2360 4664  1  1  0 98

0 25 1349352 4933980   1928 153780  656    0  1700     0 2441 4710  1  2  0 97

The memory exhaustion observed could be the consequence of this recursion/loop error found in the agent.log shown above, don't you think?

Pierre

sivar
8,699 Views

Pierre,

Thank you for the details.

I shall ask the developers about the memory issue here.

1) Fedora is not tested/supported by our QA.

2) Also, we don't recommend installing the Snap Creator server directly on the KVM hypervisor host.

     We prefer Snap Creator server to be installed on a separate host.

3) I see that the ports 8443 and 9090 are listening on tcp6

   You might want to try disabling ipv6, as we have no support for ipv6 at this time.

http://lifeofageekadmin.com/how-to-disable-ipv6-fedora-1718/

Thanks,
Siva Ramanathan

Public