Data Protection
Data Protection
Dear All,
I can't make the SC KVM plugin work, it always fails in the discovery phase with the following error about 10 minutes after the discover or backup operation is started:
Before the failure is returned, the KVM host is almost frozen due to the high amount on IOs observed on local disks. A couple of java processes (the SC agent) are apparently responsible for the freeze.
The exact error message is provided here:
[2014-03-03 15:05:34,345] DEBUG: Workflow : discover started with workflow id : 7
[2014-03-03 15:05:34,345] DEBUG: Version: Snap Creator Framework 4.1.0
[2014-03-03 15:05:34,345] DEBUG: Profile: init
[2014-03-03 15:05:34,345] DEBUG: Config: kvm
[2014-03-03 15:05:34,345] DEBUG: Action: discover
[2014-03-03 15:05:34,346] DEBUG: Plugin: kvm
[2014-03-03 15:05:34,346] DEBUG: Policy: null
[2014-03-03 15:05:34,346] DEBUG: Volume Name: detect
[2014-03-03 15:05:34,346] DEBUG: Snapshot Name: kvm_20140303150534
########## Agent validation ##########
[2014-03-03 15:05:34,559] INFO: Agent validation completed successfully for agent localhost:9090
########## Plugin validation ##########
[2014-03-03 15:05:34,584] INFO: Plugin validation completed successfully for plugin kvm
########## Application Discovery ##########
[2014-03-03 15:05:34,757] INFO: Performing discovery on : kvm
[2014-03-03 15:09:09,013] ERROR: SCF-00028: Auto Discovery for plug-in [kvm] failed with error [java.net.ConnectException: ConnectException invoking https://localhost:9090/SnapCreator/operations/status/059225cf-e871-4ee4-af7b-14b5e37776e4/7: Connection refused] and exit code [-1], Exiting!
[2014-03-03 15:09:09,802] DEBUG: Workflow : discover_OnFailure started with workflow id : 8
[2014-03-03 15:09:09,968] DEBUG: Version: Snap Creator Framework 4.1.0
[2014-03-03 15:09:09,969] DEBUG: Profile: init
[2014-03-03 15:09:09,969] DEBUG: Config: kvm
[2014-03-03 15:09:09,969] DEBUG: Action: discover_OnFailure
[2014-03-03 15:09:09,998] DEBUG: Plugin: kvm
[2014-03-03 15:09:09,998] DEBUG: Policy: null
[2014-03-03 15:09:10,034] DEBUG: Volume Name: detect
[2014-03-03 15:09:10,034] DEBUG: Snapshot Name: kvm_20140303150534
########## Agent Workflow Finalization ##########
[2014-03-03 15:10:42,366] INFO: Agent Workflow Finalization started
[2014-03-03 15:11:41,584] WARN: SCF-00080: Agent finalization for workflow failed with warning [java.net.SocketException: SocketException invoking http://localhost:9090/SnapCreator/Daemon/Agent: Unexpected end of file from server]. Continuing.
########## Snap Creator Framework 4.1.0 failed ##########
[2014-03-03 15:11:49,679] INFO: Pre Exit commands are not defined. Skipping !
I have no idea where to look at for debugging this as no log file (except an unknown.log) is created under the logs/plugins directory of the scAgent installation path.
Any help would be highly appreciated.
Kind regards
Pierre Kestremond
The only strange message I found in the agent.log for the above job is shown underlined below:
[2014-03-03T15:05:41,684+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stdoutLogger running=true
[2014-03-03T15:05:41,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stderrLogger running=true
[2014-03-03T15:05:43,645+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - 2014-03-03 15:05:43 (pid=17342) - Deleting request plugins/wrapper/in/7+kvm+1393855539683+discover
[2014-03-03T15:05:43,646+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - 2014-03-03 15:05:43 (pid=17342) - Will execute discover now (plugin=kvm, workflowId=7)
[2014-03-03T15:05:43,646+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - 2014-03-03 15:05:43 (pid=17342) - Config Parameters: ACTION, ALLOW_IP_ADDR, APP_AUTO_DISCOVERY, APP_CONF_PERSISTENCE, APP_DEFINED_BACKUP, APP_DEFINED_CLONE, APP_DEFINED_MOUNT, APP_DEFINED_RESTORE, APP_DEFINED_UMOUNT, APP_IGNORE_ERROR, APP_NAME, APP_VM, ARCHIVE_LOG_RECURSIVE_SEARCH, CMODE_SET, CMODE_SNAPSHOT_FORCE_DELETE, CONFDIR, CONFIG_NAME, CONFIG_TYPE, CONFNAME, KVM_CMD_RETRY_COUNT, KVM_VM_CONFIG_DIR, KVM_VM_MAPPING, LOG_NUM, LOG_TRACE_ENABLE, NTAP_ASUP_ERROR_ENABLE, NTAP_CIFS_EXPORT_ENABLE, NTAP_CLONE_FOR_BACKUP, NTAP_CLONE_SECONDARY, NTAP_CONSISTENCY_GROUP_SNAPSHOT, NTAP_CONSISTENCY_GROUP_TIMEOUT, NTAP_CONSISTENCY_GROUP_WAFL_SYNC, NTAP_EXTERNAL_SNAPSHOT_REGEX, NTAP_LUN_CLONE_RESERVATION, NTAP_NFS_EXPORT_ACCESS, NTAP_NFS_EXPORT_PERSISTENT, NTAP_NUM_VOL_CLONES, NTAP_OSSV_ENABLE, NTAP_PM_RUN_BACKUP, NTAP_PM_UPDATE, NTAP_PWD_PROTECTION, NTAP_SNAPMIRROR_CASCADING_UPDATE, NTAP_SNAPMIRROR_UPDATE, NTAP_SNAPMIRROR_USE_SNAPSHOT, NTAP_SNAPMIRROR_WAIT, NTAP_SNAPSHOT_CLEANUP, NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY, NTAP_SNAPSHOT_DEPENDENCY_IGNORE, NTAP_SNAPSHOT_DISABLE, NTAP_SNAPSHOT_NODELETE, NTAP_SNAPSHOT_RESTORE_AUTO_DETECT, NTAP_SNAPSHOT_RETENTIONS, NTAP_SNAPSHOT_RETENTION_AGE, NTAP_SNAPVAULT_NODELETE, NTAP_SNAPVAULT_RESTORE_WAIT, NTAP_SNAPVAULT_SNAPSHOT, NTAP_SNAPVAULT_UPDATE, NTAP_SNAPVAULT_WAIT, NTAP_TIMEOUT, NTAP_USERS, NTAP_USE_EXTERNAL_SNAPSHOT, NTAP_VOL_CLONE_RESERVE, OM_EVENT_GENERATE, OPERATION_ID, PORT, PROFILE_NAME, RECORD_ID, SC_AGENT, SC_AGENT_LOG_ENABLE, SC_AGENT_TIMEOUT, SC_AGENT_UNQUIESCE_TIMEOUT, SC_AGENT_WATCHDOG_ENABLE, SC_ITERATION_ID, SC_SERVER_PORT, SC_USER_NAME, SNAME, SNAME_COMBINED, SNAPDRIVE, SNAPDRIVE_DISCOVERY, SNAP_TIME, SNAP_TIMESTAMP_ONLY, SUCCESS_MSG, SVNAME_COMBINED, TRANSPORT, USE_GLOBAL_CONFIG, USE_PROXY, VOLUMES, obj,
[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Bin dir: /opt/NetApp/scAgent4.1.0/plugins/wrapper
[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Plugin dir: /opt/NetApp/scAgent4.1.0/plugins/
[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Perl Plugin dir: /opt/NetApp/scAgent4.1.0/plugins/perl
[2014-03-03T15:05:43,647+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Getting instance for kvm
[2014-03-03T15:05:43,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.WrapperHelperImpl - execute() - Result file doesn't exist
[2014-03-03T15:05:43,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stdoutLogger running=true
[2014-03-03T15:05:43,685+0100] DEBUG [pool-2-thread-3] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.ExecutorImpl - stderrLogger running=true
[2014-03-03T15:05:43,915+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - Executing discover
[2014-03-03T15:05:43,935+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - OS - execute() - trying to run ls /etc/libvirt/qemu
[2014-03-03T15:05:43,944+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - could not find ParserDetails.ini in /tmp/pdk-root-17342/XML/SAX
[2014-03-03T15:05:44,443+0100] DEBUG [Thread-15] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stdout, wid=7, p=kvm - OS - execute() - trying to run mount | grep nfs
[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - Deep recursion on subroutine "Net::SSLeay::AUTOLOAD" at
[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - /</opt/NetApp/scAgent4.1.0/plugins/wrapper/wrapper>Net/SSLeay.pm line 349 (#1)
[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - (W recursion) This subroutine has called itself (directly or indirectly)
[2014-03-03T15:05:44,474+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - 100 times more than it has returned. This probably indicates an
[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - infinite recursion, unless you're writing strange benchmark programs, in
[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - which case it indicates something else.
[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -
[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - This threshold can be changed from 100, by recompiling the perl binary,
[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm - setting the C pre-processor macro PERL_SUB_DEPTH_WARN to the desired value.
[2014-03-03T15:05:44,475+0100] DEBUG [Thread-16] com.netapp.snapcreator.agent.nextgen.plugin.legacyservice.StreamLogger - run() - type=stderr, wid=7, p=kvm -
[2014-03-03T15:05:44,956+0100] DEBUG [qtp1279199758-20] com.netapp.snapcreator.agent.nextgen.security.AuthorizationValidatorImpl - isAuthorized() - Incoming request from user scServer2agent (127.0.0.1, 127.0.0.1)
Hello Pierre,
1) Please provide us with the Operating System detail.
2) Also, please provide us with the KVM configuration options from your snapcreator config file.
3) Please refer this thread https://communities.netapp.com/message/122252
Does this file exist? ls /etc/libvirt/qemu
4) Are you able to run virsh commands?
Please let me know the details.
I shall forward the information to our KVM expert and get a feedback.
Thanks,
Siva Ramanathan
Hi Siva,
Here is the info you asked for:
1) Linux version
[root@treefrog ~]# cat /etc/redhat-release |
Fedora release 19 (Schrödinger’s Cat)
[root@treefrog ~]# uname -a
Linux treefrog.netapp.com 3.12.11-201.fc19.x86_64 #1 SMP Fri Feb 14 19:08:33 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@treefrog ~]#
2) the SC config file is included here:
[root@treefrog init]# grep -v "^#" kvm.conf
CONFIG_TYPE=STANDARD
VOLUMES=auto:detect
META_DATA_VOLUME=
LOG_NUM=30
SNAPDRIVE=N
SNAPDRIVE_DISCOVERY=N
LOG_TRACE_ENABLE=N
NTAP_PWD_PROTECTION=Y
USE_PROXY=N
NTAP_TIMEOUT=60
USE_GLOBAL_CONFIG=N
FEDERATED_APPLICATIONS=
ALLOW_IP_ADDR=N
SC_TMP_DIR=
NTAP_USERS=vs_kvm_pierrek_admin:vsadmin/793563786f416f4a6661735434714e3479484e3872413d3d0a
MANAGEMENT_INTERFACES=
PORT=443
TRANSPORT=HTTPS
SECONDARY_INTERFACES=
CMODE_CLUSTER_NAME=
CMODE_CLUSTER_USERS=
CMODE_SNAPSHOT_FORCE_DELETE=Y
CMODE_SET=Y
SNAME=kvm
SNAP_TIMESTAMP_ONLY=Y
NTAP_SNAPSHOT_POLICIES=
NTAP_USE_EXTERNAL_SNAPSHOT=N
NTAP_SNAPSHOT_NODELETE=N
NTAP_CONSISTENCY_GROUP_TIMEOUT=MEDIUM
NTAP_SNAPSHOT_RETENTIONS=DAILY:7
NTAP_SNAPSHOT_CREATE_CMD01=
NTAP_CONSISTENCY_GROUP_WAFL_SYNC=N
NTAP_CONSISTENCY_GROUP_SNAPSHOT=N
NTAP_SNAPSHOT_RETENTION_AGE=7
NTAP_METADATA_SNAPSHOT_CREATE_CMD=
NTAP_SNAPSHOT_DELETE_BY_AGE_ONLY=N
NTAP_SNAPSHOT_DEPENDENCY_IGNORE=N
NTAP_SNAPSHOT_RESTORE_AUTO_DETECT=N
NTAP_SNAPSHOT_DISABLE=N
NTAP_SNAPSHOT_LABEL=
NTAP_EXTERNAL_SNAPSHOT_REGEX=.*
NTAP_SNAPSHOT_CLEANUP=N
NTAP_SNAPVAULT_UPDATE=N
SNAPVAULT_QTREE_INCLUDE=
NTAP_SNAPVAULT_NODELETE=N
NTAP_SNAPVAULT_RESTORE_WAIT=N
NTAP_SNAPVAULT_RETENTION_AGE=
NTAP_SNAPVAULT_SNAPSHOT=N
NTAP_SNAPVAULT_MAX_TRANSFER=
NTAP_SNAPVAULT_WAIT=0
NTAP_SNAPVAULT_RETENTIONS=
SNAPVAULT_VOLUMES=
NTAP_SNAPMIRROR_WAIT=60
NTAP_SNAPMIRROR_USE_SNAPSHOT=N
NTAP_SNAPMIRROR_MAX_TRANSFER=
SNAPMIRROR_VOLUMES=
SNAPMIRROR_CASCADING_VOLUMES=
NTAP_SNAPMIRROR_CASCADING_UPDATE=N
NTAP_SNAPMIRROR_UPDATE=N
NTAP_CLONE_IGROUP_MAP=
NTAP_NFS_EXPORT_ACCESS=read-only
NTAP_LUN_CLONE_RESERVATION=N
NTAP_CLONE_SECONDARY_VOLUMES=
NTAP_NUM_VOL_CLONES=1
NTAP_NFS_EXPORT_PERSISTENT=N
NTAP_CLONE_FOR_BACKUP=N
NTAP_CIFS_EXPORT_ENABLE=N
NTAP_CLONE_SECONDARY=N
NTAP_NFS_EXPORT_HOST=
NTAP_VOL_CLONE_RESERVE=NONE
NTAP_DFM_DATA_SET=
NTAP_PM_RUN_BACKUP=N
NTAP_PM_UPDATE=N
NTAP_DFM_SNAPSHOT_FORMAT=
OM_TRANSPORT=
OM_HOST=
OM_EVENT_GENERATE=N
OM_USER=
OM_PWD=
OM_PORT=
NTAP_OSSV_FS_SNAPSHOT=
NTAP_OSSV_FS_SNAPSHOT_CREATE_CMD01=
NTAP_OSSV_HOMEDIR=
NTAP_OSSV_ENABLE=N
UMOUNT_CMD01=
ARCHIVE_CMD01=
APP_UNQUIESCE_CMD01=
MOUNT_CMD01=
APP_QUIESCE_CMD01=
PRE_CLONE_CREATE_CMD01=
PRE_RESTORE_CMD01=
PRE_NTAP_CMD01=
PRE_APP_UNQUIESCE_CMD01=
PRE_EXIT_CMD01=
PRE_APP_QUIESCE_CMD01=
PRE_NTAP_CLONE_DELETE_CMD01=
POST_NTAP_DATA_TRANSFER_CMD01=
POST_RESTORE_CMD01=
POST_APP_QUIESCE_CMD01=
POST_NTAP_CMD01=
POST_CLONE_CREATE_CMD01=
POST_APP_UNQUIESCE_CMD01=
SUCCESS_MSG=INFO: NetApp Snap Creator Framework finished successfully "(Action: %ACTION)"
NTAP_ASUP_ERROR_ENABLE=N
SENDTRAP=
SUCCESS_TRAP=
SC_AGENT_UNQUIESCE_TIMEOUT=305
SC_AGENT_WATCHDOG_ENABLE=N
SC_AGENT=localhost:9090
SC_AGENT_LOG_ENABLE=N
SC_AGENT_TIMEOUT=600
SC_CLONE_TARGET=
APP_NAME=kvm
APP_CONF_PERSISTENCE=Y
APP_IGNORE_ERROR=N
APP_DEFINED_RESTORE=Y
PYTHON_OPTS=
APP_DEFINED_CLONE=N
JAVA_HOME=
PYTHON_HOME=
FS_NAME=
JVM_ARGS=
PERL_OPTS=
PERL_HOME=
PLUGINS_FORWARD=
VALIDATE_VOLUMES=
APP_AUTO_DISCOVERY=Y
JAVA_CLASSPATH=
PLUGINS_RESTORE=
APP_DEFINED_BACKUP=N
APP_DEFINED_UMOUNT=N
PLUGINS_REVERSE=
APP_DEFINED_MOUNT=N
ARCHIVE_LOG_RECURSIVE_SEARCH=N
ARCHIVE_LOG_DIR=
ARCHIVE_LOG_EXT=
ARCHIVE_LOG_RETENTION=
ARCHIVE_LOG_ENABLE=
KVM_CMD_RETRY_COUNT=3
KVM_VM_MAPPING=auto:detect:detect
KVM_VM_CONFIG_DIR=/etc/libvirt/qemu
[root@treefrog init]#
3) here is the content of the /etc/libvirt/qemu directory:
[root@treefrog init]# ls -l /etc/libvirt/qemu
total 52
-rw-------. 1 root root 2868 Dec 24 11:23 arch-linux-01.xml
-rw-------. 1 root root 2802 Dec 20 16:32 arch-linux-02.xml
-rw-------. 1 root root 2852 Jan 26 08:12 fedora18-01.xml
-rw-------. 1 root root 2956 Jan 31 13:40 netapp7.xml
drwx------. 3 root root 4096 Jan 17 03:12 networks
-rw-------. 1 root root 5719 Jan 19 13:23 oel6-01.xml
-rw-------. 1 root root 5120 Dec 31 14:41 oel6-01.xml.save
-rw-------. 1 root root 2313 Jan 8 12:57 pierrek-lw7.xml
-rw-------. 1 root root 2915 Mar 3 11:28 tiny-centos64-template.xml --> This one is located on NFS storage exported from my SVM
-rw-------. 1 root root 2877 Mar 3 09:15 tiny-centos64.xml
-rw-------. 1 root root 2546 Dec 21 12:57 win-2k8r2-02.xml
[root@treefrog init]#
I can run virsh commands as you can see here:
[root@treefrog init]# virsh list --all
Id Name State
----------------------------------------------------
3 tiny-centos64-template running
- arch-linux-01 shut off
- arch-linux-02 shut off
- fedora18-01 shut off
- netapp7 shut off
- oel6-01 shut off
- pierrek-lw7 shut off
- tiny-centos64 shut off
- win-2k8r2-02 shut off
[root@treefrog init]#
Thanks already for your help.
Pierre
We have not qualified or tested Fedora with Snap Creator. (
Meanwhile, could you please check by disbaling selinux?
I see that the files have ACL information
[root@treefrog init]# ls -l /etc/libvirt/qemu
total 52
-rw-------. 1 root root 2868 Dec 24 11:23 arch-linux-01.xml
Where is your snapcreator server running?
Also, please try with SC_AGENT=hostname:9090 (instead of localhost).
Please ensure /etc/hosts on the agent includes the storage controller IPs. (or in your DNS)
Thanks,
Siva Ramanathan
Hi Siva,
I changed the SC_AGENT parameter in the config file to the FQDN of the KVM host (in this test setup I am running the agent and server on the same machine, the KVM host).
Now I am getting the below error:
It is different from the one I had before. I will now disable SELINUX (I will do it only for troubleshooting but selinux should be in enforcing mode on my host). After a reboot I will let you know the result.
Kind regards
Pierre
Sorry the screenshot was not added. Here is the debug output of the job:
[2014-03-03 17:06:18,597] DEBUG: Workflow : discover started with workflow id : 9
[2014-03-03 17:06:18,600] DEBUG: Version: Snap Creator Framework 4.1.0
[2014-03-03 17:06:18,600] DEBUG: Profile: init
[2014-03-03 17:06:18,600] DEBUG: Config: kvm
[2014-03-03 17:06:18,600] DEBUG: Action: discover
[2014-03-03 17:06:18,600] DEBUG: Plugin: kvm
[2014-03-03 17:06:18,600] DEBUG: Policy: null
[2014-03-03 17:06:18,600] DEBUG: Volume Name: detect
[2014-03-03 17:06:18,600] DEBUG: Snapshot Name: kvm_20140303170618
########## Agent validation ##########
[2014-03-03 17:06:19,154] INFO: Agent validation completed successfully for agent treefrog.netapp.com:9090
########## Plugin validation ##########
[2014-03-03 17:06:19,268] INFO: Plugin validation completed successfully for plugin kvm
########## Application Discovery ##########
[2014-03-03 17:06:19,560] INFO: Performing discovery on : kvm
[2014-03-03 17:18:09,207] ERROR: [treefrog.netapp.com:9090 (4.1.0.1)] Operation failed. Reason: Executor timed out. Make sure plugins/wrapper/wrapper binary has execute permission set. Increasing WRAPPER_TIMEOUT_IN_MSEC in agent.properties may help, too.
[2014-03-03 17:18:09,549] ERROR: [treefrog.netapp.com:9090(4.1.0.1)] SCF-00028: Auto Discovery for plug-in [kvm] failed with error [validation failed] and exit code [102], Exiting!
[2014-03-03 17:18:10,285] DEBUG: Workflow : discover_OnFailure started with workflow id : 10
[2014-03-03 17:18:10,310] DEBUG: Version: Snap Creator Framework 4.1.0
[2014-03-03 17:18:10,310] DEBUG: Profile: init
[2014-03-03 17:18:10,310] DEBUG: Config: kvm
[2014-03-03 17:18:10,310] DEBUG: Action: discover_OnFailure
[2014-03-03 17:18:10,356] DEBUG: Plugin: kvm
[2014-03-03 17:18:10,356] DEBUG: Policy: null
[2014-03-03 17:18:10,397] DEBUG: Volume Name: detect
[2014-03-03 17:18:10,397] DEBUG: Snapshot Name: kvm_20140303170618
########## Agent Workflow Finalization ##########
[2014-03-03 17:18:10,715] INFO: Agent Workflow Finalization started
[2014-03-03 17:18:28,203] INFO: [treefrog.netapp.com:9090 (4.1.0.1)] Finalized workflow with id 9
[2014-03-03 17:18:28,203] INFO: Agent Workflow Finalization finished successfully
########## Snap Creator Framework 4.1.0 failed ##########
[2014-03-03 17:18:29,516] INFO: Pre Exit commands are not defined. Skipping !
From the treefrog host can you perform,
fping vs_kvm_pierrek_admin
Siva,
I have disabled selinux in /etc/selinux/config then the host has been rebooted.
[root@treefrog ~]# getenforce
Disabled
[root@treefrog ~]#
I have restarted the discovery job and it has failed the same way as with selinux in enforcing mode:
[2014-03-03 19:59:24,016] DEBUG: Workflow : discover started with workflow id : 11
[2014-03-03 19:59:24,018] DEBUG: Version: Snap Creator Framework 4.1.0
[2014-03-03 19:59:24,018] DEBUG: Profile: init
[2014-03-03 19:59:24,018] DEBUG: Config: kvm
[2014-03-03 19:59:24,018] DEBUG: Action: discover
[2014-03-03 19:59:24,018] DEBUG: Plugin: kvm
[2014-03-03 19:59:24,018] DEBUG: Policy: null
[2014-03-03 19:59:24,019] DEBUG: Volume Name: detect
[2014-03-03 19:59:24,019] DEBUG: Snapshot Name: kvm_20140303195923
########## Agent validation ##########
[2014-03-03 19:59:24,533] INFO: Agent validation completed successfully for agent treefrog.netapp.com:9090
########## Plugin validation ##########
[2014-03-03 19:59:24,629] INFO: Plugin validation completed successfully for plugin kvm
########## Application Discovery ##########
[2014-03-03 19:59:24,859] INFO: Performing discovery on : kvm
[2014-03-03 20:09:20,155] ERROR: SCF-00028: Auto Discovery for plug-in [kvm] failed with error [java.net.ConnectException: ConnectException invoking https://treefrog.netapp.com:9090/SnapCreator/operations/status/c19a2d6e-71d7-4195-ad28-30f78374053a/11: Connection refused] and exit code [-1], Exiting!
[2014-03-03 20:09:25,956] DEBUG: Workflow : discover_OnFailure started with workflow id : 12
[2014-03-03 20:09:25,956] DEBUG: Version: Snap Creator Framework 4.1.0
[2014-03-03 20:09:25,956] DEBUG: Profile: init
[2014-03-03 20:09:25,956] DEBUG: Config: kvm
[2014-03-03 20:09:25,956] DEBUG: Action: discover_OnFailure
[2014-03-03 20:09:25,956] DEBUG: Plugin: kvm
[2014-03-03 20:09:25,956] DEBUG: Policy: null
[2014-03-03 20:09:25,956] DEBUG: Volume Name: detect
[2014-03-03 20:09:25,956] DEBUG: Snapshot Name: kvm_20140303195923
########## Agent Workflow Finalization ##########
[2014-03-03 20:09:26,105] INFO: Agent Workflow Finalization started
[2014-03-03 20:09:46,525] WARN: SCF-00080: Agent finalization for workflow failed with warning [com.ctc.wstx.exc.WstxIOException: Connection refused]. Continuing.
########## Snap Creator Framework 4.1.0 failed ##########
[2014-03-03 20:09:47,977] INFO: Pre Exit commands are not defined. Skipping !
fping result:
[root@treefrog adm]# yum install fping
Loaded plugins: langpacks, refresh-packagekit, verify
Resolving Dependencies
--> Running transaction check
---> Package fping.x86_64 0:3.5-3.fc19 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
=============================================================================================================================================================
Package Arch Version Repository Size
=============================================================================================================================================================
Installing:
fping x86_64 3.5-3.fc19 updates 45 k
Transaction Summary
=============================================================================================================================================================
Install 1 Package
Total download size: 45 k
Installed size: 102 k
Is this ok [y/d/N]: y
Downloading packages:
fping-3.5-3.fc19.x86_64.rpm | 45 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : fping-3.5-3.fc19.x86_64 1/1
Verifying : fping-3.5-3.fc19.x86_64 1/1
Installed:
fping.x86_64 0:3.5-3.fc19
Complete!
[root@treefrog adm]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/fedora_treefrog-root 99G 19G 75G 20% /
devtmpfs 2.9G 0 2.9G 0% /dev
tmpfs 2.9G 84K 2.9G 1% /dev/shm
tmpfs 2.9G 992K 2.9G 1% /run
tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup
tmpfs 2.9G 6.3M 2.9G 1% /tmp
/dev/sda1 976M 149M 760M 17% /boot
/dev/mapper/fedora_treefrog-home 583G 321G 233G 59% /home
vs_kvm_pierrek:/kvm2 95G 1.3G 94G 2% /var/lib/libvirt/images/vs_kvm_pierrek
[root@treefrog adm]# fping vs_kvm_pierrek_admin
vs_kvm_pierrek_admin is alive
[root@treefrog adm]#
Thank you.
This sure looks like a network issue. The process waited for 10 minutes (600 seconds) and came back with a networking related error.
[2014-03-03 19:59:24,859] INFO: Performing discovery on : kvm
[2014-03-03 20:09:20,155] ERROR: SCF-00028: Auto Discovery for plug-in [kvm] failed with error [java.net.ConnectException: ConnectException invoking https://treefrog.netapp.com:9090/SnapCreator/operations/status/c19a2d6e-71d7-4195-ad28-30f78374053a/11: Connection refused] and exit code [-1], Exiting!
What is your java version? -- please try with a latest java 7 JRE (lesser than u51 update)
Please provide us with the below output.
netstat -tulpn
Can you try disable firewall and try a backup? (I am not sure if firewall is running)
/etc/init.d/iptables stop
Thanks,
Siva Ramanathan
Siva,
Same behaviour/result with firewalld stopped/disabled:
[root@treefrog ~]# service firewalld status
Redirecting to /bin/systemctl status firewalld.service
firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled)
Active: inactive (dead)
Mar 03 17:32:28 treefrog.netapp.com systemd[1]: Unit firewalld.service entered failed state.
Mar 03 17:34:01 treefrog.netapp.com systemd[1]: Started firewalld - dynamic firewall daemon.
Mar 03 17:36:10 treefrog.netapp.com firewalld[520]: 2014-03-03 17:36:10 ERROR: UNKNOWN_INTERFACE: wlp16s0
Mar 03 17:43:37 treefrog.netapp.com systemd[1]: Started firewalld - dynamic firewall daemon.
Mar 03 17:44:24 treefrog.netapp.com systemd[1]: firewalld.service: main process exited, code=exited, status=1/FAILURE
Mar 03 17:44:24 treefrog.netapp.com systemd[1]: Stopped firewalld - dynamic firewall daemon.
Mar 03 17:44:24 treefrog.netapp.com systemd[1]: Unit firewalld.service entered failed state.
Mar 03 19:25:44 treefrog.netapp.com systemd[1]: Started firewalld - dynamic firewall daemon.
Mar 03 21:55:09 treefrog.netapp.com systemd[1]: Stopping firewalld - dynamic firewall daemon...
Mar 03 21:55:14 treefrog.netapp.com systemd[1]: Stopped firewalld - dynamic firewall daemon.
[root@treefrog ~]# netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 1926/mysqld
tcp 0 0 0.0.0.0:55979 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:139 0.0.0.0:* LISTEN 1560/smbd
tcp 0 0 127.0.0.1:5900 0.0.0.0:* LISTEN 3330/qemu-system-x8
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 552/rpcbind
tcp 0 0 192.168.122.1:53 0.0.0.0:* LISTEN 1929/dnsmasq
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 683/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 534/cupsd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 5111/sendmail: acce
tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN 1560/smbd
tcp 0 0 0.0.0.0:45439 0.0.0.0:* LISTEN 2842/rpc.statd
tcp6 0 0 :::9090 :::* LISTEN 4841/java
tcp6 0 0 :::53578 :::* LISTEN 2842/rpc.statd
tcp6 0 0 :::139 :::* LISTEN 1560/smbd
tcp6 0 0 :::111 :::* LISTEN 552/rpcbind
tcp6 0 0 :::55700 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN 683/sshd
tcp6 0 0 ::1:631 :::* LISTEN 534/cupsd
tcp6 0 0 :::8443 :::* LISTEN 3394/java
tcp6 0 0 :::445 :::* LISTEN 1560/smbd
udp 0 0 192.168.122.1:53 0.0.0.0:* 1929/dnsmasq
udp 0 0 0.0.0.0:67 0.0.0.0:* 1929/dnsmasq
udp 0 0 0.0.0.0:68 0.0.0.0:* 5071/dhclient
udp 0 0 0.0.0.0:68 0.0.0.0:* 640/dhclient
udp 0 0 0.0.0.0:111 0.0.0.0:* 552/rpcbind
udp 0 0 0.0.0.0:123 0.0.0.0:* 532/chronyd
udp 0 0 0.0.0.0:323 0.0.0.0:* 532/chronyd
udp 0 0 0.0.0.0:712 0.0.0.0:* 552/rpcbind
udp 0 0 0.0.0.0:45935 0.0.0.0:* 2842/rpc.statd
udp 0 0 127.0.0.1:898 0.0.0.0:* 2842/rpc.statd
udp 0 0 0.0.0.0:5353 0.0.0.0:* 539/avahi-daemon: r
udp 0 0 0.0.0.0:42455 0.0.0.0:* 539/avahi-daemon: r
udp 0 0 0.0.0.0:38723 0.0.0.0:* -
udp 0 0 0.0.0.0:2909 0.0.0.0:* 640/dhclient
udp 0 0 0.0.0.0:7307 0.0.0.0:* 5071/dhclient
udp6 0 0 :::111 :::* 552/rpcbind
udp6 0 0 :::123 :::* 532/chronyd
udp6 0 0 :::323 :::* 532/chronyd
udp6 0 0 :::712 :::* 552/rpcbind
udp6 0 0 :::54054 :::* 2842/rpc.statd
udp6 0 0 :::58796 :::* -
udp6 0 0 :::43631 :::* 640/dhclient
udp6 0 0 :::48284 :::* 5071/dhclient
udp6 0 0 :::40425 :::* 658/java
[root@treefrog ~]#
java version:
bash-4.2$ java -version
java version "1.7.0_51"
OpenJDK Runtime Environment (fedora-2.4.5.0.fc19-x86_64 u51-b31)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
bash-4.2$
To me it looks like a bug in the agent code because as I told you, the system is becoming unusable just after the discovery starts. All virtual memory gets quickly consumed and the system starts swapping like crazy:
vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 16 5885116 97788 816 155304 688 0 4260 0 1609 3076 1 2 1 96
0 17 5885016 95184 816 157124 912 372 3552 372 2338 4424 1 2 5 92
0 12 5885472 94432 708 155868 220 816 2824 916 1922 3623 1 3 20 76
0 12 5885436 97528 700 151276 1084 444 2196 488 1982 3767 1 3 16 80
0 13 5885056 96800 700 151212 1480 144 1528 144 2501 4987 1 2 9 88
0 14 5884572 95560 700 151512 1300 0 1616 0 2470 4921 1 2 16 82
0 18 5884588 97332 572 150976 428 376 3528 388 1963 3933 1 2 8 88
0 17 5884940 94300 572 152672 144 456 2028 576 1780 3268 1 3 0 96
0 18 5886632 97332 588 149276 32 1732 1880 1740 1851 3106 1 2 14 83
0 20 5889996 97008 728 151020 0 3372 1944 3420 1616 2853 1 3 0 96
1 13 5891784 99528 716 151004 500 1980 3696 2016 2278 4215 1 3 5 90
1 13 5891400 95532 716 154188 1256 20 4540 20 2653 5194 1 2 18 79
1 11 5892608 98432 652 150888 460 1504 1644 1504 2289 4343 2 3 13 82
0 12 5895716 98268 652 153152 160 3152 2972 3164 1606 2954 1 3 9 87
1 14 5895644 95044 652 156172 712 216 3800 216 2455 4821 1 2 0 96
1 16 5903904 99504 652 155116 164 8320 688 8364 1992 3657 1 3 8 88
1 15 5911956 95988 652 155368 324 8152 1640 8160 1924 3447 1 4 11 84
0 10 5911552 95868 656 156152 524 0 1628 44 2446 4794 1 2 2 95
0 8 5913536 105524 364 145104 1428 2460 2396 2460 2575 4696 1 4 19 76
0 11 5913248 95832 364 147160 876 144 2656 156 2225 4199 2 2 8 88
0 10 5914512 98428 364 147172 264 1636 1384 1668 2116 3886 1 3 10 86
0 15 5914608 95940 300 119144 2368 1104 44088 1244 4401 5693 4 5 11 81
0 26 5914620 95004 264 117232 1052 280 167400 412 20258 27834 1 3 2 95
0 32 1350696 4966284 416 131200 1208 59488 36152 59488 22826 43560 0 2 1 97 --> look at the swapped memory usage decrease when the discovery job fails
0 25 1350436 4956364 820 139952 400 0 5644 0 4159 8275 1 1 0 98
0 28 1350184 4950720 824 142624 980 0 3560 0 2355 4709 1 2 0 97
0 26 1350100 4946008 1364 146100 368 0 4332 0 2299 4534 1 2 0 98
0 27 1349980 4942040 1368 149416 720 0 4216 12 2404 4730 1 2 0 97
0 25 1349788 4939684 1520 150668 776 0 2124 4 2362 4748 1 1 0 99
1 25 1349684 4937328 1796 152060 580 0 2196 0 2353 4667 1 1 0 98
0 25 1349452 4935716 1796 152812 916 0 1708 0 2360 4664 1 1 0 98
0 25 1349352 4933980 1928 153780 656 0 1700 0 2441 4710 1 2 0 97
The memory exhaustion observed could be the consequence of this recursion/loop error found in the agent.log shown above, don't you think?
Pierre
Pierre,
Thank you for the details.
I shall ask the developers about the memory issue here.
1) Fedora is not tested/supported by our QA.
2) Also, we don't recommend installing the Snap Creator server directly on the KVM hypervisor host.
We prefer Snap Creator server to be installed on a separate host.
3) I see that the ports 8443 and 9090 are listening on tcp6
You might want to try disabling ipv6, as we have no support for ipv6 at this time.
http://lifeofageekadmin.com/how-to-disable-ipv6-fedora-1718/
Thanks,
Siva Ramanathan