ONTAP Discussions
ONTAP Discussions
Hello,
I'm in the process of setting up a new FAS2040. Everything is working fine, the only thing that doesn't work is the autosupport to netapp. Doesn't matter what i do i will get the error's showed below. Anyone have had the same problem (and solved it) ? I'm using NetApp Release 8.0.2 7-Mode: Mon Jun 13 14:14:26 PDT 2011
RNLUITST001-B> Tue Dec 6 00:45:50 CET [RNLUITST001-B: asup.smtp.fail:warning]: AutoSupport mail to 1XX.XXX.XXX.XX1 failed (Failed to transmit smtp asup) for messa ge: HA Group Notification from RNLUITST001-B (SYSTEM CONFIGURATION WARNING) WARN ING
Tue Dec 6 00:45:50 CET [RNLUITST001-B: asup.smtp.retry:info]: AutoSupport mail (HA Group Notification from RNLUITST001-B (SYSTEM CONFIGURATION WARNING) WARNING ) was not sent for host (0). The system will retry later to send the message
Tue Dec 6 00:45:56 CET [RNLUITST001-B: callhome.invoke.all:info]: User triggere d complete call home for USER_TRIGGERED (COMPLETE:AutoSupport)
Tue Dec 6 00:46:16 CET [RNLUITST001-B: asup.general.file.create.failed:error]: AutoSupport file (/mroot/etc/log/.cm_stats_hourly_done) was unable to be written to the file spool subdirectory.
Tue Dec 6 00:46:58 CET [RNLUITST001-B: asup.smtp.fail:warning]: AutoSupport mail to 1XX.XXX.XXX.XX1 failed (Failed to transmit smtp asup) for message: HA Group Notification from RNLUITST001-B (SYSTEM CONFIGURATION WARNING) WARNING
Tue Dec 6 00:46:58 CET [RNLUITST001-B: asup.smtp.retry:info]: AutoSupport mail (HA Group Notification from RNLUITST001-B (SYSTEM CONFIGURATION WARNING) WARNING) was not sent for host (0). The system will retry later to send the message
Here is my configuration of autosupport (ip and mail addresses are changed)
RNLUITST001-B> options autosupport
autosupport.cifs.verbose off
autosupport.content complete (value might be overwritten in takeover)
autosupport.doit DONT
autosupport.enable on (value might be overwritten in takeover)
autosupport.from Netapp.Fas2040B@yourcompany.com (value might be overwritten in takeover)
autosupport.local.nht_data.enable off (value might be overwritten in takeover)
autosupport.local.performance_data.enable off (value might be overwritten in takeover)
autosupport.mailhost 1XX.XXX.XXX.XX1 (value might be overwritten in takeover)
autosupport.minimal.subject.id hostname (value might be overwritten in takeover)
autosupport.nht_data.enable on (value might be overwritten in takeover)
autosupport.noteto (value might be overwritten in takeover)
autosupport.partner.to (value might be overwritten in takeover)
autosupport.performance_data.doit DONT
autosupport.performance_data.enable on (value might be overwritten in takeover)
autosupport.periodic.tx_window 1h (value might be overwritten in takeover)
autosupport.retry.count 15 (value might be overwritten in takeover)
autosupport.retry.interval 4m (value might be overwritten in takeover)
autosupport.support.enable on (value might be overwritten in takeover)
autosupport.support.proxy (value might be overwritten in takeover)
autosupport.support.to autosupport@netapp.com (value might be overwritten in takeover)
autosupport.support.transport smtp (value might be overwritten in takeover)
autosupport.support.url support.netapp.com/asupprod/post/1.0/postAsup (value might be overwrit ten in takeover)
autosupport.throttle on (value might be overwritten in takeover)
autosupport.to Help@mycompany.com,Me@external.company.com,itmanager@headoffice.com (value might be overwritten in takeover)
could you use the Netapp.Fas2040B@yourcompany.com to send out the email?
another other basic checking, like ping your mailhost and telnet to your mailhost.
thanks a lot
The edited messages seem to imply that the AutoSupport is being collected, properly retried and so on but not delivered.
My focus would be on networking to the SMTP (Email) server (as defined by 'options autosupport.mailhost') and SMTP delivery.
Using things like "ping" to the mailhost and using other machines on the same network to verify port 25 (using something like "telnet MAILHOST 25") would be good first steps for network server verification. The first one does connectivity at the IP level. The second one does connectivity at the TCP level and does SMTP service verification. The telnet xxx 25 response should print one line that looks like "220 SMTP Service here" with the 200-259 code being the signal that the SMTP service is ready.
As for SMTP, there are two sides to this. Data ONTAP sending the AutoSupport and the SMTP server receiving it. The logs on either side provide clues to things like whether a connection is successful, whether the recipients/sender addresses are acceptable, the length of the message is acceptable and so on. I can't comment on what to look at for the SMTP (Email) server side -- don't know what kind of server. For Data ONTAP 8.0 and later releases, unsuccessful SMTP deliveries are recorded in <root volume>/etc/log/mlog/notifyd.log.
Searching for strings like "MAIL FROM" and "RCPT TO" in the notifyd.log (assuming the connection was established and started) may provide a subsequent log line to explain why Data ONTAP believes the mail wasn't delivered. If the connection is never established or was rejected before the "MAIL FROM" command is issued, one has to scan the log file for the IPv4 address and/or hostname value of "options autosupport.mailhost". The log lines have right before and subsequently after this location must be examined to determine the type of failure or misunderstanding between Data ONTAP and the SMTP server.
I have seen similar situations twice before, outside of network connectivity problems to the mail server.
1. Restricted mail server, ensure the mail server allows forwarding from the filer. We restrict our mail hosts to prevent unexpected hosts from turning our mail domain into a spam bot.
2. Restricted message sizes, ensure the mail server does not restrict attachments to a size that is below the ASUP size. This one is tricky because during an issue of some sort, the size can grow. We have moved to using HTTPS to send ASUPs for this very reason.
- Scott
The below was written before I found the solution of deleting and readding the default route:
I'm seeing this too, for both SMTP and HTTPS. I think it started when I joined the interfaces into an ifgrp, looking at the timestamps on /etc/rc and when the last ASUP message got through to MyAutoSupport. Here's what I see on the console after issuing options autosupport.doit now for HTTPS:
Thu Dec 15 10:38:27 WST [algol-a: callhome.invoke.all:info]: User triggered complete call home for USER_TRIGGERED (COMPLETE:now)
Thu Dec 15 10:38:57 WST [algol-a: asup.general.file.create.failed:error]: AutoSupport file (/mroot/etc/log/.cm_stats_hourly_done) was unable to be written to the file spool subdirectory.
Thu Dec 15 10:39:03 WST [algol-a: asup.smtp.host:info]: AutoSupport cannot connect to host mail.ccgs.wa.edu.au (Network is unreachable) for message: HA Group Notification from algol-a (USER_TRIGGERED (COMPLETE:now)) INFO
Thu Dec 15 10:39:03 WST [algol-a: asup.smtp.retry:info]: AutoSupport mail (HA Group Notification from algol-a (USER_TRIGGERED (COMPLETE:now)) INFO) was not sent for host (0). The system will retry later to send the message
Thu Dec 15 10:39:03 WST [algol-a: asup.post.host:info]: AutoSupport (HA Group Notification from algol-a (USER_TRIGGERED (COMPLETE:now)) INFO) cannot connect to url https://202.3.113.8/asupprod/post/1.0/postAsup (Failed to connect to 202.3.113.8: Network is unreachable)
Thu Dec 15 10:39:03 WST [algol-a: asup.post.retry:info]: AutoSupport message (HA Group Notification from algol-a (USER_TRIGGERED (COMPLETE:now)) INFO) was not posted to NetApp for host (0). The system will retry later to post the message
and for SMTP:
Wed Dec 14 17:16:06 WST [algol-a: callhome.invoke.all:info]: User triggered complete call home for USER_TRIGGERED (COMPLETE:now)
algol-a> Wed Dec 14 17:16:36 WST [algol-a: asup.general.file.create.failed:error]: AutoSupport file (/mroot/etc/log/.cm_stats_hourly_done) was unable to be written to the file spool subdirectory.
Wed Dec 14 17:16:42 WST [algol-a: asup.smtp.host:info]: AutoSupport cannot connect to host mail.ccgs.wa.edu.au (Network is unreachable) for message: HA Group Notification from algol-a (USER_TRIGGERED (COMPLETE:now)) INFO
Wed Dec 14 17:16:42 WST [algol-a: asup.smtp.retry:info]: AutoSupport mail (HA Group Notification from algol-a (USER_TRIGGERED (COMPLETE:now)) INFO) was not sent for host (0). The system will retry later to send the message
In notifyd.log there's this:
00000024.00007fe4 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] OPTOUT:Function asup_in_optout1 called
00000024.00007fe5 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] OPTOUT:Function asup_in_optout2 called
00000024.00007fe6 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] OPTOUT:Asup in optout false
00000024.00007fe7 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] deliver_http_asup: We're supposed to send /mroot/etc/log/autosupport/201112150956.0 to support.netapp.com/asupprod/post/1.0/postAsup via https protocol
00000024.00007fe8 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] gethostby7mode here for 'support.netapp.com'
00000024.00007fe9 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] gethostby7mode zapi for 'support.netapp.com' worked
00000024.00007fea 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] ZAPI net_resolve returned '202.3.113.8' with 1
00000024.00007feb 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] gethostby7mode zapi for 'support.netapp.com' returns
00000024.00007fec 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] asup_job deliver_http h 0x699d20
00000024.00007fed 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] asup_job deliver http h->h_addr 0x699dd0
00000024.00007fee 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] deliver_http_asup: curl_easy_perform() uri 'https://support.netapp.com/asupprod/post/1.0/postAsup' replaced with uri 'https://202.3.113.8/asupprod/post/1.0/postAsup'
00000024.00007fef 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] asup_job deliver using agent: 'filer/NetApp/8.0.2'
00000024.00007ff0 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] * About to connect() to 202.3.113.8 port 443
00000024.00007ff1 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] * Trying 202.3.113.8... * Failed to connect to 202.3.113.8: Network is unreachable
00000024.00007ff2 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] * Unknown error: 0
00000024.00007ff3 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] * couldn't connect to host
00000024.00007ff4 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] * Closing connection #0
00000024.00007ff5 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] deliver_http_asup: curl_easy_perform() failed Result=7 Reason=Failed to connect to 202.3.113.8: Network is unreachable
00000024.00007ff7 0188f18b Thu Dec 15 2011 10:43:06 +08:00 [kern_notifyd:info:1703] asup_job HTTP POST failed '/mroot/etc/log/autosupport/201112150956.0'
So it's some sort of network problem I think, perhaps caused by my ifgrps. This is my /etc/rc:
hostname algol-a
ifgrp create single vif_gig e0b e0a
ifgrp create single vif_10gig e2b e2a
ifgrp create single vif_main vif_10gig vif_gig
ifgrp favor e2a
vlan create vif_main 20 85 500
ifconfig e0M `hostname`-e0M netmask 255.255.255.0 partner e0M mtusize 9000 trusted -wins mediatype auto flowcontrol full up
ifconfig vif_main `hostname`-vif-main netmask 255.255.0.0 broadcast 10.80.255.255 mtusize 9000 trusted -wins partner vif_main
ifconfig vif_main-20 `hostname`-vif-main-20 netmask 255.255.0.0 broadcast 10.20.255.255 mtusize 9000 trusted partner vif_main-20
ifconfig vif_main-85 `hostname`-vif-main-85 netmask 255.255.0.0 broadcast 10.85.255.255 mtusize 9000 trusted -wins partner vif_main-85
ifconfig vif_main-500 `hostname`-vif-main-500 netmask 255.255.255.0 broadcast 10.50.0.255 mtusize 9000 untrusted -wins partner vif_main-500
route add default 10.80.0.1 1
routed on
options dns.enable on
options nis.enable off
savecore
Traceroute to our mailserver and support.netapp.com does work, but when I call autosupport I don't see any traffic from the filer on our router at all, for either HTTPS or SMTP. Further googling finds http://communities.netapp.com/thread/13291 which definitely suggests a network issue. Ahah, deleting and readding the default route worked!
It is clear that it is a networking problem. it isn't an AutoSupport issue.
I am not sure what guidance I can give exactly to resolve this.
I will tell you that "ping" and "traceroute" at the 7-mode CLI does not test all the parts of the system that need network connectivity and routing.
The only idea if have is to simplify the networking such as using 1500 MTU which avoids interoperability issues, reduce the ifgrps to I would guess the primary connection which is 10.80.x.x and vif_main(?). With this simplification and seeing if AutoSupport starts working, one can change things slowly and figure out what the complexity is.
If this is unappealing, I would open a support case.
As mentioned, removing and readding the default gateway fixed it for me.
Thanks. I missed that little nugget of information at the end. I am glad it is working now for you!
Hi,
Since 9th december the support.netapp.com has changed due ipv4 to ipv6.
If you have firewall rules you need to update them.
Host name: support.netapp.com.
OLD IP address: 216.240.18.16
New IP address: 202.3.113.8
For RSA users, the IP address of remotesupport.netapp.com 216.240.18.81 will change in few days
regards
You need to know if the storage controller is able to successfully communicate with the mailhost on port 25
or
Can this be changed to http or https for testing?
autosupport.support.transport smtp
I had a very similar issue. I found that the route to the mailhost was not in the routing table.
route -s (to see routing table) and look for your mailhost
My mailhost was not listed at all.
route add {mailhost IP} {gateway IP} {# of hops to mailhost}
ex. route add 192.168.1.10 192.168.1.1 2
If you're not aware of how many hops then run a traceroute to the mailhost.
Now test, options autosupport.doit now and wait to see if you receive the email, and also watch in your SSH connection to see if any errors are listed.
If this works you'll need to add this route add command into your rc file
Run, rdfile /etc/rc
Copy the contents to WordPad
Add the route command just above the savecore line
Copy the contents of WordPad
Run, wrfile /etc/rc
right-click to paste the contents
hit enter twice
Ctrl + C to exit the wrfile command
Run, rdfile /etc/rc again to verify contents
If you're happy with the content then run, source /etc/rc