ONTAP Discussions

BUG on autosupport on 8.1RC2. When it will be fixed?

gmilazzoitag
13,922 Views

As most of you already know there are a lot of bugs that prevent to send 8.1RC2 autosupport using an Exchange relay (2007 and 2010 but also 2003 as tested by myself).

And always as you know the 2240 FAS is shipped only with this DOT release!

This causes a lot embarrassment for us, ASP company, that have to say to the customer that us and them will not be able to receive their autosupport unlesse they install...a Sendmail (or some other Linux machine with STMP relays other than Exchange)...This is unacceptable and, let me say it, ridiculous in that environment where Exchange reigns...

Other source of embarrassment is due to the fact that to send an autosupport has always been a very easy action and now has become a nightmare due this silly bug that DOT does not end the message with CRLF.CRLF but LF (where's the programmer that forgot CR?!?)

To avoid any misunderstanding every fix and/or action has been done on the SMTP relays as from instructions...looking at its log the cause of bug appear clear...timeout on transmission for errors on format.

So: briefly. RC2 is supported and on the market since months...how much time do we have to wait again for a fix?

Thanks

https://communities.netapp.com/message/71152

https://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=549239

https://kb.netapp.com/support/index?page=content&id=1011888

40 REPLIES 40

klemen_bregar
6,817 Views

Hi!

Can you send me options autosupport output if possible, please?

Thanks

Klemen

RODRIGO_SERVIX
6,817 Views

Klemen, this is the autosupport:

autosupport.cifs.verbose     off       

autosupport.content          complete   (value might be overwritten in takeover)

autosupport.doit             USER_TRIGGERED (test5)

autosupport.enable           on         (value might be overwritten in takeover)

autosupport.from             netapp-dc1-a@xxx.xxx.br (value might be overwritten in takeover)

autosupport.local_collection on         (value might be overwritten in takeover)

autosupport.mailhost         smtpapp.xxx.xxx.br (value might be overwritten in takeover)

autosupport.max_http_size    10485760   (value might be overwritten in takeover)

autosupport.max_smtp_size    5242880    (value might be overwritten in takeover)

autosupport.minimal.subject.id systemid   (value might be overwritten in takeover)

autosupport.nht_data.enable  on         (value might be overwritten in takeover)

autosupport.noteto                      (value might be overwritten in takeover)

autosupport.partner.to                  (value might be overwritten in takeover)

autosupport.payload_format   7z         (value might be overwritten in takeover)

autosupport.performance_data.doit DONT      

autosupport.performance_data.enable on         (value might be overwritten in takeover)

autosupport.periodic.tx_window 1h         (value might be overwritten in takeover)

autosupport.retry.count      15         (value might be overwritten in takeover)

autosupport.retry.interval   4m         (value might be overwritten in takeover)

autosupport.support.enable   on         (value might be overwritten in takeover)

autosupport.support.proxy    0          (value might be overwritten in takeover)

autosupport.support.put_url  support.netapp.com/put/AsupPut (value might be overwritten in takeover)

autosupport.support.to       autosupport@netapp.com (value might be overwritten in takeover)

autosupport.support.transport smtp       (value might be overwritten in takeover)

autosupport.support.url      support.netapp.com/asupprod/post/1.0/postAsup (value might be overwritten in takeover)

autosupport.throttle         on         (value might be overwritten in takeover)

autosupport.to               filer@xxx.com (value might be overwritten in takeover)

autosupport.validate_digital_certificate on         (value might be overwritten in takeover)

I just filled with "xxx" my customer information.

We had a problem to send the smtp e-mail to an email outside their domain. But it was an Exchange issue, nothing into NetApp. You can see at "/etc/log/mlog/notifyd.log" if the Exchange Relay is authorizing you to send that e-mail.

Regards,

freeb
6,817 Views

With Data Ontap 8.1RC3 the Autosupport Bug should be fixed.

Bug Id:

  • 549239

Now, we receive a daily autosupport message mit the subject management_log.

Which is the option to do only a weekly_log message?

Thanks for your reply!

Regards,

Florian


andris
6,818 Views

Florian,

AutoSupport in 8.1 has been significantly overhauled. From the release notes:

https://now.netapp.com/NOW/knowledge/docs/ontap/rel81rc3/html/ontap/rnote/GUID-7089A00E-38D4-40D7-A397-AE70D253A5F3.html

Excellent and updated information on AutoSupport is available in the System Administration Guide for 8.1.

https://now.netapp.com/NOW/knowledge/docs/ontap/rel81rc3/pdfs/ontap/sysadmin.pdf

One of the improvements is related to smoothing out the WEEKLY_LOG AutoSupport traffic at both customer sites and back at NetApp, as well as to solve problems with very large AutoSupport messages. To help with these issues:

1. Log files are separated from the WEEKLY LOG and sent daily, i.e. 7 chunks instead of 1 big weekly chunk

2. Performance data is now sent daily (PERFORMANCE DATA), i.e. 7 chunks instead of 1 big weekly chunk

3. Event-based AutoSupport messages send context-sensitive information

I hope that helps clarify matters.

madden
6,615 Views

I wasn't aware of the autosupport commands introduced in 8.1.  The man page has full details but here are a couple that seem handy.

1) Check history of ASUPs:

salt> autosupport history show -fields seq-num,status,subject,uri,error,last-update

seq-num destination last-update         status          subject                         uri                      error
------- ----------- ------------------- --------------- ------------------------------- ------------------------ -----
1088    smtp        "3/2/2012 11:09:43" sent-successful "USER_TRIGGERED (COMPLETE:now)" mailto:nobody@netapp.com -    
1088    http        "3/2/2012 11:09:43" ignore          "USER_TRIGGERED (COMPLETE:now)" -                        -    
1088    noteto      "3/2/2012 11:09:38" ignore          "USER_TRIGGERED (COMPLETE:now)" mailto:                  -    
1087    smtp        "3/2/2012 11:09:12" sent-successful "USER_TRIGGERED (test)"         mailto:nobody@netapp.com -    
1087    http        "3/2/2012 11:09:12" ignore          "USER_TRIGGERED (test)"         -                        -    
1087    noteto      "3/2/2012 11:09:07" ignore          "USER_TRIGGERED (test)"         mailto:                  -    
1087    retransmit  "3/2/2012 11:24:59" sent-successful "USER_TRIGGERED (test)"         mailto:nobody@netapp.com -    
1086    smtp        "3/2/2012 10:58:13" sent-successful "USER_TRIGGERED (test)"         mailto:nobody@netapp.com -    
1086    http        "3/2/2012 10:54:03" ignore          "USER_TRIGGERED (test)"         -                        -    
1086    noteto      "3/2/2012 10:54:03" ignore          "USER_TRIGGERED (test)"         mailto:                  -    
1085    smtp        "3/2/2012 10:58:09" sent-successful "USER_TRIGGERED (COMPLETE:now)" mailto:nobody@netapp.com "Domain not found"
1085    http        "3/2/2012 10:44:59" ignore          "USER_TRIGGERED (COMPLETE:now)" -                        -    
1085    noteto      "3/2/2012 10:44:57" ignore          "USER_TRIGGERED (COMPLETE:now)" mailto:                  -    

2) Get full details about a specific autosupport:

salt> autosupport history show -instance -seq-num 1088


     AutoSupport Sequence Number: 1088
Destination for this AutoSupport: smtp
                   Trigger Event: callhome.invoke.all
             Time of Last Update: 3/2/2012 11:09:43
              Status of Delivery: sent-successful
               Delivery Attempts: 1
             AutoSupport Subject: USER_TRIGGERED (COMPLETE:now)
                    Delivery URI: mailto:nobody@netapp.com
                      Last Error: -

     AutoSupport Sequence Number: 1088
Destination for this AutoSupport: http
                   Trigger Event: callhome.invoke.all
             Time of Last Update: 3/2/2012 11:09:43
              Status of Delivery: ignore
               Delivery Attempts: 1
             AutoSupport Subject: USER_TRIGGERED (COMPLETE:now)
                    Delivery URI: -
                      Last Error: -

     AutoSupport Sequence Number: 1088
Destination for this AutoSupport: noteto
                   Trigger Event: callhome.invoke.all
             Time of Last Update: 3/2/2012 11:09:38
              Status of Delivery: ignore
               Delivery Attempts: 1
             AutoSupport Subject: USER_TRIGGERED (COMPLETE:now)
                    Delivery URI: mailto:
                      Last Error: -
3 entries were displayed.

3) Retransmit a specific autosupport to an email address (can also be some totally other email address):

salt> autosupport history retransmit  -seq-num 1087 -uri mailto:nobody2@netapp.com

RODRIGO_SERVIX
6,818 Views

I saw it on "?" output, but now is much better to know exatly how use it.

Thank you for your information. It will be very useful.

Regards,

peter_lehmann
6,817 Views

I have a FAS2240-4 with 8.1RC3 and Exchange SMTP which works just fine!

However, there seems to be another "issue"...

I have a single FAS2240 but the asups it sends out, have the subject of "HA Group Notification", seems wrong to me. Anybody else experiencing this too?

Peter

ern
NetApp
6,817 Views

Yep, 8.0 shipped with AutoSupport always sending "HA Group Notification" even for single nodes. BURT 371076

AutoSupport can't really use the test "hey partner are you there?" to report HA mode since the partner may be down, rebooting, improperly cabled, etc.

peter_lehmann
6,817 Views

OK, fine for me then.

aborzenkov
6,791 Views

Well … ASUP is sent from a single node, is not it? It does not collect ASUP from partner and sends, does it? So it can speak only for itself.

The simple fact that it is causing confusion is indication that it probably is … confusing ☺

peter_lehmann
6,791 Views

Congrats... Now I'm confused ...

ern
NetApp
5,821 Views

Supportability features such as AutoSupport are doing more and more for High Availability diagnosis. For example, AutoSupport does things such as collect information about why the partner rebooted and what the local node sees of the other node. In Cluster-mode, AutoSupport reports even more information about other nodes including its storage failover partner.

As an engineer, I wish I didn't have to deal with complexity of thinking about single node issues and multiple node issues but it is my job to work on hiding the complexity and making it simple for the customer and support.

aborzenkov
5,821 Views

ok, thank you!

fletch2007
5,821 Views

Hi Rudy - I have a case open on this.

One node of our 3270 cluster is logging

"transmission-failed MANAGEMENT_LOG support.netapp.com/put/AsupPut "couldn't connect to host" "

(see history below)

It is able to email us the autosupport, but the HTTPS connection to support.netapp.com is failing for this node

With support we've verified routing is the same, autosupport options are the same, traceroute looks the same for both nodes, etc.

The engineer wants me to add a static route - for support.netapp.com but the defauly route is working for the partner and we have any egress firewall rules...

Any ideas what else to check?

thanks!

na04*> autosupport history show -fields seq-num,status,subject,uri,error,last-update

seq-num destination last-update          status subject        uri error

------- ----------- -------------------- ------ -------------- --- -----

347     smtp        "6/14/2012 00:21:07" ignore MANAGEMENT_LOG -   -    

347     http        "6/14/2012 02:48:26" transmission-failed MANAGEMENT_LOG support.netapp.com/put/AsupPut "couldn't connect to host"

347     noteto      "6/14/2012 00:21:07" ignore MANAGEMENT_LOG -   -    

346     smtp        "6/14/2012 00:14:34" ignore "PERFORMANCE DATA" - -  

346     http        "6/14/2012 01:29:31" transmission-failed "PERFORMANCE DATA" support.netapp.com/put/AsupPut "couldn't connect to host"

346     noteto      "6/14/2012 00:14:34" ignore "PERFORMANCE DATA" - -  

345     smtp        "6/13/2012 01:18:38" ignore MANAGEMENT_LOG -   -    

345     http        "6/13/2012 03:14:23" transmission-failed MANAGEMENT_LOG support.netapp.com/put/AsupPut "couldn't connect to host"

345     noteto      "6/13/2012 01:18:38" ignore MANAGEMENT_LOG -   -    

344     smtp        "6/13/2012 00:40:33" ignore "PERFORMANCE DATA" - -  

344     http        "6/13/2012 01:55:28" transmission-failed "PERFORMANCE DATA" support.netapp.com/put/AsupPut "couldn't connect to host"

344     noteto      "6/13/2012 00:40:33" ignore "PERFORMANCE DATA" - -  

343     smtp        "6/12/2012 00:46:44" ignore MANAGEMENT_LOG -   -    

343     http        "6/12/2012 03:20:36" transmission-failed MANAGEMENT_LOG support.netapp.com/put/AsupPut "couldn't connect to host"

343     noteto      "6/12/2012 00:46:44" ignore MANAGEMENT_LOG -   -    

342     smtp        "6/12/2012 00:57:16" ignore "PERFORMANCE DATA" - -

andris
5,821 Views

7-mode? Try looking at the /etc/log/mlog/notifyd.log files to see more detailed information about the AutoSupport process.

If you ask the support engineer to open a consult request in our internal AutoSupport knowledge exchange, folks like Rudy might be able to help out further.

fletch2007
5,821 Views

Reviewing the firewall logs, I see the good head is logging the outgoing ASUP HTTPS connection to netapp.support.com on its vif associated with the default route as expected.

The bad head is logging a vif NOT associated with the default route and the firewall is logging AGE OUT for this (since support.netapp.com is not responding before the timeout)

The routing tables for both show the same default route

Why is the bad head "preferring" the wrong vif for sending autosupports ?

thanks

aborzenkov
5,821 Views

Because it has routing table telling it to do so.Double check it.

fletch2007
5,821 Views

I did - the only routing table entry that _should_ apply is the default route (which the partner head _does_ use successfully)

I added a dumb static route to force it to use the proper vif and autosupports now succeed connecting to support.netapp.com:443

This should not be necessary - feels like a bug.

andris
5,821 Views

Please open a technical case with NetApp Support to investigate this further.

Thanks.

fletch2007
5,132 Views

I had 2003210902 open the whole time - when I requested "consult request in our internal AutoSupport knowledge exchange" another engineer took over the case.

Then Netapp support recommended the static route - it worked, but I wanted them to look for root cause, why it was necessary in the first place on just this node.

thanks

Public