ONTAP Discussions
ONTAP Discussions
we had problem with time sync between filer to old ntp server, so we changed to use another ntp server, but it doesn't seem be working, filer time is 3 min different to ntp time.
some forum suggested to run ntpq -p, or look at /etc/log/mlog/messages.log to ntp error, but I can't find ntpq command, nor /etc/log/mlog/messages.log file.
our filer is running ontap 7.3.4
How can I verify if filer is actually communicating to ntp server? /etc/messages file only tell time daemon started, no other ntp related message, any other log has that?
Thank you
Solved! See The Solution
bertaut wrote:
The "cluster" server is your resolved NTP server (do a nslookup).
This is not correct. Cluster refers to the NetApp cluster partner of filer04.
In a NetApp HA pair there is a cluster time daemon, where one filer is the master and the other slave. In this case the filer04 is the slave. The slave will only directly communicate with the time server when the cluster interconnect is down or clustering has been disabled.
Check the timed options on the cluster partner and verify that it updates its time correctly.
JACKIEXIE19 wrote:
some forum suggested to run ntpq -p, or look at /etc/log/mlog/messages.log to ntp error, but I can't find ntpq command, nor /etc/log/mlog/messages.log file.
This is only valid when you are running Ontap 8. And ntpq is a host based command, not on the filer.
When enabled (options timed.log on) normally time update are logged to /etc/messages. E.g.:
Mon Feb 27 06:00:00 EST [xxxxxxxxxx: kern.timed.adjust:info]: server 'yyyyyyyyyyyy' reports the appliance date is slow by 0.008 seconds. Adjusting date.
What your timed options? Can you ping the time server from the filer?
Jackie,
The process suggested above is for ONTAP 8 not 7. The ntpq cmd will not work on 7. Can you ping the new NTP server from the storage device? Forgive me but I have to ask, is the options.timed.proto set to ntp? You can turn on the timed.log option and grep "timed" in the /etc/messages file for entries such as these...
rsh FILER rdfile /etc/messages | grep timed
Mon Feb 27 07:10:35 EST [FILER: kern.timed.adjust:info]: server 'NTP_HOST' reports the appliance date is slow by 0.001 seconds. Adjusting date.
Mon Feb 27 07:12:49 EST [FILER: rc:ALERT]: timed: time daemon started
Once all the options timed are set, stopping and restarting timed daemon should take care of your issues, you may need to give some time for the time sync.
Regards,
I did see similar entries in messages file, but I wasn't sure if they actually refer to ntp server. because filer time has been always 3+ min behind the ntp device time since I reconfigured to use this new ntp server more than 10 hours ago, I have also restarted the timed several times just to be sure.
I don't know what 'cluster' server is, I use IP address for the ntp server:
filer1*> options time
timed.enable on (same value in local+partner recommended)
timed.log on (same value in local+partner recommended)
timed.max_skew 5m (same value in local+partner recommended)
timed.min_skew 0 (same value in local+partner recommended)
timed.proto ntp (same value in local+partner recommended)
timed.sched hourly (same value in local+partner recommended)
timed.servers 192.168.0.9 (same value in local+partner recommended)
timed.window 0s (same value in local+partner recommended)
Mon Feb 27 18:47:29 CST [filer4: kern.timed.adjust:info]: server 'cluster' reports the appliance date is fast by 0.065 seconds. Adjusting date.
Mon Feb 27 19:00:00 CST [filer4: kern.timed.adjust:info]: server 'cluster' reports the appliance date is slow by 0.059 seconds. Adjusting date.
..
Mon Feb 27 20:00:00 CST [filer04: kern.timed.adjust:info]: server 'cluster' reports the appliance date is slow by 0.001 seconds. Adjusting date.
Mon Feb 27 20:15:14 CST [filer04: ems.engine.inputSuppress:notice]: Event 'replication.status.timeSkewed' suppressed 1 times since Mon Feb 27 17:15:21 CST 2012.
..
Mon Feb 27 21:00:00 CST [filer04: kern.timed.adjust:info]: server 'cluster' reports the appliance date is fast by 0.204 seconds. Adjusting date.
Mon Feb 27 21:16:02 CST [filer04: rc:ALERT]: timed: time daemon started
Those entries refer to the activities with the NTP server listed in timed.servers. From your configuration, the filer's time and NTP server time sync on an hourly basis with a max skew of 5mn (Meaning that as long as filer's time and NTP server time differ within 5mn, they will keep syncing. Try incrementally reducing max skew and verify that sync takes place). The "cluster" server is your resolved NTP server (do a nslookup).
I would also ensure that the options timed on filer1 & filer4 match.
Regards,
bertaut wrote:
The "cluster" server is your resolved NTP server (do a nslookup).
This is not correct. Cluster refers to the NetApp cluster partner of filer04.
In a NetApp HA pair there is a cluster time daemon, where one filer is the master and the other slave. In this case the filer04 is the slave. The slave will only directly communicate with the time server when the cluster interconnect is down or clustering has been disabled.
Check the timed options on the cluster partner and verify that it updates its time correctly.
Thanks for all the replies, problem solved. It was the peer clustered filer not syncing to ntp server, after I change from rtc to ntp, it starts syncing.
here is its cluster peer filer's timed options and syslog before the fix:
filer0*> options time
timed.enable on (same value in local+partner recommended)
timed.log on (same value in local+partner recommended)
timed.max_skew 5m (same value in local+partner recommended)
timed.min_skew 0 (same value in local+partner recommended)
timed.proto rtc (same value in local+partner recommended)
timed.sched 1h (same value in local+partner recommended)
timed.servers 192.168.0.9 (same value in local+partner recommended)
timed.window 0s (same value in local+partner recommended)
Tue Feb 28 17:33:19 CST [filer0: rc:ALERT]: timed: time daemon started
Tue Feb 28 17:33:19 CST [filer0: kern.timed.adjust:info]: server 'rtc' reports the appliance date is slow by 0.003 seconds. Adjusting date.
Tue Feb 28 17:33:21 CST [filer0: kern.timed.adjust:info]: server 'rtc' reports the appliance date is fast by 0.009 seconds. Adjusting date.
I stand corrected with Pascal's answer regarding the "cluster" answer. Jackie, glad to hear you fixed the issue; in a cluster pair, you want to have consistent options. Failure to keep consistent options could result in issues in the event of takeover.
Regards,