Currently NTP is rejecting its upstream and is drifting quite badly (15 seconds of offset so far and growing). When checking the reason using
ntpq the flash code is
Checking the NTP documentation the peer is marked as distant if the roundtrip takes longer than 1.5 seconds. However using tcpdump I can see the packets leave and the reply return in milliseconds:
09:06:36.304204 IP 10.127.255.230.ntp > 10.127.255.213.ntp: NTPv4, Client, length 68 09:06:36.304371 IP 10.127.255.213.ntp > 10.127.255.230.ntp: NTPv4, Server, length 68
The general architecture here is a 1 ntp server in this subnet (that gets its time from an upstream outside the cluster) that serves times to the nodes in the subnet. The server is in sync and serving time as normal, however all the nodes in the subnet report as
Simply restarting ntpd has no effect as the peer is still rejected. However after changing the
tos maxdist 5000 in the ntp.conf, then it syncs (
Why would ntp think that the distance is greater than 1.5s when I can see (using ntpq/tcpdump) that requests complete in milliseconds? Is there some internal NTP parameter that I can tweak other than
maxdist that would make sense here? Is there some more debugging that can be done to diagnose this?
This is just one example of a cluster where this is happening, but I see the same symptoms elsewhere.
For reference, here is the (snarky) ntp documentation for
maxdist maxdistance Specify the synchronization distance threshold used by the clock selection algorithm. The default is 1.5 s. This determines both the minimum number of packets to set the system clock and the maximum roundtrip delay. It can be decreased to improve reliability or increased to synchronize clocks on the Moon or planets.