dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
5
share rss forum feed


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

1 recommendation

reply to Bill_MI

Re: ntpd servers vs stability?

Your description is too abstract for me to provide extensive detail, so I will have to write somewhat abstractly as well. I'll talk about ppm at the very end.

Can you provide a copy of your /etc/ntp.conf when you were using us.pool.ntp.org?

The design methodology around ntpd present-day is that you have it syncing to multiple servers (multiple means maybe 4 or 5, and preferably no less). The idea is that if you end up syncing with a server that has its clock go awry, ntpd will make intelligent decisions about who to preference and perform some internal calculations to decide what to do. I can't explain this simply, nor can anyone -- the logic applied here is very very extensive and complex, it is not "server X sucks, switch to server Y" -- it is much more involved than that.

Which stratum they are matters as well, but you shouldn't go using all the stratum 1 servers exclusively. That's rude and can actually get you in trouble in some cases.

The reason I don't use ntp.pool.org is because historically I have found problems with the participants of that pool. Their servers will go offline or crap out in some way, and I end up with a multitude of garbage in my log files. More importantly, pool.ntp.org is just that -- a pool. You're not going to get the same servers every time you do a DNS lookup, and that means troubleshooting becomes a serious PITA if one of the servers craps out. You then have to go through logs and "hope" you can figure out what happened. With non-ntp.org servers in /etc/ntp.conf you get a 1:1 correlation and it greatly minimises debugging efforts.

Anyway, here's an example I've used for years. I use one stratum 1 server and the rest are stratum 2:

$ ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*clock.isc.org   .GPS.            1 u  877 1024  377   12.590    0.482   0.009
-ntp-1.gw.illino 128.174.38.133   2 u  795 1024  377   99.423   -0.358   0.016
+otc2.psu.edu    128.118.2.33     2 u  807 1024  377   94.220    0.915   0.813
+tick.jrc.us     172.23.7.201     2 u  815 1024  377   92.206    1.016   0.843
 

The characters in front of the hostnames matters (this is what I meant about the decision/logic in ntpd and why it's complex):

     space   (reject) The peer is discarded as unreachable, synchronized to
             this server (synch loop) or outrageous synchronization distance.
 
     x       (falsetick) The peer is discarded by the intersection algorithm
             as a falseticker.
 
     .       (excess) The peer is discarded as not among the first ten peers
             sorted by synchronization distance and so is probably a poor can-
             didate for further consideration.
 
     -       (outlyer) The peer is discarded by the clustering algorithm as an
             outlyer.
 
     +       (candidat) The peer is a survivor and a candidate for the combin-
             ing algorithm.
 
     #       (selected) The peer is a survivor, but not among the first six
             peers sorted by synchronization distance.  If the association is
             ephemeral, it may be demobilized to conserve resources.
 
     *       (sys.peer) The peer has been declared the system peer and lends
             its variables to the system variables.
 
     o       (pps.peer) The peer has been declared the system peer and lends
             its variables to the system variables.  However, the actual sys-
             tem synchronization is derived from a pulse-per-second (PPS) sig-
             nal, either indirectly via the PPS reference clock driver or
             directly via kernel interface.
 
 

And my ntp.conf. Note that I've commented this heavily of all the nuances. You will also see my annoyances about pool.ntp.org at the top as well. :-)

# Originally we used north-america.pool.ntp.org, but the list
# of servers returned from that pool varied, and would regularly
# include stratum 1 servers.  Therefore, we prefer a series of
# stratum 2 servers, with a single stratum 1 as a stable base
# comparison
#
# http://support.ntp.org/bin/view/Servers/StratumOneTimeServers
# http://support.ntp.org/bin/view/Servers/StratumTwoTimeServers
#
# clock.isc.org          strat 1, California
# ntp-1.cso.uiuc.edu     strat 2, Illinois
# clock.psu.edu          strat 2, Pennsylvania
# tick.jrc.us            strat 2, New Jersey
#
server clock.isc.org          iburst
server ntp-1.cso.uiuc.edu     iburst
server clock.psu.edu          iburst
server tick.jrc.us            iburst
 
# Default: ignore all ntp queries from all other hosts.  Packets to/from
# server lines are still respected.
restrict default noquery nomodify nopeer
 
# Allow queries to/from localhost, used for ntpdc and other utils
# Allow queries to/from the local network (read-only)
restrict 127.0.0.0 mask 255.0.0.0
restrict 192.168.1.0 mask 255.255.255.0 nomodify nopeer notrap
 

You'll see that I've chosen servers spread across different geographic locations within the US. Sadly there aren't many good servers out here on the west coast; everything seems to be midwest or eastern these days.

Other reasons I picked these servers is because of how they're configured and what peers they use. Meaning: these are servers I've found have a very good track record and the admins seem to understand NTP vs. "hey d00dz i likez ntpd and just turn it onz and lol it workz!!1!".

$ ntpq -c peers ntp-1.cso.uiuc.edu
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*truechimer.cite .PPS.            1 u  343 1024  377    0.585   -0.296   0.056
 ntp-0.gw.illino 128.174.38.133   2 u  258  256  376    0.660   -0.330   0.034
 ntp-2.gw.illino 128.174.38.133   2 u   18 1024  377    0.743    3.171   0.113
-hydramail.physi 128.255.32.123   2 u   81  512  366    7.815   -0.109   0.205
+navobs1.wustl.e .GPS.            1 u  176 1024  377   22.277   -0.304   0.041
+bigben.cac.wash .GPS.            1 u  928 1024  377   49.310    0.648   1.325
 LOCAL(0)        .LOCL.          13 l  30d   64    0    0.000    0.000   0.000
 
$ ntpq -c peers clock.psu.edu
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-gps1.tns.its.ps .GPS.            1 u   38 1024  377    0.187   -0.263   0.069
*gps1.aset.psu.e .GPS.            1 u  882 1024  377    0.706   -0.093   0.082
#otc1.psu.edu    .TRUE.           1 u  359 1024  377    0.609   11.921   0.175
#bonehed.lcs.mit .PPS.            1 u  838 1024  377   18.635   -2.250   0.083
#2001:4f8:2:d::1 .GPS.            1 u  461 1024  377   83.365   -4.510   1.536
+navobs1.gatech. .GPS.            1 u  996 1024  377   25.936    0.111   0.125
-ntp1.symmetrico .GPS.            1 u   50 1024  377   82.109   -0.822   1.508
+ntp2.usno.navy. .IRIG.           1 u  803 1024  377   15.711   -0.014   0.221
#rackety.udel.ed .PPS.            1 u   59 1024  377   12.459   -5.068   0.074
#darkcity.cerias .GPS.            1 u  232 1024  353   56.689    3.139   6.929
-time.keneli.org .PPS.            1 u  971 1024  377   29.809   -0.815   2.111
-2610:20:6f15:15 .ACTS.           1 u  572 1024  377   14.263   -1.123   0.831
-time.symmetrico 69.25.96.11      2 u  646 1024  377   82.833   -0.381   1.896
#time-a.nist.gov .ACTS.           1 u 1976 1024   52   14.373    0.519   1.584
#time-b.nist.gov .ACTS.           1 u  856 1024  375   26.212    6.375   5.325
#timekeeper.isi. .GPS.            1 u  169 1024  377   98.752    8.295   4.182
 2001:5c0:0:2::2 .STEP.          16 u    - 1024    0    0.000    0.000   0.000
#IPv6.remco.org  .PPS.            1 u  206 1024  377  110.028    2.946   1.518
+tt155.ripe.Net  .GPS.            1 u  388 1024  377  105.197    0.120   1.611
#ntp1.bit.nl     172.2.53.81      2 u    9 1024  377   98.826   -4.101   1.455
#ntp2.bit.nl     32.246.249.54    2 u  130 1024  377   98.820   -5.441   1.285
-sodium.tns.its. 192.5.41.209     2 u  520 1024  377    1.429   -0.408   0.261
#polka.cac.psu.e 128.118.25.12    2 u 1074 1024  377    0.459    0.263   0.234
#2610:8:7800:24: 147.84.59.145    2 u  117 1024  377    1.427    0.409   0.060
#leibniz.math.ps 134.121.64.62    2 u  966 1024  377    0.560    0.433   0.125
 
$ ntpq -c peers tick.jrc.us
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+172.22.0.74     172.23.7.201     2 u  383 1024  377    4.010   -0.505   6.810
-172.21.0.13     .PPS.            1 u  833 1024  377    8.300   -2.028   0.700
.ntps11.jensenre 172.16.2.28      2 -  437 1024  377    3.950   -0.151   0.110
*ntps12.jensenre .PPS.            1 u  109 1024  377    3.750   -0.114   0.050
.ntps21.jensenre 172.16.2.29      2 -  740 1024  377    4.460    0.245   0.110
.ntps22.jensenre 172.16.2.29      2 -  382 1024  377    4.380    0.445   0.110
-172.23.7.202    18.26.4.105      2 -  105 1024  377    3.690    0.373   0.170
+172.23.7.203    172.16.2.28      2 -  155 1024  377    5.890   -0.369   0.670
+172.23.7.201    .PPS.            1 -  230 1024  377    3.630   -0.032   0.030
+172.23.7.200    172.16.2.28      2 -  695 1024  377    3.750   -0.114   0.060
 172.23.1.18     172.16.2.5       3 -   84 1024  377    4.460    0.393   2.410
 172.23.1.33     208.90.144.52    3 -  213 1024  377    3.920   -0.069   0.050
.rrcs-208-125-61 96.237.191.14    2 -  508 1024  376  -17.120    3.265  23.910
-time.keneli.org .PPS.            1 u  262 1024  377   24.380   -2.315   2.670
-bonehed.lcs.mit .PPS.            1 u   87 1024  377   18.020    1.824   1.390
+clock.nyc.he.ne .CDMA.           1 u  669 1024  377    6.100    0.042   0.030
 

clock.isc.org does not permit peer listing (it's intentionally disabled/filtered in their configuration on their side -- probably for security, i.e. it may disclose internal private network IPs they sync with), but you can see that the currently-preferred-method they have is an actual stratum 0 GPS-based clock source (vs. syncing from another server). I trust the ISC guys so I don't particularly care that I can't see their peers list.

Finally, a bit about ppm:

Do you know the cause of this? Is this caused by local clock skew or truly is it caused by you using some wonky remote NTP servers? The contents of /var/db/ntpd.drift (or possibly /etc/ntp.drift -- varies per OS/distro) would be useful. The ntp FAQ goes over this:

»www.ntp.org/ntpfaq/NTP-s-sw-cloc···lity.htm

If you truly have clock drift that won't go away, the root cause is almost certainly a quartz crystal / oscillator on your motherboard going bad. I've seen these happen way too often in the actual field, even on brand new servers -- just some oscillator that's wonky and the clock suddenly jumps up by 30-40 seconds and ntpd goes "?!?!?!" and tries to deal with it. They're easy to replace (basic soldering) but you have to make sure you get one with the exact same frequency as the original (it's usually printed on the casing, unless you've got one of those tiny cylinder crystals that look like "super tiny silver capacitors", in which case ha, good luck reading those things! Damn cheap vendors...)

Reminder: if you ever replace your hardware (motherboard specifically), remember to remove your ntp.drift file after doing so. It'll take 24-48 hours for this file to get re-generated (because of how ntpd logic works -- again, bunch of mathematics), so be patient.

As for comparison, my precision (on the above system) is -20. (See end of 3rd line)

$ ntpq -c rl
assID=0 status=0644 leap_none, sync_ntp, 4 events, event_peer/strat_chg,
version="ntpd 4.2.4p5-a (1)", processor="amd64",
system="FreeBSD/9.1-PRERELEASE", leap=00, stratum=2, precision=-20,
rootdelay=12.416, rootdispersion=37.975, peer=7168, refid=149.20.64.28,
reftime=d45cd14a.e0000c69  Sun, Nov 25 2012  9:18:02.875, poll=10,
clock=d45cd6df.000cc3eb  Sun, Nov 25 2012  9:41:51.000, state=4,
offset=0.308, frequency=-11.003, jitter=1.055, noise=0.597,
stability=0.044, tai=0
 

--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


Bill_MI
Bill In Michigan
Premium,MVM
join:2001-01-03
Royal Oak, MI
kudos:2
Reviews:
·WOW Internet and..

Hi koitsu and thanks much for the time.

This old box has a slow crystal. Over 1 sec/hour. I had its ppm calculated long ago around -300ppm just from crude calculation. I knew ntpd reporting -324ppm was probably more accurate and on the same order.

I'm an old school engineer that tracks his 25 year company watch at -0.95ppm averaged over a new battery cycle for one year (since it drifts with battery and seasonal temperature). I have recent years plotted, too. :-)

My ntp.conf was same as this one except for servers, which are now commented out. It was just us.pool, ntp.ubuntu.com was already commented out....

# You do need to talk to an NTP server or two (or three).
#server us.pool.ntp.org
#server ntp.ubuntu.com
#   The following servers occurred multiple times in us.pool.ntp.org 11/22/12
server 64.73.32.135
server 69.167.160.102
server 4.53.160.75
server 138.236.128.112
server 199.102.46.72
server 199.241.31.96
server 204.9.54.119
server 206.57.44.17
server 207.32.191.59
server 208.68.36.196
server 216.129.110.30
server 67.18.187.111
server 4.53.160.74
 
# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details.  The web page <http://support.ntp.org/bin/view/Support/AccessRestrictions>
# might also be helpful.
#
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.
 
# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
 
# Local users may interrogate the ntp server more closely.
restrict 127.0.0.1
restrict ::1
 
# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict 192.168.123.0 mask 255.255.255.0 notrust
 
# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)
#broadcast 192.168.123.255
broadcast 172.23.34.60
 
# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines.  Please do this only if you trust everybody on the network!
#disable auth
#broadcastclient
 
I was actually reading up on ntpd when it struck me to change servers and see what happens.

For months, I have been hourly ntpdate with us,pool.ntp.org. I know... bad practice and someone's post made me get off the can and look at ntpd again. It might have been yours?

I was keeping a log of ntpdate, including servers. I sorted them and found those that occurred the most and put them BY IP in ntp.conf above.

And bingo... ntpd seems stable as heck, NOW! Before it would loose it in a day or so.

I'm wondering if some of these issues you saw with the pool was my original problem? Enhanced by having a slow crystal?


Bill_MI
Bill In Michigan
Premium,MVM
join:2001-01-03
Royal Oak, MI
kudos:2
Reviews:
·WOW Internet and..

And here's the way I've been looking at these. Hey! You have some better commands than I've been using! :)

$ ntpq -n
ntpq> pe
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-64.73.32.135    192.5.41.40      2 u  223 1024  377   26.930   -4.590   1.076
-69.167.160.102  50.77.217.185    2 u  195 1024  377   21.190   -3.651   0.788
-4.53.160.75     220.183.68.66    2 u  176 1024  377   21.199    4.651   3.337
-138.236.128.112 127.67.113.92    2 u  248 1024  377   39.042   -8.179   1.143
+199.102.46.72   .GPS.            1 u 1176 1024  376   27.881   -1.412   0.227
-199.241.31.96   164.244.221.197  2 u  235 1024  367   51.382  -19.791   3.729
*204.9.54.119    .CDMA.           1 u  282 1024  377   16.748   -1.461   0.963
+206.57.44.17    204.123.2.72     2 u  232 1024  377   31.127   -2.167   0.589
 207.32.191.59   204.9.54.119     2 u  193 1024  377   17.090    4.205   0.185
-208.68.36.196   209.51.161.238   2 u  219 1024  377   26.724    0.776   5.370
-216.129.110.30  69.36.224.15     2 u  244 1024  377   33.114   -3.102   0.194
-67.18.187.111   129.7.1.66       2 u  247 1024  377   56.846   10.060   1.619
-4.53.160.74     209.81.9.7       2 u  258 1024  377   22.154    2.279   0.292
 172.23.34.60    .BCST.          16 u    -   64    0    0.000    0.000   0.004
ntpq> rv
assID=0 status=06d4 leap_none, sync_ntp, 13 events, event_peer/strat_chg,
version="ntpd 4.2.4p8@1.1612-o Tue Apr 19 07:08:29 UTC 2011 (1)",
processor="i586", system="Linux/2.6.32-45-generic", leap=00, stratum=2,
precision=-18, rootdelay=16.748, rootdispersion=21.970, peer=48531,
refid=204.9.54.119,
reftime=d45cdff8.0fc2c753  Sun, Nov 25 2012 13:20:40.061, poll=10,
clock=d45ce119.52b14dff  Sun, Nov 25 2012 13:25:29.323, state=4,
offset=-1.504, frequency=-323.743, jitter=0.987, noise=0.915,
stability=0.099, tai=0
ntpq>