See KnownOsIssuesDev for discussion of this topic.
9.2. Known Operating System Issues
9.2.1. Lost Interrupts
There are several mechanisms for systems to miss timer interrupts. All cause troubles for time keeping.
9.2.1.1. Scheduler HZ too high
See below for more discussion of several cases.
9.2.1.2. Disk drivers using non-DMA
Some early Linux distributations shipped with DMA for IDE disks disabled by default. Lots of disk activity would provoke lost interrupts.
See
man hdparm for info on how to change this setting.
http://www.megapathdsl.net/~hmurray/hacks/read.c has is a program that will cause lots of disk activity to test this case.
9.2.2. Xen, VMware, and Other Virtual Machine Implementations
NTP was not designed to run inside of a virtual machine. It requires a high resolution system clock, with response times to clock interrupts that are serviced with a high level of accuracy. No known virtual machine is capable of meeting these requirements.
Run NTP on the base OS of the machine, and then have your various guest OSes take advantage of the good clock that is created on the system. Even that may not be enough, as there may be additional tools or kernel options that you need to enable so that virtual machine clients can adequately synchronize their virtual clocks to the physical system clock.
9.2.2.1. VMware
In the specific case of VMware, you also need to install VMware Tools (or at least the headers from the VMware Tools), and the additional kernel options you need are probably going to be something like "clock=pit nosmp noapic nolapic", although you may need to do some experimenting to see which particular options work best for you. Once you've got these changes in place, you will need to set "tools.syncTime" to "true" in the vmx file. See also the VMware knowledgebase article
Clock in a Linux Guest Runs More Slowly or Quickly Than Real Time.
When running VMware with RedHat Linux, there are some additional things that need to be done. I quote:
With VMware 2.5.x and RHEL4 you need to go into the MUI under Advanced
Options and set "Misc.TimerHardPeriod" to 333. The default value is
1000. You always have to set the host rate faster than the guest.
That's pretty much the culprit behind all the clock problems is that
single setting. By default the host isn't able to keep up with the
guests requests, thus the guests lose time.
Also See: http://www.vmware.com/pdf/vmware_timekeeping.pdf
9.2.2.2. Xen
It appears that Xen just passes time-related system calls to the underlying master domain, and does not require any additional changes to support time sync into the guest domains.
9.2.2.3. Final notes
If your management or your client insists on running an
ntpd instance inside of a VM client, even in the face of all the above information, there is a solution. Simply add a "noselect" keyword to each of the "server" definitions, and your VM client
ntpd will monitor the defined upstream servers, but it won't actually try to sync with any of them -- leaving that job to the copy of
ntpd running on the base hardware, as defined above. This will allow your client applications to confirm that they have good time sync, through the use of the
ntpq program and by looking at the offsets reported, while avoiding the problems of actually trying to set the system time on the VM client.
However, do keep in mind that the kinds of additional/alternative kernel options you need to enable good time sync within your virtialization system may interfere with the proper operation of certain other types of programs. In that case, you need to make a decision -- do you run those applications under virtualization without good clock sync, or do you run them on a separate non-virtual machine that does have good clock sync?
Our thanks to Seph and Doug Hanks.
Related Links:
--
BradKnowles - 22 Feb 2007
9.2.3. Windows and Sun's Java Virtual Machine
Sun's Java Virtual Machine needs to be started with the
-XX:+ForceTimeHighResolution
parameter to avoid losing interrupts.
See
http://www.macromedia.com/support/coldfusion/ts/documents/createuuid_clock_speed.htm for more information.
9.2.4. Linux
9.2.4.1. Kernel 2.4 (and Earlier)
9.2.4.1.1. Using a Local Refclock
- First, you need to make sure that the PPSKit mods have been applied to your kernel. See PPSKit Implementation Status to see which is the right version of the kit for your system.
- Second, if you're still having problems, make sure that the
HZ= setting in your kernel configuration is set to 100. Some newer systems have come with this value set to 1000 instead, and that has tended to cause a lot of problems for some people by losing too many interrupts.
9.2.4.1.2. Without a Local Refclock
- The PPSKit mods may not be necessary if you do not have a locally attached refclock (presumably over a serial line).
- The issue seems to be primarily one of losing interrupts over a serial line that is very sensitive to delays.
- You may still find that the PPSKit mods will make your
ntpd server considerably more accurate and precise, even without a local refclock, due to the decrease in lost interrupts.
- If you have a poorly performing
ntpd which is not keeping good time on your system, you should seriously consider applying the PPSKit mods, or confirming that they are already applied, before you start assuming more serious hardware problems.
9.2.4.2. Kernel 2.6
9.2.4.2.1. Using a Local Refclock
- The PPSKit mods have not been ported to kernel 2.6. There is currently no clear indication that the functionality provided by these mods have been subsumed into kernel 2.6.
- You still have the same
HZ= issue as shown above for kernel 2.4.
- This is a bigger issue with kernel 2.6, since many distributions based on 2.6 are shipping with
HZ= defaulting to a value of 1000.
- Kernel 2.6 is still having problems working correctly with APIC and ACPI on many machines. You may need to disable APIC and/or ACPI at boot time before loading the OS, in order to get anything remotely resembling decent timekeeping.
- See also the Dev Issues topic LinuxImplementationLinuxPPS
9.2.4.2.2. Without a Local Refclock
- If you do not have a local refclock, you may find that kernel 2.6 works adequately for you, once any
HZ= and APIC/ACPI issues are dealt with.
- Otherwise, stick with kernel 2.4 until these issues have been resolved.
9.2.4.2.3. Lost ticks causing clock instability
From
http://gossamer-threads.com/lists/linux/kernel/494604
In 2.6, some code has been added to watch for "lost ticks" and increment the jiffies counter to compensate for them. A "lost tick" is when timer interrupts are masked for so long that ticks pile up and the kernel doesn't see each one individually, so it loses count.
Lost ticks are a real problem, especially in 2.6 with the base interrupt rate having been increased to 1000 Hz, and it's good that the kernel tries to correct for them. However, detecting when a tick has truly been lost is tricky. The code that has been added (both in timer_tsc.c's
mark_offset_tsc and timer_pm.c's
mark_offset_pmtmr) is overly simplistic and can get false positives. Each time this happens, a spurious extra tick gets added in, causing the kernel's clock to go faster than real time.
9.2.4.2.4. A problem with the Reiser file system
The addition of the Reiser file system to the kernel caused a problem with ntpd. It was unable to stay synchronized, losing more than 10 minutes per day if allowed to run freely. The stock 2.6.18 kernel from Centos 5 had no problem. When the kernel interrupt rate (HZ) was reduced from 1000 to 250, the problem was solved. Apparently the Reiser FS produces enough interrupts to break the kernel clock at 1000 Hz. This occurred on a machine with a 2.4 GHz Intel Core Duo CPU.
9.2.4.2.5. Running ntpd without root privileges
The
Linux Capabilities mechanism allows ntpd to drop all root privileges, except
for the one it actually needs (the privilege to set the system clock).
How to use this feature:
- You need the Default Linux Capabilities in your kernel, either as a module (modprobe capability), or statically (under Security Options in the kernel configuration menu)
- You need a working(!) version of libcap.so (http://www.kernel.org/pub/linux/libs/security/linux-privs)
If you get cap_set_proc(): failed to drop root privileges errors after a kernel upgrade, you may need to recompile this library!
- ntpd must be configured with --enable-linuxcaps
- ntpd must be started as root, but with a -u argument to give it a non-root user id to switch to
- Optionally, you can use the -i argument to additionally chroot ntpd (in fact, -i without -u should also work: ntpd will then run chrooted, with user id 0 but without root privileges, but this is not recommended)
You can verify your setup by looking at
/proc/<PID>/status:
For ntpd running without privileges, it should contain the lines
CapInh: 0000000002000000
CapPrm: 0000000002000000
CapEff: 0000000002000000
while for a root shell, you should see
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
9.2.4.2.5.1. A problem with IPv6 interfaces after chroot
The ifiter_ioctl interface iterator reads IPv6 interface names from /proc/net/if_inet6.
If no proc filesystem is mounted in the chroot jail, ntpd drops all IPv6 interfaces after startup.
The easy choices are
- don't use chroot
- mount proc in the chroot directory
- disable interface updates with -U 0. ntpd will not notice any new or dropped interfaces anymore.
It might also work to
- change ifiter_ioctl to enumerate IPv6 interface by another method. IPv4 interfaces are enumerated through ioctl on a socket.
- install libinet6 to enable getifaddrs()
9.2.4.2.6. SELinux
Using SELinux with
ntpd is known to cause problems. You will need to figure out how to configure SELinux to allow
ntpd to access the system calls that it needs in order to set the system time, or you will need to figure out how to turn off all SELinux features with regards to
ntpd.
As we get more information on how to do these kinds of things, we will add detail to this section.
9.2.4.2.7. Using udev
Most linux distributions use
udev to manage
/dev
. To setup a symlink to a refclock device you need an udev-rule like this one:
KERNEL=="ttyS0" SYMLINK+="refclock-0"
Very old versions of udev will need this instead:
NAME=="ttyS0" SYMLINK+="refclock-0"
The rule have to be defined after the rule for the device linked to. If your distribution supports udev-rules in many files you should put the refclock rules in a file by itself to ease maintenance.
9.2.4.2.8. Kernel 2.6 Mis-Detecting CPU TSC Frequency
Starting with Linux Kernel 2.6.18, the CPU's Time Stamp Counter is used to keep time, and when booting sometimes the Kernel mis-detects the frequency of this counter. This may result in severe clock drift which is impossible for
ntpd to correct.
One solution to this problem is to change back to the old "acpi_pm" clock, which is what was used in earlier kernels. For example, in your grub.conf file, you can set:
clocksource=acpi_pm
And then reboot. A similar procedure is apparently possible with earlier versions of Kernel 2.6, which uses a "clock=" designation instead of "clocksource=".
Our thanks to
Jordan Russell for locating and resolving this issue.
9.2.4.3. AppArmor causing "permission denied" errors
AppArmor is a security tool which has been developed by Novell and has made its way into the
SuSE Linux/openSUSE distribution, and maybe also other distributions.
See:
http://en.opensuse.org/AppArmor_Detail
AppArmor uses
profiles to control which system devices and resources may be accessed by an application, allowing finer control than the standard Unix rights management. If an application tries to access a resource it has not been granted sufficient rights to then access is prevented, and a
"permission denied" error occurs.
AppArmor is shipped with default profiles which work with the standard installation, but if an application's configuration is modified to use some non-standard configuration then the AppArmor profile has to be modified accordingly. This affects any application, not only
ntpd.
The AppArmor profile for
ntpd may require modification if
refclock devices have been configured manually, or even if
log files or
statistics files shall be generated by
ntpd.
In order to check whether a "permission denied" problem is related to AppArmor you can temporarily stop AppArmor and see if the problem persists, or not.
If AppArmor shall be used it must be configured to allow access to the refclock device used by
ntpd. Under SuSE Linux/openSUSE this may be done using the configuration tool, yast2. To add an entry for a refclock /dev/refclock-0 which points to /dev/ttyS0:
Yast2 -> Novell AppArmor -> Edit Profile
Select profile /usr/sbin/ntpd
Add entry: /dev/ttyS0
Mark allow for: Read, Write, Link
This generates a new entry in the AppArmor profile file:
/dev/ttyS0 rwl
Similar changes may be required to allow log or statistics file to be generated by
ntpd under AppArmor control.
Please note the symbolic links (e.g. /dev/refclock-0) are also created newly after every reboot. If this doesn't appear to happen you must create an udev rule for this. See also
9.2.4.2.7. Using udev.
9.2.5. Mac OS X
The Mac OS X method of enabling
ntpd is to go to the Apple menu option
System Preferences..., then into the
Date & Time sheet, then go to the
Date & Time sub-panel, and click on the radio button labeled
Set Date & Time automatically, which allows you to select a time server to use from a drop-down, or you can fill in the name of your own preferred time server.
Note that every time you exit this preference sheet, the system will re-write your
/etc/ntp.conf based on the information you have provided.
- Even if you have provided your own
/etc/ntp.conf, there is no way to prevent the system from re-writing it based on the content of this field.
- Even if you don't make any changes to this preference, just by going into this sub-panel and exiting back out, Mac OS X will re-write your
/etc/ntp.conf.
Unfortunately, when Mac OS X creates the
/etc/ntp.conf file, it will do nasty things like appending "minpoll 12 maxpoll 17" to every single line, including those lines which do not have a "server" directive.
- Worse, any line that had more than two arguments will get the rest truncated and replaced by "minpoll 12 maxpoll 17".
- All lines will get the directive "server" prepended to them, even if they weren't originally server directives.
Here's a sample input
/etc/ntp.conf file:
tos minclock 4 minsane 4
server time.euro.apple.com iburst
server de.pool.ntp.org iburst
server fr.pool.ntp.org iburst
server nl.pool.ntp.org iburst
server uk.pool.ntp.org iburst
server 0.europe.pool.ntp.org iburst
server 1.europe.pool.ntp.org iburst
server 2.europe.pool.ntp.org iburst
server 127.127.1.0 # Local clock
fudge 127.127.1.0 stratum 14 # Undisciplined
statsdir /var/ntp/ntpstats
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
Here's what Mac OS X will munge this into:
server tos minpoll 12 maxpoll 17
server time.euro.apple.com minpoll 12 maxpoll 17
server de.pool.ntp.org minpoll 12 maxpoll 17
server fr.pool.ntp.org minpoll 12 maxpoll 17
server nl.pool.ntp.org minpoll 12 maxpoll 17
server uk.pool.ntp.org minpoll 12 maxpoll 17
server 0.europe.pool.ntp.org minpoll 12 maxpoll 17
server 1.europe.pool.ntp.org minpoll 12 maxpoll 17
server 2.europe.pool.ntp.org minpoll 12 maxpoll 17
server 127.127.1.0 # minpoll 12 maxpoll 17
server fudge minpoll 12 maxpoll 17
server statsdir minpoll 12 maxpoll 17
server filegen minpoll 12 maxpoll 17
server filegen minpoll 12 maxpoll 17
server filegen minpoll 12 maxpoll 17
The result is totally bogus, won't parse, and will prevent
ntpd from starting up.
If you're going to maintain your own
/etc/ntp.conf file, you need to make sure you save a copy to something like
/etc/ntp.conf.save.europe (or whatever you prefer), so that you can restore a good working copy after Mac OS X munges it beyond recognition.
You will probably also want to change the code in
/System/Library/StartupItems/NetworkTime/NetworkTime so as to remove the call to
ntpdate and change the invocation of
ntpd to include a "-g" option on the command line.
Otherwise, you will probably want to start and stop
ntpd manually, outside of the control of Mac OS X.
9.2.6. Sun
9.2.6.1. Sun Device Drivers
9.2.6.1.1. su Driver
An issue with the Sun
su driver has surfaced with respect to PPS support. Currently (200508) the su driver is not supporting
PPS correctly in some configurations. Sun is working on a patch for that issue. For more information please refer to
Bug #361.