Monday, July 6, 2009

Some problems with GPS and NTPD

I found some problems when trying to set up the ntpd server connected to a GPS.

First, I had to solve how to create the devices required by ntpd: /dev/gps0 and /dev/gpspps0. This is easy: just create a udev rules file, for example /etc/udev/rules.d/90-gps.rules:

SUBSYSTEM=="pps", MODE="0660" GROUP="uucp" SYMLINK="gps%k"
KERNEL=="ttyS0", SYMLINK="gps0"

the above creates the following links:
gps0 -> ttyS0
gpspps0 -> pps0

and the target devices are accessible to members of the uucp group. Therefore, the 'ntp' user must belong to the uucp group:

# usermod -G ntp,uucp ntp

Then ntpd experienced some serious jitter problems in the system clock, and I found out that the ntpd driver was not using the PPS signal, but only the NMEA output. Running ntpd in debug mode (-d flag) showed a 'permission denied' error when opening /dev/gpspps0. The file permissions are ok, but the problem was in the selinux layer. I inadvertently had the selinux enabled. Disabling it fixed the problem (set to 'disable' in /etc/selinux/config).

Also, the serial port had to be carefully set up before ntpd was started, otherwise, some interactions in the tty driver would be executed, and the DCD interrupt detection would be lost, as explained in the previous posting. The minimum settings of the serial port at start-up are:

# stty -F /dev/gps0 raw ispeed 4800 ospeed 4800 -hupcl

In particular, finding the selinux problem was quite confusing sometimes. If ntpd is started from the command-line as root, the "permission denied" error does not show up. However, if started using the 'service ntpd start' command, the error appears.

Wednesday, July 1, 2009

Synchronizing an NTP server to a GPS/PPS

The goal now is to have a Garmin GPS 18LVC driving a PPS (pulse-per-second) signal to the ntpd server for a highly accurate time reference server.

1. The GPS device

There are at least a couple of ways to propagate the PPS signal to the ntpd server, plus some variants in each case. However, the GPS device must be seen as a device that sources two different types of data: the absolute date and time, and the 1-Hz clock signal (PPS).

The first one provides the complete information about when it is now, including a complete time and date, but with poor accuracy because this information is sent over the data lines of the serial port and encoded using some type of protocol, i.e. NMEA. PPS provides a very accurate clock
(about 1 uS in the GPS 18LVC device) but without any reference to the absolute time. This clock is wired to the DCD (Data Carrier Detect) pin of the serial port. In other words, PPS tells us with good precision when each second begins, but it doesn't tell us which second it is. This timing
information must be combined with the protocol messages sent by the GPS to have both precision and a complete timestamp at the same time.

The Garmin 18LVC model speaks the NMEA protocol, which sends out every second the NMEA messages that have been previously selected with the PGRMO command. Only the GPRMC message is necessary to get the current date and time, and it is important to keep to amount of information sent out over the serial port every second to a minimum. Once the PGRMO command has selected the message types, the GPS will retain this configuration in its internal memory.

The output from the GPS can be seen using a terminal program, such as minicom, if properly configured at 4800 baud, no parity, 1 stop bit.

2. NTPD reference clocks


The ntpd server supports several types of drivers. Here, the term 'driver' has nothing to do with a 'kernel driver'. ntps drivers are low-level callback functions that are registered within the ntpd core and implement the access to several types of local clocks, such as GPSs , radio clocks, etc. Each driver is identified by a pseudo-IP address identifier. The identifiers involved here are:

127.127.20.x : NMEA Reference Clock driver
127.127.28.x : SHM (shared memory) driver

2.1 NMEA reference clock

The NMEA clock driver assumes that a GPS device sending out NMEA messages is connected to the system via a serial port, named /dev/gpsX and its PPS signal, wired through the DCD pin, is accessible from a /dev/gpsppsX device.

/dev/gpsX is actually a link to some /dev/ttySX serial device, and /dev/gpsppsX is a link to a /dev/ppsX device which, in turn, is provided by the kernel PPS API. This API collects and distributes a precision kernel clock information from/to userland programs, and supports some predefined client drivers, such as the DCD pin connected to a 8250 UART. The DCD pin is sensed using a new serial line discipline, named PPS, which is an extension of the TTY line discipline. The sensing takes place during interrupt time, so it provides a very precise timestamping of the DCD events.

This PPS API, also known as LinuxPPS, is not yet available in the kernel, so a patch must be applied to the kernel mainline tree.

Note the /dev/gpsppsX device is optional and if ntpd cannot open this device at startup, it will silently fall back to the NMEA-only functionality. This means that it will use the arrival time of the NMEA messages to discipline the system clock, which will result in a very poor precision, usually, worse than using a remote NTP server over the internet. Running ntpd in debug mode will, however, log this condition.

The /etc/ntp.conf must contain these lines:


server 127.127.20.0 mode 1 minpoll 4 prefer
fudge 127.127.20.0 flag3 1 flag2 0 time1 0.0

Here note that a single NTP device provides both the timestamp and the PPS timing. The meaning of these flags are as follows:

mode=1, means that only the GPRMC messages of the NMEA protocol will be analyzed. flag3=1 tells ntpd to use the PPS line discipline of the kernel, and flag2=0 tells the driver to use the rising edge of the DCD signal to signal the start of each second.

In order to activate the PPS line discipline on the serial port connected to the GPS, it is necessary to run the 'ldattach' utility, which actually will stay in the background to keep the serial port open and the discipline active:

# ldattach pps /dev/ttyS0


2.2 SHM reference clock

The SHM driver accepts delayed timing information from a System-V IPC shared memory (with key "NTPx"). The timing information is written there by some external process, whatever it is. This process would read the timing information from the GPS and write it to the shared memory so that ntpd can process it. There are some user-space utilities that can do this job, for example, gpsd and shmpps. Gpsd is a general-purpose daemon that has been designed to talk to most types of GPS models using a wide variety of protocols and, in addition, is capable of processing the PPS signals and sending timing information to ntpd via a shared memory device. Or, at least,
this is what it claims to do... because I couldn't achieve this.

2.2.1 gpsd

Actually gpds feeds two devices to ntpd, one with the absolute timestamp parsed from the NMEA messages, or any other protocol supported by gpsd, and another feeding the PPS timing information. Ntpd sees both devices as two different SHM devices, so the ntp.conf file must be like this:

server 127.127.28.0 minpoll 4
fudge 127.127.28.0 refid GPS
server 127.127.28.1 minpoll 4 prefer
fudge 127.127.28.1 refid PPS

2.2.2. shmpps

shmpps is a much simpler daemon that does exactly one job: detect the PPS changes, get a timestamp for each change, and send them to ntpd via a single shared memory. No absolute time is passed to ntpd, so ntpd should still use a NMEA device (on the same serial port) to get the absolute time reference.

So, /etc/ntp.conf must look like this:

server 127.127.20.0 mode 1 minpoll 4 prefer
fudge 127.127.20.0 flag3 0 flag2 0 refid NMEA
server 127.127.28.0 minpoll 4
fudge 127.127.28.0 refid PPS

Both gpsd and shmpps share a common problem: they detect DCD changes from user-space, that is, a program that blocks on the TIOCMGET ioctl until a change is detected and then gets a timestamp. In this case, the latency is larger that in the PPS kernel API, where the timestamp is read at interrupt time.

In conclusion, the LinuxPPS approach offers the best precision and simplicity as it requires no external daemons, but the kernel must be patched. On the other side, the SHM devices offer worse precision but are easier to set up.

3. How to enable support for PPS API

Here I explain how to patch and build the software modules involved in having GPS/PPS connected to ntpd. I also rebuilt the RPMs for these modules so that instalations is easier. The process involved in rebuilding the RPMs is also described below.

3.1. Building the Linux kernel

There are two ways to get LinuxPPS: via git or via patches. The latest version of LinuxPPS is available only via the git repository and it contains an entire Linux kernel tree, but only for one
kernel version version (2.6.28-rc6 at the time of writing). However, the patches are against a wider repertory of kernel versions, but the LinuxPPS implementations are older.

I'd rather not use the kernel available from the git repository because I prefer to stick to the same kernel version that is already installed in the target system, in my case, a Fedora 10 (2.6.27.5-117) but, at the same time, I want to have the latest LinuxPPS implementation.

So, I decided to patch the 2.6.27 kernel manually. This shouldn't take much time, as the patches look quite straightforward to apply.

First, I downloaded the entire git repository and diff'ed it against a stock 2.6.28-rc6 kernel, the resulting patch is the LinuxPPS implementation.

To get the original kernel that comes with my Fedora 10, I had to download the source RPM for the installed kernel version (2.6.27.5-117) and install it. This must be done on the target PC.

Then, I prepared a source tree for the patch and build:

# cd ~/rpmbuild
# rpmbuild --target i686 -bp SPECS/kernel.spec

Note the spec file already creates the kernel config file so there is no need to configure the kernel to match our current settings.

Just to test, I tried to apply the LinuxPPs patch directly to my 2.6.27.5 kernel but, of course, it didn't work as there were too many differences.

Then, I split the patch into two parts: one containing the new files specific to the pps implementation, and another for the existing kernel files. The first sub-patch was applied successfully, and the second one had to be manually applied, but it wasn't too painful. At the end,
I diff'ed against the original 2.6.27.5 kernel and saved them to a patch file (patch-2.6.27.5-ppsapi).

Then, I configured the kernel to enable PPS support:

make menuconfig
Device drivers -> PPS support
* Enable high-resolution timestamps
* Enable 'ktimer' and 'line discipline' as modules

General setup -> append to kernel version: "pps" to have this tag
on the kernel version string.

Then I compiled the kernel and the modules

# make
# make install

Check that /etc/grub.conf has the correct entries (it should have) so that when the system boots again, the new kernel is used

Now, the kernel header files must be prepared for compilation of the user-space tools:

# cd /usr/include
# mv linux linux.orig
# mv asm asm.orig
# mv asm-generic asm-generic.orig
# ln -s /lib/modules/2.6.27.5pps/build/include/linux
# ln -s /lib/modules/2.6.27.5pps/build/include/asm
# ln -s /lib/modules/2.6.27.5pps/build/include/asm-generic
# cp /lib/modules/2.6.27.5pps/build/Documentation/pps/timepps.h .

Note the timepps.h header is required for ntpd to detect the presence of the PPSAPI in the Linux kernel. Otherwise, ntpd wouldn't complain and silently revert to using the NMEA protocol only.

3.2. Building the 'ntpd' server


Now, it is time to patch and compile the NTP server (ntpd). I used the latest available release, 4.2.4p7 at the time of writing.

Initially, I used the nmea.patch from the LinuxPPS homepage but the resulting ntpd failed to detect the PPS signal from the GPS. After some detailed debugging using gdb, I found out that the DCD interrupts from the UART were disabled by ntpd during initialization, preventing the PPS signal from reaching the processor.

A more detailed debugging showed that the call to tcsetattr() in the refclock_setup function, indirectly caused the DCD interrupts to be disabled, though the bit mask of c_iflag should not cause this problem. I believe that changing some of the c_iflag bits causes the UART to
be incorrectly reprogrammed, perhaps this is why the ldattach utility requires patching. So, I decided to patch ldattach so that the c_iflag settings are the same as the ones set by ntpd. This is a dirty hack, but it seems to be how the LinuxPPS patches work here.

If you are experiencing problems and you suspect that PPS is not being read by ntpd, try reading the IER register (Interrupt Enable Register) of your 8250 UART (or compatible) and check that bit 3 is set. The IER register is at offset 0x01 of the UART (i.e. address 0x3F9).

To compile ntpd, download ntp-4.2.4p7 and apply the patch-ntp-4.2.4p7-ppsapi, then configure it to enable the NMEA driver (ID 127.127.20.u):

# ./configure --disable-all-clocks --disable-parse-clocks --enable-NMEA --enable-linuxcaps

Additionally, to may want to enable support for SHM drivers, in case you want to experience with user-space drivers, which don't require kernel patching but are more likely to be affected by latencies and be less precise. If so, add the "--enable-SHM" argument to the configure command.

Now, run 'make' and use the produced 'ntpd' driver and utilities.

Remember that the Linux PPSAPI must be enabled in the kernel, and that the correct kernel include files must be visible under /usr/include, as explained in the previous chapter.

3.3. Building the ldattach utility

The ldattach utility is used to set the line discipline associated to a serial port. LinuxPPS uses a new line discipline, named PPS, that detects changes in the DCD line of the serial port and feeds those changes into the kernel PPS API.

A small patch must be applied to ldattach, but it must be slightly different to the one proposed in the LinuxPPS homepage. I believe the spirit behind this patch is to set the same terminal config both by ldattach and ntpd so that no change is done and the UART registers preserve the
DCD detection.

ldattach is provided by the util-linux package in the Fedora 10 distribution.

The easiest way is to install the source RPM, add the new patch in the spec file and rebuild the rpm.

3.4 Rebuilding the RPMs

The affected RPMs are: kernel, kernel-headers, ntp, ntpdate, util-linux-ng and their debug and devel variants.

Rebuilding an RPM so that more patches are applied is quite straighforward. Usually, an RPM that comes from a distribution package has already several patches that get applied when the rpm is built. All we have to do is add the corresponding pps patches to each spec file and rebuild the RPM.

In general, the process is as follows: the source RPM is installed and this installs a spec file in the SPECS directory, and a source tarball and the patch files under the SOURCES directory. The SPECS and SOURCES directory can be found under the buildroot directory of the rpmbuild utility, usually under ~/rpmbuild. Then, the additional pps patch must be copied in the SOURCES
directory, and then edit the spec file to add the new patch file using a %patch clause. Number the patch clause so that our patch is applied after the others.

Also, it is a good idea to add the 'pps' suffix in the release string of the RPM, otherwise it would be impossible to tell a pps-capable rpm from a non-capable one.

The kernel RPM is the most complex of these RPMs, and the spec file had to be changed more:

* The 'buildid' macro was defined to be ".pps"
* Patches are applied using the "ApplyPatch" macro
* The timepps.h header must be copied to /usr/include when building
the kernel-headers rpm.

As described in the above paragraphs, before building the util-linux and ntp packages, the links to the new kernel include files must be done. Alternatively, the pps kernel rpm can be built and installed first, and then rebuild the rest of the packages.

Monday, June 15, 2009

Installing Fedora from a USB pendrive

I had to install Fedora 10 on a PC without floppy disk and DVD/CD-ROM. The only choice is to boot from USB-HD which is supported by the BIOS. Here is the process, step by step:

1. Install the livecd tools on a Linux PC:

# yum install livcecd-tools

2. Download the DVD iso image of the Fedora distribution that must be installed on the target system. Pay attention to chooses the correct architecture (i686 or x86_64).

3. Loop-mount the ISO image to a temporary directory:

# mkdir /mnt/tmp
# mount -o loop /mnt/tmp

4. Insert the USB drive and make sure it is not mounted. Then , run the script that creates a live (bootable) CD:

# livecd-iso-to-disk --reset-mbr /mnt/tmp/images/boot.iso /dev/sdb1

5. Mount the USB drive and copy the installer and the ISO image to the USB drive:

# mkdir /media/usbdisk/images
# cp /mnt/tmp/images/install.img /media/usbdisk/images
# cp Fedora-10-i386.iso /media/usbdisk/

6. Unmount the USB drive, and insert it on the target system. The Anaconda installer must show up shortly after power-up. Select Install Media from "Hard drive" and proceed as in a normal DVD installation.

More problems with OWAMP behind NAT

Well, not all NAT problems were solved with the fixes detailed in the previous post. Indeed, the server does not check that the OWAMP client is sending from the address specified in the "receiver IP address" field of the Request-session message, but there are some outstanding problems.

First, the server will create a new socket for the test session, and will attempt to bind that socket to the address specified for the sender or the receiver in the request-session message. This address is, of course, the public address of the server as the message was created and sent by the client. Obviously, the server will fail in binding that socket to a public address it is not in none of its interfaces.

Second, the client must specify its actual public address in the sender/receiver IP address field of the request-session message, otherwise, the other end will not be able to send test packets to that address.

Therefore, some modification is required in the OWAMP code. On the client side, the public address must be specified in the messages. Also, on the server side, the server should bind the test socket to the private address, regardless of the address specified in the session-request message.

The client needs to know its actual public address on the public side of the NAT, which is specified by the new '-x' option of the owping program. The owampd daemon only needs to know if it is running behind a NAT or not, but it doesn't know the public address, makeing the server setup more simple. The -N flag is used in this case.

So, the extra requirement is for the client to know its public address, which can be obtained by a STUN client or a similar service.

Now, it definitely works fine, with both the server and the client running behind their respective NATs.

Friday, June 5, 2009

OWAMP problems behind NAT

I've been doing some tests with the OWAMP protocol (One-way Active Measurement Protocol) and it turns out that it is not possible to ow-ping a server when the owamp client (owping) is behind a NAT. The client returns a 'server denied access' error.

OWAMP is one of those non-NAT-traversable protocols, such as SIP, as it passes endpoint IP addresses in the protocol messages. If the client is sitting behind a NAT, the source address passed is not the same as the actual source IP address as seen by the server. During a test session request stage, the owamp server checks that both addresses are the same, in order to prevent attacks, si one would think that the OWAMP protocol is unusable if the client is behind a NAT.

But not. It happens that the owamp server only checks the addresses in open mode. So, if we enable the authenticated mode, for example, the check is omitted and everything works.

To work in authenticated mode, all you need to do is to setup a common passphrase in both sides so that the client gets authenticated. The passphrases are kept in the owampd.pfs file and are generated by the 'pfstore' utility:

# pfstore -f /usr/local/etc/owampd.pfs testuser

Then, run owampd so that it loads the pfs file (using the -c option).
Repeat the same pfstore action on the client machine, and then ping the server:

# owping -A A -u testuser -k /usr/local/etc/owampd.pfs

and this works.

Monday, May 25, 2009

Time loop in VMware guest

If the guest VM should always boot with the same date/time on the RTC, two actions must be done:

1. Edit the .vmx file of the guest VM, and add a line like this:

rtc.starttime = "1238765436"

where the number is a 1970-based epoch, in seconds.

2. Boot the guest VM, and run VMtools. Disable the host-to-guest time synchronization, and reboot the guest VM.

Then, the guest VM will always boot with the same date and time in the calendar.

Wednesday, May 20, 2009

Setup a Samba server


Edit /etc/samba/smb.conf and add set the basic settings in the 'global' section. The only important item is the "security=user" option.

The important thing is to enable the samba users with:

# smbpasswd -a

which adds (-a) a new user to the samba user database, and sets a new password to it.
Not doing this causes a 'permission denied' error in the client when attempting to mount it.

The syntax to mount a samba share is:

# mount -t cifs -o user=,pass= //server/resource

In the /etc/samba/smbusers file, it is possible to map sambe user names to UNIX user names, but it is not required to do so. By default, samba does a one-to-one mapping.

If we want to announce the samba shares on the network, it is necessary to start the 'nmb' service. Use the normal chkconfig interface to start smb and nmb on startup:

# chkconfig --add nmb
# chkconfig --level 3 nmb on
# chkconfig --level 4 nmb on
# chkconfig --level 5 nmb on




Tuesday, May 12, 2009

Set up VMware server on a x86_64 system

Setting up a 1.0.8 VMware server on a 64-bit PC creates some problems because VMware is distributed in binary form only (rpm packages) for 32-bit i386 architectures .

In my case, the target distribution is a Fedora 10 x86_64, so the first step was installing the 32-bit base libraries, that is, glibc, X11, etc. This can be easily done with the 'yum' installer:

# yum install glibc.i686 libgcc.i386 lidstdc++.i386 zlib.i386
# yum install libX11.i386 libXtst.i386 libXrender.i386 libXt.i386

Also, some Perl modules are required for the VMware installer to work properly:

# yum install perl-ExtUtils-Embed

which is not 32-bit specific, of course.

Now, we are in a position where the VMware-server package may be installed:

# rpm -ivh VMware-server-1.0.8.i386.rpm

If we hadn't installed the i386 libraries, the vmware-server rpm could be installed anyway but problems would appear when running the vmware-config.pl script, which runs the vmware-ping utility, which is a 32-bit one.

Now, some patches to the vmware distribution must be installed. In my case, the vmware-any-any-117d could not be installed, as it returned an error when compiling the kernel modules (something about an already defined symbol). The update that worked fine was 'vmware-update-2.6.27-5.5-2.tar.gz' but this depends on the particular kernel version that runs in my target system.

Installing this update is quite straightforward, however, this tarball contains a compiled 'update' executable which is intended for 32-bit installations. I decided to delete it and recompile manually before running the update:

# tar xfz vmware-update-2.6.27-5.5-2.tar gz
# cd vmware-update-2.6.27-5.5-2
# rm update
# gcc update.c -o update
# ./runme-pl

The 'runme.pl' script automatically launches the 'vmware-config.pl' script. I accepted all the proposed default answers, except for the NAT networking, which I answered 'no', the host-only networking -also 'no'- and the TCP port to listen to, which is by default 904, but I changed it to 902 to match the default port the clients connect to (why this discrepancy in the defaukt values?).

If I had accepted the 904 port in the server side, then the client wouldn't connect to the server and would return a 'connection refused' error. Fortunately, the client connectiomn dialog accepts an optional port number after the server IP, i.e. myserver:904.

After this, the client still returned an error when attempting to connect to the client: 'login incorrect (username/password)'. After some investigations, I found out that the problem was in the PAM libraries. Again, only the 64-bit pam libraries were installed, but vmware expects the 32-bit libraries, so the user authentication failed. Unfortunately, the pam i386 and x86_64 libraries cannot coexist simultaneously, because there are some common files provided by both packages (i.e. the man pages). So, the workaround is to download the pam package and its dependencies, and do a force install:

# yumdownloader --resolve pam.i386
# rpm -ivh --force *.rpm

Also, we must configure the proper path to the pam libraries.
Edit /etc/pam.d/vmware-authd :

#%PAM-1.0
auth sufficient /lib/security/pam_unix2.so shadow nullok
auth required /lib/security/pam_unix_auth.so shadow nullok
account sufficient /lib/security/pam_unix2.so
account required /lib/security/pam_unix_acct.so

After all this, vmware server runs correctly and clients may connect.

To install the vmware client (a.ka. vmware-server-console), all I had to do is install the client rpm:

# rpm -ivh VMware-server-console-1.0.8-126538.i386.rpm
# vmware-config-server-console.pl

and that's all.