Before I detail how I tackled the problem and my results, I will quickly explain some of the concepts relating to the problem. Skip ahead if you just want answers.
What is it?
Imagine you have two perfect time keeping devices, but you start them at different times. If they are truly perfect, the difference between them will remain constant. If you work out that constant offset, you can apply it to either device to ‘correct’ it. You can synchronise them. However, in reality, such devices are pretty much impossible to find and anything that comes close is prohibitively expensive, large or power hungry.
Let’s re-imagine the above scenario, but substitute our mythical time keeping devices with Arduinos (or your microcontroller of choice). More specifically I’ll use Arduino Unos, which use 16 MHz oscillators. The Arduino environment provides timekeeping functions such as millis() or micros() to keep track of elapsed time. If we start two Unos at different time, we find that the difference in elapsed time between them changes, often surprisingly quickly. If we were to calculate an offset (ie the difference in times) and apply it at time X, by time X+1 the two Unos would be out of sync again.
This is because we are relying on the accuracy of the clock source, which for the oscillator used by the Uno is probably +/- 100 ppm (Parts Per Million). In the time required for 1 million pulses, the oscillator may oscillate from 999900 to 1000100 times. For the 16 MHz (16 million times a second) oscillator this would give a range of 15998400 to 16001600 oscillations per second. For the Uno, a second could actually be from 0.9999 s to 1.0001 s, potentially an error of about 8.64 seconds per day.
A really handy link is Maxim’s RTC calculator, it allows you to calculate the worst case scenario, depending on the ppm error, for your oscillator.
The calculations above are obviously a worst case, the oscillator would need to be “off” by 100 ppm all the time to achieve the 8.64 seconds of error. In reality it is likely to be less, especially if the ppm error oscillates between a positive and negative value.
If you were able to calculate the rate at which you were diverging from “true time”, you could inject/remove milliseconds as needed to correct it. As long as you continue to do this process fairly regularly you could lower the error dramatically. The reason you need to do it fairly regularly is that the ppm error changes with a number of factors, such as temperature.
Why does it matter?
Well for a lot of applications, it doesn’t really. If you regularly calculate and apply an offset using the ‘actual’ time, then for general scheduling purposes (ie do X once every 5 minutes) it’s really no big deal. In fact, depending on what you are trying to do you could ignore the problem all together.
In my case I am trying to keep wireless sensor nodes synchronised. All the nodes in the network need to wake up at the same time, or as close to the same time as possible. This allows the nodes to communicate with each other at scheduled intervals, where the awake time is relatively short (1-2 seconds). As a result the average power consumption will drop as the radio transceivers can be turned off in between these events.
I have already implemented a time-synchronisation method that allows a node to calculate the offset between itself and another node. However, even running this synchronisation once a minute results in a fairly consistent error of about 60 to 120 milliseconds depending on the device. This isn’t a huge amount but it would be good to be able to reduce it.
One potential solution was to use a better oscillator such as this one from Sparkfun. That has an accuracy of +/- 2 ppm which equates to 0.17 seconds a day or just over a minute a year. I didn’t go for that option for a number of reasons, ranging from added cost, complexity and my PhD supervisor said no! I think that using such a device would only improve the methods I developed instead.
Calculating an Offset
The principle of the technique I used is, essentially, the same one used for many of the time synchronisation protocols used in WSNs such as RBS, TPSN or FTSP. Here is how my technique works:
- Node A records its time, lets call that t1, and transmits an empty message to Node B.
- Node B receives the message from A and records the time, lets call that t2, before transmitting the recorded time (t2) back to Node A.
- Node A receives the message from B and records the time, lets call that t3.
- Node can calculate the offset using: t2 – ((t3 + t1)/2).
This relies on the assumption that it takes the same amount of time transmit a message from Node A to Node B, as it does from Node B to Node A. If that is true, then t2 should be exactly half way between t1 and t3. (t3+t1)/2 gives the middle of that range and subtracting it from t2 gives the difference, ie the offset. In my implementation I’m working with milliseconds, so the resultant offset is also in milliseconds. I apply the offset as described in a previous post.
Now the problem with this technique is the assumption that the time taken to transmit a message from A to B is equal to B to A. In a perfect world this might be true, however in reality messages aren’t received, retries occur and many other things get in the way. In some time-synchronisation methods these non-deterministic delays can be eliminated. The best way of doing this, is to take time stamps as the messages are being sent/received. The only uncertainty left is the transmission time, which is usually negligible. When done correctly this can result in an average pairwise error of less than 10 microseconds in a multihop network.
I am using XBee transceivers and as such I don’t have low enough level access (MAC level) to the transceiver hardware, to take timestamps as messages are being sent/received. The assumption of symmetrical transmission time between two nodes doesn’t hold all of the time. The most common result is that a message is retransmitted, resulting in an erroneous offset being calculated. Since I run the synchronisation process so often (once a minute) the nodes quickly recover. I have found that my error is in the order of a few milliseconds rather than microseconds. Good enough for me, but not good enough for acoustic location services or other such time sensitive processes.
Dealing with clock drift
As mentioned at the start of this post, calculating and applying an offset once is not enough. Since the oscillators are very likely to be oscillating at different rates, clock skew needs to be taken into account. There are, I’m sure, some very clever ways to do this. My method was simply to take an average of the last 4 offsets and divide that value by 60,000 (60 seconds in milliseconds). That results in an error per millisecond, which I accumulate in the Arduino millis interrupt routine. When more than 1 millisecond of error is accumulated, the millisecond counter is incremented.
Below is a graph showing the three nodes using the described techniques, the average required offset (calculated every minute) is around -4 to +4 milliseconds. The large spikes are caused by bad offsets being calculated due to the non-determinism inherent in the offset calculation method used.
There are several possibilities worth investigating in terms of anomaly rejection and better skew calculation. If I get a chance to improve this I will, but it seems to be adequate for my needs.