Score:25

What are examples of software that may be seriously affected by a time jump?

mx flag

The chrony documentation warns

BE WARNED: Certain software will be seriously affected by such jumps in the system time. (That is the reason why chronyd uses slewing normally.) Documentation

But the documentation gives no examples. What are examples of software that will be seriously affected? Is the OS or any background processes at risk?

John Gordon avatar
to flag
The downside of providing examples is that some folk will think you've given them a complete list...
br flag
Anything that calculates an average over a fixed amount of time and records it.
Quora Feans avatar
cz flag
Are you considering mistakes causes by people using the software? For example, in some places when switching daylight savings time, trains are scheduled to hold/do not depart during the one hour which could be ambiguous.
cl flag
Pretty much anything that uses certificates.
Cecilia avatar
mx flag
@QuoraFeans No, I'm just looking for software issues.
Cecilia avatar
mx flag
@JohnGordon I understand that a list will not be comprehensive. I just want enough understanding to weigh the risks of using a time jump to synchronize the clocks.
stark avatar
mu flag
I'm always surprised when my screen locks and goes dark.
Score:32
in flag

This is a bit of open question but let me give some examples:

  • databases - most of them rely a lot of precise time for storing records, indexes, etc
  • security - precise time is very important for security to map action to time and gaps or time duplication is not accepted
  • digital signing - usually part of signed document is the timestamp so wrong time may invalidate the signature
  • scheduling software - may skip or repeat twice jobs depend of time jump direction.
  • clustering software - probably any cluster will need to be in sync and any jump of one or more nodes may have unpredictable result.
in flag
Task scheduling software like cron and [xcron](https://github.com/cubiclesoft/xcron) can be affected by wild swings in system time. Regular cron is especially susceptible since it can skip running scripts or run scripts multiple times depending on the direction that time moves in while several cron replacements keep track of schedules that have run and which have been missed.
Romeo Ninov avatar
in flag
@CubicleSoft, correct, thank you. Will add it in my answer :)
Charles Duffy avatar
cn flag
Also, anything that's doing cluster coordination. zookeeper, consul, etcd... this makes _clustered databases_ particularly sensitive; as an extreme example, Google's Spanner was designed assuming atomic clocks (or GPS equivalent) at every location.
Romeo Ninov avatar
in flag
@CharlesDuffy, correct, thank you. Will add it too :)
rexkogitans avatar
jp flag
Monitoring software - may detect that a service did not respond after a timeout, although there was no timeout in reality.
Romeo Ninov avatar
in flag
@rexkogitans, you are right. But IMHO the end result can be one more ticket created (eventually) by monitoring software and not serious service affect,
rexkogitans avatar
jp flag
@RomeoNinov I am thinking of software like Monit which may be configured to restart non-responding services.
Romeo Ninov avatar
in flag
@rexkogitans, such services are questionable... I did not saw big corporations to use them. If they need continuous service they use clusters or bunch of servers behind loadbalancers.
Hauleth avatar
mx flag
I think that a lot of clustering software use system-independent clock like Lamport's clock to synchronise inner workings instead of relying on untrusted OS source.
Cpt.Whale avatar
cn flag
Authentication (separate from security/logging) is regularly limited by time too. Auth tokens, certificates, sessions, etc. A common one is kerberos, which will fail for >5 minute gaps by default, and Radius can default even lower
Romeo Ninov avatar
in flag
@Cpt.Whale, see second point :)
Score:13
bz flag

I recently got bit by a bug that dates back to 1999 and affects both the JVM and Android Runtime: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4290274

... two extra executions are fired (unexpectedly) when the system clock is set ahead one minute after the task is scheduled using scheduleAtFixedRate().

I work on a device that starts with the 1970 epoch as the current time, then receives the correct network time a little later. Occasionally a 3rd party library would initialize before the time was set, causing it to experience a 50 year time jump.

The result was scheduleAtFixedRate attempting to catch up on ~50 years worth of invocations... which was about 27 million back-to-back invocations with no delay between them.

That would cause the GC to go haywire and generally bog down the system until it was restarted

Score:9
in flag

All software that interacts with real-live hardware. If you have a toaster that toasts bread for 20 seconds, and its software is stupid enough to check against the wall clock, you'll either get white or burned bread if you correct the clock while waiting for your toast.

Practically all applications that control any kind of industrial device need precise timings, like, for example, "open a valve for 5.3 seconds to get the correct amount of fluid". Being off by more than a few milliseconds ruins your product.

Applications that position anything using motors will either use step motors (which are slow) or end switches to determine when to stop. But often, you don't have a switch at every important position, so you'll do some "x m/s for A milliseconds, then y m/s for B milliseconds" logic. Now imagine your NTP daemon adjusts the time by even a single millisecond while this logic is running ...

user71659 avatar
in flag
If stuff is millisecond-precise then you want a time jump. Slewing is going to make the clock inaccurate over a long period of time. A time jump will only screw you up once.
sa flag
All such software *should* be programmed to use the monotonic clock which does not jump. That doesn't mean it is.
Ben Voigt avatar
pl flag
@user253751: A monotonic clock is allowed to jump (but only forward).; you are wanting the steady clock
sa flag
@BenVoigt operating systems I'm familiar with call the clock which does not jump at all "monotonic"
Score:4
in flag

We had an issue with an on-vehicle embedded system where the clock would significant lose time (due to an electrical problem). But the wireless connections were intermittent, so the time only occasionally corrected. The upshot was that when the vehicles finally received wireless, and then an NTP update, the clock would jump forward significantly.

Various systems were checking the "last valid" time of certain things like GPS readings, etc. Suddenly all of these were "old", despite being updated only 0.5 seconds before.

Obviously a reconfiguration could fix the issue, but it was an issue.

Romeo Ninov avatar
in flag
My personal view is you had a not well designed system. Because GPS satellites for example send precise time and can be used for time sync. And this time source is widely used for systems w/o network connectivity.
in flag
@RomeoNinov - Well the core tenant of engineering is that it's a bunch of trade-offs. We traded using precise GPS time with having time when there was no GPS - like when the vehicle has just started, in a workshop, or underground. Your system can't just say "hang on a minute" when the customer drives off without a GPS fix. When there's a good GPS signal, we can (and do) use it to synch the clock. But it's simply not always available. (And before someone else mentions it, GPS repeaters are not accurate enough for our hardware).
ph flag
Systems should use monotonic clock instead of wall clock time for operations that do not require wall clock time. Also it is advisable to have your own NTP server running that collects time from multiple sources - GPS and external NTPs. But I can see why an embedded system may choose to keep things simpler. edit: now I see that in alfgaar's answer.
Score:2
tr flag

Dovecot IMAP server is affected and (in older versions) it (deliberately) suicides if it detects the system time having jumped backwards. In v2.0, it at least tries to remedy the situation.

See https://wiki.dovecot.org/TimeMovedBackwards

Score:2
in flag

Plenty of examples...

Score:1
cn flag

Most game engines use an update loop that take a delta of the time between the previous and current time. Sometimes a time change or program suspension/resume will cause this delta to be huge. Typically you just filter out large deltas as an outlier.

Score:1
ng flag

It's already in a comment, but I thought I'd post it as an answer too:

Applications that should have used the steady monotonic clock but don't are also affected. For example, if software checks client keep-alives using the current time, a jump in time may kick out all clients.

I've seen regularly that software uses the wrong clock.

tomlogic avatar
gd flag
This is my preferred technique when writing embedded software. We keep track of elapsed time from power on/reset, and use that in calculations that measure elapsed time. For "wall clock" or current time/date, you maintain a "skew" value that you can add to your elapsed time value. That skew will adjust up/down by a few seconds when you sync with GPS or ntp, but you're typically only using it in your UI or to generate headers.
Score:0
us flag

Everyday normal web browsing

Really.

Anything to do with encryption deals in certificates. The certificates must be validated before they are accepted. Part of the validation process is checking the certificate is not expired, which obviously implicates your computer clock. If your let your computer clock get too far out of sync with reality, certificate validation on the computer will fail.

This matters, because pretty much every web page you access these days is transmitted via HTTPS, which uses TLS encryption (and certificate validation) to ensure the integrity of the page contents.

In other words, if you let your clock get off, you might not be able to even browse the web normally.

Now playing with an NTP daemon — where the whole point is keeping your system clock more or less accurate — is unlikely to create a shift large enough to matter. But point it at the wrong time source, and you could easily create this effect.

Additionally, a number of things that deal in authentication rely on the clocks between the user's computer and the server being relatively in sync, with tolerances limited to sometimes no more than a few minutes difference.

Lambda Fairy avatar
vn flag
Chromium has a "sane time" system to detect this problem: https://www.chromium.org/developers/design-documents/sane-time/
Score:0
in flag

Timing Things

It appears to be the obvious, but according to Falsehoods Programmers Believe About Time, due to lack of knowledge or support for a monotonically time source, developers often use system time to measure how long a process takes, which can account for incorrect measures, if between the two measurements the system clock has changed, like a value which is:

  • slightly bigger than what was the correct;
  • negative (which can likely crash system rellying in a positive value);
  • a hugely bigger than what is correct due to the integer signal bit flip for negative numbers and a incorrect type conversion (a signed -1 has the same memory representation as the maximum unsigned integer value 0xFFFFFFFF)
// this C code prints 4294967295
printf("%u", (unsigned int) -1);

// this comparison is true
if (0xFFFFFFFF == (unsigned int) -1) {
}

Cache Invalidation

As a wise man once said:

There are only two hard things in Computer Science: cache invalidation and naming things. Phil Karlton

Often distributed systems, use a very short time for cache invalidation, which can be a conservative value, like 60s or higher (which most of dynamic DNS uses), down to a few milliseconds.

Using NTP you can ensure that all computers in a local network are synced down bellow a millisecond or a few milliseconds over internet regarding the correct UTC time.

With that in mind, even an submillisecond call to a cache server in the local network (like Redis) can also be cached in local memory for nanosecond response time.

However, there is a thing called Leap Second which makes this kind of aggressive millisecond caching very hard, as the reference clock can either jump one second ahead or behind the current clock.

The difference between 1s or -1s could mean that the current value that we thing is correct is not the more recent value or a value that is still correct are treated as if it is already too old, causing the system to query the source of true too often, slowing down the system or even crashing it.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.