Score:3

Windows Server Time-Service jumps into the future and (partially) back

kr flag

This looks like the same issue as Windows Server 2022 Time Service Jumping into the future. I've also added a support ticket at Microsoft (Feedback Hub) for the issue: https://aka.ms/AAkwnpl

As the system clock is essential for correctly working software and probably the most central shared mutable state, this issue is wreaking havoc on both our systems and everyone we communicate with, causing ripples all the way to critical infrastructure.

We noticed this for the first time in august 2022 on a 2019 server. The clock was set to January 2023, but corrected itself. Unfortunately, this was found some time after logs had been purged, so we were unable to debug it further.

But last month, we experienced it again, this time on a 2016 server. The clock was set to 55 days to the future.

15 seconds later, Time-Service noticed the clock was different than our domain controller, and that it must change the clock back -4454176 seconds. It backs off as it things it's larger than 4294967295.

15 minutes after the first change, the clock is set again, this time backwards to 12h26m43s in the future.

15 seconds after the second change, Time-Service notice the clock is off and this time corrects it as it's within a reasonable window.

And then the same thing happened again three weeks later on the same server, only differing in details. In the mean time, the server had both been rebooted and updated with a new monthly update.

We're using VMWare, configured with two physical hardware clocks. We have two domain controllers configured to use pool.ntp.org -- should probably be moved to our own stratum 0 hardware, although it's probably not related to our issues.

With the help from a few external experts, we have pretty much excluded erroneous configuration, manual intervention (by mistake, security breach or disloyal employee) and hardware issues, and we're left with "strange Windows bug".

Unfortunately, 2016 doesn't include much details related to these events, so it's difficult to debug further. 2019+ includes more information.

@chris1out in Windows Server 2022 Time Service Jumping into the future had the same issue for servers not enrolled in a domain, so we can probably rule out the domain controller. It was also using the standard time server and not pool.ntp.org. This means we can probably rule out those two too. This pretty much leaves a bug in Time-Service as the probable cause. This serverfault question is the only documented event of this we're able to find.

djdomi avatar
za flag
check the bios battery and the timezones and verify that the time source is reliable
simendsjo avatar
kr flag
This is not a OS running on bare metal, so issues like that should hit multiple VMs. Each iron is hosting a lot of VMs. VMWare is configured to using our two stratum 0 hardware clocks. pool.ntp.org has been used for our domain controllers though, but they are serving hundred of VMs and hundreds of user computers.
djdomi avatar
za flag
it could also happen the rare timeahift bug, where the vm adjust to high or low the time. I had this ad a similar situation and fixed it by installation of a third party tool, sometime it was also possible to use w32tm to fix it within windows native
simendsjo avatar
kr flag
What "the rare timeahift bug" and what is "high or low the time"? What "third party tool" and what "w32tm" related changes to fix "it"? I need some details in order to look anything up.
Score:3
kr flag

UPDATE: Ars Technica has written an article on this problematic feature: https://arstechnica.com/security/2023/08/windows-feature-that-resets-system-clocks-based-on-random-data-is-wreaking-havoc/


TL;DR: We have found the most likely root cause: W32time Secure Time Seeding which looks at the legacy "time" value in SSL handshake headers, which is random in newer SSL implementations, interprets it as the correct time, and sets the clock accordingly.

It can be turned off by setting the UtilizeSSLTimeData registry key: reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32time\Config /v UtilizeSslTimeData /t REG_DWORD /d 0 /f

And w3tm can be instructed to reread it's configuration: W32tm.exe /config /update


The longer answer...

Microsoft shipped a feature called Secure Time Seeding in November 2015 which is is included in Windows Server 2016+ and turned on by default.

It's an attempt to mitigate the problem where a system has no power to drive the system clock (e.g. a power failure and bad CMOS battery) and thus have a completely wrong time on boot, unable to securely communicate with other sources to reliably get the correct time.

During outgoing SSL handshakes, it looks at "ServerUnixTime" (probably gmt_unix_time in the specification).

The TLS 1.2 specification says the following (emphasis mine):

The current time and date in standard UNIX 32-bit format (seconds since the midnight starting Jan 1, 1970, UTC, ignoring leap seconds) according to the sender's internal clock. Clocks are not required to be set correctly by the basic TLS protocol; higher-level or application protocols may define additional requirements. Note that, for historical reasons, the data element is named using GMT, the predecessor of the current worldwide time base, UTC.

Microsoft's initial blog says the following in their post (emphasis mine):

The ServerUnixTime is supposed to be the current system time on the server, but it can also be set to a random value by some SSL implementations. We have observed that most servers provide a fairly accurate value in this field and the rest provide random values. We use this data field assuming it is somewhat accurate but can also be incorrect.

It shows they have misinterpreted the specification and their implementation is designed and operates under wrong assumptions. Their implementation might have worked fine when the world was still using older and more insecure implementations, but as more and more servers are updating, the premise is completely wrong.

At least they don't trust a single source, but as less sources provides the time and more sources use random values, it's probably only a matter of time before the 4 bytes are similar enough to confuse their algorithm.

We took the approach to not trust the data from a lone server, irrespective of who the server identifies itself as. We rely on corroborating information from multiple servers to arrive at a common truth about the current time.

They further describe that they are using statistical methods to see when they can interpret the random bytes as a correct time (emphasis mine):

The information from ServerUnixTime and OCSP validity periods are merged to produce the smallest possible reliable time range value along with a confidence score. When the confidence score is sufficiently high, this data becomes information.

The gmt_unix_time field was discussed on the TLS mailinglist in September 2013, two years before Microsoft shipped this feature, implemented in OpenSSL October 2013, and shipped January 2014.

commit 2016265dfbab162ec30718b5e7480add42598158
Author: Nick Mathewson <[email protected]>
Date:   Sun Oct 20 15:03:24 2013 -0700

    Do not include a timestamp in the Client/ServerHello Random field.

    Instead, send random bytes, unless SSL_SEND_{CLIENT,SERVER}RANDOM_MODE
    is set.

    This is a forward-port of commits:
      4af793036f6ef4f0a1078e5d7155426a98d50e37
      f4c93b46edb51da71f09eda99e83eaf193a33c08
      3da721dac9382c48812c8eba455528fd59af2eef
      2583270191a8b27eed303c03ece1da97b9b69fd3

    While the gmt_unix_time record was added in an ostensible attempt to
    mitigate the dangers of a bad RNG, its presence leaks the host's view
    of the current time in the clear.  This minor leak can help
    fingerprint TLS instances across networks and protocols... and what's
    worse, it's doubtful thet the gmt_unix_time record does any good at
    all for its intended purpose, since:

        * It's quite possible to open two TLS connections in one second.

        * If the PRNG output is prone to repeat itself, ephemeral
          handshakes (and who knows what else besides) are broken.

commit 2927791d77ddaef687e92b1779e0bff89bdc279f
Author: Nick Mathewson <[email protected]>
Date:   Sun Oct 20 15:08:58 2013 -0700

    Fix another gmt_unix_time case in server_random

### Major changes between OpenSSL 1.0.1e and OpenSSL 1.0.1f [6 Jan 2014]

  * Don't include gmt_unix_time in TLS server and client random values
  * (.. unrelated changes ..)

u/zanatwo found this issue March 2017 and reported it on r/sysadmin, but there's no indication Microsoft knows about this issue as it's still enabled.

It was rediscovered by u/Thranx January 2022 and also reported on r/sysadmin.

And again starting beginning of this year by @chris1out on ServerFault.

As we and others are experiencing, the issue is increasing in frequency, probably due to less servers reporting the actual time and just random values. It's probable that this will continue to increase in frequency and hit more users.

The system clock is the most important shared mutable state on the system, and bugs which change the time to a wildly different value wreaks havoc on all systems and have repercussions far beyond the single server it happens on. This is without a doubt the most serious bug/misfeature I've ever encountered, and Microsoft needs to disable this ASAP.

Getting in touch with Microsoft is difficult, so please report this issue to Microsoft if you're experiencing the same issue.

Thanks a lot to @test-is-prod for sharing their findings and pointing me to the Reddit post by /u/zanatwo!

References:

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.