Score:4

Windows Server 2022 Time Service Jumping into the future

hm flag

A handful of times over the last few months one of our Windows Server 2022 boxes will show a warning in the Event Log similar to this

The time service has set the time with offset 19630688 seconds.,

which will then jump forward in time that many seconds. Then a few minutes later it seems to sync with the time server again and we get the following error

The time service has detected that the system time needs to be changed by -19630688 seconds. The time service will not change the system time by more than 54000 seconds. Verify that your time and time zone are correct, and that the time source time.nist.gov (ntp.m|0x0|0.0.0.0:123->128.138.140.44:123) is working properly.

After a few more errors similar to those in phrasing the clock does correct itself, usually within about 10 minutes. The box isn't on a domain, and it's using the default timeserver in Windows. Any idea what could be going wrong here?

cn flag
If something is setting the time 227 days off, I would check defective hardware or another offending process.
chris1out avatar
hm flag
Hardware could be the problem, but I’ve moved to another machine from my IaaS provider and it happened again.
cn flag
Ah, if that is an IAAS provider then you are off topic here- there is nothing we can do. THEY have a support hotline and they have access to the physical hardware to do debugging. We do not. And you can likely not even answer us questions on that level. Issue a support ticket.
chris1out avatar
hm flag
Okay, jackass, my point was it's probably not a hardware problem since it's happened on multiple machines. I was wondering if anyone else had seen windows doing this when not on a domain. Clearly I'm not the only one that has seen this due to the below "answer".
simendsjo avatar
kr flag
@chris1out: I got in touch with test-is-prod, and we're even located in the same small city (different datacenters I presume). Do you want to get in touch so we could gather forces "against" VMWare and Microsoft to prioritize this issue? You can reach me at [email protected]
Score:4
kr flag

UPDATE: Ars Technica has written an article on this problematic feature: https://arstechnica.com/security/2023/08/windows-feature-that-resets-system-clocks-based-on-random-data-is-wreaking-havoc/


TL;DR: We have found the most likely root cause: W32time Secure Time Seeding which looks at the legacy "time" value in SSL handshake headers, which is random in newer SSL implementations, interprets it as the correct time, and sets the clock accordingly.

It can be turned off by setting the UtilizeSSLTimeData registry key: reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32time\Config /v UtilizeSslTimeData /t REG_DWORD /d 0 /f

And w3tm can be instructed to reread its configuration: W32tm.exe /config /update


The longer answer...

Microsoft shipped a feature called Secure Time Seeding in November 2015 which is is included in Windows Server 2016+ and turned on by default.

It's an attempt to mitigate the problem where a system has no power to drive the system clock (e.g. a power failure and bad CMOS battery) and thus have a completely wrong time on boot, unable to securely communicate with other sources to reliably get the correct time.

During outgoing SSL handshakes, it looks at "ServerUnixTime" (probably gmt_unix_time in the specification).

The TLS 1.2 specification says the following (emphasis mine):

The current time and date in standard UNIX 32-bit format (seconds since the midnight starting Jan 1, 1970, UTC, ignoring leap seconds) according to the sender's internal clock. Clocks are not required to be set correctly by the basic TLS protocol; higher-level or application protocols may define additional requirements. Note that, for historical reasons, the data element is named using GMT, the predecessor of the current worldwide time base, UTC.

Microsoft's initial blog says the following in their post (emphasis mine):

The ServerUnixTime is supposed to be the current system time on the server, but it can also be set to a random value by some SSL implementations. We have observed that most servers provide a fairly accurate value in this field and the rest provide random values. We use this data field assuming it is somewhat accurate but can also be incorrect.

It shows they have misinterpreted the specification and their implementation is designed and operates under wrong assumptions. Their implementation might have worked fine when the world was still using older and more insecure implementations, but as more and more servers are updating, the premise is completely wrong.

At least they don't trust a single source, but as less sources provides the time and more sources use random values, it's probably only a matter of time before the 4 bytes are similar enough to confuse their algorithm.

We took the approach to not trust the data from a lone server, irrespective of who the server identifies itself as. We rely on corroborating information from multiple servers to arrive at a common truth about the current time.

They further describe that they are using statistical methods to see when they can interpret the random bytes as a correct time (emphasis mine):

The information from ServerUnixTime and OCSP validity periods are merged to produce the smallest possible reliable time range value along with a confidence score. When the confidence score is sufficiently high, this data becomes information.

The gmt_unix_time field was discussed on the TLS mailinglist in September 2013, two years before Microsoft shipped this feature, implemented in OpenSSL October 2013, and shipped January 2014.

commit 2016265dfbab162ec30718b5e7480add42598158
Author: Nick Mathewson <[email protected]>
Date:   Sun Oct 20 15:03:24 2013 -0700

    Do not include a timestamp in the Client/ServerHello Random field.

    Instead, send random bytes, unless SSL_SEND_{CLIENT,SERVER}RANDOM_MODE
    is set.

    This is a forward-port of commits:
      4af793036f6ef4f0a1078e5d7155426a98d50e37
      f4c93b46edb51da71f09eda99e83eaf193a33c08
      3da721dac9382c48812c8eba455528fd59af2eef
      2583270191a8b27eed303c03ece1da97b9b69fd3

    While the gmt_unix_time record was added in an ostensible attempt to
    mitigate the dangers of a bad RNG, its presence leaks the host's view
    of the current time in the clear.  This minor leak can help
    fingerprint TLS instances across networks and protocols... and what's
    worse, it's doubtful that the gmt_unix_time record does any good at
    all for its intended purpose, since:

        * It's quite possible to open two TLS connections in one second.

        * If the PRNG output is prone to repeat itself, ephemeral
          handshakes (and who knows what else besides) are broken.

commit 2927791d77ddaef687e92b1779e0bff89bdc279f
Author: Nick Mathewson <[email protected]>
Date:   Sun Oct 20 15:08:58 2013 -0700

    Fix another gmt_unix_time case in server_random

### Major changes between OpenSSL 1.0.1e and OpenSSL 1.0.1f [6 Jan 2014]

  * Don't include gmt_unix_time in TLS server and client random values
  * (.. unrelated changes ..)

u/zanatwo found this issue March 2017 and reported it on r/sysadmin, but there's no indication Microsoft knows about this issue as it's still enabled.

It was rediscovered by u/Thranx January 2022 and also reported on r/sysadmin.

And again starting beginning of this year by @chris1out on ServerFault.

As we and others are experiencing, the issue is increasing in frequency, probably due to less servers reporting the actual time and just random values. It's probable that this will continue to increase in frequency and hit more users.

The system clock is the most important shared mutable state on the system, and bugs which change the time to a wildly different value wreaks havoc on all systems and have repercussions far beyond the single server it happens on. This is without a doubt the most serious bug/misfeature I've ever encountered, and Microsoft needs to disable this ASAP.

Getting in touch with Microsoft is difficult, so please report this issue to Microsoft if you're experiencing the same issue.

Thanks a lot to @test-is-prod for sharing their findings and pointing me to the Reddit post by /u/zanatwo!

References:

chris1out avatar
hm flag
Excellent information. Not going to lie, most of this was over my head. I'm not at all a windows or server admin...just kind of got stuck with the job, so I appreciate all of the detail here.
Score:1
mu flag

We're having the same problem with a handful of our servers. A mix of 2019 and 2016. Both physical and VM's. I've looked at logs for days but cant find any comparison. Opened a ticket with MS but so far not anything of value there of finding out exactly what caused it.

You can try running this and see if it helps:

net stop w32time
w32tm /unregister
w32tm /register
net start w32time
w32tm /resync /nowait
w32tm /resync /rediscover
simendsjo avatar
kr flag
Could you add a link to the ticket at MS?
simendsjo avatar
kr flag
I added a support ticket at Microsoft (Feedback Hub) for this issue: https://aka.ms/AAkwnpl
test-is-prod avatar
mu flag
I opened a premier ticket in our organization. MS are really most interested in the DC logs but I would think more servers would be affected if that was the case. Not sure if its the hypervisors eitherl. I see some NTP errors in our esxi's but none that matches the time the servers jumps in time.
simendsjo avatar
kr flag
If you're in contact with MS, maybe you could give a hint that we're also experiencing this issue? It's by far the worst bug I've ever encountered, so I really hope MS takes it seriously. We're open to helping them debug the issue as it's terrible when it happens, and could be devastating if hitting the right servers.
test-is-prod avatar
mu flag
Yes its really bad. Some SQL servers was affected by this and it really screwed them up. Could you give some brief information about your environment?
simendsjo avatar
kr flag
I was forced to create a new question: https://serverfault.com/questions/1131670/windows-server-time-service-jumps-into-the-future-and-partially-back?noredirect=1&lq=1 Look at my profile for my homepage and email if you want to discuss this further.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.