Score:0

Identifying source of memory drain on Windows Server 2019

sy flag

Every Saturday night around midnight the server in question experiences a sudden loss of available memory. Over the course of about 35-40 minutes the available memory drops until the OS is unable to function and locks up. The OS is unresponsive at that point until the server is rebooted.

There are many alerts in the Windows Event logs during the resource depletion from various processes complaining that they’re out of resources. For example, you can track the progression of SQL Server as more and more of its process memory is paged out. Eventually the log entries stop as the OS locks up, and the next entry is after the server is rebooted.

I checked Task Scheduler and didn’t see anything obvious running at midnight on Saturdays that could be the cause.

Last weekend I ran Windows Performance Monitor on a scheduled task to track and log the following parameters during the crash:
Total available memory
page file bytes (broken down by process)
total page file bytes
private bytes (by process)
private bytes (total)
virtual bytes (by process)
virtual bytes (total)
working set (by process)
working set (total)
working set – private (by process)
working set - private total
Total processor usage

The total available memory column in the log shows a clear drop starting around midnight until around 12:40am when available memory is zero. Similarly, you can see the working set memory for each individual process reduce. However, there doesn’t appear to be an obvious culprit for the memory loss. There’s no record I can see of a specific process increasing in memory usage while everything else drops.

I did force a Non-Maskable Interrupt this time around so I could look at the memory dump, but I could only locate a minidump which had very little useful information. I’m not sure if that was due to the size of the page file at the time, or something caused by the resource depletion, or Windows deleted it automatically. As far as I can tell the settings are default for Windows Server 2019. The page file is now the size of the RAM (16GB) so it’s possible Windows changed it (https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/automatic-memory-dump), and a further NMI will result in a correctly-saved memory dump. I’ll try it again next Monday.

I am unsure how best to proceed from here and would appreciate any ideas. Are there any parameters I missed from my Performance Monitor log that I should have included, for example?

Thanks.

cn flag
16 GB RAM SQL Server.
Score:0
es flag

Questions :

  • is SQL server the only installed service/application ?
  • is there an antivirus active is the machine a VM ?
  • is this instance of SQL Server alone or do you have other SQL Server instances on this machine ?
  • Do you have limited the memory used by SQL Server to 12 Gb ?
DB_2022 avatar
sy flag
It's a DPM server so it has DPM as well as SQL. Other than that, there's nothing much else on there. Windows Defender's the only antivirus. It does its daily scan at 5am. It's the only SQL instance. Memory is not limited, but as I said above I see no evidence of SQL taking up the memory. SQL complains it's losing memory in the Event logs. My process monitor shows memory usage for SQL clearly dropping, not rising. If SQL was the cause of the memory loss, wouldn't I see it grabbing more and more memory in the process monitor?
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.