Score:1

Unexpected shutdown of Server (BSOD) with message “WHEA_UNCORRECTABLE_ERROR”

in flag

When we checked the system event log we found that following warning has been logged repeatedly.

Event 17
A corrected hardware error has occurred.
Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)
Bus:Device:Function: 0x0:0x2:0x0
Vendor ID:Device ID: 0x8086:0x6F04
Class Code: 0x30400

And when system got shutdown unexpectedly (BSOD) the below error was logged.

Event 16
A fatal hardware error has occurred.
Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)
Bus:Device:Function: 0x0:0x2:0x0
Vendor ID:Device ID: 0x8086:0x6F04
Class Code: 0x30400

The system shutdown unexpectedly (20-7-21) only once with the above error (event 16) even though the warning (event 17) was logged daily since creation of the server machine (27-03-2021).

Crash dump analysis of the BSOD:

Crash dump file: D:\MEMORY.DMP
This was probably caused by the following module: pci.sys (pci+0x1364B)
Bug check code: 0x124 (0x4, 0xFFFFE000C7D1E038, 0x0, 0x0)
Error: WHEA_UNCORRECTABLE_ERROR
File path: C:\Windows\system32\drivers\pci.sys
Product: Microsoft® Windows® Operating System
Company: Microsoft Corporation
Description: NT Plug and Play PCI Enumerator
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem.
The crash took place in a Microsoft module. Your system configuration may be incorrect. Possibly this problem is caused by another driver on your system that cannot be identified at this time.

We have tried

We have updated to the latest windows server 2012 R2 (v6.3.9600 Build 9600)

All relevant drivers have been updated to the latest version

PCI.sys has been updated to latest version (v6.3.9600.18939)

Server Details:

Motherboard: AsrockRack Server Board EP2C612D16NM-2T8R
Raid: Dell (LSI OEM) 9341-8I mega raid (Latest Firmware)
Processor: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10 GHz, 2100 MHz
OS: Microsoft Windows Server 2012 R2 Standard
OS Version: 6.3.9600 Build 9600
Score:0
cn flag

If you already have updated the operative system and drivers to the latest version, then maybe you should consider to update the firmware server to the latest version too. The error message you are getting points to a faulty hardware too, for the error text a PCI related component. Other reasons can be that your server is overheating.

You can get several other options to try to troubleshoot this problem in this and in this documents.

I hope this can be helpful for you.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.