Score:0

DFS Replication slow to a single server

de flag

This is a proper head-scratcher...

A customer has a domain with eight DCs. Two (including the PDC) are Hyper-V VMs in our datacentre, four are vSphere VMs at national site offices (not RODCs) and the other two are also vSphere VMs in a 3rd-party datacentre.

SYSVOL replication is all but instantaneous between the PDC (DC1001) and the site DCs, yet between DC1001 and the other Hyper-V DC (DC1002), replication is taking a couple of hours.

I've checked AD Sites and Services to ensure that all the links are in place and I can see direct connections between the two Hyper-V boxes.

We've put the two Hyper-V boxes on the same host to see if there was a Hyper-V networking problem, but slow replication is persisting.

I've run a DFS Replication heath check report and the results are a bit confusing to say the least... Of all eight DCs, none of them have any backlogged receiving transactions yet all of them except DC1002 have over 300 backlogged sending transactions. DC1002 has no backlog at all and this is the "slow" node in the web. How can this be?

DFSR Diag commands (with the backlog exception) report everything to be hunky-dory yet there must be an explanation as to why replication is taking hours to show on DC1002.

I'm no Hyper-V expert, so there may well be something I've missed there.

There appears to be no file replication taking place, just SYSVOL.

There are also associated entries in Event Viewer > Applications and Services Logs > DFS Replication

Log Name: DFS Replication
Source: DFSR
Date: 20/01/2022 11:42:06
Event ID: 5014
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: DC1002.domain.local
Description:
The DFS Replication service is stopping communication with partner Site1-001 for replication group Domain System Volume due to an error. The service will retry the connection periodically.

Additional Information:
Error: 1726 (The remote procedure call failed.)

I've looked up this error in relation to DFS replication, but there's a myriad of solutions for all kinds of problems so this seems like an unassailable minefield.

The RPC service is running on all hosts (Auto startup), there are not firewall rules blocking this and general network connectivity is fine throughout.

Any advice or experience with this problem would be greatly appreciated!!!

EDIT: As a note, the two Hyper-V DCs reside in the same site so inter-site replication does not take place for these two I would guess. Also, all inter-site intervals are set to 15 minutes.

joeqwerty avatar
cv flag
The default replication interval for Intersite replication is 180 minutes, so I'd suggest making sure that Active Directory Sites and Services is configured appropriately for your sites and the Domain Controllers at each site.
Rich M avatar
de flag
Fair comment @joeqwerty, thanks, I'll check this. What do you think would be causing changes to replicate to all other DCs almost instantaneously?
cn flag
`I've looked up this error in relation to DFS replication, but there's a myriad of solutions for all kinds of problems so this seems like an unassailable minefield.` How is that? I'm inclined to think a packet capture between the two DC's would provide at least some information.
Rich M avatar
de flag
@GregAskew - Thanks for your response. I've asked our network team for the capture and am awaiting a reply. There's a number of search results relating to DISM errors which I'm not sure this is, and there's a number of proposed solutions for this: Stopped RPC Service; Name Resolution Issues; Traffic Blocked at Firewall; Network Connectivity Issues. I know that it's none of these as replication is completing, albeit slowly. I'm also finding a lot of results for replication not working at all or replication being slow overall, nothing for a single node.
cn flag
You should be able to start a capture, and initiate a sync. Ports of interest would include tcp/135,5722, and the high ports (49152+). Also Hyper-V network adapters have a history of causing network issues, that may be worth checking.
Score:0
de flag

Turns out performing a non-authoritative synchronization of DFSR-replicated sysvol replication was the way forward.

  1. In the ADSIEDIT.MSC tool, modify the following distinguished name (DN) value and attribute on each of the domain controllers (DCs) that you want to make non-authoritative:

CN=SYSVOL Subscription,CN=Domain System Volume,CN=DFSR-LocalSettings,CN=<the server name>,OU=Domain Controllers,DC=<domain> msDFSR-Enabled=FALSE

  1. Force Active Directory replication throughout the domain.
  2. Run the following command from an elevated command prompt on the same servers that you set as non-authoritative:

DFSRDIAG POLLAD

  1. You'll see Event ID 4114 in the DFSR event log indicating sysvol replication is no longer being replicated.
  2. On the same DN from Step 1, set msDFSR-Enabled=TRUE.
  3. Force Active Directory replication throughout the domain.
  4. Run the following command from an elevated command prompt on the same servers that you set as non-authoritative:

DFSRDIAG POLLAD

  1. You'll see Event ID 4614 and 4604 in the DFSR event log indicating sysvol replication has been initialized. That domain controller has now done a D2 of sysvol replication.

https://docs.microsoft.com/en-US/troubleshoot/windows-server/group-policy/force-authoritative-non-authoritative-synchronization

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.