This is a proper head-scratcher...
A customer has a domain with eight DCs. Two (including the PDC) are Hyper-V VMs in our datacentre, four are vSphere VMs at national site offices (not RODCs) and the other two are also vSphere VMs in a 3rd-party datacentre.
SYSVOL replication is all but instantaneous between the PDC (DC1001) and the site DCs, yet between DC1001 and the other Hyper-V DC (DC1002), replication is taking a couple of hours.
I've checked AD Sites and Services to ensure that all the links are in place and I can see direct connections between the two Hyper-V boxes.
We've put the two Hyper-V boxes on the same host to see if there was a Hyper-V networking problem, but slow replication is persisting.
I've run a DFS Replication heath check report and the results are a bit confusing to say the least...
Of all eight DCs, none of them have any backlogged receiving transactions yet all of them except DC1002 have over 300 backlogged sending transactions. DC1002 has no backlog at all and this is the "slow" node in the web. How can this be?
DFSR Diag commands (with the backlog exception) report everything to be hunky-dory yet there must be an explanation as to why replication is taking hours to show on DC1002.
I'm no Hyper-V expert, so there may well be something I've missed there.
There appears to be no file replication taking place, just SYSVOL.
There are also associated entries in Event Viewer > Applications and Services Logs > DFS Replication
Log Name: DFS Replication
Source: DFSR
Date: 20/01/2022 11:42:06
Event ID: 5014
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: DC1002.domain.local
Description:
The DFS Replication service is stopping communication with partner Site1-001 for replication group Domain System Volume due to an error. The service will retry the connection periodically.
Additional Information:
Error: 1726 (The remote procedure call failed.)
I've looked up this error in relation to DFS replication, but there's a myriad of solutions for all kinds of problems so this seems like an unassailable minefield.
The RPC service is running on all hosts (Auto startup), there are not firewall rules blocking this and general network connectivity is fine throughout.
Any advice or experience with this problem would be greatly appreciated!!!
EDIT:
As a note, the two Hyper-V DCs reside in the same site so inter-site replication does not take place for these two I would guess. Also, all inter-site intervals are set to 15 minutes.