I have a lab that contains the ACME domain which ACME-DC2 and ACME-DC3. It has a child domain called LAB with LAB-DC1. I made some infrastructure changes to my environment that broke Active Directory DNS for a while. Replication between my ACME domain to the LAB domain appears to be broken:
Starting test: Replications
[Replications Check,LAB-DC1] A recent replication attempt failed:
From ACME-DC2 to LAB-DC1
Naming Context: DC=ForestDnsZones,DC=ACME,DC=local
The replication generated an error (1256):
The remote system is not available. For information about network troubleshooting, see Windows Help.
The failure occurred at 2022-01-14 13:54:35.
The last success occurred at 2019-05-03 19:45:51.
20747 failures have occurred since the last success.
REPLICATION LATENCY WARNING
ERROR: Expected notification link is missing.
Source ACME-DC2
Replication of new changes along this path will be delayed.
This problem should self-correct on the next periodic sync.
[Replications Check,LAB-DC1] A recent replication attempt failed:
From ACME-DC2 to LAB-DC1
Naming Context: CN=Schema,CN=Configuration,DC=ACME,DC=local
The replication generated an error (5):
Access is denied.
The failure occurred at 2022-01-14 13:54:35.
The last success occurred at 2019-05-03 19:45:51.
20738 failures have occurred since the last success.
[Replications Check,LAB-DC1] A recent replication attempt failed:
From ACME-DC2 to LAB-DC1
Naming Context: CN=Configuration,DC=ACME,DC=local
The replication generated an error (5):
Access is denied.
The failure occurred at 2022-01-14 13:54:34.
The last success occurred at 2019-05-03 19:45:51.
20771 failures have occurred since the last success.
REPLICATION LATENCY WARNING
ERROR: Expected notification link is missing.
Source ACME-DC2
Replication of new changes along this path will be delayed.
This problem should self-correct on the next periodic sync.
[Replications Check,LAB-DC1] A recent replication attempt failed:
From ACME-DC2 to LAB-DC1
Naming Context: DC=ACME,DC=local
The replication generated an error (1256):
The remote system is not available. For information about network troubleshooting, see Windows Help.
The failure occurred at 2022-01-14 13:54:35.
The last success occurred at 2019-05-03 20:16:26.
39498 failures have occurred since the last success.
REPLICATION LATENCY WARNING
ERROR: Expected notification link is missing.
Source ACME-DC2
Replication of new changes along this path will be delayed.
C:\Users\administrator.ACME> repadmin /replicate LAB-DC1 ACME-DC2 "DC=ForestDnsZones,DC=ACME,DC=local"
DsReplicaSync() failed with status 5 (0x5):
Access is denied.
dcdiag reports:
......................... ACME-DC2 passed test Services
Starting test: SystemLog
A warning event occurred. EventID: 0x8000001C
Time Generated: 01/22/2022 18:09:57
Event String:
When generating a cross realm referral from domain LAB.ACME.L
verify the ticket. The ticket key version in the request was 15 and the avai
this error is a delay in replicating the keys. In order to remove this probl
of keys to occur.
......................... ACME-DC2 passed test SystemLog
Starting test: VerifyReferences
......................... ACME-DC2 passed test VerifyReferences
I've tried various things to fix it. The break was definitely longer than 180 days so I believe the LAB-DC has been tombstoned or vise-versa. I had seen some errors about this before. In an attempt to fix this, I set the "Strict Replication Consistency" to 0 in the Registry and re-ran the replication. This appears to have fixed some errors but I can't get around the "ACCESS DENIED".
I also had various references to ACME-DC1 (a dead DC from before) in the DNS data on the LAB-DC1. I went through and renamed all those to ACME-DC2.
Any idea how to fix this Access Denied error?
There is a second issue where the LAB domain has the old ACME-DC1 as the schema master and domain naming server. That DC existed in Sites and Services. I tried to delete it there but couldn't. I had to use ADSIEdit to delete it from the CN=Configuration area.
C:\Users\administrator.ACME>netdom query fsmo
Schema master *** Warning: role owner is a deleted DC: CN=NTDS Settings\0ADEL:a1047a26-7404-43b1-8a6e-f260c2a73d14,CN=ACME-DC1\0ADEL:e307b4b3-3a49-4c03-a93f-9c0c8e
b45c10,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=ACME,DC=local
Domain naming master *** Warning: role owner is a deleted DC: CN=NTDS Settings\0ADEL:a1047a26-7404-43b1-8a6e-f260c2a73d14,CN=ACME-DC1\0ADEL:e307b4b3-3a49-4c03-a93f-9c0c8e
b45c10,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=ACME,DC=local
PDC lab-dc1.lab.ACME.local
RID pool manager lab-dc1.lab.ACME.local
Infrastructure master lab-dc1.lab.ACME.local
I tried to correct this by using ADSIEdit and altering the fsmoRoleOwner (??) attribute and got WILL_NOT_PERFORM. I'm a little hesitant to try and seize a role.
I tried to also add a new domain controller to the LAB domain but it also errored out because of the invalid reference to ACME-DC1.
It's not the end of world if I have to trash the LAB domain but I'm hoping that I don't.
I also tried fixfsmo.ps1. It identified the errors and said it fixed them. When I try netdom query fsmo though, still shows the old entries.
Edit
I was able to fix this. I had several more problems.
The sysvol share on ACME-DC3 was missing. I was able to extend the maxofflinetimeindays days to past the error shown in the DFS replication event viewer and restart the DFS replication service. SYSVOL and NETLOGON re-appears on ACME-DC3. I reset the maxofflinetimeindays back to 60 afterwards.
I also found some more anomalous entries in my DNS from before I re-IPed the DCs. Fixed all of the them.
I manually added a NTDS connection between LAB-DC1 and NETOPIA-DC3. Not sure if this helped, but dcdiag was complaining that it was missing.
The secure channel between ACME-DC2 and ACME-DC3 was dead. I tried using netdom to repair but kept getting errors. I was finally able to use Active Directory Domains and Trusts to repair the trust relationship. This was a big step in fixing things. I had tried previously using PowerShell and netdom or nltest to repair the secure channel and couldn't. I finally just used this tool from each of the 3 DCs and both directions (child to parent and parent to child) and selected to validate the connection and repair. Eventually I got it to work.