(I know 2013 is shortly out of support, and it will be replaced, but in the interim, I'm experiencing an issue I would like to fix.)
I will use telnet over port 25 as my example, but normal traffic also behaves the same way.
When I send a packet over port 25 to specific nodes in my Exchange cluster, it takes between 11 and 14 seconds before the banner is displayed with a 220 response. Since this affects emails too, it makes certain services rely on this communication time out.
Things I know:
- About 70% of the nodes experience this issue fairly consistently. And the other 30% seemingly never exhibit this issue. They were all set up around the same time, but someone before my time might have changed settings on some nodes and not others. However, they "look" identical to me.
- Tracing packets I can see the SMTP traffic arrive immediately on the target server. (No issues with network\latency, firewall, dns etc. everything flows smoothly)
- I see an immediate TCP Accept on MSExchangeFrontendTransport.exe
- After this packet has entered Windows and I've seen the TCP Accept, it takes the full 11-14 seconds before it shows up in any Exchange logs (SMTP Protocol Connector logs)
- It's unknown when the issue was introduced, but it was likely a very long time ago, there just wasn't anything impacted by the delay before.
So, it seems like something is holding the traffic captive, but having turned down\off everything I can find and dug down into every imaginable log and process monitor results etc. I can't see anything impacting it.
Could this be internal to Exchange, or internal to a native Windows system process? And if there are no direct suggested solutions, where would you go from here to investigate this? I'm not entirely sure what would be the best way to proceed in the immediate future.
Any help will be greatly appreciated!