I recently upgraded our Tomcat server from 7.0.85 to 9.0.70. I am using Apache 2.4.
My Java application runs in a cluster, and it is expected that if the master node fails during a command, the secondary node will take the master role and finish the action.
I have a test that starts an action, performs a failover, and ensures that the secondary node completes the action.
The client sends the request and loops up to 8 times trying to get an answer from the server.
Before the upgrade, the client gets a read-timeout for the first 3/4 tries and then the secondary finishes the action, sends a 200 response, and the test passes. I can see in the Apache access log that the server is trying to send a 500 (internal error) response for the first tries, but I guess it takes too long and I get a read timeout before that.
After the upgrade, I am getting a read-timeout for the first try, but after that, the client receives the internal error response and stops trying. I can see that on the second try the Apache response is way faster than the first try and from the other tries (the 2,3,4 tries) before the upgrade.
I can see in the tcpdump that in the first try (both before and after the upgrade) the connection between the Apache and the Tomcat reaches the timeout. In the following tries the Tomcat sends the Apache a reset connection. The difference is, after the upgrade the Tomcat sends the reset connection immediately after the request, and before the upgrade, it takes a few seconds to send it.
My socket timeout is 20 seconds, the AJP timeout is 10 seconds (as it was before the upgrade). I am using the same configuration files as before the upgrade (except for some refactoring changes I had to do because of Tomcat changes). I tried changing the AJP timeout to 20 seconds, but it didn't help
Is this a configuration issue? Is there a way to “undo” this change?