I wonder what would happen if, say, we use an HAProxy Load Balancer with persistence (sticky sessions) and a server goes down (or we want to scale down).
According to their docs:
When doing persistence, if a server goes down, then HAProxy will redispatch the user to an other server.
My problem is with the definition of "down" here. If we're running this on Kubernetes, does "down" mean that the pod is not responsive anymore or simply not healthy (as in failing on the liveness probe), or does it mean that the pod is not running anymore and a new one was created in its place?
I'm asking this because I'm trying to figure out the best way to maintain in-order delivery in an efficient way, so consider this example:
- HTTP client hits the Load Balancer with an HTTP request (R1) and has no session
- Load balancer uses an algo to determine the best server (S1), sends R1 over to S1 and it adds the cookie when it gets the response back from the server (S1)
- Following requests have the cookie so the LB knows already which server should get the request
- Now the client sends another request (R2) but the server (S1) doesn't respond in time (but the pod is still running and trying to write that request into Kafka as a message)
- Since the client didn't get a response in time, it is going to retry R2 again but the LB saw that S1 was not responsive so it's going to pick a new server this time (S2)
- The request R2 now succeeds because S2 was fine and dandy
- The client sends R3 and it succeeds because again S2 is fine
- In the meantime now S1 has recovered and before terminating it manages to write R2 into Kafka messing up with the order of the messages (because now we got R1, R2, R3, R2)
Hence my questions above. Is there anything, even if not HAProxy, that can achieve this? Perhaps a Kubernetes Load Balancer? Or am I going to need to write a custom Load Balancer that will have to keep track of pods and check whether they are simply not responding or completely gone?
FYI I'm totally fine sacrificing availability here for the sake of consistency (CAP theorem) and refuse a request. I'm also trying to avoid using a fixed number of servers since it would be ideal to have them scale up and down according to traffic. The main thing would be to NOT route a request if the server that was previously serving that client is still dangling somewhere to avoid messing up with the order of the messages.
Any ideas would be welcome.