Here is what I consider a quite generic situation for which I'm not sure what the best solution is.
Consider you have a k8s workload where pods need 10-30seconds to be Ready.
If at some point you get a load spike that start to crash your pods for some reason (OOMKills, threadpool overload making the probe unresponsive, whatever).
Even though you have an HPA configured, the traffic might only go up from client retries, and eventually, all you pods will crash as soon as they get Ready, because the service sends an important portion of not all of the requests to a single pods, all of the others being in the process of restarting.
EDIT : at this point, I assume the pods have correctly defined Liveness and Readiness probes configured. But if the ingress traffic requires at least N pods and the amount of ready probes is always < N because they crash as soon as they receive the traffic because it's toomuch, what do you do ?
Beside asking all clients to have a circuit breaker / exponential backoff set up on their side, is there a way to ask Kubernetes to stop sending traffic to your deployment until there's "enough" pods ready ? (Enough being either a static number or dynamic depending on the ingress traffic)
Today our solution is to have a circuit breaker on our side and manually stop the traffic until the workload is healthy enough and we manually turn the traffic back on, but I'm wondering if there could be a better way to respond automatically to that situation, when you are unable to prevent it from happening.
Thanks