I am not sure I understand the scenario.
you mean what happens when you have a huge load of requests to your inference server and they're starting to queue?
If you can afford to work in a-sync pattern (i.e. you can consume requests from a queue and return results to a queue), then this is your back-pressure mechanism (i.e. the inference consumer consumes from queue).
This setup also allows you to batch requests and get a much higher throughput in inference esp. from neural networks on both cpu and gpu.
If your integration needs to be synchronous (request/response), then most servers can build a backlog of requests (some do it better than others), but they all end up losing requests if they are really overwhelmed by incoming traffic.
Hope this helps.