My app is consuming a third party api. One of the requirements of this api is that my app cannot send more than 20 requests per second. Because of the nature of this app, and because my user base is growing, we are now hitting that limit very frequently.
(One thing to note about my app: it consists of 4 separate background processes, running independently, and each of those 4 processes hit the third party api at different times, based on a number of conditions)
I've come up with 2 possible solutions to work around the rate limit, but both solutions seem flawed:
Implement a global cache (possibly redis) that tracks all outgoing requests to the third-party api. Every time each of my processes attempts a request, it first checks the cache. If there are less than 20 in the past second, then proceed. But if not, then wait a specified time and check again.
Implement a global cache, a queue, and a fifth process dedicated to handling web requests to this third party api. Each of my 4 processes places requests in the queue (instead of submitting the request directly). The fifth process checks the queue, checks the conditions (<20 requests in the past second), makes the web request, and puts the results back in the queue (Each one handled ONE at a time). Meanwhile, the other process (that placed the original request in the queue) polls the queue for the response. And once the response is there, it grabs it (and removes the item from the queue) and goes on its merry way.
My problem with #1 is accuracy. It's conceivable that all 4 processes check the cache simultaneously, and the current count is 19. All 4 processes get the green light and submit their requests simultaneously, bringing the count up to 23, and then my app gets locked out for going over the limit.
My problem with #2 is complexity. I feel that the accuracy would hold up, since that fifth process ensures that all requests are handled one at a time, so there is no chance of breaking the limit because of race conditions. However, it just seems fragile, and possibly overkill. I'm introducing a lot of moving parts, which means (in my experience) that a lot could go wrong, and it could be difficult to track down errors.
Are there any other solutions to this problem? Am I overthinking it? Would #1 or #2 work just fine?
If the service works by rejecting only requests that are above the limit, and not just killing the whole service for you, then you could implement similar algorithm to how TCP works.
Simply said, each client would start sending requests as fast as it can. When it's requests start getting rejected, it will slow down just enough for requests to not be rejected too much. Then, it would try to increase the rate at random intervals, probing if it is possible to increase the rate for itself.
Of course this won't make it possible to make more requests, but will make it possible for each client to have a bite of the pie.
Also, if you expect different clients to have similar requests, then some form of global caching is definitely going to improve the situation.