On and off requests take very long on my system

June 30, 2020

I have and issue in my AWS system. Every few requests takes almost exactly 1 minute and 30 seconds to answer. When I say a few I mean 5 to 25 or so. Normally if you cancel the slow request and send again it just answers fast. I also noticed this happens with ANY request, not only specific ones. The servers and back-end do not look overloaded. the system is as follows:

ALB with sticky sessions | 2 Web servers | DB on RDS

The system when using curl most times responds fine, but when it takes long, this is the response output:

    time_namelookup:  0.004136
       time_connect:  130.117558
    time_appconnect:  130.125254
   time_pretransfer:  130.125340
      time_redirect:  0.000000
 time_starttransfer:  130.172553
         time_total:  130.172615

Aside from the time_connect, the request is fine in the sense that the page loads after that. normal response time of the system is under 0.5 seconds.

I was reading about this and the docs indicate time_connect, is related to

"time_connect is the TCP three-way handshake from the client’s perspective. It ends just after the client sends the ACK - it doesn't include the time taken for that ACK to reach the server. It should be close to the round-trip time (RTT) to the server. In this example, RTT looks to be about 200 ms."

This was taken from here.

I can not find anything meaningful on AWS Cloudwatch, the app logs or the DB monitoring. Any ideas about what I should look into or how to troubleshoot this issue?

