One calm April day, our internal ops application started showing problematic responses in the form of 504s. A HTTP 504 error is a server side gateway timeout and can be hard to diagnose. However, this usually has to do with the load balancers sitting in front of your application and their communication upstream.
Ultimately the fix for came in the form of ensuring the application keep-alive timeout is the same or greater than that of the load balancers sitting in front. If your application closes a connection to the load balancer, that balancer will pass a 504 upstream. In our case, this took modifying an option in the Hapi module in Node.
Some Resources to help explain this:
- https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html#ts-elb-errorcodes-http504
- https://aws.amazon.com/premiumsupport/knowledge-center/504-error-classic/
- https://support.cloudflare.com/hc/en-us/articles/218378978-What-should-I-do-after-seeing-a-502-or-504-gateway-error-on-my-site-