After timed out PHP script, all subsequent requests to subdomain fail as 504 and log shows 404?
Hi, it took a few months to pinpoint what was going on and I'm not even sure I fully understand what is happening but it's really annoying and that issue is new-ish... It didn't do that a couple updates back, like 3-4 months ago. I will try to explain it in my own words. It's probably not exactly right, sorry in advance.
Imagine one day, on a given minute there's a high load on my server. During that time, on subdomain bbb.aaa.com, if I request a very hard to compute PHP script that unfortunately times out before the first byte of DATA is sent back to the web browser, then that web browser will fail with a 504 Gateway Timeout response... and this is expected. But (and this is the part that I need help with) my WHM server will seemingly then rupture some kind of bridge between the subdomain bbb.aaa.com worker (or PHP pool? or something else?) and the Internet.
From this point on, all the web browsers that request anything on the bbb.aaa.com subdomain will just time out with a 504 even if the server load went down and it should not time out at all... And no matter how many times I refresh my web browser tab on bbb.aaa.com or even request other PHP scripts from bbb.aaa.com, it will just (wrongly) time out with a 504 Gateway Timeout... even if I disable cache in developer tools, SHIFT-CTRL-R, and so on. I tried accessing bbb.aaa.com's index page (a simple HTML page with no computation) from my cell phone over LTE as well and it only 504'ed on me.
No, I am not blacklisted in cPHulk or in the firewall. At this point, I can still request anything over other subdomains from the same computer, same web browser and I can even load anything on aaa.com (the timing out subdomain's parent domain) and it loads just fine... But requesting anything on bbb.aaa.com only fails after 15 seconds with a 504 error.
During the 15 seconds before it slaps a 504 error in my developer tools console, if I quickly go and list Apache status in WHM, I can see my browser on a line with an Apache worker in "W" state for the requested script. Immediately when my browser tab times out with a 504, the Apache status switches the worker to a "K" state and a few seconds later, it closes the connection and switches it to "_" (waiting for connection). So it's not like it doesn't connect to Apache. It's not even that it doesn't process anything. As the "W" state shows, it's trying to push DATA to the web browser, but something in the middle is totally blocking it and so both parties just time out and the connection is closed without the DATA seemingly having reached the web browser.
Two things fix the issue: 1) Wait it out (not exactly sure but maybe like 15 minutes) or 2) Restart Apache. Restarting Apache instantly fixes everything.
Apache domlogs weirdness: When I take a look at the logs after the fact, I can see many visitor IP addresses (including mine and the one from my cellphone's LTE) requesting all kinds of good locations on bbb.aaa.com during the time it was (wrongly) timing out and all the log entries show that it returned a 404 error (NOT FOUND) which is completely wrong. After restarting Apache, I can request any of those 404 locations and they work.
Edit: Come to think of it, the script initially might not even have timed out from server load. It wasn't even that high to begin with. I think that I just happened to request from a subdomain that just had this weird rupture issue occurring already.
Have you ever seen something like this where after 1 timed out script, Apache then marks all the requests made to that subdomain's files as 404 not found and leaves all the web browsers timing out with a 504?
I also want to add another detail that may or may not be related, but I think it is: Sometimes (not always) when this happens, the logs show that for some of the visiting IP addresses, the (wrongly) 404 requests are made to no domain at all. They appear in the main Apache access_log and all their requests are serving them the default cPanel page with its resources such as /img-sys/powered_by_cpanel.svg and /img-sys/server_misconfigured.png. It usually does not last very long, like perhaps a couple minutes and then those log entries stop and operations seemingly resume to normal. I've talked to some of those users and when these log entries occur, they complain that for a couple minutes, their website was inaccessible or would time out.
-
Hey hey! I don't have a good explanation for this one as I can't say I've seen anything like that before. If you have a way to reproduce this would be a good option for a ticket!
0 -
I can't reproduce it on demand. It's occurring randomly. I am wondering if it could be related to restarting the firewall which I do multiple times a day after adding IP addresses to block... or maybe when the PHP-FPM worker reaches X amount of requests served (as per the setting in MultiPHP Manager) then during the time it spawns a new worker, this would happen? I'm 99% sure it has nothing to do with any of those things, but I really don't have any way to reproduce this issue on demand. All I know is that it happens almost every day.
0
Please sign in to leave a comment.
Comments
2 comments