All cron jobs randomly firing a bunch of times in parallel?

Benjamin D.

March 22, 2024 20:35
Edited

Over the past month or two, I've been noticing at random (like once a week or two?) ALL cron jobs on the server firing like 8 to 16 (!?) times all in parallel during the same exact second. This often brings the server to an unresponsive, DoS like state because it uses all the resources and I have to try and manually kill all those running cron job tasks manually otherwise hosted websites no longer respond. The cron jobs all have a log to monitor all the times they start and end and they always start once a minute, last 1 second and then end, very precisely, except over the last 10 days, it happened once where all the cron jobs on the server fired 10 times in parallel all during the same second! This is crazy.

I don't understand why it's randomly doing this. Those are the same cron jobs we had before on the previous server and we never experienced that sort of thing.

Is there anything new under WHM 116 that could enable that weird and dangerous bug? Can somebody take a look at what could make the crontab entries run multiple times in parallel? Is this some sort of race condition bug in the OS and/or in WHM? I'm under AlmaLinux v8.9.0 STANDARD

Comments

16 comments

Benjamin D.

March 22, 2024 20:28
0
cPRex Jurassic Moderator

March 22, 2024 21:49
Hey there! I can't say I've heard of such a thing, and I also don't have any other reports of similar behavior that I could find.
0
Benjamin D.

March 22, 2024 21:56

Edited
It almost feels like some sort of race condition triggers all CPU cores to fire the cron jobs in parallel at the exact same time or something... seemingly at random, but I can list all the date/times where this occurred and I've seen it happen live so I know this is not just written in the logs, it truly brings the server to its knees and I very much have to manually kill all the processes when this happens.
0
Benjamin D.

March 23, 2024 13:01

Edited
OH BOY. WTH IS THIS? Is this normal? And how did this happen?! EDIT: I read online that crond forks itself to the amount of scheduled tasks to run. Is this still the case in AlmaLinux 8.9? Am I panicking for nothing here or is this abnormal? I ran the command again after some time and now there are only 2 crond processes, so I think it's normal that the amount of crond processes varies throughout the day, depending on scheduled cron jobs. https://support.cpanel.net/hc/en-us/community/posts/19634192633879-Multiple-crond-instances
0
cPRex Jurassic Moderator

March 25, 2024 15:53
I would expect the number of processes to vary throughout the day - I wouldn't expect them to be firing the crons multiple times.
0
Benjamin D.

March 25, 2024 16:22

Edited
Yeah me neither and I've never seen that happen under CentOS 7 last year, before I upgrade to a new server under AlmaLinux 8 and WHM 116.
0
Benjamin D.

April 04, 2024 15:23

Edited
It just happened again. 133 cron jobs (same 8 or 9 crons multiplied by a bunch of parallel instances) of all kinds across the entire server all fired up at the same time, using most of Apache's slots, since they're basically just calling an HTTP request each, denying normal web browser traffic for a few minutes until I managed to kill the processes. I almost couldn't sign in to WHM to do so. I'm now wondering if it would be a bug caused by when WHM auto-updates itself?

For future reference, this happened under WHM 118.0.4, let's see if the next time it happens is under a newer version or not.
0
Benjamin D.

April 22, 2024 16:49
Will you look at that? It just happened again and now WHM is under 118.0.6.
0
Benjamin D.

April 29, 2024 14:01

Edited
It happened again and now WHM is under 118.0.8. It's definitely tied to WHM self updating but I'm not quite sure how it's even related to it. All I know is that it seems quite clear that it happens once after WHM version changes. Could it be tied to date/time sync or adjustments done after WHM updates? I'm really desperate to find a fix to this as it drives customers away from my server because it's "unstable". If I'm not sitting in front of WHM to catch it do it, then the server basically does a DoS.
0
cPRex Jurassic Moderator

April 29, 2024 13:57
And it will likely keep happening as this isn't related to cPanel.
0
Benjamin D.

April 30, 2024 12:10

Edited
I'm trying to understand why it's suddenly been doing that starting immediately after getting this new server earlier this winter. Those crons have been doing fine for YEARS on the old server, before we were forced to move to AlmaLinux and if you are 100% sure there's nothing in WHM that could cause this, then perhaps it comes from AlmaLinux itself? It's just so weird that it always happens once after WHM self updates and then it's fine for a week or two until WHM updates again. The cron runs every hour (or minute, depending on the cron) and there are a lot of hours/minutes in between WHM updates where things could go wrong and they don't. It's so weird and almost predictable (but not fully predictable as it does not seem to happen *immediately* after a WHM update but it will happen once some time after a WHM update).

Here are the only differences I can think of between YEARS of it working perfectly fine (zero issue) and now:
- AlmaLinux 8 instead of CentOS 7
- PHP-FPM running 8.2 instead of suphp 8.2
- Newer WHM version
So, if it doesn't come from WHM, then it leaves only 2 possibilities that I see that differ from the old server:

1) The AlmaLinux OS, which I highly doubt the issue comes from since cron is a very basic thing in an OS and there would have been thousands of AlmaLinux users complaining and the bug would have been fixed a long time ago...

or 2) PHP-FPM... Now, I know PHP-FPM does something weird whenever you start or restart it. It seemingly pre-executes the website index file in order to (allegedly, not sure) compile/cache the HTML page that comes out of it. I don't recall any other PHP handler doing this, but to PHP-FPM's credit, it's faster than any other PHP handler, so it has got to cache responses or at least parts of its PHP code results in some way. I've seen PHP-FPM "reload" all the websites whenever I tweak ANYTHING under MultiPHP settings in WHM (e.g. the "Max Requests" value). Most of them are almost instantaneous to "reload", but some of them take 3-4 seconds. Anyway, I'm now wondering if PHP-FPM could cause all the crons to fire up at the same time to "reload" (cache) the PHP scripts that are tied to them. But if it were the case, the thing is that specifically for today, nobody changed anything in MultiPHP settings at all, so unless WHM reloads the PHP-FPM pool or service, then I don't see how this could occur automatically and why it seemingly only does it once after any WHM self update.

For future reference: I disabled PHP-FPM on all accounts that had cron jobs and we're now waiting the next WHM self update to see if it resolves the issue or not.
0
Benjamin D.

May 14, 2024 13:01
Alright, I just caught WHM updating to 120.0.5 and so far, it's stable and crons have not fired up in parallel. This makes me believe the whole thing was caused by PHP-FPM "reloading" all the crons after each WHM update. There still was a good 10 seconds of unresponsiveness across the whole server immediately following the WHM update though. This is why I hate WHM updates in the middle of a week day, like seriously, why not update at 3:00 AM on a Sunday or something like that instead of during work hours on a week day? Is there a way to force it to never update during working hours?
0
cPRex Jurassic Moderator

May 15, 2024 03:40
There sure is - https://support.cpanel.net/hc/en-us/articles/360053314013-How-can-I-change-when-cPanel-updates-run
0
Benjamin D.

May 15, 2024 14:49

Edited
Ah, yes. I made it so it only updates during a week end day! Thanks cPRex. I'm also pleased to report that nothing unusual occurred over the last 30-ish hours so I'm quite confident that this whole thing was caused by PHP-FPM re-launching duplicates of cron jobs that were tied to PHP scripts after every WHM update. This is a serious bug that should be addressed by PHP-FPM developers but for the time being, like I mentioned above, I disabled PHP-FPM on all the subdomains that run cron jobs and this seemingly fixed the issue!
0
cPRex Jurassic Moderator

May 15, 2024 14:48
Sure thing!
0
Benjamin D.

June 06, 2024 12:07
I'm pleased to report that once again, WHM updated to 120.0.9 and there was no surge in cron jobs and no DDoS like state since PHP-FPM was disabled on all accounts that have cron jobs.

We can close this issue.
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?