tailwatchd down across multiple servers
so i upgraded to cpanel 11.60 on a bunch of servers and i keep getting messages that tailwatchd is down
the process seems to be running though
any ideas what's up? [root@msrv04 ~]# cat /usr/local/cpanel/version 11.60.0.15
[root@msrv04 ~]# tail -f /usr/local/cpanel/logs/tailwatchd_log
[5905] [2016-11-04 02:39:42 +0100] [Cpanel::TailWatch::Eximstats] Loading email sending limits from 1478221200 - 1478224800
[5905] [2016-11-04 03:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478224800
[5905] [2016-11-04 04:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478228400
[5905] [2016-11-04 05:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478232000
[5905] [2016-11-04 06:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478235600
[5905] [2016-11-04 07:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478239200
[5905] [2016-11-04 08:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478242800
[5905] [2016-11-04 09:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478246400
[5905] [2016-11-04 10:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478250000
[5905] [2016-11-04 11:00:00 +0100] [Cpanel::TailWatch::Eximstats] Resetting email limits to new starttime of 1478253600
[root@msrv04 ~]# /scripts/restartsrv_tailwatchd --status
(XID pvhjzz) The "tailwatchd" service is down.
the process seems to be running though
[root@msrv04 ~]# ps aux |grep tail
root 5905 0.0 0.4 106172 15868 ? S Nov01 0:06 tailwatchd
[root@msrv04 ~]# systemctl status tailwatchd.service
? tailwatchd.service - tailwatchd
Loaded: loaded (/etc/systemd/system/tailwatchd.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2016-11-04 11:05:31 CET; 10s ago
Process: 32236 ExecStart=/scripts/restartsrv_tailwatchd --no-verbose (code=exited, status=1/FAILURE)
Main PID: 16200 (code=exited, status=0/SUCCESS)
Nov 04 11:05:31 msrv04.example.nl systemd[1]: Starting tailwatchd...
Nov 04 11:05:31 msrv04.example.nl restartsrv_tailwatchd[32236]: tailwatchd is already running (tailwatchd) with PID 5905 by root
Nov 04 11:05:31 msrv04.example.nl systemd[1]: tailwatchd.service: control process exited, code=exited status=1
Nov 04 11:05:31 msrv04.example.nl systemd[1]: Failed to start tailwatchd.
Nov 04 11:05:31 msrv04.example.nl systemd[1]: Unit tailwatchd.service entered failed state.
Nov 04 11:05:31 msrv04.example.nl systemd[1]: tailwatchd.service failed.
[root@msrv04 ~]# /scripts/restartsrv_chkservd
Waiting for "tailwatchd" to start ""Job for tailwatchd.service failed because the control process exited with error code. See "systemctl status tailwatchd.service" and "journalctl -xe" for details.
"failed.
Service Error
(XID ttdda4) The "tailwatchd" service failed to start.
Startup Log
Nov 04 11:05:31 msrv04.storetech.nl systemd[1]: Starting tailwatchd...
Nov 04 11:05:31 msrv04.storetech.nl restartsrv_tailwatchd[32236]: tailwatchd is already running (tailwatchd) with PID 5905 by root
Nov 04 11:05:31 msrv04.storetech.nl systemd[1]: tailwatchd.service: control process exited, code=exited status=1
Nov 04 11:05:31 msrv04.storetech.nl systemd[1]: Failed to start tailwatchd.
Nov 04 11:05:31 msrv04.storetech.nl systemd[1]: Unit tailwatchd.service entered failed state.
Nov 04 11:05:31 msrv04.storetech.nl systemd[1]: tailwatchd.service failed.
tailwatchd has failed. Contact your system administrator if the service does not automagically recover.
any ideas what's up? [root@msrv04 ~]# cat /usr/local/cpanel/version 11.60.0.15
-
The docs may be of some use: TailWatch - cPanel Knowledge Base - cPanel Documentation See this post as well: Resetting email limits to new starttime 0 -
err. i fail to see how this helps. my servers are sending e-mail alerts that tailwatchd is down. the service clearly isn't. the service tries to get restarted - it errors out - i get e-mail about it. i am not getting e-mails about the exim thingie since from your thread that you linked, it shows it's just information. i am getting e-mails about a service that isn't down however. am i understanding this wrong? i kinda don't think so. i think it's related to this more: tailwatchd_log Problem with recent_authed_mail_ips Implemented case CPANEL-7723: Prevent multiple copies of tailwatchd from being started. Fixed case CPANEL-6436: Prevent tailwatchd from starting multiple processes. than to what you linked. 0 -
i just manually killed the process and did a /scripts/restartsrv_chkservd seems everything is ok after doing this. pretty weird to have this thing happening tho. i just looked closely and on another server i had 2 instances of tailwatchd running: root@jaws [~]# ps aux |grep tail root 411 0.0 0.8 402604 263800 ? S Oct12 11:20 tailwatchd root 5242 0.0 0.0 112652 972 pts/0 S+ 06:38 0:00 grep --color=auto tail root 24796 0.0 0.1 120564 39436 ? S Nov03 0:53 tailwatchd 0 -
Hello, The following cases were recently included with cPanel version 60 to help prevent these types of issues with tailwatchd: Fixed case CPANEL-9515: Tail-check: ensure tailwatchd is restarted using systemd. Fixed case CPANEL-9392: Harden tailwatchd's dupe process check. It looks like the process list you reported shows the duplicate tailwatchd process was started on October 12th, before the server was updated to include the resolutions in cPanel version 60. Could you let us know if you notice this issue on a server and the duplicate process is dated at a time after the system is updated to cPanel 60? Thank you. 0 -
Hello, The following cases were recently included with cPanel version 60 to help prevent these types of issues with tailwatchd: Fixed case CPANEL-9515: Tail-check: ensure tailwatchd is restarted using systemd. Fixed case CPANEL-9392: Harden tailwatchd's dupe process check. It looks like the process list you reported shows the duplicate tailwatchd process was started on October 12th, before the server was updated to include the resolutions in cPanel version 60. Could you let us know if you notice this issue on a server and the duplicate process is dated at a time after the system is updated to cPanel 60? Thank you.
i did tell you in my post that it's after 11.60 that i experienced this issue :/ ... - Removed -0 -
Hello, The resolutions are aimed to prevent the startup of duplicate tailwatchd processes, but they won't automatically kill any existing processes that were started before the system was updated to include those resolutions. Could you let us know if any systems where this is happening shows a duplicate/hanging tailwatchd process that was started after the update to version 60? Thank you. 0 -
well 2 of the boxes didn't have duplicate tailwatchd processes, only 1 of them. they stopped failing after i killed it. i don't think it's a massive issue though, because i have about 25 boxes and the others aren't alerting. now i know what to do anyway. thanks. 0 -
Could you open a support ticket using the link in my signature if you notice this happening again? We're happy to take a closer look to determine what went wrong. Thank you. 0
Please sign in to leave a comment.
Comments
8 comments