Chkservd notifications

Elizabeta

August 16, 2018 09:04

Hello, I have restarted my dns -only server and now I "ve got some messages: Service Name dnsadmin Service Status failed Notification The service "dnsadmin" appears to be down. Service Check Method The system"s command to check or to restart this service failed. Number of Restart Attempts 1 Service Check Raw Output (XID pteh4m) The "dnsadmin" service is down. The subprocess "/usr/local/cpanel/scripts/restartsrv_dnsadmin" reported error number 255 when it ended. Startup Log Aug 16 13:53:39 dns2.example.com systemd[1]: Starting cPanel DNS admin service... Aug 16 13:53:39 dns2.example.com restartsrv_dnsadmin[26916]: Starting PID 26920: dnsadmin-dormant Aug 16 13:53:39 dns2.example.com systemd[1]: Started cPanel DNS admin service. Aug 16 14:40:43 dns2.example.com systemd[1]: Stopping cPanel DNS admin service... Aug 16 14:40:44 dns2.example.com systemd[1]: Stopped cPanel DNS admin service. Next message: Service Name dnsadmin Service Status failed Notification The service "dnsadmin" appears to be down. Service Check Method The system"s command to check or to restart this service failed. Number of Restart Attempts 1 Service Check Raw Output (XID pteh4m) The "dnsadmin" service is down. The subprocess "/usr/local/cpanel/scripts/restartsrv_dnsadmin" reported error number 255 when it ended. Startup Log Aug 16 13:53:39 dns2.example.com systemd[1]: Starting cPanel DNS admin service... Aug 16 13:53:39 dns2.example.com restartsrv_dnsadmin[26916]: Starting PID 26920: dnsadmin-dormant Aug 16 13:53:39 dns2.example.com systemd[1]: Started cPanel DNS admin service. Aug 16 14:40:43 dns2.example.com systemd[1]: Stopping cPanel DNS admin service... Aug 16 14:40:44 dns2.example.com systemd[1]: Stopped cPanel DNS admin service. And so other message for services crond, rsyslogd, sshd, nameserver, lmtp, mysql... But in ps aux I see that processes are up. What is dnsadmin dormant mode? Best regards, Elizabeta

Comments

13 comments

Elizabeta

August 16, 2018 16:43
Hello, Ok I understand what means dormant But, I don"t understand why I don"t receive messages that services are operational from cPanel Monitoring ? Best regards, Elizabeta
0
Elizabeta

August 16, 2018 16:55
Hello, One more information. On the same machine I have manually stopped exim and I"ve got message from cPanel Monitoring that service is down. After two minutes service exim was recovered and I"ve got message that service is operational. It seems ok now. Or? What happened before? Thank you! Best regards, Elizabeta
0
Elizabeta

August 16, 2018 18:26
Hello, Now I have restarted cpanel (after upgrade kernel) and cca one hour after restart machine I have got a many messages from cPanel that services lmtp, mysql, exim, crond, ftpd was down. But, I see in command ps aux that all services are up. Also, I have not received messages from cPanel Monitoring that services are now operational. Why?? Best regards, Elizabeta
0
Elizabeta

August 17, 2018 12:41
Hello, After many my messages, sorry, I have looked in detail logs after restart machine (18:57) 2018-08-16 18:57:06 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa6-0000Lq-Er 2018-08-16 18:57:06 1fqLa6-0000Lq-Er == net@example.com R=lookuphost defer (-36): host lookup for mail.net.ba did not complete (DNS timeout?) 2018-08-16 18:58:32 SMTP connection from [127.0.0.1]:35638 (TCP/IP connection count = 1) 2018-08-16 18:58:32 SMTP connection from (localhost) [127.0.0.1]:35638 closed by QUIT 2018-08-16 19:03:26 SMTP connection from [127.0.0.1]:35672 (TCP/IP connection count = 1) 2018-08-16 19:03:26 SMTP connection from (localhost) [127.0.0.1]:35672 closed by QUIT 2018-08-16 19:08:28 SMTP connection from [127.0.0.1]:35700 (TCP/IP connection count = 1) 2018-08-16 19:08:28 SMTP connection from (localhost) [127.0.0.1]:35700 closed by QUIT 2018-08-16 19:13:29 SMTP connection from [127.0.0.1]:35744 (TCP/IP connection count = 1) 2018-08-16 19:13:29 SMTP connection from (localhost) [127.0.0.1]:35744 closed by QUIT 2018-08-16 19:18:30 SMTP connection from [127.0.0.1]:35774 (TCP/IP connection count = 1) 2018-08-16 19:18:30 SMTP connection from (localhost) [127.0.0.1]:35774 closed by QUIT 2018-08-16 19:23:32 SMTP connection from [127.0.0.1]:35802 (TCP/IP connection count = 1) 2018-08-16 19:23:32 SMTP connection from (localhost) [127.0.0.1]:35802 closed by QUIT 2018-08-16 19:28:32 SMTP connection from [127.0.0.1]:35834 (TCP/IP connection count = 1) 2018-08-16 19:28:32 SMTP connection from (localhost) [127.0.0.1]:35834 closed by QUIT 2018-08-16 19:33:33 SMTP connection from [127.0.0.1]:35876 (TCP/IP connection count = 1) 2018-08-16 19:33:33 SMTP connection from (localhost) [127.0.0.1]:35876 closed by QUIT 2018-08-16 19:38:34 SMTP connection from [127.0.0.1]:35914 (TCP/IP connection count = 1) 2018-08-16 19:38:34 SMTP connection from (localhost) [127.0.0.1]:35914 closed by QUIT 2018-08-16 19:42:18 SMTP connection from [139.162.109.245]:41050 (TCP/IP connection count = 1) 2018-08-16 19:42:22 SMTP connection from scan-7.security.ipip.net [139.162.109.245]:41050 lost D=3s 2018-08-16 19:43:35 SMTP connection from [127.0.0.1]:35942 (TCP/IP connection count = 1) 2018-08-16 19:43:35 SMTP connection from (localhost) [127.0.0.1]:35942 closed by QUIT 2018-08-16 19:48:36 SMTP connection from [127.0.0.1]:35974 (TCP/IP connection count = 1) 2018-08-16 19:48:36 SMTP connection from (localhost) [127.0.0.1]:35974 closed by QUIT 2018-08-16 19:53:38 SMTP connection from [127.0.0.1]:36012 (TCP/IP connection count = 1) 2018-08-16 19:53:38 SMTP connection from (localhost) [127.0.0.1]:36012 closed by QUIT 2018-08-16 19:56:59 cwd=/var/spool/exim 2 args: /usr/sbin/exim -qG 2018-08-16 19:56:59 Start queue run: pid=17224 2018-08-16 19:56:59 1fqLa6-0000Lg-4W => net@example.com R=lookuphost T=remote_smtp H=mail.net.ba [212.20.31.50] C="250 ok: Message 2642517 accepted"
Ok, I understand, cPanel Monitoring have attempted send mail 18:57, it was not ok, and cPanel Monitoring succesufully sent mail after one hour that services was failed before. I would like to know where is put this option in exim? (time for retry failed messages from queue) But, the main question is why cpanel Monitoring did not sent messages that services are operational?? Thank you! Best regards, Elizabeta
0
cPanelMichael

August 17, 2018 17:00
Hello @Elizabeta, You can read more information about why this happens on the following thread:
But, the main question is why cpanel Monitoring did not sent messages that services are operational??

Can you verify if the following option is enabled under the System tab in WHM >> Tweak Settings? The option to enable or disable ChkServd recovery notifications Per it's description: Disabling this option will suppress notification of service recovery from ChkServd. Thank you.
0
Elizabeta

August 18, 2018 14:16
Hello Michael, I have read everything on this link System tab in WHM >> Tweak Settings: ChkServd TCP check failure threshold After Graceful restart of cPanel, cPanel Monitoring did not send messages that services are down. Last time, when was restart of cPanel, cPanel Monitoring sent many messages. In first moment, he could not sent, but just after one hour retry send messages. Where I can put in exim that messages from queue which can not be delivered immediately, try retry not after hour time, but before? Thank you! Best regards, Elizabeta
0
Elizabeta

August 20, 2018 08:40
Hello, In my exim.conf is this for retry configuration * * F,2h,15m; G,16h,1h,1.5; F,4d,8h . How can I via WHM change this settings? I would like to put option, when message can not be accepted that retry several times (i.e. 15 min) in first 2 hours, then rarely. Now is situation that message can not be accepted, and retry after hour. Thanks in advance, Elizabeta
0
Elizabeta

August 20, 2018 12:15
Hello, in logs : 2018-08-16 18:57:04 1fqLa4-0000Km-Ik == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:04 1fqLa4-0000Kr-Tk <= root@cpanel.example.com U=root P=local S=48472 id=1534438624.F2bGTFog5wbqNEKQ@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: lmtp (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:05 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa4-0000Kr-Tk 2018-08-16 18:57:05 1fqLa4-0000Kr-Tk == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:05 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:05 1fqLa5-0000Kw-2p <= root@cpanel.example.com U=root P=local S=48396 id=1534438625.dL_KTGJSs52k1Hvl@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: imap (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:05 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa5-0000Kw-2p 2018-08-16 18:57:05 1fqLa5-0000Kw-2p == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:05 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:05 1fqLa5-0000LB-9X <= root@cpanel.example.com U=root P=local S=49044 id=1534438625.N8sr6EoFwo1rg_Pq@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: ftpd (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:05 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa5-0000LB-9X 2018-08-16 18:57:05 1fqLa5-0000LB-9X == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:05 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:05 1fqLa5-0000LQ-GH <= root@cpanel.example.com U=root P=local S=50792 id=1534438625.z_pd7WRo3qCJgxzS@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: exim (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:05 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa5-0000LQ-GH 2018-08-16 18:57:05 1fqLa5-0000LQ-GH == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:05 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:05 1fqLa5-0000LW-Kw <= root@cpanel.example.com U=root P=local S=49408 id=1534438625.8_Ky88Ejlbuiooog@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: dnsadmin (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:05 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa5-0000LW-Kw 2018-08-16 18:57:05 1fqLa5-0000LW-Kw == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:05 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:05 1fqLa5-0000Lb-Sm <= root@cpanel.example.com U=root P=local S=51082 id=1534438625.V6Bp7MrybxhaPhwM@cpanel.wh.net.ba T="[cpanel.example.com] FAILED \342\233\224: crond (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:05 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa5-0000Lb-Sm 2018-08-16 18:57:06 1fqLa5-0000Lb-Sm == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:06 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:06 1fqLa6-0000Lg-4W <= root@cpanel.example.com U=root P=local S=48053 id=1534438626.FGSsHo_KixQnzFNV@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: cpanellogd (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:06 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa6-0000Lg-4W 2018-08-16 18:57:06 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:06 1fqLa6-0000Lg-4W == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:06 1fqLa6-0000Ll-AA <= root@cpanel.example.com U=root P=local S=50225 id=1534438626.hATbzqYZlpvDy9t6@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: cpanel-dovecot-solr (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:06 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa6-0000Ll-AA 2018-08-16 18:57:06 cwd=/ 3 args: /usr/sbin/sendmail -odb -ti 2018-08-16 18:57:06 1fqLa6-0000Ll-AA == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 18:57:06 1fqLa6-0000Lq-Er <= root@cpanel.example.com U=root P=local S=47826 id=1534438626.FNwHQA46KpOhXwh5@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: clamd (212.20.31.57)" for someusr@example.com 2018-08-16 18:57:06 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa6-0000Lq-Er 2018-08-16 18:57:06 1fqLa6-0000Lq-Er == someusr@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 19:56:59 1fqLa6-0000Lg-4W => someusr@example.com R=lookuphost T=remote_smtp H=mail.example.com [212.20.31.50] C="250 ok: Message 2642517 accepted" 2018-08-16 19:56:59 1fqLa6-0000Lg-4W Completed
18:56 is time restart of machine Exim just try to send mail at 18:57 9 time and succesufull sent mail after hour. But in my exim.conf is this for retry configuration * * F,2h,15m; G,16h,1h,1.5; F,4d,8h It specifies # retries every 15 minutes for 2 hours Why it is not so? Best regards, Elizabeta
0
cPanelMichael

August 21, 2018 13:48
Hello @Elizabeta,
Where I can put in exim that messages from queue which can not be delivered immediately, try retry not after hour time, but before?

Exim just try to send mail at 18:57 9 time and succesufull sent mail after hour. But in my exim.conf is this for retry configuration * * F,2h,15m; G,16h,1h,1.5; F,4d,8h It specifies # retries every 15 minutes for 2 hours

Can you run the following command and let us know the output so we can confirm when the first delivery attempt for message ID 1fqLa6-0000Lg-4W was initiated? exigrep 1fqLa6-0000Lg-4W /var/log/exim_mainlog
Ensure to post the output in CODE tags and remove any identifying information. Thank you.
0
Elizabeta

August 22, 2018 07:02
Hello Michael, Thank you for your answer. The output of command is: [root@cpanel log]# exigrep 1fqLa6-0000Lg-4W /var/log/exim_mainlog 2018-08-16 18:57:06 cwd=/var/spool/exim 3 args: /usr/sbin/exim -Mc 1fqLa6-0000Lg-4W 2018-08-16 18:57:06 1fqLa6-0000Lg-4W <= root@cpanel.example.com U=root P=local S=48053 id=1534438626.FGSsHo_KixQnzFNV@cpanel.example.com T="[cpanel.example.com] FAILED \342\233\224: cpanellogd (x.x.x.x)" for net@example.com 2018-08-16 18:57:06 1fqLa6-0000Lg-4W == net@example.com R=lookuphost defer (-36): host lookup for mail.example.com did not complete (DNS timeout?) 2018-08-16 19:56:59 1fqLa6-0000Lg-4W => net@example.com R=lookuphost T=remote_smtp H=mail.net.ba [x.x.x.x] C="250 ok: Message 2642517 accepted" 2018-08-16 19:56:59 1fqLa6-0000Lg-4W Completed
Best regards, Elizabeta
0
cPanelMichael

August 22, 2018 14:10
Hello @Elizabeta, Can you open a
0
Elizabeta

August 23, 2018 08:01
Hello Michael, We opened a support ticket. Ticket number is 10168025 . Could you clooser look this? Thank you very much! Best regards, Elizabeta
0
cPanelMichael

August 27, 2018 14:47
Hello, To update, here's part of the response in the support ticket regarding this issue: [QUOTE] After researching even further, the cause of this behavior is Exim's handling of retry attempts and 'queue runs'. When a deferred message is added to Exim's retry database, it is provided a timestamp for when it is safe to re-attempt delivery. However, the next delivery attempt is handled when Exim performs a routine queue run, which is when Exim attempts to process any undelivered messages still sitting in its queue. By default, this queue run is set to process once per hour via the '-q1h' flag: ===== # ps aux | grep [e]xim mailnull 5035 0.0 0.0 77508 3976 ? Ss Aug24 0:00 /usr/sbin/exim -bd -q1h -oP /var/spool/exim/exim-daemon.pid ===== As such, while a message may be marked as safe to re-attempt delivery after 15 or 30 minutes, the message will not be processed until Exim's next queue run is processed. For additional insight, you can also read Exim's official documentation on its handling of retry rules here: 32. Retry configuration Here is a quote from this page that summarizes the functionality: "If such a delivery suffers a temporary failure, the retry data is updated as normal, and subsequent delivery attempts from queue runs occur only when the retry time for the local address is reached." This behavior resulted in some inaccuracies during my testing, as Exim seems to perform a new queue run immediately upon starting. When observing this behavior, and observing the hour-long period of no activity in the server's Exim log during the time of this issue, I theorized that Exim was 'online' but 'unresponsive' due to a server issue that was occurring outside of the Exim service. However, with this new understanding in mind, we can now re-observe this hour delay being part of the default Exim behavior. In short, the earlier messages were added to the retry database with a retry time of 15 minutes at 18:57, but the messages were not processed until 19:56:59 due to the scheduling of Exim's queue runs at this time.
Thank you.
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?