Half of shop's PC could not find their emails domain name until PowerDNS restart?
Alright, so something began to happen like 5 days ago, but given how most shops were closed for a few days this week, I've only been aware of this issue now (and managed to fix it in half an hour after much trial and error). I wanted your opinion on this:
Approximately half of their shop's PC (a mixture of Windows 10 and 11 all using Google's public DNS 8.8.8.8) could connect to their email domain (they could ping it) while the other half could not ping it... as if the domain name did not exist (it does and it works).
Things I have tried during the first half hour:
- Reboot a PC that did not resolve the domain name.
- Reboot the shop's Internet router.
- Set another DNS resolver in the shop's Internet router.
- Manually do ipconfig/flushdns on a PC that did not resolve the domain name.
- After all this, I had no other idea, so I restarted PowerDNS on the web server and boom, everything worked on all shop PC without even having to touch anything!
Why did it suddenly stop to accept new DNS queries from half of their shop PC? (I can also guarantee you that new customers still came and used their website, so I'm sure PowerDNS was still working for some of the new visitors, otherwise there would be no new visitors on their website)
Why would I ever have to restart PowerDNS manually like this? I've never seen that in 15 years using WHM.
EDIT: Oh, I almost forgot about this, but when I restart PowerDNS (I tried it a second time just to be sure that it wasn't just a one-time error) it complains about a few domain names that were terminated via the terminate account function in WHM earlier this year. Although none of them are listed under "Accounts List" and that their home dir doesn't exist anymore, for some reason PowerDNS complains it cannot find them... Why is PowerDNS still looking for them when it's restarted? This is what I see (a couple of them, one for each terminated account) :
Startup Log
Dec 27 08:59:14 whatever.com pdns_server[1319432]: [bindbackend] error at 2024-12-27 08:59:14 -0500 parsing 'whatever.org' from file '/var/named/whatever.org.db': Unable to open file '/var/named/whatever.org.db': No such file or directory: No such file or directory
-
Hey hey! I don't have a good explanation for the first part of the issue, as I would expect DNS issues to be more widespread than just for a certain office if there was an issue with the configuration.
For the second half of the issue you can try rebuilding the configuration with this command:
/scripts/rebuilddnsconfig
If that doesn't work, feel free to remove those lines directly from /etc/named.conf and then restart the DNS service to clear those older entries.
0 -
You mean this command?
/usr/local/cpanel/scripts/cleandns
0 -
Nope
0 -
/scripts/rebuilddnsconfig outputs almost the same thing as when I restart PowerDNS through the WHM interface, except written in different wording and it asks me to run /usr/local/cpanel/scripts/cleandns ...
!! /var/named/whatever.org.db does not exist, unable to locate.
!! Run /usr/local/cpanel/scripts/cleandns to remove zone without corresponding files.
!! Or locate the proper zone file and place in /var/named and rerun0 -
If that didn't clear the old zones from the configuration then it's fine to manually edit /etc/named.conf to remove them. I can't say *why* they didn't cleared properly when the domains were removed though.
0 -
While you were replying, I ran the command that your command told me to run and it fails to parse /etc/named.conf at line 424, error = /etc/named.conf:424: unknown option 'view'
Here's the relevant part of our /etc/named.conf file, it crashes at: view "external" {
[a ton of "zone" declarations and then...]
view "external" {
/* This view will contain zones you want to serve only to "external" clients
* that have addresses that are not on your directly attached LAN interface subnets:
*/
recursion no;
additional-from-cache no;
// you'd probably want to deny recursion to external clients, so you don't
// end up providing free DNS service to all takers// all views must contain the root hints zone:
zone "." IN {
type hint;
file "/var/named/named.ca";
};// These are your "authoritative" external zones, and would probably
// contain entries for just your web and mail servers:// BEGIN external zone entries
[back to a ton of other "zone" declarations...]
0 -
That's odd - that's one of the default lines in all systems, so I'm not sure why it would error out there. It might be best to make a ticket on this one.
0 -
Didn't you guys make some changes with how PowerDNS is packaged with WHM a few updates back or something like that? Might be related to these weird PowerDNS issues we're getting lately?
0 -
Not that I am aware of. PowerDNS has been the default DNS service on new installs since version 84.
0 -
No yes, I know that, but I'm referring to RE-746. Weird issues with PowerDNS started recently and I've upgraded the server like 2 weeks ago. It began to happen shortly after (that's the time I personally began to be aware of it, but it might have begun the day we upgraded)
0 -
I don't think RE-746 would be related to the issues you're seeing. I still think it would be best to create that ticket so the system can be examined.
0 -
It happened again 2 hours ago and I restarted the PowerDNS service and it immediately went back to normal operation. This is so weird. It never did that before this week. It's completely new to me and I've been using the same server for years.
I manually went in /etc/named.conf and cleaned it up by removing all the zones causing PowerDNS startup/restart errors (the old, deprecated zones causing "No such file or directory" errors... All of them were cPanel accounts that I terminated earlier this year).
I hope that this prevents it from... crashing? I don't know, it doesn't seem like it's crashed when this issue occurs, but it just like stops resolving/serving computers randomly. I guess we'll see over the course of the next 3 days as this is the second time this happens out of the blue during the same week and absolutely no WHM update or account change was done this week at all, so I'm sure if the issue is not resolved, it will occur again like in 2-3 days max.
EDIT: I just checked and it also coincides with the server's main domain and name server's AutoSSL 4 months SSL certificate renewal time, or at least it's very, very close to it. Could it be related?
0 -
Alright!
So after more hours invested in this, I realized that everything written above in this thread is not the issue.
The true issue is that my software firewall has begun to block many of Google's public DNS IP addresses because they were trying to resolve domain names on my WHM server through DNS over TLS which uses port TCP/853 and that port is not open on my server. Don't ask me why but I always used default options and port numbers for everything except SSH so DNS resolution has always been done via port TCP/53 and not TCP/853.
After hours spent grepping logs and then realizing this and then removing all the IP addresses from Google's DNS resolver from the firewall, everything seems to work normally.
Now the issue is that even though I've unblocked all those IP addresses, I think that they will come back in the firewall's deny list at some point, because every second or two, one of Google's DNS resolver IP addresses continues to try to connect to TCP/853 which is not opened on my WHM server.
1) Is DNS over TLS (port TCP/853) a thing in WHM and if so, why is it not enabled by default and how can I enable this?
2) I don't have a question #2.
Thanks.
0 -
I'm glad you found something even though it ended up being odd!
cPanel currently doesn't have support for DNS over TLS so I don't have a way to get that working. Would you like me to submit a feature request for this so our team can discuss adding that?
0 -
If it's going to be implemented in 2030, then don't bother. So as sysadmins, what are we supposed to do about those hundreds of incoming DNS queries on port TCP/853 from public DNS resolvers such as Google and CloudFlare every minute? Just ignore them?
0 -
Is that port opened on your system? I'd just block the port as that isn't something that's normally open on cPanel systems.
0 -
As I wrote: It's not. But it fills the logs with Firewall: *TCP_IN Blocked* DPT=853 and now unless I tweak my software firewall's configuration, it will eventually hate those IP addresses and re-list them in the deny.csf block list. It's not a hard problem to deal with now that I'm aware of it, but it's kind of annoying that we cannot just serve those hundreds of queries a minute. I'm pretty sure this slows down everything too, both on the visitor's end and on our server's end.
I guess Google and CloudFlare battle each other to be the most secure by resolving DNS over TLS now for all those tin foil hat users who think the government will go after them if they don't use a VPN to casually browse the web or something.
0 -
It might be best to just totally block those IPs then, if that's possible?
0 -
You want me to block Google's public DNS? Why would I do that?
0 -
If you want to get rid of the firewall spam you'll have to do something, but this isn't related to cPanel in any way.
0 -
Well, honestly, I was expecting that "something" to be instructions on how to enable DNS over TLS but looks like WHM is there yet. So I guess I'm stuck and I'll just make a rule to ignore port TCP/853 connections.
0
Please sign in to leave a comment.
Comments
21 comments