Single Account DNS Zone stops Responding
Hey all,
Needing the collective genius of this forum to put me out of my misery.
The Config/Context
3 x cPanel 110.0.15 Servers (CloudLinux 7.9) in a DNS Cluster (PowerDNS).
The customer's account in question was transferred from another cPanel Server a couple of years ago. It's a larger account with various sub-domains.
The Issue
Over the last ~9 months, there has been a periodic issue where the site stops responding. Pinging the domain or any sub-domains fail to resolve.
The issue ONLY affects this account and no others on the same Server or other Servers in the DNS Cluster.
It took me a minute to figure out that if I make any arbitrary change in the DNS Zone, the issue is immediately resolved, the domain name resolves correctly, and the site is back online.
The issue then reoccurs anywhere from an hour to several hours later.
Troubleshooting/Changes thus far
The actual Hosting Provider/Server Admins have always been great and had made some recommendations around correcting the NS records in the Zone among other things.
In the previous 2 instances of this issue, their recommended changes magically 'fix' the issues and I don't have a repeat of it for another few months. This 3rd instance however is stumping us. Server Admins are suggesting more top-level Glue/NS changes but again, there are no other affected accounts, just this one customer.
If it's relevant, I'm noting the Zone Serial# before and after each time I make the arbitrary change and between when I make the change and the issue reoccurs, I'm noticing 1-3 increments in the serial# - unsure if this is normal or suggests that something else is editing the zone in the cluster?
If anyone can help me resolve this...love you long time!
-
Hey there! I'd only expect the serial number to update one time. Is it possible the zone isn't syncing to the cluster properly immediately after you make the change? In WHM >> Tweak Settings under the Logging tab, I'd recommend enabling the "Enable verbose logging of DNS zone syncing" option if you haven't already. This will write additional log data to /usr/local/cpanel/logs/dnsadmin_log with every sync of the zone, so you'll be able to get information on why that isn't syncing properly. You can also try running this command to ensure the integrity of the zone file on the local system after a change: named-checkzone domain.com /var/named/domain.com.db just update both entries of "domain.com" to be the actual domain you're working with. Between those two things, I'd expect you to find *something* relevant that exposes the issue. 0 -
Thanks cPRex! So far, the logging hasn't revealed anything, but, in addition to enabling the Verbose Logging via Tweak Settings, I saw reference to enabling similar Debug Logging via the DNS Cluster Sync config. Enabling this the Server where the account is hosted had no effect or showed any useful info in the log, however, since enabling it on the Cluster Master, I've not had a reoccurrence of the issue. Unsure if the process of Editing/Saving the Sync config might have resolved the issue for now or is just a coincidence? Before editing the Cluster Sync config to enable to Debug Logging, everything was green tick & syncing successfully so really not sure. If it's resolved it for now, sweet. I shall see if it lasts or reappears again in the coming days/months. Thanks again for the assistance! At least I have something to review if it does reoccur. If it does reoccur and I manage to find the reason, I'll be sure to share on here to hopefully help others. Cheers! 0 -
I'm glad to hear things are working, even if we don't have a good explanation. But yes, hopefully the additional logging will show something if this happens again in the future. 0
Please sign in to leave a comment.
Comments
3 comments