DNS Cluster "Could not communicate with remote API server"
Hi All
I have a DNS Cluster of 3 DNS Only servers with 6 Web servers using Write-Only connections to each of the DNS Only servers.
I have had an ongoing issue when trying to manually 'Synchronize DNS Records' to the cluster and errors showing with one or more of the DNS servers. this seems to occur with all / any of the web servers but without any discernible pattern. Like it could be any of the webservers showing any of the name servers as 'unknown', but never all of the name servers, there is always at least 1 name server showing ok.
If I select another page in the WHM console and return then the error message is gone and all 3 of the DNS servers are listed as connected and ok.
I am just not clear if this should be expected when selecting the manual Synchronize DNS Records from the WHM menu or if I actually have an API or Token issue ?
Thoughts ?
Thanks.
-
Hi, I should add Cloudlinux 7.8 v 88.0.13 on most or recent on web servers and recently updated on the name servers. 0 -
Hmm... sounds odd. Are you running NAT or your servers on different VLANs? 0 -
Hi No NAT. Public IP"s. Geographical separation. All Xen VM"s Although one of the WHM servers is on the same physical box as one of the name servers 0 -
Hm - I don't have a good explanation to your issue. It's a bit odd that it's random - else I would say that it's network or firewall related.. I suggest you open a support ticket (If you haven't already done). 0 -
As far as I know the APIs uses the cPanel default ports to communicate (2083,2086 etc...). Are you able to telnet to these ports from one server to another? Do you have cpHulk or hosts access restriction in place? These could explain the connection issues you see. 0 -
Can you please open a ticket using the link in my signature? Once open please reply with the Ticket ID here so that we can update this thread with the resolution once the ticket is resolved. Thanks! 0 -
Hi @cPanelLauren Ticket 93514404 0 -
As far as I know the APIs uses the cPanel default ports to communicate (2083,2086 etc...). Are you able to telnet to these ports from one server to another? Do you have cpHulk or hosts access restriction in place? These could explain the connection issues you see.
Hi Andrew Thanks for the input. I'd expect that if it were a port or firewall issue then it would be either open or closed and not intermittent. I can trigger the sync, get the error, check the dns cluster page to see an outage, screenshot it, refresh it, and have all servers connected. It looks more like a timing issue.0 -
I'm having this issue as well. I followed the instructions on setting up a separate DNS Only server and added it to the cluster. Everything works fine, and I got a greenlight for the first server just like the OP did in the above screenshot. I used Terraform to setup my first server, and I used the same Terraform plan to spin up an exact replica of the server, and I then configured its hostname and IP address. When I added the server I got an intermediate page that said that everything was working fine - that synchronization was good and reverse trust was good. Then when I got back to the clusters page, I get the same error for the second server. I rebuilt the server manually just in case my Terraform plan did something wonky. No go - no matter what I do, this always displays that error. I can telnet to and from each server I can access all recommended ports per the documentation. Just like the OP the only difference between these two servers is that they are in different geographically distributed datacenters. I have not tested if that could be the case, but then again, that would defeat the purpose of having reliable DNS servers if they were in the same data center. I see that a support ticket was submitted from above. Was this issue resolved? Can you post the results or how to resolve if so? Thanks! 0 -
I'm having this issue as well. I followed the instructions on setting up a separate DNS Only server and added it to the cluster. Everything works fine, and I got a greenlight for the first server just like the OP did in the above screenshot.
I'm seeing the same problem on two nameservers that recently got upgraded to cPanel v94 DNS ONLY. Whenever I "edit" one server I get:DNS Cluster Management The Trust Relationship has been established. The remote server, ns1.domain.com, is running WHM version: 10.0.0 The new role for IP ADDRESS is sync. Return to Cluster Status
but when I go back to cluster status I see: "Could not communicate with remote API server." Anyone else having the same issue?0 -
@blue928 - in the ticket that was opened, we discovered intermittent network issues, although that customer did not write back saying what the official resolution was. It sounds like you may be experiencing the following interface error, which you can ignore: cPanel Can you check that and see if that is the case? The same would apply for @andrewmoras 0 -
According to that article, it's fixed in 96.0.0 and later. So I'm confused about why I'm seeing that error in 96.0.9. Edit: My error is slightly different; it's not intermittent. 0 -
Hello DoghouseAgency! That's certainly odd. If the issue is not intermittent, The originally reported issue should have been resolved; however, as you're still experiencing the issue with a build that's already had the fix applied, it would be best to open a support ticket so that our analysts can review the issue more thoroughly and determine what exactly is occurring. You can submit a support request using the "Submit a ticket" link in my signature below. Please be sure to link this thread when opening the ticket and provide the ticket number here to track the issue properly. If our analysts help you resolve the issue, please be sure to post the resolution here as it may help other community members with similar issues. I hope that this helps. If you have any other questions or concerns, please let us know! 0 -
I'm running into a similar issue here. It's happening with one particular nameserver running cPanel DNSonly on old Dell hardware with XenServer/XCP. We even refreshed DNSonly on AlmaLinux 8, but no luck.. we get the same "Could not communicate with remote API server" error. Reverse trust was established. Public IPs, different geo zones. No VLANs. The DNS zones sync without issues. Telnet is fine between both. One important note, it's the only VM sitting on the server. It was very slow when it was on CentOS 7, and it's very slow on AlmaLinux 8. No hardware issues reported by OMSA. We'll be replacing the server sometime next week and importing the VM. My gut is it's the server itself. I'll post my findings then. 0 -
Hi All For whats its worth, I am still experiencing this same issue. A year later with no resolution from my upstream service provider, who have supposedly engaged with CPanel support. I am now in the habit of hitting up the Manual Synchronise DNS Servers process, getting the error message, refreshing the cluster links, once or twice or three times depending on the mood of the server(s). When I finally get a clear indication that the sync will work, syncing the DNS servers. 24 hours later, rinse and repeat the same actions. Never happened with the old DNS server daemon. It is only since using PowerDNS. So it could be a WHM/CPanel issue introduced around the same time as PowerDNS was added as the preferred tool, or it could be co-incidence, but Rule 39: There is no such thing as a coincidence. At the time of my original post, I could not find anyone with the same issue. Obviously, I am now not Robinson Crusoe and there is an underlying issue that has not been resolved and it is highly unlikely, across multiple ISP's, multiple IaaS providers, and multiple geographical locations, that we all have a network issue. Over the year since reporting this my upstream support provided these gems in the ticket thread: [QUOTE]" We are still getting the API related errors on nameservers for your server, to resolve and fix the root cause of intermittent issues related to DNS synchronization. We will keep you posted about the same. "
[QUOTE] "This behavior occurs due to the request for information from cluster members timing out; The timeout is 7 seconds but often it takes longer to read packages on a remote server. The screenshot provided in the document (initial reply) is currently running cPanel version 90. This issue was recently resolved in cPanel version 96. You can mostly ignore this error as the DNS cluster does continue to function even though the error appear or upgrade the cPanel version to 96 for DNS servers. "
At which point I upgraded to the non-stable .96 release and 24 hours later, I responded to the thread with: [QUOTE]"Our 3 servers were already upgraded to Cpanel 96 and it is not 'fixed'. If you check two of our servers now you will see the error again on the Sync page. The other server is apparently not affected today but that is consistent in the random manner that this issue arises. This is an ongoing issue that I cannot "mostly ignore" as it means that our DNS cluster is NOT WORKING from the perspective that if it is not sync'ing it is not working. The 3 DNS Only servers will continue to function but with out-of-date records. If it is a time-out issue, it needs to be fixed. If it is a comms issue, it needs to be fixed. Whatever it is it needs to be fixed and it is not fixed, and I cannot ignore it. "
That was back in May 2021. I have had nothing further from either my provider, nor CPanel. YMMV cheers0 -
Update: We swapped out the server today and all is working now. We exported the VM from the old server, and imported exactly as it was on the new server. Identical VM, new hardware. Problem solved. It must have been hardware or network related for us because the old server was running very slowly. SSH was slow to respond and it was the only VM on the server. Perhaps there is a new network timeout value for the dns cluster in a new cPanel update? The issue existed with both BIND and PowerDNS. Hardware swap fixed it all. 0 -
@thowden - have you submitted a ticket directly to our support team so we can check this out? While I don't have any reports of that exact issue that I'm aware of, we'd be happy to take a look at things. 0 -
I turns out this is caused by a 7sec timeout. If your dns server does not respond to the API call within 7sec it will time out. For some reason my cPanel DNS Only servers respond way slower than our full cPanel servers, even if they have less cpu use and memory pressure. Support created this KB thats being updated regarding this issue. cPanel is investigating it in CPANEL-38426 afaik. 0 -
For me the problem went away by updating cPanel/WHM initially. I'm on v98 currently and everything is working properly on both the cluster and the shared machine. I believe you've already upgraded cPanel/WHM to see if it makes a difference? Thanks, Andrew 0 -
@Host1no - thanks for posting that! 0 -
Hi @cPRex Over the last 3 months I have been working at migrating to a new infrastructure with the expectation that a 'new clean shiny' environment would be working............ Over the last few weeks I have been configuring 4 new DNS Only servers to provide my cluster and replacing the current cluster which is getting old. @thowden - have you submitted a ticket directly to our support team so we can check this out? While I don't have any reports of that exact issue that I'm aware of, we'd be happy to take a look at things.
I logged a ticket today reference #94365981 if you want to review the gory details. @Host1no Thanks for the api timing test ! For other interested parties, the short version is: Old Hosting not working with error/warning in the DNS Cluster Management "Could not communicate with remote API Server". Planning on a major migration so I ignored it and created my new shiny toys. Started with a single CentOS server, added CPanel DNS Only and configured to taste ! Cloned server #1 out 3 more times as ns1, ns2, and ns3, keeping the source server as ns4. And you might expect an issue might arise that affects all the servers, like they are clones, after all. But, no. Only one server gives the API error and that is the original source server! While I am assured that the Cluster is working and the zone copies / transfers are all ok, I hate seeing red ink on a system that is meant to be at the very core of the services we provide. So now I will wait for a resolution.0 -
Update - it looks to be an issue with the resolvers configured on the system, as things started working well after that configuration was updated. 0 -
We're still seeing this issue too. Already created a ticket for it but was told to ignore it and that DNS still synchronized correctly. 0 -
We're still seeing this issue too. Already created a ticket for it but was told to ignore it and that DNS still synchronized correctly.
Hello! This sounds accurate; however, I can take a look if you provide the ticket ID.0 -
I turns out this is caused by a 7sec timeout. If your dns server does not respond to the API call within 7sec it will time out. For some reason my cPanel DNS Only servers respond way slower than our full cPanel servers, even if they have less cpu use and memory pressure. Support created this KB thats being updated regarding this issue. cPanel is investigating it in CPANEL-38426 afaik.
0 -
Hi All Just to confirm that after sorting out the resolver error on one of the servers, everything has been running fine. DNS write-only from my hosting servers out to the name servers and they all respond in a timely fashion when forcing a DNS sync. So I will say my issue was simply the older servers were just that, old. They had been in place for over 3 years and with all the potential patching, updates, fiddle factor, etc. we just needed to rebuild from scratch and that was easier on shiny new toys. Thanks for all the input. 0
Please sign in to leave a comment.
Comments
27 comments