Main Site Broke
Without warning our main site, domain.com, stopped working. I quickly figured out that the file /var/named/domain.com.db was missing from both of our DNS servers and so was the reference to it in /etc/named/named.conf
I'm not concerned right now with why this happened. I just want to get it working again.
I recreated the /var/named/domain.com.db file as best as I could from memory:
and added this to the /etc/named.conf:
When I dig from the server that is running ns1.domain.com I get this
Which looks right to me, but if I dig from somewhere else I get a different response
or it can't find ns1.domain.com because my DNS server isn't working right, but returns the above when queried by IP
Any help? What should I try next?
; Zone file for domain.com
$TTL 14400
domain.com. 86400 IN SOA ns1.domain.com. ashoat.gmail.com. (
2013011109 ;Serial Number
43200 ;refresh
7200 ;retry
1209600 ;expire
86400 ;minimum
)
domain.com. 86400 IN NS ns1.domain.com.
domain.com. 86400 IN NS ns2.domain.com.
domain.com. 14400 IN A 64.62.211.132
localhost 14400 IN A 127.0.0.1
ns1 14400 IN A 65.19.143.3
ns2 14400 IN A 64.62.211.133
domain.com. 14400 IN MX 0 domain.com.
www 14400 IN CNAME domain.com.
and added this to the /etc/named.conf:
zone "domain.com" {
type master;
file "/var/named/domain.com.db";
};
When I dig from the server that is running ns1.domain.com I get this
# dig @ns1.domain.com domain.com
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62596
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2
;; QUESTION SECTION:
;domain.com. IN A
;; ANSWER SECTION:
domain.com. 14400 IN A 64.62.211.132
;; AUTHORITY SECTION:
domain.com. 86400 IN NS ns2.domain.com.
domain.com. 86400 IN NS ns1.domain.com.
;; ADDITIONAL SECTION:
ns1.domain.com. 14400 IN A 65.19.143.3
ns2.domain.com. 14400 IN A 64.62.211.133
;; Query time: 17 msec
;; SERVER: 65.19.143.3#53(65.19.143.3)
;; WHEN: Sun Jul 12 12:52:19 2015
;; MSG SIZE rcvd: 115
Which looks right to me, but if I dig from somewhere else I get a different response
# dig @ns1.domain.com domain.com
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30096
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 0
;; QUESTION SECTION:
;domain.com. IN A
;; AUTHORITY SECTION:
org. 80213 IN NS d0.org.afilias-nst.org.
org. 80213 IN NS a0.org.afilias-nst.info.
org. 80213 IN NS a2.org.afilias-nst.info.
org. 80213 IN NS b0.org.afilias-nst.org.
org. 80213 IN NS b2.org.afilias-nst.org.
org. 80213 IN NS c0.org.afilias-nst.info.
;; Query time: 1 msec
;; SERVER: 65.19.143.3#53(65.19.143.3)
;; WHEN: Sun Jul 12 12:55:33 2015
;; MSG SIZE rcvd: 169
or it can't find ns1.domain.com because my DNS server isn't working right, but returns the above when queried by IP
# dig @65.19.143.3 domain.com
Any help? What should I try next?
-
intoDNS intodns.com/domain.org says that 65.19.143.3 doesn't have GLUE. Could that be the cause? 0 -
ns1 is lame and and ns2 is not responding you have got bigger issues than messed up zone files 0 -
Thanks for the response! If you check dig @65.143.19.3 krydos.domain.org or dig @64.62.211.133 krydos.domain.org they both are working correctly, and every other domain is working too. The only thing I've found that isn't working is domain.org and the two nameservers in that same zone. Is there any reason that ns2 wouldn't respond for certain domains? What do you mean exactly by lame? Any suggestions on what I should try next? 0 -
not for me dig @65.143.19.3 ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.30.rc1.el6_6.2 <<>> @65.143.19.3 ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached A lame DNS server is one that does not provide authorative data for a domain that designates that server as authorative for it. so since 65.143.19.3 is dead the 2 DNS servers are not syncing if the other domains are still working they are running solely on the 1 working name server 0 -
not for me dig @65.143.19.3 ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.30.rc1.el6_6.2 <<>> @65.143.19.3 ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached
That's really odd because every IP address I dig from, and every web dig interface I can find shows it working; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.30.rc1.el6_6.3 <<>> @65.19.143.3 ANY ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7876 ;; flags: qr rd; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 13 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;. IN NS ;; ANSWER SECTION: . 461967 IN NS e.root-servers.net. . 461967 IN NS f.root-servers.net. . 461967 IN NS g.root-servers.net. . 461967 IN NS h.root-servers.net. . 461967 IN NS i.root-servers.net. . 461967 IN NS j.root-servers.net. . 461967 IN NS k.root-servers.net. . 461967 IN NS l.root-servers.net. . 461967 IN NS m.root-servers.net. . 461967 IN NS a.root-servers.net. . 461967 IN NS b.root-servers.net. . 461967 IN NS c.root-servers.net. . 461967 IN NS d.root-servers.net. ;; ADDITIONAL SECTION: a.root-servers.net. 461967 IN A 198.41.0.4 a.root-servers.net. 461967 IN AAAA 2001:503:ba3e::2:30 b.root-servers.net. 461967 IN A 192.228.79.201 b.root-servers.net. 461967 IN AAAA 2001:500:84::b c.root-servers.net. 461967 IN A 192.33.4.12 c.root-servers.net. 461967 IN AAAA 2001:500:2::c d.root-servers.net. 461967 IN A 199.7.91.13 d.root-servers.net. 461967 IN AAAA 2001:500:2d::d e.root-servers.net. 461967 IN A 192.203.230.10 f.root-servers.net. 461967 IN A 192.5.5.241 f.root-servers.net. 461967 IN AAAA 2001:500:2f::f g.root-servers.net. 461967 IN A 192.112.36.4 h.root-servers.net. 461967 IN A 128.63.2.53 ;; Query time: 52 msec ;; SERVER: 65.19.143.3#53(65.19.143.3) ;; WHEN: Tue Jul 14 05:57:53 2015 ;; MSG SIZE rcvd: 496
I don't see why my nameserver would refuse to answer to just you.0 -
Not Just me both of our data centers (separate networks) My ISP Intodns DNS stuff All reporting your 65.143.19.3 is down 0 -
So, why would a nameserver respond with one (correct) thing when queried locally, and a different response when queried remotely? 0 -
because your network is dead Ping 65.143.19.3 Timed out Timed out Timed out Timed out 0 -
I really do appreciate your trying to help me, but 65.19.143.3 responds to my pings. 0 -
.19.143.3
responds to ping from my end too . Check with the DC if there is a routing issue .0 -
Yeah, it's odd that the server responds to most people's ping, but not all. Back to my original question: Does anyone have any ideas why it would respond correct locally? # dig @127.0.0.1 domain.org ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39242 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0 ;; QUESTION SECTION: ;domain.org. IN A ;; ANSWER SECTION: domain.org. 14400 IN A 64.62.211.132 ;; AUTHORITY SECTION: domain.org. 86400 IN NS ns1.domain.org. domain.org. 86400 IN NS ns2.domain.org. ;; Query time: 17 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Wed Jul 15 07:31:52 2015 ;; MSG SIZE rcvd: 83
and respond with missing data remotely?# dig @65.19.143.3 domain.org ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12735 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1 ;; QUESTION SECTION: ;domain.org. IN A ;; AUTHORITY SECTION: domain.org. 13851 IN NS ns1.domain.org. domain.org. 13851 IN NS ns2.domain.org. ;; ADDITIONAL SECTION: ns2.domain.org. 6557 IN A 64.62.211.133 ;; Query time: 40 msec ;; SERVER: 65.19.143.3#53(65.19.143.3) ;; WHEN: Wed Jul 15 07:34:45 2015 ;; MSG SIZE rcvd: 83
Everything is set up exactly the same on ns2, and it responds with the same data locally and remotely. ns1 and ns2 both return all of the thousands of other zones they are authoritative for correctly. The only one that doesn't work is this particular one. Any wild ideas of things to try accepted. Thanks!0 -
Hello :) Try rebuilding the DNS configuration via: mv /etc/named.conf /etc/named.conf.old /scripts/rebuilddnsconfig
Also, you may need to delete and re-create the zone through WHM instead of the command line. Thank you.0 -
Thanks for the suggestion. Yeah, I had tried that command to rebuild the named.conf several times before I even posted on these forums. All of the conf checkers reported no errors too. I'm happy to announce that I got it fixed though! However, I still have no idea what exactly was broken in the first place. What I ended up doing was just going nuts and deleting EVERYTHING that had any reference to domain.org. I completed flushed all of the cache from both servers. I restarted named on both servers several times and tried everything I could think of to make sure there were no traces left anywhere. Then once I was reasonably sure everything was cleaned out completely I started over from scratch and rebuilt everything over from the ground up. Oddly enough the working /var/named/domain.org.db has the exact same permissions, the exact same ownership, the exact same contents, but now it works. It makes me think maybe something wonky was in there somewhere conflicting with the correct information I had. Who knows. I wish I could be more help to anyone who has a similar problem in the future and finds this thread, but that's all I've got. Thanks for everyone's help. 0 -
Hello :) It's possible the zone serial needed incrementation if additional name servers did not detect the change in zone. Regardless, I am happy to see the issue is now resolved. Thank you for updating us with the outcome. 0
Please sign in to leave a comment.
Comments
14 comments