Skip to main content

Very slow smtp response from one service provider

Comments

17 comments

  • Lyttek
    btw, rfc1413_query_timeout = 0
    0
  • cPanelMichael
    Hello :) Have the users reported this issue to their ISP to see if it's an issue with their connection? You may want to enable Exim on an additional port and have the users try that port for sending. You can configure Exim to run on an additional port via: "WHM Home " Service Configuration " Service Manager" Thank you.
    0
  • SamTheMan
    I am having this problem with five different offices full of PCs (so far) in Kansas City, using 5 different cPanel servers. All of them are on Time Warner "residential" connections -- this doesn't seem to be an issue for "business class" customers. The problem started yesterday mid-day, none of the customers changed anything on their end. Of course TW hotly denies they did anything, they're not responsible, etc etc. The Tier 3 tech I talked to yesterday (for more than an hour) told me there was nothing wrong, nothing he could do and said I should tell my customer to call Comcast! What I'm seeing is this: any connections to an SMTP port take a little more than a minute for the greeting banner to appear. The port number doesn't matter -- 25, 587, 2525, even 465 (SSL) all behave the same. For customers using Outlook, the default timeout is 60 seconds, so no messages can be delivered. Increasing the timeout to 3-4 minutes works around this issue (messages are delivered but each one takes a minute to go). I've tested this repeatedly from the command prompt; it's not an Outlook issue. I've also disabled firewalls, antivirus, etc on the PCs with no effect. Bypassing the router (CAT5 straight into the TW modem) changes nothing. However, changing a laptop from its TW connection to a Verizon wifi fixes it immediately. ALSO very important -- this only seems to be a problem with cPanel servers. When I test connections from TW to Gmail, Yahoo or other 3rd party servers, no problems at all. But 5 different cPanel servers I've tested are all having this problem. I think it's some kind of timeout, but I can't figure out what. My servers all have the "rfc1413_query_timeout" set to 0 and I've even configured the iptables firewalls to block IDENT requests with an immediate "connection refused", no change. I've tried disabling all the RBLs in exim, no change. This doesn't feel like a DNS timeout to me. I'd welcome any suggestions at all. My customers are SCREAMING for a fix to this, so I'm really under the gun.
    0
  • SamTheMan
    New clue -- after some debugging with exim, it looks like the Big Delay is in DNS. When exim looks up reverse DNS for the client's IP address (w.x.y.z), it comes back with something like cpe-W-X-Y-Z.kc.res.rr.com. Then exim tries to do a forward lookup on that name to find its IP address, and the query times out several times (for a total of over one minute delay). Looks like TW changed their internal network so the nameserver for kc.res.rr.com is unreachable from the outside world. That server is named device-dns1.rr.com AND device-dns2.rr.com with IP 65.24.6.70. I'm attempting to configure named to hijack the kc.res.rr.com zone (for my server only) so it'll return fast results for those queries. I'll report back if I get it working.
    0
  • SamTheMan
    FIXED! Hijacking the DNS zone fixes the problem. In my case, the zone is kc.res.rr.com, but I'm sure there are others... To fix it, login as root through SSH. If you're not comfortable editing config files through SSH, STOP and DO NOT continue. Create a file named /var/named/kc.res.rr.com.db (or whatever zone is giving you problems). Put this in the file (again, change kc.res.rr.com as needed): $TTL 300 kc.res.rr.com. 86400 IN SOA ns.example.com. samtheman.example.com. ( 2014101401 ;Serial Number 300 ;refresh 300 ;retry 300 ;expire 300 ) kc.res.rr.com. 300 IN NS ns.example.com. kc.res.rr.com. 300 IN NS ns2.example.com. *.kc.res.rr.com. 300 IN A 127.0.0.1
    Change ownership on that file: chown named.named /var/named/kc.res.rr.com.db
    Then edit your /etc/named.conf file. Find the section that begins with: view "localhost_resolver"
    Within that section, just below the "recursion" command, insert these lines: zone "kc.res.rr.com" { type master; file "/var/named/kc.res.rr.com.db"; };
    Also insert those same lines in the section that begins: view "internal"
    Restart named (signaling it doesn't seem to be enough): /etc/rc.d/init.d/named restart
    That should do it. In your new-found free time, call TW and tell them how their all a bunch of lying sacks of crap. [COLOR="silver">- - - Updated - - - Forgot to mention -- this only works if your cPanel server is using itself as the only nameserver. Check your /etc/resolv.conf!
    0
  • jhitesma
    We're experiencing this in AZ as well the past several days. A DNS workaround isn't an option for us since we're not running our own nameservers.
    0
  • mo-jord
    I am having the exact same issue in Kansas City [COLOR="silver">- - - Updated - - - I should also add that this is through TWC and cPanel websites with any SMTP port - when I Switch to AT&T the issue goes away. I was able to change the SMTP time out and this fixed the problem, although it takes forever to send emails. I can't change the name servers, so I am stuck too.
    0
  • jhitesma
    I opened a ticket with cpanel regarding this and sadly the response has been far less helpful than I've come to expect from their support staff :( Their only suggestion has been to enable bind on the servers and use the hack above to hijack the DNS requests. Quite frankly I really don't want to enable additional services we don't need and don't want which will take additional server resources, provide additional attack vectors for hackers, and run the risk of causing further problems down the road since we use external nameservers and have absolutely no reason to be running bind on our servers. Not to mention the history of security issues associated with bind even when it's configured correctly. In the past I've always got prompt professional and truly helpful support from cpanel and this is so out of character it shocks me. Given that other hosts aren't having the same issue it truly seems to be something related to cpanel's configuration of exim and the response I'm getting basically sounds like "Sorry, we don't want to deal with that." I tried white listing affected IP's in both "Sender verification bypass IP addresses" and "Trusted SMTP IP addresses" but that had no effect either. Very disappointed in cpanels response to this issue.
    0
  • Lyttek
    Thank you SamTheMan for that info!! Will be trying that out shortly!
    0
  • Lyttek
    Worked for me, so thanks very much!
    0
  • jhitesma
    We're still trying to find a solution to this since we don't run nameservers on our servers and can't do a workaround as posted above. Cpanel has been very good about documenting the problem with Time Warners DNS configuration. But has been extremely unhelpful on finding a way to reconfigure exim to deal with this. Usually we get outstanding tech support from cpanel but I've had a ticket open for over a week and cpanel has been dropping the ball repeatedly on dealing with it. From techs who didn't even bother to read the description of the problem, to techs who despite having access to our server ask what settings are set to in our exim config then suggest trying things I already tried earlier in the ticket and finally suggest configuration changes that "may help but may make things worse for people not having problems" (which didn't work when I did try them.) I get that the root problem is TW's screwed up DNS. But we've got a ticket open with TW that has supposedly been escalated repeatedly to the national level but still isn't getting any attention so it's unlikely TW is going to fix their DNS anytime soon and in the meantime their tech support is telling our clients it's a misconfiguration on our server that's causing the problem. Since people with TW (we have a TW line here in our office as a backup so "people with TW" includes us to some extent) aren't able to reliably connect to our server but are able to reliably connect to just about any other server they try to connect to (google, yahoo, bing, exchange based servers...) they have no problems believe TW that it's our fault and not actually TW's. Very very disappointed in cpanel's lack of response to this which is now actively costing us clients. Yes it's TW's fault - but other servers are able to deal with TW's messed up DNS without preventing SMTP connections. That cpanel doesn't even seem to consider this a problem and isn't interested in trying to fix the configuration of exim that's causing it to refuse connections other servers are accepting is deeply troubling.
    0
  • cPanelMichael
    Could you post the ticket number here so we can take a closer look? Thank you.
    0
  • jhitesma
    Ticket #5574307 it's been reported to management twice which were the only points it seemed anyone bothered to give it any real attention. Though the response is still "tough luck, install bind on your servers and try to figure out how to get the workaround above to work without using itself as the only nameserver (oh, and you'll be on your own trying that) and the ever helpful "Contact Time Warner and ask for an escalation" even though we've been in contact with TW for over a week, have a regional manager working with us and have escalated to a national level....but still are getting replies from TW like: Give you an update. From what I have been told the issue is correlated with the one in Washington. It is still being looked specifically by the DNS team at this point. They have tested other networks and is coming in forward and reverse through those name servers however they do see an issue still with the DNS servers. They have identified issue with a PVU DNS server but they are working on repair and getting more tickets gathered on this. I do not have an eta on repair quite yet. I will inform you when I do.
    Which believe it or not is not from an out sourced overseas tech but one here in the US who still apparently can't handle simple English or really understands how DNS works.
    0
  • cPanelMichael
    Hello, I see your support ticket with us is still open so you can except further correspondence from our staff. However, I did want to address one of your previous comments: Quite frankly I really don't want to enable additional services we don't need and don't want which will take additional server resources, provide additional attack vectors for hackers, and run the risk of causing further problems down the road since we use external nameservers and have absolutely no reason to be running bind on our servers. Not to mention the history of security issues associated with bind even when it's configured correctly.
    Is there a specific vulnerability with Bind you are concerned about? Thank you.
    0
  • jhitesma
    ] Is there a specific vulnerability with Bind you are concerned about? Thank you.

    I honestly have no idea currently as we have no interest in running bind on our servers now or in the future so I haven't been following it in years. Even with no currently known vulnerabilities adding another service is just adding more potential attack vectors and using more system resources, we're not currently prepared to accept that additional risk on our servers. We also have multiple clients who require notification of new services being added on our servers due to their own policies and agreements with 3rd parties not wanting nameservers running on the same physical server as their website. (In some cases this is simply that they don't want a single point of failure - and I understand that adding bind just to hijack TW's domain wouldn't add an extra point of failure...but try explaining that to a client who no longer understands the why of their requirement only the way it's written.) There are many reasons we don't have and don't want bind running on our servers. I would fully agree that it's entirely TW's problem if it wasn't that other SMTP servers are dealing with the broken TW DNS with no issue. They're facing the same broken DNS system but they aren't experiencing connectivity issues. I've reached out to our nameservers administration but they are completely unwilling to hijack the res.tw.com domain even if it's broken. A response which also reinforces that installing bind just to hijack a domain is hardly a suitable response to this issue.
    0
  • jhitesma
    Good news. TW seems to have fixed their DNS as I'm now able to get resolution on those domains: root@host5 [~]# dig +trace cpe-76-178-74-167.natsow.res.rr.com ANY ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.23.rc1.el6_5.1 <<>> +trace cpe-76-178-74-167.natsow.res.rr.com ANY ;; global options: +cmd . 65051 IN NS e.root-servers.net. . 65051 IN NS b.root-servers.net. . 65051 IN NS f.root-servers.net. . 65051 IN NS k.root-servers.net. . 65051 IN NS h.root-servers.net. . 65051 IN NS m.root-servers.net. . 65051 IN NS g.root-servers.net. . 65051 IN NS i.root-servers.net. . 65051 IN NS j.root-servers.net. . 65051 IN NS l.root-servers.net. . 65051 IN NS d.root-servers.net. . 65051 IN NS c.root-servers.net. . 65051 IN NS a.root-servers.net. ;; Received 228 bytes from 10.0.80.11#53(10.0.80.11) in 318 ms com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS a.gtld-servers.net. ;; Received 497 bytes from 198.41.0.4#53(198.41.0.4) in 210 ms rr.com. 172800 IN NS dns1.rr.com. rr.com. 172800 IN NS dns2.rr.com. rr.com. 172800 IN NS dns3.rr.com. rr.com. 172800 IN NS dns6.rr.com. rr.com. 172800 IN NS dns5.rr.com. ;; Received 228 bytes from 192.26.92.30#53(192.26.92.30) in 100 ms natsow.res.rr.com. 7200 IN NS dns-sec-01.peakview.rr.com. natsow.res.rr.com. 7200 IN NS dns-pri-01.peakview.rr.com. ;; Received 144 bytes from 65.24.0.171#53(65.24.0.171) in 151 ms cpe-76-178-74-167.natsow.res.rr.com. 3600 IN A 76.178.74.167 natsow.res.rr.com. 3600 IN NS dns-pri-01.peakview.rr.com. natsow.res.rr.com. 3600 IN NS dns-sec-01.peakview.rr.com. ;; Received 160 bytes from 76.85.232.130#53(76.85.232.130) in 29 ms Bad news - I'm still getting abnormally slow connections coming from TW. More than twice as long as it takes coming from our centry link connection. Don't know if it's long enough to keep causing problems yet but it still seems abnormally long even though that DNS response doesn't seem slow.
    0
  • jhitesma
    Looks like it was some bad caching along the way somewhere. After a few more hours things have cleared out fully and connections from TW are going through as normal again.
    0

Please sign in to leave a comment.