The Curious Incident of cPanel services CA change
TL;DR:
For cPanel – You do not change the CA that the checkallsslcerts script uses to produce new cert for the cPanel services SSL certificate, without proactively letting customers know about it, preferred in advance. It is changing the CA of the cPanel Services SSL certificates! This change has consequences…
For Let's Encryp – I don't know what your reason is for constantly and frequently changing the IPs that are resolved for r3.o.lencr.org, but if you can – please avoid it or at least alter the IP changes using longer time intervals, to avoid making this FQDN an lightspeed moving target – it fails stuff (as you can read below).
The long version, my journey:
I have a very simple installation – cPanel on one Ubuntu backend server, behind a pfSense firewall who handles the connection to the Internet.
Recently I begun receiving emails with a subject line like:
"
[backend-server-fqdn] The SSL (Secure Sockets Layer) certificate for “cpanel” on “backend-server-fqdn” will expire in less than 30 days.
"
Multiple emails like this hinted me that the renewal of the cpanel services SSL certificate is not successful.
These services handle the web server for the cPanel admin web sites, and the email server that handle sending and receiving emails.
So, I SSHed to the server and run the checkallsslcerts script, to learn what is wrong with the services cert renewal process (the script is explained here - https://docs.cpanel.net/whm/scripts/the-checkallsslcerts-script/).
The first meaningful error was:
"
warn [checkallsslcerts] Failed to fetch CA bundle information from certificate’s “authorityInfoAccess” extension:cpanel::Exception::HTTP::Network/(XID xxxxx) The system failed to send an HTTP (Hypertext Transfer Protocol) “GET” request to http://r3.i.lencr.org/ because of an error: Could not connect to 'r3.i.lencr.org:80': Connection timed out
"
It was a bit strange to me – because I do use "Let's Encrypt" as the certificate authority (CA) for the web sites hosted by my cPanel installation, but specifically for this case, of cPanel's own services cert renewal - so far the CA by/of cPanel itself. But I am OK with Let's Encrypt, so I flowed with this change for now, noting my self to look into this change later, once I solve this issue.
Because in my FW I also try to secure outgoing sessions, I have a specific rule to allow outgoing sessions towards port 80, HTTP, on the Internet, so I added to its group object (Alias, the term used for it at pfSense) of the destinations for this rule – the FQDN of r3.i.lencr.org , which is used to get the CA cert of Let's Encrypt, the organization that manages the CA provider, the one that the checkallsslcerts uses.
(pfSense allows to use also non-numeric-IP as FW objects, hence FQDN values, and it is resolving them every FW admin set period of time, and save the numeric IPs of them in "Table" objects (as a kind of "Cache"), which are the source for the alias objects used in the FW rules. So, when a new connection arrives the FW – the FW can do a numeric IP match also for FQDN based objects)
This changed made the above error message be gone, but now a new error message arrived:
"
warn [checkallsslcerts] Retrying after network failure: (XID xxxxxx) The system failed to send an HTTP (Hypertext Transfer Protocol) “POST” request to http://r3.o.lencr.org because of an error: Could not connect to 'r3.o.lencr.org:80': Network is unreachable
"
Smarter now, I run to add this new value, of r3.o.lencr.org (the OCSP server address of Let's Encrypt, and querying OCSP is done towards port 80 TCP, hence HTTP), to the same group of objects in the FW, which is allowed to access the Internet at destination port of 80 TCP.
This solved it, the new certificate for the cPanel Services SSL certificate was installed successfully, and I was happy!
But something else came now to hit me…
I have a monitoring system, that directly probes my backend server's httpS protocol, using its core/raw name, the one used by the above services cert, using a simple httpS://fqdn request (the text of "fqdn" is of course replaced in the request with the actual server name value).
The health check is performed every one minute, its timeout for the probed target server side to respond is five seconds, and if two consecutive checks fail (hence no server reply within maximum of 10 seconds from each probe start) - it sends me an email that the probe has failed, and that the system may be down.
I began to steadily get such emails, about one each hour, but not in a very tidy timely pattern. So, most of the checks passed and all was OK, but from time to time – the check failed.
I SSHed to the server again.
This time I looked in the web server's error log, at /var/log/apache2/error_log, and found the following suspicious error message:
"
[Sun Mar 31 20:28:07.069878 2024] [ssl:error] [pid 13026] (101)Network is unreachable: [client Numeric-IP-Address:Source-Port—of-the-monitoring-system] AH01974: could not connect to OCSP responder 'r3.o.lencr.org'
[Sun Mar 31 20:28:07.069916 2024] [ssl:error] [pid 13026] AH01941: stapling_renew_response: responder error
"
But, hey, wait a minute, I just allowed it, r3.o.lencr.org, in the FW, so why is it blocked again??
So, I went back to the FW and set the logging level to a higher value, to show me both allowed and blocked requests from the backend server to the any IP on the Internet, towards port 80, and zoomed into the log output to look for events that happened just around the times of the monitoring alerts.
Strangely enough – some requests were allowed, exactly by the FW rule that allows this access, and some were blocked by a rule I made to block outgoing port 80 access attempts, made to targets on the Internet that I did not approve in the "Allow" rule. Ha??
OK, I got it. There is possibly a mismatch between what the backend server "knows" to be the IP addresses of r3.o.lencr.org are, and what the FW "knows" they are.
And this is although both systems use the same DNS servers as resolvers.
First, I wanted to see how much diversity of IPs this DNS translation gives, around the Internet, so I accessed this nice website that perform a DNS query across many DNS servers around the world.
https://www.whatsmydns.net/#A/r3.o.lencr.org
It turned out that indeed, Let's Encrypt really put a lot of effort to distribute the IPs for this server across many unique IPs around the world.
Next I run the following commands, in several recurring loops, at the terminal of the backend server, to see how fixed or varied are the IPs I get as replies for the DNS query for this FQDN:
- To run the DNS lookup
nslookup r3.o.lencr.org
- To clear the DNS cache
systemd-resolve --flush-caches
And it turned out, as suspected, and as demonstrated in the above DNS checks website - very diverse.
Hence, when the cPanel services web server at the backend server is wishing to serve/reply to a client httpS access to its service web server name, it needs to do an OCSP stapling (see here how OCSP works to understand the communication flow - https://en.wikipedia.org/wiki/OCSP_stapling), hence it needs to query r3.o.lencr.org using http:// (port 80 TCP) access, but first it needs to learn which IPs are serving this FQDN, but each time (post local DNS cache expiration) – it probably gets totally new set of IPs as a reply, hence - for the FW to match them – it is a moving target!!! Both systems contantly have possibly different-from-each-other IPs for this FQDN, and only if there is a match between them – the access to the OCSP server will be allowed.
This is probably the cause for the blocks the FW does to the requests that are originated from the monitoring system, there is a mismatch between what the backend server knows as the IPs for r3.o.lencr.org and what the FW knows as the IPs for this FQDN; each of the systems is doing DNS querying and caching at its own intervals, and possibly gets different results in the replies, although they use the same DNS server resolver.
And this mismatch causes the FW to block the request of the OCSP stapling, so the web server never replies to the monitoring system, because the webserver doesn't get an OCSP reply, hence reaching the request timeout for the monitoring request, which fails the monitoring probe.
I need to get both systems, the backends server and the FW, as much as possible, to be on the same page, so both will have the same list of IPs for the FQDN of r3.o.lencr.org.
The first, dumb, ancient security related instinct - was to use fixed IP objects, which led me to find all the numeric IP values that are linked to this FQDN and add them to the relevant object at the allow rule on the FW.
Quite quickly I learned it is not efficient, there are too many of these IPs, and the concept is not future proof, as these IPs can, and probably will, be changed/removed/added in the future.
So, I moved towards looking into the FW's DNS lookup interval, the one that constantly loops DNS queries for FQDN alias objects.
I found that all FQDN based Alias objects of pfSense are resolved by default every 300 seconds, hence every 5 minutes, at the admin web GUI path of system > Advanced, Firewall & NAT, at the field of "Aliases Hostnames Resolve Interval", which is empty by default (which is actually translated to the default of 300 seconds)
https://docs.netgate.com/pfsense/en/latest/firewall/aliases.html#using-hostnames-in-aliases
HAAA! Eureka! I shouted… and set it, in my rage, to a value of 1 second, of course, and applied the change.
Yes, that was it. It solved the problem. Not entirely, I still do get here and there those monitoring failure email, but at the big picture level – it is solved.
Yes, I pay for it with a few more CPU percentage utilization at the FW, and possibly some few more consumed memory megabytes, but hey – I am now up to date with the Internet DNS accuracy, up to date to the second!!
I guess pfSense caches these frequent DNS lookup results to its Alias objects tables, which are cached for longer periods than these very frequent DNS lookups, hence creating a larger list of possible IPs that represent the reference FQDN, hence enlarging the chance that the IP that the server will ask for when accessing r3.o.lencr.org - will also be in the list of IP for this FQDN at the FW, so the traffic will be allowed and the monitoring check will be completed successfully.
Now, to cPanel, the firm…
To verify, I went to https://crt.sh/, a web site which is like a search engine for historical issuance of public certificates, based on "Certificate Transparency (CT)" (https://en.wikipedia.org/wiki/Certificate_Transparency).
I searched in it for my cPanel server's FQDN, and indeed, the most recent cert was the first to use Let's Encrypt as CA. Most of the former ones were issued by the CA of cPanel.
So, a change was made here.
I believe I am quite on top of being informed of meaningful changes at cPanel, I do get their emails about prominent changes.
So, I assumed I missed this change, so I went on to search for any announcement by cPanel about it.
I tried the following relevant cPanel documentation articles.
The very relevant following article does not even mention "Let's Encrypt"
The checkallsslcerts Script - https://docs.cpanel.net/whm/scripts/the-checkallsslcerts-script/
The following cPanel support article gave me a generic direction towards solution, but it is not mentioning "Let's Encrypt" at all.
OCSP responder errors - https://support.cpanel.net/hc/en-us/articles/360036533894-OCSP-responder-errors
The following article does mention "Let's Encrypt" as the source of the services cert but does not mention since when and for which cPanel version(s).
Manage Service SSL Certificates - https://docs.cpanel.net/whm/service-configuration/manage-service-ssl-certificates/
The most explicit mentions around this change were found in the "Change Log" page for my cPanel version branch, of 118,
https://docs.cpanel.net/changelogs/118-change-log/
The changes are for version 117.9999.78, which is I guess a pre-118 version, dated 2024-01-18:
"
Fixed case EK-24: Convert checkallsslcerts to use Let's Encrypt for hostname certificates.
Fixed case EK-45: Set the AutoSSL provider to Let's Encrypt on updates to 118.
Fixed case EK-46: Add a deprecation warning to the AutoSSL UI for the Sectigo provider.
Fixed case EK-47: Add a feature showcase for the Let's Encrypt changes.
Fixed case EK-58: Update the current provider headings on the AutoSSL UI.
Fixed case EK-70: Install the Let's Encrypt plugin before running checkallsslcerts during initial setup.
"
But I don't follow change logs closely, as they mostly contain fixes, not new features, or changes in behavior.
In the main, more friendly, page of "Release Notes", for the version branch of 118 – "Let's Encrypt" is mentioned, but not in the reference of cPanel's' own application services CA provider
https://docs.cpanel.net/release-notes/118-release-notes/
All in all, from where I stand, cPanel failed here, and failed me as a customer.
You do not do such a prominent system change (even if it is at the lower levels of the system, without a GUI noticeable change) before alerting your customers first, to let them know about the change in advance, so they can prepare their environment for that change, and hopefully avoid issues.
Thank you.
-
🚬
0 -
Point 1
You do not change the CA that the checkallsslcerts script uses to produce new cert for the cPanel services SSL certificate, without proactively letting customers know about it, preferred in advance.
According to the changelogs, on the 18th of January 2024 (for version 117.9999.78 onwards):
Fixed case EK-46: Add a deprecation warning to the AutoSSL UI for the Sectigo provider.
(with the default Sectigo provider being removed 2nd of April under the 119.9999.69 update).
This move was also detailed on the features list (specifically "Let's Encrypt instead of Sectigo for AutoSSL and Hostnames").
Point 2
For Let's Encryp – I don't know what your reason is for constantly and frequently changing the IPs that are resolved for r3.o.lencr.org,
I can current see 2 different IPv4 addresses for r3.o.lencry.org (and 2 difference onces for IPv6) - for that hostname, however Let's Encrypt have previously said:
Let's Encrypt CA does not want to announce particular IP addresses that are used in validation because of a desire to change them periodically (partly in order to make it harder for attackers to be able to cause misissuance). While you could figure out what addresses are currently used, they may change at any time and will not be documented. If you can't allow inbound connections from the general public to the service that you're trying to validate, you can use the DNS challenge type (which just requires letting the Let's Encrypt CA look up your DNS records associated with that name).
(source https://community.letsencrypt.org/t/ip-addresses-le-is-validating-from-to-build-firewall-rule/5410/17 ) and also from https://letsencrypt.org/docs/faq/#what-ip-addresses-does-let-s-encrypt-use-to-validate-my-web-server :
What IP addresses does Let’s Encrypt use to validate my web server?
We don’t publish a list of IP addresses we use to validate, and these IP addresses may change at any time. Note that we now validate from multiple IP addresses.
0 -
Could I get a short version of what problem this caused? I haven't heard any complaints about the switch to Let's Encrypt related to the CA up until this post.
0 -
The change of the cPanel services CA caused a change of the IPs needed for the cPanel server to reach to on the Internet, hence the new IPs were blocked by the Firewall who allow outgoing traffic only to specific IPs on the Internet, which prevented the renewal of the cPanel services cert.
I guess not many filter outgoing requests from backend servers to the Internet, which is why possibly not many complained about this change.
0 -
Thanks for the clarification! As the post that rbairwell shared, we don't have any control over the Let's Encrypt SSL IPs, which is in contrast to Sectigo who does have a set list we cover here:
I'm sorry this caused an interruption for you!
0 -
Or just buy a commercial cert (lots of places to get cheap certs) install it and then only worry about it once a year?
0 -
Let's Encrypt do have some advice (from Aug 2016) about this:
-
For all challenge types: Allow outgoing traffic to
acme-v01.api.letsencrypt.org
on port 443 (HTTPS). -
For
HTTP-01
(for example viacertbot
's webroot plugin): Allow incoming traffic on port 80 (HTTP) from anywhere.
However, if you are unable to do this but are running the popular CSF firewall suite, then you may be able to use the dyndns settings to allow Let's Encrypt (see this 2020 post for an example of inbound Lets Encrypt rules that used to work), if the problem is outbound then:
tcp|out|d=r3.o.lencr.org
might work - although according to Let's Encrypts documentation , the o.lencr.org subdomain is only used for OSCP confirmation and should only be being used by the web browsers (unless you have scripts using things such as wget/curl to fetch things from remote HTTPS sites - such as eCommerce payment confirmations - which might then trigger an OSCP check from the server: but this probably wouldn't happen during certificate renewal).
If you are using a different firewall suite, then it might be a bit more tricky to add DNS entries as an allow list.
0 -
-
cPRex, I am sure you understand that the issue here is not CP's totally OK lack of control of LE's IPs, but the fact that CP does such a fundamental changes and don't inform properly its customers about it.
I am very sad to say, that issues like this and others seen recently (like the version upgrade that forces a migration process when CP is on Ubuntu), points that CP is degrading, somewhat soon forcing its customers to look into migrating to other, alternative, options.
0 -
I think I have to disagree about there being no communication about this, as this was a major change with announcements - most recently, our March newsletter included the change about Let's Encrypt. If you aren't getting the cPanel newsletters you can sign up for those at https://cpanel.net/mailing-list/
Our January newsletter also talks about how Let's Encrypt will be required in version 118.
0 -
I see, is there a web archive for this newsletter?
0 -
Unfortunately we don't have them published online yet. Our future plan is to have the email just be a link to an online version, but it hasn't happened quite yet!
1
Please sign in to leave a comment.
Comments
11 comments