Server wide Google can't access server and I can't see why
AnsweredWe have recently found that Google analytics and Search Console can't seem to access website pages across our WHM metal server because the Google can't seem to access the robots.txt files on any website.
The robots.txt file exists and is reachable from the browser.
My conclusion is that somehow the WHM firewall or similar is blocking the Google access to www.website.com/robots.txt . But I can't see how this is happening. Google gives no useful specific information. Just that the request is met by a (5xx) error. But the request loads perfectly in the browser.
I have cleared our extensive list of blocked IPs on the Firewall (CSF) and have checked that port flooding firewall options are turned off (they are off). I have also checked Apache to see if there's anything on there that might cause issue in the Virtual host httpd.conf includes and nothing there seems relevant.
I'm not certain what I'm looking for but something that's causing Google (specifically and only) to be denied by the server.
What am I missing? Where can I look? I'm out of ideas. I think that there's something automated that's denying Google bots from reaching the server but I can't make out what it is. Maybe some sort of rule denying access to non HTML files , although they work in the browser .
It looks like a single thing somewhere is blocking Google server wide (although I'm not certain of this) .
To summarise:
-- DNS works fine
[root@server~]# dig +nocmd www.site.net cname +noall +answer
www.site.net. 12991 IN CNAME site.net.
[root@server ~]# dig +nocmd www.site.net a +noall +answer
www.site.net. 12968 IN CNAME site.net.
site.net. 5323 IN A 213.171.205.23
-- .htaccess works fine (and is site specific).
-- Viewing httpd.conf looks all ok (manaully viewing) and has been unchanged since forever.
-- CSF has been cleared and double checked port flooding protections has been turned off.
What else can I check to see what's denying Google specifically? Would Google be denied by any server wide HTTP headers? There are a bunch of headers always set in Apache includes
Apache Always Set headers:
=========================
Header always set Permissions-Policy "ambient-light-sensor=(), autoplay=(), battery=(), camera=(), cross-origin-isolated=(), display-capture=(), encrypted-media=(), execution-while-not-rendered=(), execution-while-out-of-viewport=(), fullscreen=(self), geolocation=(), gyroscope=(), keyboard-map=(self), magnetometer=(), microphone=(), midi=(), navigation-override=(), payment=(), picture-in-picture=(), publickey-credentials-get=(), screen-wake-lock=(), sync-xhr=(), usb=(), web-share=(self), xr-spatial-tracking=(), clipboard-read=(), clipboard-write=(), gamepad=(), speaker-selection=(), conversion-measurement=(), focus-without-user-activation=(), hid=(), idle-detection=(), interest-cohort=(), serial=(), sync-script=(), trust-token-redemption=(), window-placement=(), vertical-scroll=()"
Header always set Cache-Control no-cache,must-revalidate
Header always set X-Clacks-Overhead "GNU Terry Pratchett"
Header always set X-XSS-Protection 1;mode=block
Header always set X-Content-Type-Options nosniff
Header always set X-Frame-Options SAMEORIGIN
Header always set Content-Language en
Header always set X-Powered-By Gnomes
Header always set Content-Security-Policy upgrade-insecure-requests
Header always set Strict-Transport-Security "max-age=31536000;" "expr=%{HTTPS} == 'on'"
=========================
Would one of these be an issue? What else can I check on the server ?!
-
SOLVED:
While I was unable to find exactly information telling me what the cause was, by a process of deduction I found the issue:Googlebots are unable to operate with certain types of "Permissions-Policy" HTTP headers are in place. Specifically:
Permissions-Policy: execution-while-not-rendered=*, execution-while-out-of-viewport=*, geolocation=*, sync-script=*,
Should all be default/enabled (
*
) on the HTTP header supplied to Googles bots.(I'm unsure if geolocation is required to make it work, but the others definitely)
0 -
I'm glad you found the solution!
0
Please sign in to leave a comment.
Comments
2 comments