Introduction
The awk utility provides a method for searching through files that is similar to grep.
One advantage of using awk over grep is that it can be easier to only show the desired information from your search in some situations.
Please keep in mind that this guide is offered as a courtesy. cPanel support is not able to provide support for the use of third party utilities such as grep and awk. If you require assistance with the use of utilities such as grep and awk, please reach out to a systems administrator with the skills, trianing, and expertise required to help you.
Procedure
To help demonstrate we will be using the following fictional apache access log file. The name of this fictional log is apachelog.txt:
66.249.65.38 - - [18/Dec/2020:01:02:26 +0000] "GET /055/cgi-bin/ HTTP/1.1" 302 - "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
67.205.167.31 - - [18/Dec/2020:05:11:05 +0000] "GET / HTTP/1.1" 302 242 "-" "Screaming Frog SEO Spider/10.4"
161.35.99.67 - - [18/Dec/2020:05:21:24 +0000] "GET / HTTP/1.1" 302 242 "-" "Screaming Frog SEO Spider/10.4"
66.249.65.38 - - [18/Dec/2020:06:14:14 +0000] "GET /robots.txt HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.34 - - [18/Dec/2020:06:14:14 +0000] "GET / HTTP/1.1" 302 242 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
In order to only show the requests from google bot you could use this basic syntax to search the file:
# awk '/Googlebot/' apachelog.txt
66.249.65.38 - - [18/Dec/2020:01:02:26 +0000] "GET /055/cgi-bin/ HTTP/1.1" 302 - "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.90 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.38 - - [18/Dec/2020:06:14:14 +0000] "GET /robots.txt HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.34 - - [18/Dec/2020:06:14:14 +0000] "GET / HTTP/1.1" 302 242 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
If you wanted to only show the IPs that google bot is making requests from, you could use the print functionality to only print the first column:
# awk '/Googlebot/{print $1}' apachelog.txt
66.249.65.38
66.249.65.38
66.249.65.34
Please refer to the awk manual page for more in-depth information about the use of this utility:
Comments
0 comments
Article is closed for comments.