Golang : How to determine if request or crawl is from Google robots
For this tutorial, we will learn how to detect a visit by Google robots or web crawlers and learn how to distinguish fake/spoof user agents from bad actors pretending to be Google.
Taking example from the official guide from Google on how to see which robots Google uses to crawl website. We can see that all of the user agents contain the string "google".
The simplest way to detect Google robots or crawlers should look like this :
package main
import (
"strings"
"fmt"
"net/http"
)
func getUserAgent(w http.ResponseWriter, r *http.Request) {
ua := r.Header.Get("User-Agent")
fmt.Printf("user agent is: %s \n", ua)
w.Write([]byte("user agent is " + ua))
ualow := strings.ToLower(ua)
if strings.Contains(ualow, "google") {
fmt.Println("Visited by Google bot")
} else {
fmt.Println("Visited by some thing else")
}
}
func main() {
http.HandleFunc("/", getUserAgent)
http.ListenAndServe(":8080", nil)
}
Sample output :
These are the results simulated by Google PageSpeed Insights
user agent is: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/27.0.1453 Safari/537.36
Visited by Google bot
user agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Version/8.0 Mobile/12F70 Safari/600.1.4
Visited by Google bot
However, the above code is kinda primitive and can be easily spoofed. We need to verify if the Google robot is indeed genuine by following this guideline from Google on how to verify Google robots.
package main
import (
"fmt"
"net"
"net/http"
"os/exec"
"strings"
"os"
)
func getUserAgent(w http.ResponseWriter, r *http.Request) {
ua := r.Header.Get("User-Agent")
fmt.Printf("user agent is: %s \n", ua)
w.Write([]byte("user agent is " + ua))
ualow := strings.ToLower(ua)
if strings.Contains(ualow, "google") {
fmt.Println("Visited by Google bot")
} else {
fmt.Println("Visited by some thing else")
}
// get IP address
ip, _, _ := net.SplitHostPort(r.RemoteAddr)
// capture the output of host command to verify Google robots
// based on https://support.google.com/webmasters/answer/80553
cmd := exec.Command("host", ip)
result, err := cmd.Output() // capture the exec output to variable result
if err != nil {
//fmt.Println(err)
fmt.Printf("Host %s command execution failed \n", ip)
os.Exit(1)
}
fmt.Println("Host reply : ", string(result))
// if result contain the word google, then it is genuine user agent
// else fake
if strings.Contains(strings.ToLower(string(result)), "google") {
fmt.Println(" and the user agent is real. ")
} else {
fmt.Println(" and the user agent is determine to be FAKED after verifying with host command. ")
}
}
func main() {
http.HandleFunc("/", getUserAgent)
http.ListenAndServe(":8080", nil)
}
References :
https://support.google.com/webmasters/answer/1061943
https://www.socketloop.com/tutorials/golang-how-to-check-if-a-string-contains-another-sub-string
By Adam Ng
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+12.5k Elastic Search : Return all records (higher than default 10)
+4.7k Javascript : Access JSON data example
+23.2k Golang : Randomly pick an item from a slice/array example
+7.3k Ubuntu : connect() to unix:/var/run/php5-fpm.sock failed (13: Permission denied) while connecting to upstream
+6.8k Golang : Calculate pivot points for a cross
+19.3k Golang : Execute shell command
+14.5k Golang : How to shuffle elements in array or slice?
+4.4k Javascript : How to show different content with noscript?
+17.2k Golang : Fix cannot convert buffer (type *bytes.Buffer) to type string error
+11.7k Golang : Surveillance with web camera and OpenCV
+29.5k Golang : Save map/struct to JSON or XML file
+23.9k Find and replace a character in a string in Go