Golang : How to extract links from web page ?
Golang's third party package goquery provides excellent tools to assist developers in building HTML parser/crawler. In this tutorial, we will learn how to use goquery to extract links from a web page. The code below will scan a part of the web page that is within the <div class=".nav-collapse ..." >
and <ul class=".nav..">
and extract the links.
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
"strings"
)
func ScrapeLinks(url string) {
doc, err := goquery.NewDocument(url)
if err != nil {
panic(err)
}
// process this part :
// <div class="nav-collapse collapse navbar-responsive-collapse">
// <ul class="nav nav-pills">
// <li>
// <a href="/references">References</a>
// </li>
// <li>
// <a href="/tutorials">Tutorials</a>
// </li>
// </ul>
// </div>
doc.Find(".nav-collapse .nav").Each(func(i int, s *goquery.Selection) {
Title := strings.TrimSpace(s.Find("li").Text()) // https://www.socketloop.com/tutorials/trim-white-spaces-string-golang
// convert string to array
Fields := strings.Fields(Title)
// go deeper by 1 level to get the <a href=""></a>
doc.Find(".nav-collapse .nav a").Each(func(i int, s *goquery.Selection) {
Link, _ := s.Attr("href")
Link = url + Link
fmt.Printf("Title is [%s] and link is [%s]\n", Fields[i], Link)
})
})
}
func main() {
ScrapeLinks("https://socketloop.com")
}
Output :
Title is [References] and link is [https://socketloop.com/references]
Title is [Tutorials] and link is [https://socketloop.com/tutorials]
Bear in mind that goquery's doc.Find only look for first part of the <div class="nav-collapse collapse navbar-responsive-collapse">
.. i.e .. only
doc.Find(".nav-collapse") // will do.
in this tutorial case, because we want to dig 1 level deeper to <ul class="nav...">
we use this instead
doc.Find(".nav-collapse .nav")
Reference :
By Adam Ng
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+25.8k Golang : convert rune to integer value
+11.7k Golang : Fuzzy string search or approximate string matching example
+7.9k Golang : Reverse a string with unicode
+7.1k Golang : How to solve "too many .rsrc sections" error?
+33.1k Golang : How to check if a date is within certain range?
+21.3k Golang : Convert(cast) string to rune and back to string example
+34.1k Golang : Proper way to set function argument default value
+14.7k Golang : How to get URL port?
+24.8k Golang : How to print rune, unicode, utf-8 and non-ASCII CJK(Chinese/Japanese/Korean) characters?
+15.7k Golang : Convert date format and separator yyyy-mm-dd to dd-mm-yyyy
+16.1k Golang : ROT47 (Caesar cipher by 47 characters) example
+5.4k Golang : Get FX sentiment from website example