Golang : Format strings to SEO friendly URL example
Problem:
You want to take a string and format it to SEO friendly URL. SEO friendly means no uppercase, underscore and characters deemed not suitable for search engine crawlers. How to do that?
NOTES: This example can be used in sanitazing and cleaning up input strings as well.
Solution:
Ported this PHP function to Golang
function SEOfriendlyURL($string)
{
$string = strtolower($string);
$string = str_replace(" ", "-", $string);
$string = str_replace("/", "-", $string);
$string = preg_replace('/\s+/', '-', $string);
$string = preg_replace("`\[.*\]`U", "", $string);
$string = preg_replace('`&(amp;)?#?[a-z0-9]+;`i', '-', $string);
$string = preg_replace("/[^\x9\xA\xD\x20-\x7F]/", "", $string);
$string = htmlentities($string, ENT_COMPAT, 'utf-8');
$string = preg_replace("`&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);`i", "\\1", $string);
$string = preg_replace(array("`[^a-z0-9]`i", "`[-]+`"), "-", $string);
return strtolower(trim($string, '-'));
}
and added some improvements to normalize unicode strings and remove all diacritical/accents marks.
Here you go!
package main
import (
"fmt"
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
"regexp"
"strings"
"unicode"
)
func isMn(r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
func SEOURL(s string) string {
seoStr := strings.ToLower(s)
//seoStr = strings.Replace(seoStr, "/", "-", -1)
//regE := regexp.MustCompile("/s+/")
//seoStrByte := regE.ReplaceAll([]byte(seoStr), []byte("-"))
//seoStr = string(seoStrByte) // convert []byte to string
// convert all spaces to dash
regE := regexp.MustCompile("[[:space:]]")
seoStrByte := regE.ReplaceAll([]byte(seoStr), []byte("-"))
seoStr = string(seoStrByte) // convert []byte to string
// remove all blanks such as tab
regE = regexp.MustCompile("[[:blank:]]")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte(""))
seoStr = string(seoStrByte) // convert []byte to string
// remove all punctuations with the exception of dash
//regE = regexp.MustCompile("[[:punct:]]")
regE = regexp.MustCompile("[!/:-@[-`{-~]")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte(""))
seoStr = string(seoStrByte) // convert []byte to string
// \x9\xA\xD will cause non-hex character in escape sequence error
// regE = regexp.MustCompile("/[^\x9\xA\xD\x20-\x7F]/")
//regE = regexp.MustCompile("[[:xdigit:]]") -- will remove some alphabet. Bug?
regE = regexp.MustCompile("/[^\x20-\x7F]/")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte(""))
seoStr = string(seoStrByte) // convert []byte to string
regE = regexp.MustCompile("`&(amp;)?#?[a-z0-9]+;`i")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte("-"))
seoStr = string(seoStrByte) // convert []byte to string
regE = regexp.MustCompile("`&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);`i")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte("\\1"))
seoStr = string(seoStrByte) // convert []byte to string
regE = regexp.MustCompile("`[^a-z0-9]`i")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte("-"))
seoStr = string(seoStrByte) // convert []byte to string
regE = regexp.MustCompile("`[-]+`")
seoStrByte = regE.ReplaceAll([]byte(seoStr), []byte("-"))
seoStr = string(seoStrByte) // convert []byte to string
// normalize unicode strings and remove all diacritical/accents marks
// see https://www.socketloop.com/tutorials/golang-normalize-unicode-strings-for-comparison-purpose
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
seoStr, _, _ = transform.String(t, seoStr)
return strings.TrimSpace(seoStr)
}
func main() {
NonSEOString := "@<ElNi\u00f1o coming? > #% sooner this year!"
fmt.Println("BEFORE : ", NonSEOString)
SEOedString := SEOURL(NonSEOString)
fmt.Println("AFTER : ", SEOedString)
}
Output:
BEFORE : @<ElNiƱo coming? > #% sooner this year!
AFTER : elnino-coming--#%-sooner-this-year
NOTES: This example is not perfect as regular expression is not exactly my forte. Also, the conversion from byte to string can be optimized instead of converting in and out. Will leave it as an exercise for you. ;-)
References:
https://golang.org/pkg/regexp/#Regexp.ReplaceAll
https://golang.org/pkg/regexp/syntax/#pkg-overview
https://www.socketloop.com/tutorials/golang-normalize-unicode-strings-for-comparison-purpose
https://www.socketloop.com/tutorials/trim-white-spaces-string-golang
See also : Golang : Normalize unicode strings for comparison purpose
By Adam Ng
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+9k Golang : read gzipped http response
+11k Mac/Linux and Golang : Fix bind: address already in use error
+2k Google : Block or disable caching of your website content
+1.4k Golang : Grab news article text and use NLP to get each paragraph's sentences
+7.7k Golang : Intercept Ctrl-C interrupt or kill signal and determine the signal type
+3.7k Linux/Unix : fatal: the Postfix mail system is already running
+3k Golang : Intercept and compare HTTP response code example
+9.7k Golang : Close channel after ticker stopped example
+3.5k Golang : Scan files for certain pattern and rename part of the files
+1.9k Javascript : Empty an array example
+2.1k Golang : Generate Interleaved 2 inch by 5 inch barcode
+21.6k Golang : Convert date or time stamp from string to time.Time type