Strings, Runes & UTF-8 Beginner¶

Introduction¶

Go strings are read-only slices of bytes, not arrays of characters. Every Go source file is UTF-8, and the language has first-class support for Unicode through the rune type. Understanding the byte-vs-rune distinction is essential -- it's a top interview question and a source of subtle production bugs when handling multi-byte characters.

Strings Are Immutable Byte Slices¶

A string in Go is a read-only, immutable sequence of bytes. It has no built-in notion of "characters."

s := "Hello, 世界"
fmt.Println(len(s))    // 13 (bytes, NOT characters)
fmt.Println(s[0])      // 72 (byte value of 'H')
// s[0] = 'h'          // compile error: strings are immutable

Critical Interview Point

len(s) returns the number of bytes, not the number of characters (runes). "世界" is 6 bytes (3 per character) but only 2 runes.

Runes: Unicode Code Points¶

A rune is an alias for int32 and represents a single Unicode code point.

var r rune = '世'
fmt.Printf("Type: %T, Value: %d, Char: %c\n", r, r, r)
// Type: int32, Value: 19990, Char: 世

Byte vs Rune Iteration¶

s := "café"

// Byte iteration -- iterates raw bytes
for i := 0; i < len(s); i++ {
    fmt.Printf("byte[%d] = %x\n", i, s[i])
}
// byte[0]=63 byte[1]=61 byte[2]=66 byte[3]=c3 byte[4]=a9
// 'é' splits into 2 bytes (c3, a9)

// Rune iteration -- range decodes UTF-8 automatically
for i, r := range s {
    fmt.Printf("rune[%d] = %c (U+%04X)\n", i, r, r)
}
// rune[0] = c (U+0063)
// rune[1] = a (U+0061)
// rune[2] = f (U+0066)
// rune[3] = é (U+00E9)  -- index 3, NOT 4

Interview Tip

range over a string iterates by rune, decoding UTF-8 on each step. The index jumps by the byte width of each rune, not by 1. A standard for i := 0; i < len(s); i++ iterates by byte.

String Conversions¶

s := "Hello, 世界"

// String ↔ byte slice
b := []byte(s)           // copies bytes out
s2 := string(b)          // copies bytes back

// String ↔ rune slice
r := []rune(s)           // decodes UTF-8 into code points
fmt.Println(len(r))      // 9 (rune count, the "character" count)
s3 := string(r)          // re-encodes to UTF-8

// Single rune/byte to string
fmt.Println(string(65))  // "A" -- int to string gives the rune
fmt.Println(string(r[7]))// "界"

Conversion Cost

[]byte(s) and []rune(s) both allocate and copy. In hot paths, avoid repeated conversions -- work with bytes directly or convert once.

Raw Strings (Backtick Literals)¶

Raw strings preserve everything literally -- no escape sequences, can span multiple lines.

path := `C:\Users\docs\file.txt`   // no need to escape backslashes
query := `SELECT *
FROM users
WHERE active = true`               // multi-line without \n
regex := `\d{3}-\d{4}`             // regex without double-escaping

String Concatenation¶

Method	Use Case	Allocations
`+` operator	Small, known concatenations (2-3 strings)	New string each time
`fmt.Sprintf`	Formatted output with mixed types	Moderate overhead
`strings.Builder`	Building strings in loops	Amortized O(1) appends
`strings.Join`	Joining a slice of strings	Single allocation

// BAD: O(n²) in a loop -- each + allocates a new string
var s string
for _, word := range words {
    s += word + " "  // quadratic allocation
}

// GOOD: strings.Builder -- the standard approach for loops
var b strings.Builder
b.Grow(estimatedSize) // optional pre-allocation
for _, word := range words {
    b.WriteString(word)
    b.WriteByte(' ')
}
result := b.String()

// GOOD: strings.Join for slices
result := strings.Join(words, " ")

Key `strings` Package Functions¶

import "strings"

strings.Contains("seafood", "foo")       // true
strings.HasPrefix("Hello", "He")         // true
strings.HasSuffix("Hello", "lo")         // true
strings.Index("chicken", "ken")          // 4
strings.Count("cheese", "e")             // 3

strings.ToUpper("hello")                 // "HELLO"
strings.ToLower("HELLO")                 // "hello"
strings.TrimSpace("  hi  ")             // "hi"
strings.Trim("***hi***", "*")           // "hi"

strings.Split("a,b,c", ",")             // ["a", "b", "c"]
strings.Join([]string{"a","b","c"}, ",") // "a,b,c"

strings.Replace("oink oink", "oink", "moo", 1)  // "moo oink"
strings.ReplaceAll("oink oink", "oink", "moo")   // "moo moo"

strings.NewReader("hello")               // io.Reader from string

Key `unicode/utf8` Package Functions¶

import "unicode/utf8"

s := "Hello, 世界"
utf8.RuneCountInString(s)          // 9 -- true character count
utf8.ValidString(s)                // true
utf8.RuneLen('世')                  // 3 -- bytes needed for this rune

b := []byte("世")
r, size := utf8.DecodeRune(b)     // r='世', size=3
utf8.Valid(b)                      // true

Quick Reference¶

Concept	Type	Notes
`string`	Immutable `[]byte`	UTF-8 encoded, zero value is `""`
`byte`	`uint8`	Single byte, used for ASCII or raw data
`rune`	`int32`	Unicode code point
`len(s)`	Byte count	Not character count
`utf8.RuneCountInString(s)`	Rune count	True character count
`s[i]`	`byte`	Index access returns a byte
`range s`	`(index, rune)`	Decodes UTF-8, index = byte offset
`raw`	Raw string literal	No escapes processed
`[]byte(s)`	Conversion	Allocates + copies
`[]rune(s)`	Conversion	Allocates + copies, decodes UTF-8

Best Practices¶

Use strings.Builder for any string construction in loops
Use range when you need to process runes (characters), never manual byte indexing for Unicode text
Use utf8.RuneCountInString when you need the true character count
Use raw string literals for regex patterns, file paths, and SQL queries
Prefer strings.EqualFold over ToLower/ToUpper for case-insensitive comparison -- it avoids allocation
Pre-allocate Builder with Grow() when you know the approximate final size

Common Pitfalls¶

Slicing multi-byte strings

Slicing a string by byte index can cut a multi-byte rune in half, producing invalid UTF-8:

s := "café"
fmt.Println(s[:4])  // "caf\xc3" -- broken! 'é' is 2 bytes
fmt.Println(s[:5])  // "café" -- correct byte boundary

Convert to []rune first if you need character-based slicing.

Comparing strings with special characters

Some Unicode characters have multiple representations. Use golang.org/x/text/unicode/norm for normalization when comparing user input.

Modifying strings through byte slice

Converting to []byte, modifying, and converting back works but creates copies at each step. The original string is never modified.

Performance Considerations¶

Operation	Cost	Notes
`len(s)`	O(1)	Stored in string header
`s[i]`	O(1)	Direct byte access
`s + t`	O(n+m)	Allocates new string
`[]byte(s)`	O(n)	Copy + allocation
`[]rune(s)`	O(n)	Decode + allocation
`strings.Builder`	Amortized O(1) per write	Doubles buffer on growth
`utf8.RuneCountInString(s)`	O(n)	Must scan entire string
`range s` (rune)	O(n) total	Decodes on the fly, no allocation

Compiler Optimizations

Go's compiler optimizes certain patterns: []byte conversions in map lookups and comparisons often avoid allocation. string(b) in map[string(b)] does not copy.

Interview Tips¶

Interview Tip

When asked "What is the output of len("日本語")?", the answer is 9 (3 runes × 3 bytes each), not 3. Always clarify whether a question is asking about bytes or runes.

Interview Tip

Know why string is immutable: it allows safe sharing across goroutines without locks, enables string interning optimizations, and makes strings usable as map keys.

Interview Tip

If asked to reverse a Unicode string, convert to []rune first:

func reverseString(s string) string {
    runes := []rune(s)
    for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
        runes[i], runes[j] = runes[j], runes[i]
    }
    return string(runes)
}

Reversing bytes directly would corrupt multi-byte characters.

Key Takeaways¶

Strings are immutable byte slices, not character arrays
rune is int32 -- a Unicode code point; byte is uint8
len() counts bytes; use utf8.RuneCountInString() for runes
range over a string yields (byte index, rune) pairs
Use strings.Builder for efficient concatenation in loops
Raw strings (backticks) are ideal for regex, paths, and multi-line text
String conversions ([]byte, []rune) always allocate and copy

Strings, Runes & UTF-8 Beginner¶

Introduction¶

Strings Are Immutable Byte Slices¶

Runes: Unicode Code Points¶

Byte vs Rune Iteration¶

String Conversions¶

Raw Strings (Backtick Literals)¶

String Concatenation¶

Key strings Package Functions¶

Key unicode/utf8 Package Functions¶

Quick Reference¶

Best Practices¶

Common Pitfalls¶

Performance Considerations¶

Interview Tips¶

Key Takeaways¶

Key `strings` Package Functions¶

Key `unicode/utf8` Package Functions¶