Strings, Runes & UTF-8 Beginner¶
Introduction¶
Go strings are read-only slices of bytes, not arrays of characters. Every Go source file is UTF-8, and the language has first-class support for Unicode through the rune type. Understanding the byte-vs-rune distinction is essential -- it's a top interview question and a source of subtle production bugs when handling multi-byte characters.
Strings Are Immutable Byte Slices¶
A string in Go is a read-only, immutable sequence of bytes. It has no built-in notion of "characters."
s := "Hello, 世界"
fmt.Println(len(s)) // 13 (bytes, NOT characters)
fmt.Println(s[0]) // 72 (byte value of 'H')
// s[0] = 'h' // compile error: strings are immutable
Critical Interview Point
len(s) returns the number of bytes, not the number of characters (runes). "世界" is 6 bytes (3 per character) but only 2 runes.
Runes: Unicode Code Points¶
A rune is an alias for int32 and represents a single Unicode code point.
var r rune = '世'
fmt.Printf("Type: %T, Value: %d, Char: %c\n", r, r, r)
// Type: int32, Value: 19990, Char: 世
Byte vs Rune Iteration¶
s := "café"
// Byte iteration -- iterates raw bytes
for i := 0; i < len(s); i++ {
fmt.Printf("byte[%d] = %x\n", i, s[i])
}
// byte[0]=63 byte[1]=61 byte[2]=66 byte[3]=c3 byte[4]=a9
// 'é' splits into 2 bytes (c3, a9)
// Rune iteration -- range decodes UTF-8 automatically
for i, r := range s {
fmt.Printf("rune[%d] = %c (U+%04X)\n", i, r, r)
}
// rune[0] = c (U+0063)
// rune[1] = a (U+0061)
// rune[2] = f (U+0066)
// rune[3] = é (U+00E9) -- index 3, NOT 4
Interview Tip
range over a string iterates by rune, decoding UTF-8 on each step. The index jumps by the byte width of each rune, not by 1. A standard for i := 0; i < len(s); i++ iterates by byte.
String Conversions¶
s := "Hello, 世界"
// String ↔ byte slice
b := []byte(s) // copies bytes out
s2 := string(b) // copies bytes back
// String ↔ rune slice
r := []rune(s) // decodes UTF-8 into code points
fmt.Println(len(r)) // 9 (rune count, the "character" count)
s3 := string(r) // re-encodes to UTF-8
// Single rune/byte to string
fmt.Println(string(65)) // "A" -- int to string gives the rune
fmt.Println(string(r[7]))// "界"
Conversion Cost
[]byte(s) and []rune(s) both allocate and copy. In hot paths, avoid repeated conversions -- work with bytes directly or convert once.
Raw Strings (Backtick Literals)¶
Raw strings preserve everything literally -- no escape sequences, can span multiple lines.
path := `C:\Users\docs\file.txt` // no need to escape backslashes
query := `SELECT *
FROM users
WHERE active = true` // multi-line without \n
regex := `\d{3}-\d{4}` // regex without double-escaping
String Concatenation¶
| Method | Use Case | Allocations |
|---|---|---|
+ operator |
Small, known concatenations (2-3 strings) | New string each time |
fmt.Sprintf |
Formatted output with mixed types | Moderate overhead |
strings.Builder |
Building strings in loops | Amortized O(1) appends |
strings.Join |
Joining a slice of strings | Single allocation |
// BAD: O(n²) in a loop -- each + allocates a new string
var s string
for _, word := range words {
s += word + " " // quadratic allocation
}
// GOOD: strings.Builder -- the standard approach for loops
var b strings.Builder
b.Grow(estimatedSize) // optional pre-allocation
for _, word := range words {
b.WriteString(word)
b.WriteByte(' ')
}
result := b.String()
// GOOD: strings.Join for slices
result := strings.Join(words, " ")
Key strings Package Functions¶
import "strings"
strings.Contains("seafood", "foo") // true
strings.HasPrefix("Hello", "He") // true
strings.HasSuffix("Hello", "lo") // true
strings.Index("chicken", "ken") // 4
strings.Count("cheese", "e") // 3
strings.ToUpper("hello") // "HELLO"
strings.ToLower("HELLO") // "hello"
strings.TrimSpace(" hi ") // "hi"
strings.Trim("***hi***", "*") // "hi"
strings.Split("a,b,c", ",") // ["a", "b", "c"]
strings.Join([]string{"a","b","c"}, ",") // "a,b,c"
strings.Replace("oink oink", "oink", "moo", 1) // "moo oink"
strings.ReplaceAll("oink oink", "oink", "moo") // "moo moo"
strings.NewReader("hello") // io.Reader from string
Key unicode/utf8 Package Functions¶
import "unicode/utf8"
s := "Hello, 世界"
utf8.RuneCountInString(s) // 9 -- true character count
utf8.ValidString(s) // true
utf8.RuneLen('世') // 3 -- bytes needed for this rune
b := []byte("世")
r, size := utf8.DecodeRune(b) // r='世', size=3
utf8.Valid(b) // true
Quick Reference¶
| Concept | Type | Notes |
|---|---|---|
string |
Immutable []byte |
UTF-8 encoded, zero value is "" |
byte |
uint8 |
Single byte, used for ASCII or raw data |
rune |
int32 |
Unicode code point |
len(s) |
Byte count | Not character count |
utf8.RuneCountInString(s) |
Rune count | True character count |
s[i] |
byte |
Index access returns a byte |
range s |
(index, rune) |
Decodes UTF-8, index = byte offset |
`raw` |
Raw string literal | No escapes processed |
[]byte(s) |
Conversion | Allocates + copies |
[]rune(s) |
Conversion | Allocates + copies, decodes UTF-8 |
Best Practices¶
- Use
strings.Builderfor any string construction in loops - Use
rangewhen you need to process runes (characters), never manual byte indexing for Unicode text - Use
utf8.RuneCountInStringwhen you need the true character count - Use raw string literals for regex patterns, file paths, and SQL queries
- Prefer
strings.EqualFoldoverToLower/ToUpperfor case-insensitive comparison -- it avoids allocation - Pre-allocate
BuilderwithGrow()when you know the approximate final size
Common Pitfalls¶
Slicing multi-byte strings
Slicing a string by byte index can cut a multi-byte rune in half, producing invalid UTF-8:
s := "café"
fmt.Println(s[:4]) // "caf\xc3" -- broken! 'é' is 2 bytes
fmt.Println(s[:5]) // "café" -- correct byte boundary
[]rune first if you need character-based slicing.
Comparing strings with special characters
Some Unicode characters have multiple representations. Use golang.org/x/text/unicode/norm for normalization when comparing user input.
Modifying strings through byte slice
Converting to []byte, modifying, and converting back works but creates copies at each step. The original string is never modified.
Performance Considerations¶
| Operation | Cost | Notes |
|---|---|---|
len(s) |
O(1) | Stored in string header |
s[i] |
O(1) | Direct byte access |
s + t |
O(n+m) | Allocates new string |
[]byte(s) |
O(n) | Copy + allocation |
[]rune(s) |
O(n) | Decode + allocation |
strings.Builder |
Amortized O(1) per write | Doubles buffer on growth |
utf8.RuneCountInString(s) |
O(n) | Must scan entire string |
range s (rune) |
O(n) total | Decodes on the fly, no allocation |
Compiler Optimizations
Go's compiler optimizes certain patterns: []byte conversions in map lookups and comparisons often avoid allocation. string(b) in map[string(b)] does not copy.
Interview Tips¶
Interview Tip
When asked "What is the output of len("日本語")?", the answer is 9 (3 runes × 3 bytes each), not 3. Always clarify whether a question is asking about bytes or runes.
Interview Tip
Know why string is immutable: it allows safe sharing across goroutines without locks, enables string interning optimizations, and makes strings usable as map keys.
Interview Tip
If asked to reverse a Unicode string, convert to []rune first:
Key Takeaways¶
- Strings are immutable byte slices, not character arrays
runeisint32-- a Unicode code point;byteisuint8len()counts bytes; useutf8.RuneCountInString()for runesrangeover a string yields (byte index, rune) pairs- Use
strings.Builderfor efficient concatenation in loops - Raw strings (backticks) are ideal for regex, paths, and multi-line text
- String conversions (
[]byte,[]rune) always allocate and copy