Go Memory Model and Happens-Before Advanced¶

Introduction¶

The Go memory model specifies the conditions under which reads of a variable in one goroutine are guaranteed to observe writes to that same variable in another goroutine. Without understanding it, you will write concurrent code that works in testing but fails unpredictably in production — possibly only on certain CPU architectures or under high load.

The core problem: modern CPUs and compilers reorder memory operations for performance. Within a single goroutine this is invisible (the reordering preserves single-threaded semantics). Across goroutines, without explicit synchronization, you have no guarantee about the order in which one goroutine's writes become visible to another.

Syntax & Usage¶

The Happens-Before Relation¶

The memory model defines a partial order called happens-before. If event A happens-before event B, then the effects of A are guaranteed to be visible to B.

Within a single goroutine, happens-before follows program order — statement 1 happens-before statement 2 if it appears first in the code.

Across goroutines, happens-before is established only through synchronization events.

graph TD
    subgraph "Goroutine 1"
    A1[x = 1] --> A2[ch <- struct lbrace rbrace]
    end
    subgraph "Goroutine 2"
    B1[<-ch] --> B2[print x]
    end
    A2 -->|"happens-before<br/>(channel send → receive)"| B1
    style A2 fill:#4caf50,color:white
    style B1 fill:#4caf50,color:white

Synchronization Events (Happens-Before Guarantees)¶

1. Package Initialization¶

// init() in package A completes BEFORE any code in packages that import A.
// All init() functions complete BEFORE main() starts.
package db

var conn *Connection

func init() {
    conn = openConnection() // Guaranteed visible to all importers
}

2. Goroutine Creation¶

The go statement happens-before the new goroutine begins executing.

var x int

x = 42
go func() {
    fmt.Println(x) // Guaranteed to see x = 42
}()

3. Channel Operations¶

Unbuffered channel: send happens-before the corresponding receive completes.

var data string

func main() {
    ch := make(chan struct{})

    go func() {
        data = "hello"  // Write
        ch <- struct{}{}  // Send — happens-before receive
    }()

    <-ch                  // Receive — happens-after send
    fmt.Println(data)     // Guaranteed to print "hello"
}

Buffered channel: the receive of the kth element happens-before the send of the (k+C)th element completes, where C is the buffer capacity.

// With a buffer of size 1:
ch := make(chan struct{}, 1)

go func() {
    data = "hello"
    ch <- struct{}{} // Succeeds immediately (buffer has space)
}()

<-ch
fmt.Println(data) // Guaranteed to print "hello"

Channel close: closing a channel happens-before a receive that returns a zero value because the channel is closed.

var data string

func main() {
    ch := make(chan struct{})

    go func() {
        data = "hello"
        close(ch) // Close happens-before receive of zero value
    }()

    <-ch // Returns zero value — but data write is guaranteed visible
    fmt.Println(data) // "hello"
}

4. Mutex Lock/Unlock¶

For any sync.Mutex or sync.RWMutex, the nth call to Unlock() happens-before the (n+1)th call to Lock().

var (
    mu   sync.Mutex
    data string
)

func writer() {
    mu.Lock()
    data = "hello" // Protected write
    mu.Unlock()    // Unlock happens-before next Lock
}

func reader() {
    mu.Lock()       // This Lock happens-after the preceding Unlock
    fmt.Println(data) // Guaranteed to see "hello" if called after writer
    mu.Unlock()
}

5. sync.Once¶

The completion of f() in once.Do(f) happens-before any call to once.Do returns.

var (
    instance *Service
    once     sync.Once
)

func GetService() *Service {
    once.Do(func() {
        instance = &Service{} // Initialization
        instance.Setup()
    })
    return instance // All callers see the fully initialized Service
}

6. sync.WaitGroup¶

wg.Done() (which is wg.Add(-1)) happens-before wg.Wait() returns.

var results []int
var wg sync.WaitGroup

for i := 0; i < 10; i++ {
    wg.Add(1)
    go func(n int) {
        defer wg.Done()
        results = append(results, compute(n)) // ⚠️ Still a race on append!
    }(i)
}
wg.Wait()
// wg.Wait() return guarantees all Done() calls have completed,
// but the shared append above is still a race — need a mutex too.

What Goes Wrong Without Synchronization¶

// ❌ DATA RACE — no happens-before relationship
var x int
var ready bool

go func() {
    x = 42
    ready = true // Compiler/CPU may reorder: ready=true BEFORE x=42
}()

if ready {
    fmt.Println(x) // Might print 0! No guarantee x=42 is visible
}

Why this fails:

Compiler reordering: The compiler may move ready = true before x = 42 since they are independent in single-goroutine semantics.
CPU store buffer: Even without compiler reordering, CPU write buffers may make ready visible to another core before x.
No happens-before: Without a synchronization event, the reading goroutine has no guarantee about what values it sees.

The "Busy-Wait" Anti-Pattern¶

// ❌ BROKEN — no synchronization, compiler may optimize to infinite loop
var flag bool

go func() {
    doWork()
    flag = true
}()

for !flag {
    // The compiler is allowed to cache 'flag' in a register
    // and NEVER re-read it from memory, making this an infinite loop
}

Atomic Operations (sync/atomic)¶

Atomic operations provide sequentially consistent ordering — they establish happens-before relationships without locks.

import "sync/atomic"

var counter atomic.Int64

// Goroutine 1
counter.Store(42) // Atomic write

// Goroutine 2
val := counter.Load() // Atomic read — if this observes 42,
                       // it also observes all writes that happened
                       // before the Store(42) in goroutine 1

// Atomic types (Go 1.19+)
var flag atomic.Bool
var ptr  atomic.Pointer[Config]
var val  atomic.Value // Stores any type, but not type-safe

// Common operations
counter.Add(1)           // Atomic increment
old := counter.Swap(100) // Atomic swap, returns previous value
swapped := counter.CompareAndSwap(100, 200) // CAS

Atomic Pointer for Lock-Free Configuration¶

type Config struct {
    MaxConns int
    Timeout  time.Duration
    Debug    bool
}

var currentConfig atomic.Pointer[Config]

func init() {
    currentConfig.Store(&Config{
        MaxConns: 100,
        Timeout:  30 * time.Second,
    })
}

// Safe concurrent read — no lock needed
func getConfig() *Config {
    return currentConfig.Load()
}

// Update config atomically — readers never see a partial update
func updateConfig(new *Config) {
    currentConfig.Store(new)
}

Channels and Memory Visibility¶

Channels provide the strongest and most idiomatic synchronization in Go. Here's why:

type Result struct {
    Data    []byte
    Headers map[string]string
    Status  int
}

func fetchAsync(url string) <-chan Result {
    ch := make(chan Result)
    go func() {
        // All these writes happen-before the channel send
        result := Result{
            Data:    downloadBody(url),
            Headers: parseHeaders(url),
            Status:  200,
        }
        ch <- result // Send synchronizes ALL preceding writes
    }()
    return ch
}

func main() {
    ch := fetchAsync("https://example.com")
    result := <-ch // Receive synchronizes — ALL fields are guaranteed visible
    // result.Data, result.Headers, result.Status are all fully initialized
}

Common Memory Model Violations¶

Double-Checked Locking (Broken)¶

// ❌ BROKEN — classic double-checked locking doesn't work without atomics
var instance *Singleton

func GetInstance() *Singleton {
    if instance == nil {     // Unsynchronized read
        mu.Lock()
        defer mu.Unlock()
        if instance == nil {
            instance = &Singleton{} // Write may be partially visible
        }
    }
    return instance // May return a pointer to an incompletely initialized struct
}

// ✅ Use sync.Once (the only correct lazy initialization in Go)
var (
    instance *Singleton
    once     sync.Once
)

func GetInstance() *Singleton {
    once.Do(func() {
        instance = &Singleton{}
    })
    return instance
}

Signaling with a Regular Variable¶

// ❌ BROKEN — no synchronization on 'done'
var done bool
var result string

go func() {
    result = compute()
    done = true
}()

for !done {} // May never terminate or may see result = ""
use(result)

// ✅ FIX: use a channel
ch := make(chan string)
go func() {
    ch <- compute()
}()
result := <-ch
use(result)

Quick Reference¶

Synchronization Event	Happens-Before Guarantee
`go f()`	`go` statement → start of `f()`
`ch <- v` (unbuffered)	Send → corresponding receive completes
`<-ch` (unbuffered)	Receive → corresponding send completes
`close(ch)`	Close → receive returning zero value
`mu.Unlock()`	Unlock of call N → Lock of call N+1
`once.Do(f)`	Completion of `f()` → return of any `Do()` call
`wg.Done()`	`Done()` → `Wait()` returns
`atomic.Store()`	Store → Load that observes the stored value
Package `init()`	All `init()` complete → `main()` begins

Best Practices¶

Use channels for communication, mutexes for state — this is the Go proverb and the memory model's sweet spot.
Use sync.Once for lazy initialization — it's the only correct way. Never hand-roll double-checked locking.
Prefer high-level primitives — channels > atomics > mutexes > unsafe. Higher-level constructs are easier to reason about and harder to get wrong.
Run the race detector in CI — go test -race ./... detects most data races at runtime. It should be part of every CI pipeline.
Don't assume visibility without synchronization — if two goroutines access the same variable and at least one writes, you need a happens-before relationship.
Use atomic types (Go 1.19+) over the function-based API — atomic.Int64 is clearer and less error-prone than atomic.AddInt64(&x, 1).

Common Pitfalls¶

Data Races Are Undefined Behavior in Go

Unlike some languages where data races produce "some value," Go treats data races as undefined behavior. The race detector (-race) can find them, but a racy program may crash, corrupt data, or behave arbitrarily — even on x86 where the hardware memory model is relatively strong.

// ❌ This is undefined behavior — not just "you might read a stale value"
var x int
go func() { x = 1 }()
go func() { x = 2 }()
// The program is INCORRECT regardless of what value x holds

The uintptr Gap with Atomic Pointers

// ❌ NEVER do this — the GC may move the object between Load and use
addr := atomic.LoadUintptr(&ptrAsUintptr)
obj := (*MyStruct)(unsafe.Pointer(addr)) // Object may have moved!

// ✅ Use atomic.Pointer[T] (Go 1.19+)
var p atomic.Pointer[MyStruct]
obj := p.Load() // Safe — GC-aware

Happens-Before Is Not Happens-After

"A happens-before B" means B is guaranteed to see A's effects. It does not mean A completes before B starts in wall-clock time — only that the memory effects are ordered.

WaitGroup Does Not Synchronize Shared Data

wg.Wait() guarantees all wg.Done() calls have happened, but does NOT protect shared data written by goroutines. You still need a mutex or channel for the data itself.

// ❌ Race on shared slice — WaitGroup alone is not enough
var results []int
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
    wg.Add(1)
    go func(n int) {
        defer wg.Done()
        results = append(results, n) // RACE — concurrent append
    }(i)
}
wg.Wait() // All Done() calls happened, but append was racy

// ✅ Use a mutex or channel to collect results
var mu sync.Mutex
for i := 0; i < 10; i++ {
    wg.Add(1)
    go func(n int) {
        defer wg.Done()
        mu.Lock()
        results = append(results, n)
        mu.Unlock()
    }(i)
}
wg.Wait()

Architecture-Dependent Bugs

Code with data races may appear to work on x86 (strong memory model) but fail on ARM (weaker memory model). The race detector catches this regardless of architecture — always use it.

Performance Considerations¶

Atomics vs Mutexes: Atomic operations are ~5-10x faster than mutex lock/unlock for simple counters and flags. Use atomic.Int64.Add(1) instead of mu.Lock(); counter++; mu.Unlock() for hot-path counters.
False sharing: Atomic variables on the same cache line (64 bytes on most architectures) cause cache-line bouncing between CPUs. Pad hot atomics if profiling shows contention.
```
type PaddedCounter struct {
    value atomic.Int64
    _pad  [56]byte // Pad to fill a 64-byte cache line
}
```
Read-heavy workloads: Use sync.RWMutex (multiple concurrent readers, exclusive writers) or atomic.Pointer for read-heavy, write-rare data like configuration.

Contention profiling: Use go tool pprof mutex profile to identify lock contention bottlenecks.

# Enable mutex profiling
runtime.SetMutexProfileFraction(1)

# Then in pprof:
go tool pprof http://localhost:6060/debug/pprof/mutex

sync.Map: Optimized for two patterns: (1) write-once, read-many (like caches) and (2) disjoint key sets per goroutine. For everything else, a regular map + sync.RWMutex is faster.

Interview Tips¶

Interview Tip

When asked "What is the Go memory model?", start with: "It defines when writes in one goroutine are guaranteed to be visible to reads in another goroutine. Without explicit synchronization, there are no guarantees because both the compiler and CPU can reorder operations." Then list the synchronization events.

Interview Tip

A classic interview question is "Is this code correct?" followed by some variation of a flag-based synchronization pattern. The answer is almost always no — explain that you need a channel, mutex, or atomic to establish a happens-before relationship. Mention that the race detector (go test -race) would catch this.

Interview Tip

If asked about sync.Once, explain that it solves the lazy initialization problem correctly. Under the hood, it uses atomic.Load for the fast path (already initialized) and falls back to a mutex for the slow path (first call). This makes it essentially zero-cost after initialization.

Interview Tip

Knowing about false sharing and cache-line padding shows deep systems knowledge. Mention it when discussing high-performance atomic counters, but note that it's a micro-optimization that only matters at extreme scale — always benchmark first.

Key Takeaways¶

The Go memory model defines happens-before — the only guarantee for cross-goroutine memory visibility.
Without synchronization, one goroutine's writes may never become visible to another goroutine, or may become visible in a different order.
Channels are the primary synchronization mechanism: send happens-before receive.
Data races are undefined behavior — not merely "stale reads." Use go test -race.
sync.Once is the only correct lazy initialization. sync.Mutex provides mutual exclusion. sync/atomic provides lock-free ordering.
The memory model is architecture-independent — code must be correct under the Go model, not under x86's stronger guarantees.