Race Condition Detection and Prevention Advanced¶

Introduction¶

A data race occurs when two or more goroutines access the same memory location concurrently, and at least one of them writes. Data races are undefined behavior in Go — the program can crash, produce corrupted data, or appear to work correctly while silently corrupting state.

A race condition is a broader concept: a bug where program correctness depends on the uncontrolled timing of events. All data races are bugs; not all race conditions involve data races (e.g., a TOCTOU check-then-act bug using proper locks still has a logical race).

Go ships with a built-in race detector (-race flag) that instruments memory accesses at compile time and detects data races at runtime. It's one of Go's most powerful tools — use it in development and CI without exception.

Syntax & Usage¶

The Race Detector¶

# Run tests with race detection
go test -race ./...

# Build with race detection
go build -race -o myapp .

# Run with race detection
go run -race main.go

The race detector uses ThreadSanitizer (TSan) under the hood. It instruments every memory read and write, tracking which goroutine accessed what and when. When it detects two unsynchronized accesses to the same address (with at least one write), it prints a detailed report and exits.

Race Detector Output¶

WARNING: DATA RACE
Write at 0x00c0000b4010 by goroutine 7:
  main.increment()
      /app/main.go:15 +0x4a

Previous read at 0x00c0000b4010 by goroutine 6:
  main.getCount()
      /app/main.go:20 +0x3e

Goroutine 7 (running) created at:
  main.main()
      /app/main.go:28 +0x96

Goroutine 6 (running) created at:
  main.main()
      /app/main.go:27 +0x7a

The report shows: the racing accesses (read/write), the exact source lines, and where the goroutines were created.

Common Data Race Patterns and Fixes¶

Pattern 1: Shared Counter¶

// RACE: unsynchronized counter
var count int

func increment() {
    for range 1000 {
        count++ // read-modify-write — not atomic
    }
}

func main() {
    go increment()
    go increment()
    time.Sleep(time.Second)
    fmt.Println(count) // may be less than 2000
}

Fix with sync.Mutex:

var (
    mu    sync.Mutex
    count int
)

func increment() {
    for range 1000 {
        mu.Lock()
        count++
        mu.Unlock()
    }
}

Fix with sync/atomic:

var count atomic.Int64

func increment() {
    for range 1000 {
        count.Add(1)
    }
}

func main() {
    go increment()
    go increment()
    time.Sleep(time.Second)
    fmt.Println(count.Load()) // always 2000
}

Pattern 2: Concurrent Map Access¶

// RACE: maps are not safe for concurrent use
m := make(map[string]int)

go func() { m["a"] = 1 }()
go func() { m["b"] = 2 }()
// fatal error: concurrent map writes

Fix with sync.RWMutex:

type SafeMap struct {
    mu sync.RWMutex
    m  map[string]int
}

func (s *SafeMap) Set(key string, val int) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.m[key] = val
}

func (s *SafeMap) Get(key string) (int, bool) {
    s.mu.RLock()
    defer s.mu.RUnlock()
    v, ok := s.m[key]
    return v, ok
}

Fix with sync.Map (for write-once/read-many patterns):

var m sync.Map

go func() { m.Store("a", 1) }()
go func() { m.Store("b", 2) }()

Pattern 3: Slice Append¶

// RACE: append may reallocate the underlying array
var results []int

var wg sync.WaitGroup
for i := range 10 {
    wg.Add(1)
    go func(i int) {
        defer wg.Done()
        results = append(results, i) // RACE: concurrent slice append
    }(i)
}
wg.Wait()

Fix with index assignment (no append):

results := make([]int, 10)

var wg sync.WaitGroup
for i := range 10 {
    wg.Add(1)
    go func(i int) {
        defer wg.Done()
        results[i] = i * 2 // safe: each goroutine writes to a unique index
    }(i)
}
wg.Wait()

Fix with mutex (when append is needed):

var (
    mu      sync.Mutex
    results []int
)

var wg sync.WaitGroup
for i := range 10 {
    wg.Add(1)
    go func(i int) {
        defer wg.Done()
        mu.Lock()
        results = append(results, i*2)
        mu.Unlock()
    }(i)
}
wg.Wait()

Fix with channel:

ch := make(chan int, 10)
for i := range 10 {
    go func(i int) {
        ch <- i * 2
    }(i)
}

results := make([]int, 0, 10)
for range 10 {
    results = append(results, <-ch)
}

Pattern 4: Struct Field Races¶

// RACE: concurrent access to struct fields
type Config struct {
    Debug   bool
    Timeout time.Duration
}

var cfg Config

go func() { cfg.Debug = true }()
go func() { fmt.Println(cfg.Debug) }() // RACE

Fix with atomic.Value:

var cfg atomic.Value

cfg.Store(Config{Debug: false, Timeout: 5 * time.Second})

go func() {
    cfg.Store(Config{Debug: true, Timeout: 5 * time.Second})
}()

go func() {
    c := cfg.Load().(Config)
    fmt.Println(c.Debug)
}()

Pattern 5: Loop Variable Capture (Pre-Go 1.22)¶

// RACE in Go < 1.22: all goroutines share the same loop variable
for _, url := range urls {
    go func() {
        fetch(url) // captures variable, not value — all see the last url
    }()
}

Fix (pre-Go 1.22):

for _, url := range urls {
    url := url // shadow with a per-iteration copy
    go func() {
        fetch(url)
    }()
}

Go 1.22+ changed loop variable semantics — each iteration creates a new variable. But the old pattern is worth knowing for interview context and legacy codebases.

The sync/atomic Package¶

Atomic operations provide lock-free synchronization for simple values:

// Go 1.19+ typed atomics (preferred)
var counter atomic.Int64
var flag    atomic.Bool
var config  atomic.Value
var ptr     atomic.Pointer[Config]

counter.Add(1)
counter.Store(0)
val := counter.Load()

flag.Store(true)
if flag.Load() { /* ... */ }

config.Store(Config{Debug: true})
cfg := config.Load().(Config)

Atomic operations reference:¶

Type	Operations
`atomic.Int32` / `Int64`	`Load`, `Store`, `Add`, `Swap`, `CompareAndSwap`
`atomic.Uint32` / `Uint64`	`Load`, `Store`, `Add`, `Swap`, `CompareAndSwap`
`atomic.Bool`	`Load`, `Store`, `Swap`, `CompareAndSwap`
`atomic.Value`	`Load`, `Store`, `Swap`, `CompareAndSwap` (any type, but must be consistent)
`atomic.Pointer[T]`	`Load`, `Store`, `Swap`, `CompareAndSwap`

CompareAndSwap (CAS)¶

The foundation of lock-free algorithms:

var state atomic.Int32

func tryTransition(from, to int32) bool {
    return state.CompareAndSwap(from, to)
    // Atomically: if state == from, set state = to and return true
    //             else return false
}

Prevention Strategies Summary¶

Strategy	When to Use	Overhead
Mutex (`sync.Mutex`)	Protecting shared state, critical sections	Low (lock/unlock)
RWMutex (`sync.RWMutex`)	Read-heavy shared state (10:1+ read:write ratio)	Low–medium
Channels	Transferring ownership of data between goroutines	Medium
Atomics (`sync/atomic`)	Counters, flags, single-value updates	Lowest
Confinement	Each goroutine owns its data exclusively	Zero
Immutability	Data never changes after creation	Zero
Copy-on-write	Config reload, shared state with rare updates	Medium

Confinement — The Best Strategy¶

Confine data to a single goroutine so no synchronization is needed:

func processItems(items []Item) []Result {
    results := make([]Result, len(items))
    var wg sync.WaitGroup

    for i, item := range items {
        wg.Add(1)
        go func(i int, item Item) {
            defer wg.Done()
            // item is a copy — confined to this goroutine
            // results[i] is a unique index — no overlap
            results[i] = process(item)
        }(i, item)
    }

    wg.Wait()
    return results
}

Quick Reference¶

Concept	Command / Syntax
Enable race detector	`go test -race ./...` or `go build -race`
Atomic counter	`var c atomic.Int64; c.Add(1)`
Atomic flag	`var f atomic.Bool; f.Store(true)`
Atomic config swap	`var v atomic.Value; v.Store(cfg)`
CAS operation	`c.CompareAndSwap(old, new)`
Safe map (mutex)	`sync.RWMutex` + `map[K]V`
Safe map (built-in)	`sync.Map`
Race report format	Goroutine ID, access type, source location, creation stack

Best Practices¶

Run -race in CI on every commit — data races are undefined behavior. A single race can corrupt memory silently. Make race detection a blocking CI step.
Default to confinement — design goroutines to own their data. Pass copies, use channels to transfer ownership, and assign unique slice indices.
Use typed atomics (Go 1.19+) — atomic.Int64, atomic.Bool, atomic.Pointer[T] are safer and more readable than the old atomic.AddInt64(&val, 1) style.
Prefer sync.Mutex over sync.RWMutex unless profiling shows contention — RWMutex has higher per-operation overhead. It only wins with high read:write ratios.
Never share mutable state between goroutines without synchronization — even "safe-looking" operations like count++ are data races (read-modify-write is three operations).
Use atomic.Value for read-heavy config — store an immutable config struct; readers pay near-zero cost. Writers replace the whole struct atomically.

Common Pitfalls¶

Assuming count++ is atomic

var count int
go func() { count++ }()
go func() { count++ }()
// RACE: count++ is read + increment + write — three operations

Use atomic.Int64.Add(1) or protect with a mutex. No compound operation on a plain variable is atomic in Go.

Concurrent map access without synchronization

m := make(map[string]int)
go func() { m["key"] = 1 }()
go func() { _ = m["key"] }()
// fatal error: concurrent map read and map write

Go's runtime detects concurrent map access and crashes deliberately — this is not a data race the -race flag catches, it's a runtime fatal error. Protect maps with sync.RWMutex or use sync.Map.

Race detector is not exhaustive

The race detector only finds races that actually execute during the test run. If a racy code path isn't triggered, it won't be detected. Maximize test coverage and run with realistic concurrent workloads to increase detection probability.

Race detector performance overhead

# Race detection adds ~5-10x CPU overhead and 5-10x memory overhead
go test -race ./...  # significantly slower than regular tests

Use -race in CI and during development, but not in production. The overhead is too high for production services. Some teams run a canary instance with -race against production traffic.

Benign races don't exist

// "It's fine, we only read the length for logging"
go func() { data = append(data, item) }()
go func() { log.Println(len(data)) }()  // STILL A RACE

The Go memory model makes no guarantees about unsynchronized access. Even "read-only" access to a concurrently modified variable is undefined behavior. There are no benign races in Go.

Performance Considerations¶

Atomics vs Mutexes: Atomic operations are ~2–5x faster than mutex lock/unlock for simple counters and flags. Use atomics when the operation is a single read or write.
Mutex contention: High contention on a single mutex serializes goroutines and kills parallelism. If profiling shows mutex contention (via the mutex pprof profile), consider sharding the data or using lock-free structures.
RWMutex overhead: RWMutex has higher per-operation cost than Mutex due to internal atomic operations. It only wins when reads vastly outnumber writes (typically 10:1+) and the critical section is non-trivial.
Race detector cost: ~5–10x CPU, ~5–10x memory. Never deploy with -race in production. Run it in CI and on staging/canary environments.
False sharing: When atomic variables sit on the same cache line, updating one invalidates the cache for the other (ping-pong effect). For high-throughput counters, pad atomics to cache-line boundaries:
```
type PaddedCounter struct {
    val atomic.Int64
    _   [56]byte // pad to 64-byte cache line
}
```

Interview Tips¶

Interview Tip

"What's the difference between a data race and a race condition?" A data race is concurrent unsynchronized access to the same memory where at least one is a write — it's undefined behavior and Go's -race detector catches it. A race condition is a broader logic bug where correctness depends on timing — like check-then-act without holding a lock. You can have race conditions even with proper synchronization (no data race), and the race detector won't catch those.

Interview Tip

"How does Go's race detector work?" It uses ThreadSanitizer (TSan), compiling the program with instrumentation on every memory access. At runtime, it tracks which goroutine accessed which memory address and checks for unsynchronized concurrent accesses. It's dynamic — it only detects races that actually execute. It adds ~5–10x CPU and memory overhead, so it's used in CI and development, not production.

Interview Tip

"How would you fix a data race on a shared counter?" Three options, depending on context: (1) sync/atomic — simplest and fastest for counters (atomic.Int64.Add(1)). (2) sync.Mutex — when the counter is part of a larger critical section. (3) Channel — when the counter is owned by a single goroutine that receives increment messages. Always pick the simplest approach that fits.

Interview Tip

"What's your strategy for preventing races in a large codebase?" (1) Design for confinement — each goroutine owns its data. (2) Use channels to transfer ownership between goroutines. (3) When shared state is unavoidable, use mutexes or atomics. (4) Run -race in CI on every commit as a blocking step. (5) Use go vet which catches some concurrency bugs statically. (6) Write concurrent tests with -count=100 to increase timing variation and detection probability.

Key Takeaways¶

Data races are undefined behavior — not just bugs, but UB that can corrupt memory, crash, or produce silently wrong results.
Always run go test -race ./... in CI — it catches data races dynamically with ~5–10x overhead.
The race detector only finds races that execute — maximize test coverage and use realistic concurrent workloads.
Prefer confinement: goroutines that own their data don't need synchronization.
Use sync/atomic for counters and flags, sync.Mutex for critical sections, channels for ownership transfer.
There are no benign races in Go — the memory model provides zero guarantees for unsynchronized access.
Go maps are not safe for concurrent use — the runtime will crash on concurrent read/write rather than silently corrupt.