Race Condition Detection and Prevention Advanced¶
Introduction¶
A data race occurs when two or more goroutines access the same memory location concurrently, and at least one of them writes. Data races are undefined behavior in Go — the program can crash, produce corrupted data, or appear to work correctly while silently corrupting state.
A race condition is a broader concept: a bug where program correctness depends on the uncontrolled timing of events. All data races are bugs; not all race conditions involve data races (e.g., a TOCTOU check-then-act bug using proper locks still has a logical race).
Go ships with a built-in race detector (-race flag) that instruments memory accesses at compile time and detects data races at runtime. It's one of Go's most powerful tools — use it in development and CI without exception.
Syntax & Usage¶
The Race Detector¶
# Run tests with race detection
go test -race ./...
# Build with race detection
go build -race -o myapp .
# Run with race detection
go run -race main.go
The race detector uses ThreadSanitizer (TSan) under the hood. It instruments every memory read and write, tracking which goroutine accessed what and when. When it detects two unsynchronized accesses to the same address (with at least one write), it prints a detailed report and exits.
Race Detector Output¶
WARNING: DATA RACE
Write at 0x00c0000b4010 by goroutine 7:
main.increment()
/app/main.go:15 +0x4a
Previous read at 0x00c0000b4010 by goroutine 6:
main.getCount()
/app/main.go:20 +0x3e
Goroutine 7 (running) created at:
main.main()
/app/main.go:28 +0x96
Goroutine 6 (running) created at:
main.main()
/app/main.go:27 +0x7a
The report shows: the racing accesses (read/write), the exact source lines, and where the goroutines were created.
Common Data Race Patterns and Fixes¶
Pattern 1: Shared Counter¶
// RACE: unsynchronized counter
var count int
func increment() {
for range 1000 {
count++ // read-modify-write — not atomic
}
}
func main() {
go increment()
go increment()
time.Sleep(time.Second)
fmt.Println(count) // may be less than 2000
}
Fix with sync.Mutex:
var (
mu sync.Mutex
count int
)
func increment() {
for range 1000 {
mu.Lock()
count++
mu.Unlock()
}
}
Fix with sync/atomic:
var count atomic.Int64
func increment() {
for range 1000 {
count.Add(1)
}
}
func main() {
go increment()
go increment()
time.Sleep(time.Second)
fmt.Println(count.Load()) // always 2000
}
Pattern 2: Concurrent Map Access¶
// RACE: maps are not safe for concurrent use
m := make(map[string]int)
go func() { m["a"] = 1 }()
go func() { m["b"] = 2 }()
// fatal error: concurrent map writes
Fix with sync.RWMutex:
type SafeMap struct {
mu sync.RWMutex
m map[string]int
}
func (s *SafeMap) Set(key string, val int) {
s.mu.Lock()
defer s.mu.Unlock()
s.m[key] = val
}
func (s *SafeMap) Get(key string) (int, bool) {
s.mu.RLock()
defer s.mu.RUnlock()
v, ok := s.m[key]
return v, ok
}
Fix with sync.Map (for write-once/read-many patterns):
Pattern 3: Slice Append¶
// RACE: append may reallocate the underlying array
var results []int
var wg sync.WaitGroup
for i := range 10 {
wg.Add(1)
go func(i int) {
defer wg.Done()
results = append(results, i) // RACE: concurrent slice append
}(i)
}
wg.Wait()
Fix with index assignment (no append):
results := make([]int, 10)
var wg sync.WaitGroup
for i := range 10 {
wg.Add(1)
go func(i int) {
defer wg.Done()
results[i] = i * 2 // safe: each goroutine writes to a unique index
}(i)
}
wg.Wait()
Fix with mutex (when append is needed):
var (
mu sync.Mutex
results []int
)
var wg sync.WaitGroup
for i := range 10 {
wg.Add(1)
go func(i int) {
defer wg.Done()
mu.Lock()
results = append(results, i*2)
mu.Unlock()
}(i)
}
wg.Wait()
Fix with channel:
ch := make(chan int, 10)
for i := range 10 {
go func(i int) {
ch <- i * 2
}(i)
}
results := make([]int, 0, 10)
for range 10 {
results = append(results, <-ch)
}
Pattern 4: Struct Field Races¶
// RACE: concurrent access to struct fields
type Config struct {
Debug bool
Timeout time.Duration
}
var cfg Config
go func() { cfg.Debug = true }()
go func() { fmt.Println(cfg.Debug) }() // RACE
Fix with atomic.Value:
var cfg atomic.Value
cfg.Store(Config{Debug: false, Timeout: 5 * time.Second})
go func() {
cfg.Store(Config{Debug: true, Timeout: 5 * time.Second})
}()
go func() {
c := cfg.Load().(Config)
fmt.Println(c.Debug)
}()
Pattern 5: Loop Variable Capture (Pre-Go 1.22)¶
// RACE in Go < 1.22: all goroutines share the same loop variable
for _, url := range urls {
go func() {
fetch(url) // captures variable, not value — all see the last url
}()
}
Fix (pre-Go 1.22):
for _, url := range urls {
url := url // shadow with a per-iteration copy
go func() {
fetch(url)
}()
}
Go 1.22+ changed loop variable semantics — each iteration creates a new variable. But the old pattern is worth knowing for interview context and legacy codebases.
The sync/atomic Package¶
Atomic operations provide lock-free synchronization for simple values:
// Go 1.19+ typed atomics (preferred)
var counter atomic.Int64
var flag atomic.Bool
var config atomic.Value
var ptr atomic.Pointer[Config]
counter.Add(1)
counter.Store(0)
val := counter.Load()
flag.Store(true)
if flag.Load() { /* ... */ }
config.Store(Config{Debug: true})
cfg := config.Load().(Config)
Atomic operations reference:¶
| Type | Operations |
|---|---|
atomic.Int32 / Int64 |
Load, Store, Add, Swap, CompareAndSwap |
atomic.Uint32 / Uint64 |
Load, Store, Add, Swap, CompareAndSwap |
atomic.Bool |
Load, Store, Swap, CompareAndSwap |
atomic.Value |
Load, Store, Swap, CompareAndSwap (any type, but must be consistent) |
atomic.Pointer[T] |
Load, Store, Swap, CompareAndSwap |
CompareAndSwap (CAS)¶
The foundation of lock-free algorithms:
var state atomic.Int32
func tryTransition(from, to int32) bool {
return state.CompareAndSwap(from, to)
// Atomically: if state == from, set state = to and return true
// else return false
}
Prevention Strategies Summary¶
| Strategy | When to Use | Overhead |
|---|---|---|
Mutex (sync.Mutex) |
Protecting shared state, critical sections | Low (lock/unlock) |
RWMutex (sync.RWMutex) |
Read-heavy shared state (10:1+ read:write ratio) | Low–medium |
| Channels | Transferring ownership of data between goroutines | Medium |
Atomics (sync/atomic) |
Counters, flags, single-value updates | Lowest |
| Confinement | Each goroutine owns its data exclusively | Zero |
| Immutability | Data never changes after creation | Zero |
| Copy-on-write | Config reload, shared state with rare updates | Medium |
Confinement — The Best Strategy¶
Confine data to a single goroutine so no synchronization is needed:
func processItems(items []Item) []Result {
results := make([]Result, len(items))
var wg sync.WaitGroup
for i, item := range items {
wg.Add(1)
go func(i int, item Item) {
defer wg.Done()
// item is a copy — confined to this goroutine
// results[i] is a unique index — no overlap
results[i] = process(item)
}(i, item)
}
wg.Wait()
return results
}
Quick Reference¶
| Concept | Command / Syntax |
|---|---|
| Enable race detector | go test -race ./... or go build -race |
| Atomic counter | var c atomic.Int64; c.Add(1) |
| Atomic flag | var f atomic.Bool; f.Store(true) |
| Atomic config swap | var v atomic.Value; v.Store(cfg) |
| CAS operation | c.CompareAndSwap(old, new) |
| Safe map (mutex) | sync.RWMutex + map[K]V |
| Safe map (built-in) | sync.Map |
| Race report format | Goroutine ID, access type, source location, creation stack |
Best Practices¶
- Run
-racein CI on every commit — data races are undefined behavior. A single race can corrupt memory silently. Make race detection a blocking CI step. - Default to confinement — design goroutines to own their data. Pass copies, use channels to transfer ownership, and assign unique slice indices.
- Use typed atomics (Go 1.19+) —
atomic.Int64,atomic.Bool,atomic.Pointer[T]are safer and more readable than the oldatomic.AddInt64(&val, 1)style. - Prefer
sync.Mutexoversync.RWMutexunless profiling shows contention —RWMutexhas higher per-operation overhead. It only wins with high read:write ratios. - Never share mutable state between goroutines without synchronization — even "safe-looking" operations like
count++are data races (read-modify-write is three operations). - Use
atomic.Valuefor read-heavy config — store an immutable config struct; readers pay near-zero cost. Writers replace the whole struct atomically.
Common Pitfalls¶
Assuming count++ is atomic
var count int
go func() { count++ }()
go func() { count++ }()
// RACE: count++ is read + increment + write — three operations
atomic.Int64.Add(1) or protect with a mutex. No compound operation on a plain variable is atomic in Go.
Concurrent map access without synchronization
m := make(map[string]int)
go func() { m["key"] = 1 }()
go func() { _ = m["key"] }()
// fatal error: concurrent map read and map write
-race flag catches, it's a runtime fatal error. Protect maps with sync.RWMutex or use sync.Map.
Race detector is not exhaustive
The race detector only finds races that actually execute during the test run. If a racy code path isn't triggered, it won't be detected. Maximize test coverage and run with realistic concurrent workloads to increase detection probability.
Race detector performance overhead
# Race detection adds ~5-10x CPU overhead and 5-10x memory overhead
go test -race ./... # significantly slower than regular tests
-race in CI and during development, but not in production. The overhead is too high for production services. Some teams run a canary instance with -race against production traffic.
Benign races don't exist
The Go memory model makes no guarantees about unsynchronized access. Even "read-only" access to a concurrently modified variable is undefined behavior. There are no benign races in Go.Performance Considerations¶
- Atomics vs Mutexes: Atomic operations are ~2–5x faster than mutex lock/unlock for simple counters and flags. Use atomics when the operation is a single read or write.
- Mutex contention: High contention on a single mutex serializes goroutines and kills parallelism. If profiling shows mutex contention (via the
mutexpprof profile), consider sharding the data or using lock-free structures. - RWMutex overhead:
RWMutexhas higher per-operation cost thanMutexdue to internal atomic operations. It only wins when reads vastly outnumber writes (typically 10:1+) and the critical section is non-trivial. - Race detector cost: ~5–10x CPU, ~5–10x memory. Never deploy with
-racein production. Run it in CI and on staging/canary environments. -
False sharing: When atomic variables sit on the same cache line, updating one invalidates the cache for the other (ping-pong effect). For high-throughput counters, pad atomics to cache-line boundaries:
Interview Tips¶
Interview Tip
"What's the difference between a data race and a race condition?" A data race is concurrent unsynchronized access to the same memory where at least one is a write — it's undefined behavior and Go's -race detector catches it. A race condition is a broader logic bug where correctness depends on timing — like check-then-act without holding a lock. You can have race conditions even with proper synchronization (no data race), and the race detector won't catch those.
Interview Tip
"How does Go's race detector work?" It uses ThreadSanitizer (TSan), compiling the program with instrumentation on every memory access. At runtime, it tracks which goroutine accessed which memory address and checks for unsynchronized concurrent accesses. It's dynamic — it only detects races that actually execute. It adds ~5–10x CPU and memory overhead, so it's used in CI and development, not production.
Interview Tip
"How would you fix a data race on a shared counter?" Three options, depending on context: (1) sync/atomic — simplest and fastest for counters (atomic.Int64.Add(1)). (2) sync.Mutex — when the counter is part of a larger critical section. (3) Channel — when the counter is owned by a single goroutine that receives increment messages. Always pick the simplest approach that fits.
Interview Tip
"What's your strategy for preventing races in a large codebase?" (1) Design for confinement — each goroutine owns its data. (2) Use channels to transfer ownership between goroutines. (3) When shared state is unavoidable, use mutexes or atomics. (4) Run -race in CI on every commit as a blocking step. (5) Use go vet which catches some concurrency bugs statically. (6) Write concurrent tests with -count=100 to increase timing variation and detection probability.
Key Takeaways¶
- Data races are undefined behavior — not just bugs, but UB that can corrupt memory, crash, or produce silently wrong results.
- Always run
go test -race ./...in CI — it catches data races dynamically with ~5–10x overhead. - The race detector only finds races that execute — maximize test coverage and use realistic concurrent workloads.
- Prefer confinement: goroutines that own their data don't need synchronization.
- Use
sync/atomicfor counters and flags,sync.Mutexfor critical sections, channels for ownership transfer. - There are no benign races in Go — the memory model provides zero guarantees for unsynchronized access.
- Go maps are not safe for concurrent use — the runtime will crash on concurrent read/write rather than silently corrupt.