Skip to content

Observability and Resilience SDK

Repository: go-observability-sdk/ (sibling directory)

What This Project Demonstrates

Production systems need more than business logic. This SDK provides the building blocks that make services observable, resilient, and production-ready -- exactly the kind of platform engineering work a Tech Lead drives.

Skill Area How It's Demonstrated
Library / API design Clean public APIs with functional options pattern
Interface design Small, composable interfaces throughout
Resilience patterns Circuit breaker, retry with backoff, rate limiting
Observability Structured logging, Prometheus metrics, health checks
Testing discipline Comprehensive tests for every component
Go idioms Functional options, interface adapters, generics

Components

graph TD
    App["Your Application"] --> CB["Circuit Breaker\ncircuitbreaker/"]
    App --> RL["Rate Limiter\nratelimiter/"]
    App --> RT["Retry\nretry/"]
    App --> HC["Health Checks\nhealthcheck/"]
    App --> LG["Logger\nlogger/"]
    App --> MT["Metrics\nmetrics/"]
    CB -.->|"state changes"| LG
    RL -.->|"rate exceeded"| MT
    HC -.->|"HTTP handler"| HealthEndpoint["/health"]
    MT -.->|"HTTP handler"| MetricsEndpoint["/metrics"]

Component Overview

Component What It Does Key Pattern
Circuit Breaker Prevents cascading failures by failing fast when a dependency is down State machine (Closed → Open → Half-Open)
Rate Limiter Controls request throughput using token bucket algorithm Allow() non-blocking + Wait(ctx) blocking
Retry Retries failed operations with exponential backoff and jitter Configurable via functional options
Health Check Aggregates dependency health into a single endpoint Registry pattern with concurrent checks
Logger Structured logging with context propagation log/slog wrapper with middleware
Metrics Prometheus metrics helpers for HTTP services Counter + histogram + gauge middleware

Project Structure

go-observability-sdk/
├── circuitbreaker/
│   ├── circuitbreaker.go         # Three-state circuit breaker
│   └── circuitbreaker_test.go    # State transition tests
├── ratelimiter/
│   ├── ratelimiter.go            # Token bucket rate limiter
│   └── ratelimiter_test.go       # Rate enforcement tests
├── retry/
│   ├── retry.go                  # Retry with exponential backoff
│   └── retry_test.go             # Backoff timing tests
├── healthcheck/
│   ├── healthcheck.go            # Health check registry + HTTP handler
│   └── healthcheck_test.go       # Status aggregation tests
├── logger/
│   ├── logger.go                 # slog wrapper + HTTP middleware
│   └── logger_test.go            # Format and context tests
├── metrics/
│   ├── metrics.go                # Prometheus HTTP metrics helpers
│   └── metrics_test.go           # Middleware tests
├── examples/
│   └── httpserver/
│       └── main.go               # Complete example using all components
├── go.mod
├── Makefile
└── README.md

Key Design Decisions

Why Functional Options Pattern?

Every component uses functional options for configuration:

cb := circuitbreaker.New(
    circuitbreaker.WithFailureThreshold(5),
    circuitbreaker.WithTimeout(30 * time.Second),
    circuitbreaker.WithOnStateChange(func(from, to State) {
        log.Info("circuit breaker state change", "from", from, "to", to)
    }),
)

This pattern provides: - Sensible defaults (zero-config works) - Backward-compatible API evolution (add options without breaking existing callers) - Self-documenting configuration - Compile-time type safety

Why Separate Packages?

Each component is an independent package: - Users import only what they need (go-observability-sdk/retry) - No coupling between components - Each package has its own tests and documentation - Follows the Go principle: "a little copying is better than a little dependency"

Why Token Bucket for Rate Limiting?

Token bucket is the standard algorithm for rate limiting in distributed systems: - Allows bursts up to a configurable limit - Smooth refill rate prevents thundering herd - Simple to reason about and configure - Used by AWS, Google Cloud, and most API gateways

Why Injectable Clock in Circuit Breaker?

The circuit breaker accepts a nowFunc option for time:

cb := circuitbreaker.New(
    circuitbreaker.WithNowFunc(fakeClock.Now),
)

This makes tests deterministic -- no time.Sleep needed, no flaky timing tests.


Component Deep Dives

Circuit Breaker State Machine

stateDiagram-v2
    Closed --> Open: Failures >= Threshold
    Open --> HalfOpen: After Timeout
    HalfOpen --> Closed: Success >= SuccessThreshold
    HalfOpen --> Open: Any Failure

Retry Backoff Strategy

Attempt 1: immediate
Attempt 2: 100ms + jitter (0-50ms)
Attempt 3: 200ms + jitter (0-100ms)
Attempt 4: 400ms + jitter (0-200ms)
Attempt 5: 800ms + jitter (0-400ms)
... capped at MaxDelay

Jitter prevents thundering herd when multiple clients retry simultaneously.

Health Check Aggregation

Individual Results Overall Status HTTP Code
All pass healthy 200
Non-critical fail degraded 200
Any critical fail unhealthy 503

Go Concepts Showcased

Concept Where It's Used
Functional options Every component's configuration
Interfaces Check interface, http.Handler adapter pattern
sync.Mutex Thread-safe state in circuit breaker and rate limiter
context.Context Cancellation in retry and rate limiter Wait
log/slog Structured logging wrapper with context propagation
HTTP middleware Logger, metrics, and rate limiter as func(http.Handler) http.Handler
time.Ticker / time.Timer Rate limiting token refill, circuit breaker timeout
Custom error types ErrCircuitOpen, RetryableError
Test helpers Injectable clocks, assertion helpers
http.HandlerFunc adapter Health check CheckFunc mirrors this pattern

How to Talk About This in an Interview

Interview Talking Points

  1. Start with why: "Production services need resilience primitives -- circuit breakers, retries, rate limiting. Instead of ad-hoc implementations scattered across services, I built a shared SDK with consistent API design."

  2. API design philosophy: "Every component uses functional options for zero-config defaults with extensibility. The API is backward-compatible -- adding new options never breaks existing callers."

  3. Testing approach: "Injectable clocks make tests deterministic. No time.Sleep calls, no flaky CI. Each component has comprehensive tests including concurrent access."

  4. How components compose: "The example server shows all components working together -- rate limiter as middleware, circuit breaker wrapping upstream calls, health checks aggregating dependency status, Prometheus metrics recording everything."

  5. Library vs framework: "This is a library, not a framework. Each package is independently importable. Services adopt components incrementally."


Running the Project

cd go-observability-sdk

# Install dependencies
go mod tidy

# Run all tests
make test

# Run with race detector
go test -race ./...

# Run the example server
make example
# Then visit:
#   http://localhost:8080/         (Hello endpoint with circuit breaker)
#   http://localhost:8080/health   (Health check status)
#   http://localhost:8080/metrics  (Prometheus metrics)

Example: All Components Together

The examples/httpserver/main.go demonstrates a production-like HTTP server:

func main() {
    // Logger
    log := logger.New(logger.WithFormat("json"), logger.WithLevel(slog.LevelInfo))

    // Circuit breaker for upstream service
    cb := circuitbreaker.New(
        circuitbreaker.WithFailureThreshold(3),
        circuitbreaker.WithTimeout(10 * time.Second),
    )

    // Rate limiter: 100 requests/second, burst of 20
    rl := ratelimiter.New(100, 20)

    // Health checks
    health := healthcheck.NewRegistry()
    health.Register("upstream", true, healthcheck.CheckFunc(func(ctx context.Context) error {
        if cb.State() == circuitbreaker.StateOpen {
            return errors.New("circuit breaker open")
        }
        return nil
    }))

    // Prometheus metrics
    httpMetrics := metrics.NewHTTPMetrics("myservice")

    // Wire up the handler chain
    mux := http.NewServeMux()
    mux.HandleFunc("/", handleRequest(cb))
    mux.Handle("/health", health.Handler())
    mux.Handle("/metrics", promhttp.Handler())

    // Middleware stack: logging → metrics → rate limiting → handler
    handler := log.HTTPMiddleware(
        httpMetrics.Middleware(
            rateLimitMiddleware(rl, mux)))

    // Start with graceful shutdown...
}