Production Go Advanced¶
Introduction¶
Writing Go that compiles is easy. Writing Go that runs reliably in production — with structured logging, graceful shutdown, health checks, proper configuration, and observability — is what makes a senior engineer. This topic ties together everything from the other sections into a cohesive production service.
Why This Matters
This is the "glue everything together" topic. Interviewers want to know: can you build and operate a service end-to-end? Do you know how to handle signals, configure a connection pool, emit metrics, ship logs, and deploy to Kubernetes? This is where theory meets practice.
Structured Logging (log/slog)¶
Go 1.21 introduced log/slog — the standard library's structured logging package.
Basic Usage¶
import "log/slog"
func main() {
// JSON handler for production
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
}))
slog.SetDefault(logger)
slog.Info("server starting",
"addr", ":8080",
"version", version,
)
// Output: {"time":"2025-01-15T10:30:00Z","level":"INFO","msg":"server starting","addr":":8080","version":"v1.2.3"}
slog.Error("request failed",
"method", "GET",
"path", "/api/users",
"status", 500,
"error", err,
"duration", time.Since(start),
)
}
Logger with Context (Request Scoped)¶
func RequestLogger(logger *slog.Logger) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
requestID := r.Header.Get("X-Request-ID")
if requestID == "" {
requestID = uuid.NewString()
}
reqLogger := logger.With(
"request_id", requestID,
"method", r.Method,
"path", r.URL.Path,
)
ctx := ContextWithLogger(r.Context(), reqLogger)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
}
type ctxKey struct{}
func ContextWithLogger(ctx context.Context, logger *slog.Logger) context.Context {
return context.WithValue(ctx, ctxKey{}, logger)
}
func LoggerFromContext(ctx context.Context) *slog.Logger {
if logger, ok := ctx.Value(ctxKey{}).(*slog.Logger); ok {
return logger
}
return slog.Default()
}
// Usage in handlers
func handleGetUser(w http.ResponseWriter, r *http.Request) {
logger := LoggerFromContext(r.Context())
logger.Info("fetching user", "user_id", chi.URLParam(r, "id"))
}
Custom Log Levels and Dynamic Level¶
var logLevel = new(slog.LevelVar) // default: Info
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: logLevel,
}))
// Change level at runtime (useful for debugging in production)
mux.HandleFunc("PUT /debug/log-level", func(w http.ResponseWriter, r *http.Request) {
level := r.URL.Query().Get("level")
switch level {
case "debug":
logLevel.Set(slog.LevelDebug)
case "info":
logLevel.Set(slog.LevelInfo)
case "warn":
logLevel.Set(slog.LevelWarn)
case "error":
logLevel.Set(slog.LevelError)
default:
http.Error(w, "invalid level", http.StatusBadRequest)
return
}
slog.Info("log level changed", "new_level", level)
})
Graceful Shutdown¶
HTTP Server¶
func main() {
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
srv := &http.Server{
Addr: ":8080",
Handler: setupRoutes(),
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 60 * time.Second,
}
// Start server in goroutine
go func() {
logger.Info("server starting", "addr", srv.Addr)
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
logger.Error("server error", "error", err)
os.Exit(1)
}
}()
// Wait for shutdown signal
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
sig := <-quit
logger.Info("shutdown signal received", "signal", sig)
// Graceful shutdown with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
logger.Error("forced shutdown", "error", err)
}
logger.Info("server stopped")
}
Full Production Shutdown (Server + Workers + Connections)¶
func main() {
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
// Initialize dependencies
db, _ := sql.Open("postgres", os.Getenv("DATABASE_URL"))
defer db.Close()
redisClient := redis.NewClient(&redis.Options{Addr: os.Getenv("REDIS_URL")})
defer redisClient.Close()
// Start background workers
var wg sync.WaitGroup
workerCtx, workerCancel := context.WithCancel(context.Background())
wg.Add(1)
go func() {
defer wg.Done()
runWorker(workerCtx, db)
}()
// Start HTTP server
srv := &http.Server{
Addr: ":8080",
Handler: setupRoutes(db, redisClient),
}
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
slog.Error("server error", "error", err)
}
}()
slog.Info("server started", "addr", ":8080")
// Wait for shutdown signal
<-ctx.Done()
slog.Info("shutting down...")
// 1. Stop accepting new HTTP requests
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
srv.Shutdown(shutdownCtx)
// 2. Stop background workers
workerCancel()
wg.Wait()
// 3. Close connections (deferred above)
slog.Info("shutdown complete")
}
sequenceDiagram
participant OS as OS Signal
participant SRV as HTTP Server
participant WRK as Workers
participant DB as Database
OS->>SRV: SIGTERM
Note over SRV: Stop accepting new requests
SRV->>SRV: Finish in-flight requests (30s timeout)
SRV->>WRK: Cancel worker context
WRK->>WRK: Finish current tasks
WRK->>DB: Close connections
Note over SRV: Shutdown complete
Health Checks for Kubernetes¶
type HealthHandler struct {
ready atomic.Bool
db *sql.DB
redis *redis.Client
}
func NewHealthHandler(db *sql.DB, redis *redis.Client) *HealthHandler {
return &HealthHandler{db: db, redis: redis}
}
func (h *HealthHandler) SetReady(ready bool) {
h.ready.Store(ready)
}
// Liveness: is the process alive? (restart if not)
func (h *HealthHandler) LivenessHandler(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("ok"))
}
// Readiness: can the process serve traffic? (remove from LB if not)
func (h *HealthHandler) ReadinessHandler(w http.ResponseWriter, r *http.Request) {
if !h.ready.Load() {
http.Error(w, "not ready", http.StatusServiceUnavailable)
return
}
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
checks := map[string]error{
"database": h.db.PingContext(ctx),
"redis": h.redis.Ping(ctx).Err(),
}
result := map[string]string{}
healthy := true
for name, err := range checks {
if err != nil {
result[name] = err.Error()
healthy = false
} else {
result[name] = "ok"
}
}
if !healthy {
w.WriteHeader(http.StatusServiceUnavailable)
}
json.NewEncoder(w).Encode(result)
}
// Startup: has the process finished initializing? (don't restart yet)
func (h *HealthHandler) StartupHandler(w http.ResponseWriter, r *http.Request) {
if !h.ready.Load() {
http.Error(w, "starting", http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
}
// Usage
health := NewHealthHandler(db, redisClient)
mux.HandleFunc("GET /healthz", health.LivenessHandler)
mux.HandleFunc("GET /readyz", health.ReadinessHandler)
mux.HandleFunc("GET /startupz", health.StartupHandler)
// Mark ready after initialization
health.SetReady(true)
Kubernetes Manifest¶
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe:
httpGet:
path: /startupz
port: 8080
failureThreshold: 30
periodSeconds: 2
Configuration Management¶
Environment Variables + Flags (12-Factor)¶
type Config struct {
Addr string `json:"addr"`
DatabaseURL string `json:"-"` // never log secrets
RedisURL string `json:"-"`
LogLevel string `json:"log_level"`
ShutdownTimeout time.Duration `json:"shutdown_timeout"`
MaxConnections int `json:"max_connections"`
}
func LoadConfig() (*Config, error) {
cfg := &Config{
Addr: envOrDefault("ADDR", ":8080"),
DatabaseURL: requireEnv("DATABASE_URL"),
RedisURL: envOrDefault("REDIS_URL", "localhost:6379"),
LogLevel: envOrDefault("LOG_LEVEL", "info"),
ShutdownTimeout: envDuration("SHUTDOWN_TIMEOUT", 30*time.Second),
MaxConnections: envInt("MAX_CONNECTIONS", 25),
}
// Flags override env vars (useful for local dev)
flag.StringVar(&cfg.Addr, "addr", cfg.Addr, "listen address")
flag.StringVar(&cfg.LogLevel, "log-level", cfg.LogLevel, "log level")
flag.Parse()
return cfg, cfg.validate()
}
func (c *Config) validate() error {
if c.DatabaseURL == "" {
return fmt.Errorf("DATABASE_URL is required")
}
if c.MaxConnections < 1 {
return fmt.Errorf("MAX_CONNECTIONS must be >= 1")
}
return nil
}
func envOrDefault(key, fallback string) string {
if v := os.Getenv(key); v != "" {
return v
}
return fallback
}
func requireEnv(key string) string {
v := os.Getenv(key)
if v == "" {
slog.Error("required environment variable not set", "key", key)
os.Exit(1)
}
return v
}
func envDuration(key string, fallback time.Duration) time.Duration {
v := os.Getenv(key)
if v == "" {
return fallback
}
d, err := time.ParseDuration(v)
if err != nil {
slog.Error("invalid duration", "key", key, "value", v, "error", err)
os.Exit(1)
}
return d
}
func envInt(key string, fallback int) int {
v := os.Getenv(key)
if v == "" {
return fallback
}
n, err := strconv.Atoi(v)
if err != nil {
slog.Error("invalid int", "key", key, "value", v, "error", err)
os.Exit(1)
}
return n
}
Dockerfile Best Practices¶
Multi-Stage Build¶
# Stage 1: Build
FROM golang:1.23-alpine AS builder
RUN apk add --no-cache ca-certificates git
WORKDIR /app
# Cache dependencies
COPY go.mod go.sum ./
RUN go mod download
# Build
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-s -w -X main.version=$(git describe --tags --always)" \
-o /app/server ./cmd/server
# Stage 2: Runtime
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /app/server /server
# Non-root user (numeric for scratch)
USER 65534:65534
EXPOSE 8080
ENTRYPOINT ["/server"]
Alternative with Distroless (Includes Shell for Debugging)¶
FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
EXPOSE 8080
ENTRYPOINT ["/server"]
| Base Image | Size | Shell | CA Certs | Debugging |
|---|---|---|---|---|
scratch |
~0MB | No | Must copy | None |
distroless/static |
~2MB | No | Yes | Minimal |
distroless/base |
~20MB | No | Yes | Some |
alpine |
~7MB | Yes | Yes | Full |
Makefile Patterns¶
.PHONY: build test lint run docker
VERSION ?= $(shell git describe --tags --always --dirty)
LDFLAGS := -s -w -X main.version=$(VERSION)
build:
CGO_ENABLED=0 go build -ldflags="$(LDFLAGS)" -o bin/server ./cmd/server
test:
go test -race -coverprofile=coverage.out ./...
go tool cover -func=coverage.out
test-integration:
go test -race -tags=integration ./...
lint:
golangci-lint run ./...
run: build
./bin/server
docker:
docker build -t myapp:$(VERSION) .
migrate-up:
migrate -path migrations -database "$(DATABASE_URL)" up
migrate-down:
migrate -path migrations -database "$(DATABASE_URL)" down 1
generate:
go generate ./...
tidy:
go mod tidy
go mod verify
clean:
rm -rf bin/ coverage.out
Observability¶
Prometheus Metrics¶
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
httpRequestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests by method, path, and status",
},
[]string{"method", "path", "status"},
)
httpRequestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration in seconds",
Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
},
[]string{"method", "path"},
)
activeConnections = promauto.NewGauge(prometheus.GaugeOpts{
Name: "active_connections",
Help: "Number of active connections",
})
)
func MetricsMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
wrapped := &statusRecorder{ResponseWriter: w, status: http.StatusOK}
activeConnections.Inc()
defer activeConnections.Dec()
next.ServeHTTP(wrapped, r)
duration := time.Since(start).Seconds()
status := strconv.Itoa(wrapped.status)
httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path, status).Inc()
httpRequestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
})
}
type statusRecorder struct {
http.ResponseWriter
status int
}
func (r *statusRecorder) WriteHeader(status int) {
r.status = status
r.ResponseWriter.WriteHeader(status)
}
// Expose metrics endpoint
mux.Handle("GET /metrics", promhttp.Handler())
OpenTelemetry Setup¶
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
)
func initTracer(ctx context.Context, serviceName, version string) (func(), error) {
exporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")),
otlptracegrpc.WithInsecure(),
)
if err != nil {
return nil, fmt.Errorf("create exporter: %w", err)
}
res := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName(serviceName),
semconv.ServiceVersion(version),
)
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.ParentBased(
sdktrace.TraceIDRatioBased(0.01), // 1% sampling
)),
)
otel.SetTracerProvider(tp)
shutdown := func() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
tp.Shutdown(ctx)
}
return shutdown, nil
}
Complete Production Service Structure¶
myservice/
├── cmd/
│ └── server/
│ └── main.go # entry point, wiring
├── internal/
│ ├── config/
│ │ └── config.go # configuration loading
│ ├── handler/
│ │ ├── user.go # HTTP handlers
│ │ └── middleware.go # logging, metrics, auth
│ ├── service/
│ │ └── user.go # business logic
│ ├── repository/
│ │ ├── user.go # interface
│ │ └── postgres.go # implementation
│ └── model/
│ └── user.go # domain types
├── migrations/
│ ├── 000001_users.up.sql
│ └── 000001_users.down.sql
├── proto/
│ └── user.proto
├── Dockerfile
├── Makefile
├── go.mod
└── go.sum
main.go (Putting It All Together)¶
package main
import (
"context"
"database/sql"
"log/slog"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
var version = "dev"
func main() {
// 1. Load configuration
cfg, err := config.Load()
if err != nil {
slog.Error("failed to load config", "error", err)
os.Exit(1)
}
// 2. Setup logging
logLevel := new(slog.LevelVar)
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: logLevel,
}))
slog.SetDefault(logger)
slog.Info("starting service", "version", version)
// 3. Initialize tracing
shutdownTracer, err := initTracer(context.Background(), "myservice", version)
if err != nil {
slog.Error("failed to init tracer", "error", err)
os.Exit(1)
}
defer shutdownTracer()
// 4. Connect to database
db, err := sql.Open("postgres", cfg.DatabaseURL)
if err != nil {
slog.Error("failed to open database", "error", err)
os.Exit(1)
}
defer db.Close()
db.SetMaxOpenConns(cfg.MaxConnections)
db.SetMaxIdleConns(cfg.MaxConnections / 2)
db.SetConnMaxLifetime(5 * time.Minute)
if err := db.PingContext(context.Background()); err != nil {
slog.Error("database unreachable", "error", err)
os.Exit(1)
}
// 5. Build dependency graph
userRepo := repository.NewPostgresUserRepository(db)
userSvc := service.NewUserService(userRepo)
userHandler := handler.NewUserHandler(userSvc)
// 6. Setup HTTP routes and middleware
mux := http.NewServeMux()
mux.HandleFunc("GET /api/v1/users/{id}", userHandler.GetUser)
mux.HandleFunc("POST /api/v1/users", userHandler.CreateUser)
health := NewHealthHandler(db)
mux.HandleFunc("GET /healthz", health.LivenessHandler)
mux.HandleFunc("GET /readyz", health.ReadinessHandler)
mux.Handle("GET /metrics", promhttp.Handler())
wrapped := Chain(mux,
Recovery(logger),
RequestLogger(logger),
MetricsMiddleware,
)
srv := &http.Server{
Addr: cfg.Addr,
Handler: wrapped,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 60 * time.Second,
}
// 7. Start server
go func() {
slog.Info("HTTP server starting", "addr", cfg.Addr)
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
slog.Error("server error", "error", err)
os.Exit(1)
}
}()
health.SetReady(true)
// 8. Wait for shutdown signal
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
<-ctx.Done()
slog.Info("shutdown signal received")
health.SetReady(false) // stop receiving traffic
// Brief delay for load balancer to notice readiness change
time.Sleep(2 * time.Second)
shutdownCtx, cancel := context.WithTimeout(context.Background(), cfg.ShutdownTimeout)
defer cancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
slog.Error("forced shutdown", "error", err)
}
slog.Info("shutdown complete")
}
Quick Reference¶
| Concern | Solution | Key Package |
|---|---|---|
| Structured logging | log/slog JSON handler |
log/slog (stdlib) |
| Graceful shutdown | signal.NotifyContext + srv.Shutdown |
os/signal (stdlib) |
| Liveness check | /healthz — always 200 |
Manual handler |
| Readiness check | /readyz — check dependencies |
Manual handler |
| Configuration | Env vars + flags + validation | os, flag (stdlib) |
| Metrics | Prometheus counters, histograms | prometheus/client_golang |
| Tracing | OpenTelemetry spans | go.opentelemetry.io/otel |
| Container image | Multi-stage, scratch/distroless | Dockerfile |
| Build | CGO_ENABLED=0 go build -ldflags |
Makefile |
| Migrations | SQL files + migration tool | golang-migrate |
Best Practices¶
- Structured logging from day one — use
slogwith JSON handler; add request ID to every log - Graceful shutdown is mandatory — handle SIGTERM, drain connections, stop workers before exit
- Separate liveness from readiness — liveness says "restart me", readiness says "don't send traffic"
- Configuration via environment variables — 12-factor, validate at startup, never log secrets
- Multi-stage Docker builds — build in
golang:, run inscratchordistroless - Set HTTP server timeouts —
ReadTimeout,WriteTimeout,IdleTimeoutprevent resource leaks - Expose
/metricsfor Prometheus and trace with OpenTelemetry - Version your binary — embed git tag with
-ldflagsfor traceability in production - Use
internal/package — prevents external imports of implementation details - Makefile as development interface —
make build,make test,make lint,make docker
Common Pitfalls¶
No Graceful Shutdown
Without graceful shutdown, in-flight requests are terminated on deploy. This causes 502 errors, incomplete transactions, and data corruption. Always handle SIGTERM.
Logging Secrets
Never log database URLs, API keys, or tokens. Use json:"-" tags on config structs and redact sensitive fields.
Missing HTTP Server Timeouts
The default http.Server has no timeouts. A slow client can hold connections forever, eventually exhausting file descriptors.
Readiness Check During Shutdown
After receiving SIGTERM, set readiness to false before starting shutdown. This gives the load balancer time to stop routing traffic. Add a brief sleep (1-2s) between marking unready and shutting down.
Using fmt.Println in Production
Unstructured logs are impossible to parse, filter, or alert on. Use slog for all logging — even in library code.
Performance Considerations¶
slogis fast — the JSON handler is allocation-efficient; useLogAttrsfor zero-allocation hot-path logging- Prometheus metrics — use
promautofor simple registration; keep label cardinality low (no user IDs in labels) - OpenTelemetry sampling — use 1-10% sampling in production; 100% in staging
- Docker image size —
scratchbase = ~5MB final image; faster pulls, less attack surface - Connection pool tuning — monitor
db.Stats()and Redis pool stats; alert on wait count/duration - Graceful shutdown timeout — match Kubernetes
terminationGracePeriodSeconds(default 30s); set your shutdown timeout slightly lower
Interview Tips¶
Interview Tip
"My production Go service template starts with: structured logging via slog, graceful shutdown via signal.NotifyContext, health checks for Kubernetes, Prometheus metrics middleware, and a multi-stage Dockerfile targeting scratch. I wire dependencies via constructor injection in main.go — explicit and testable, no magic."
Interview Tip
"For graceful shutdown, the order matters: mark readiness as false, sleep briefly for LB to drain, stop accepting new requests, wait for in-flight requests to complete, stop background workers, then close database connections. Each step has a purpose."
Interview Tip
"I separate liveness from readiness probes. Liveness just returns 200 — if the process is alive, don't restart it. Readiness checks actual dependencies (database, Redis). This way, a temporary database blip removes the pod from the load balancer without restarting it. A startup probe prevents premature liveness checks during slow initialization like migrations."
Key Takeaways¶
- Structured logging with
slog— JSON output, request-scoped loggers via context, dynamic log levels - Graceful shutdown — handle SIGTERM, drain HTTP, stop workers, close connections in order
- Three health probes — liveness (alive?), readiness (can serve?), startup (finished init?)
- 12-factor configuration — environment variables, validated at startup, secrets never logged
- Multi-stage Docker — build in
golang:, run inscratch/distroless, non-root user - Observability triad — metrics (Prometheus), traces (OpenTelemetry), structured logs (slog)
- HTTP server timeouts are mandatory —
ReadTimeout,WriteTimeout,IdleTimeout - The
main.goorchestrates — loads config, initializes dependencies, wires the service, handles lifecycle - Use
internal/to protect implementation details from external consumers - Makefile as developer UX — every common task should be one
makecommand