Low-Latency Feature Store Service¶

Repository: go-feature-store/ (sibling directory)

What This Project Demonstrates¶

Feature stores are a critical component of ML infrastructure. In ad-tech, features (user segments, contextual signals, historical behavior) must be served to ML models in real-time during bid requests -- typically under 5ms P99.

Skill Area	How It's Demonstrated
gRPC services	Protobuf-defined API with unary and batch RPCs
Low-latency design	Generic sharded LRU cache with sub-millisecond reads
ML infrastructure	Feature serving patterns used in real ad-tech stacks
Generics	Cache layer uses Go generics throughout
Observability	Prometheus metrics, gRPC interceptors, structured logging
Production patterns	Graceful shutdown, health checks, configurable via env vars

Architecture¶

graph TD
    MLModel["ML Model\n(gRPC Client)"] -->|"GetFeatures / BatchGet"| GRPCServer["gRPC Server\n:50051"]
    GRPCServer --> Interceptors["Interceptors\n(logging, metrics)"]
    Interceptors --> Server["FeatureStoreServer"]
    Server --> Store["InMemoryStore"]
    Store --> ShardedCache["ShardedCache\n(N shards)"]
    ShardedCache --> Shard0["Shard 0\n(LRU Cache)"]
    ShardedCache --> Shard1["Shard 1\n(LRU Cache)"]
    ShardedCache --> ShardN["Shard N\n(LRU Cache)"]
    GRPCServer -.->|"HTTP :8081"| Health["/health"]
    GRPCServer -.->|"HTTP :8081"| Metrics["/metrics"]

Request Flow¶

gRPC request arrives for features of a given entity (user ID, device ID, etc.)
Interceptors log the request and record Prometheus metrics (duration, status)
Store layer looks up features in the sharded cache
Sharded cache hashes the key to select a shard, then performs LRU lookup within that shard
Response returns the feature map with hit/miss metadata

Project Structure¶

go-feature-store/
├── cmd/
│   └── server/
│       └── main.go                    # gRPC + HTTP server entry point
├── internal/
│   ├── cache/
│   │   ├── lru.go                     # Generic LRU cache with TTL
│   │   ├── lru_test.go                # Tests + benchmarks
│   │   └── sharded.go                 # Sharded cache for reduced contention
│   ├── store/
│   │   ├── store.go                   # FeatureStore interface + InMemoryStore
│   │   └── store_test.go              # Tests
│   ├── server/
│   │   ├── server.go                  # gRPC server implementation
│   │   └── server_test.go             # Tests
│   └── proto/featurestore/v1/
│       ├── featurestore.go            # Message types
│       └── featurestore_grpc.go       # gRPC service interface
├── proto/featurestore/v1/
│   └── featurestore.proto             # Canonical protobuf definition
├── pkg/client/
│   └── client.go                      # Reusable gRPC client library
├── go.mod
├── Makefile
├── Dockerfile
└── README.md

Key Design Decisions¶

Why Sharded Cache?¶

A single mutex-guarded map becomes a bottleneck under high concurrency. Sharding spreads keys across N independent LRU caches, each with its own lock:

16 shards = 16x less mutex contention
Hash-based sharding distributes keys evenly
Power-of-two shard count enables bitwise AND for fast index calculation

Why Generic LRU Cache?¶

Go 1.18+ generics eliminate type assertions in the hot path:

type Cache[K comparable, V any] struct { ... }

cache := NewCache[string, FeatureMap](WithMaxSize(10000))
val, ok := cache.Get("user:12345")  // No type assertion needed

Why TTL on Cache Entries?¶

ML features have different freshness requirements: - User segments: may be stale for minutes - Real-time signals (last click): must be fresh within seconds

Lazy TTL expiration (check on access, no background goroutine) avoids unnecessary CPU overhead.

Why gRPC Over REST?¶

Protobuf encoding is 3-10x smaller and faster than JSON
Strongly typed contract prevents client/server drift
Streaming RPCs enable future batch streaming patterns
Interceptors provide clean middleware separation (like HTTP middleware but for gRPC)

Go Concepts Showcased¶

Concept	Where It's Used
Generics	`Cache[K, V]` and `ShardedCache[K, V]` with type constraints
gRPC	Service definition, server/client implementation, interceptors
Interfaces	`FeatureStore` interface, `Check` health check interface
sync.RWMutex	Read-heavy cache access (many reads, fewer writes)
sync/atomic	Lock-free hit/miss counters in the store layer
Functional options	`WithMaxSize`, `WithTTL`, `WithShardCount`
Table-driven tests	Comprehensive test suites across all packages
Benchmarks	Concurrent read/write performance testing
Graceful shutdown	Signal handling, gRPC server GracefulStop
Docker multi-stage	Minimal production image

How to Talk About This in an Interview¶

Interview Talking Points

Start with the problem: "ML models serving ads in real-time need features in under 5ms. This service provides that with a sharded in-memory cache and gRPC for efficient serialization."
Explain the caching strategy: "I used a sharded LRU with per-shard locking to reduce contention. Under high concurrency, a single-lock LRU becomes a bottleneck."
Discuss generics: "The cache is fully generic -- Cache[K, V] -- eliminating type assertions in the hot path. This matters when you're serving millions of feature lookups per second."
Talk about the gRPC design: "Protobuf is significantly more efficient than JSON for structured data. Interceptors provide clean logging and metrics without polluting business logic."
Production readiness: "The service includes health checks, Prometheus metrics, graceful shutdown, and configurable caching parameters -- everything needed to deploy on Kubernetes."

Running the Project¶

cd go-feature-store

# Install dependencies
go mod tidy

# Run tests
make test

# Run benchmarks
make bench

# Start the server
GRPC_PORT=50051 HTTP_PORT=8081 CACHE_SIZE=10000 SHARD_COUNT=16 make run

# Use grpcurl to test (install: go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest)
grpcurl -plaintext -d '{"entity_id":"user:123","features":{"age":{"int_value":25}}}' \
  localhost:50051 featurestore.v1.FeatureStoreService/SetFeatures