Skip to content

Low-Latency Feature Store Service

Repository: go-feature-store/ (sibling directory)

What This Project Demonstrates

Feature stores are a critical component of ML infrastructure. In ad-tech, features (user segments, contextual signals, historical behavior) must be served to ML models in real-time during bid requests -- typically under 5ms P99.

Skill Area How It's Demonstrated
gRPC services Protobuf-defined API with unary and batch RPCs
Low-latency design Generic sharded LRU cache with sub-millisecond reads
ML infrastructure Feature serving patterns used in real ad-tech stacks
Generics Cache layer uses Go generics throughout
Observability Prometheus metrics, gRPC interceptors, structured logging
Production patterns Graceful shutdown, health checks, configurable via env vars

Architecture

graph TD
    MLModel["ML Model\n(gRPC Client)"] -->|"GetFeatures / BatchGet"| GRPCServer["gRPC Server\n:50051"]
    GRPCServer --> Interceptors["Interceptors\n(logging, metrics)"]
    Interceptors --> Server["FeatureStoreServer"]
    Server --> Store["InMemoryStore"]
    Store --> ShardedCache["ShardedCache\n(N shards)"]
    ShardedCache --> Shard0["Shard 0\n(LRU Cache)"]
    ShardedCache --> Shard1["Shard 1\n(LRU Cache)"]
    ShardedCache --> ShardN["Shard N\n(LRU Cache)"]
    GRPCServer -.->|"HTTP :8081"| Health["/health"]
    GRPCServer -.->|"HTTP :8081"| Metrics["/metrics"]

Request Flow

  1. gRPC request arrives for features of a given entity (user ID, device ID, etc.)
  2. Interceptors log the request and record Prometheus metrics (duration, status)
  3. Store layer looks up features in the sharded cache
  4. Sharded cache hashes the key to select a shard, then performs LRU lookup within that shard
  5. Response returns the feature map with hit/miss metadata

Project Structure

go-feature-store/
├── cmd/
│   └── server/
│       └── main.go                    # gRPC + HTTP server entry point
├── internal/
│   ├── cache/
│   │   ├── lru.go                     # Generic LRU cache with TTL
│   │   ├── lru_test.go                # Tests + benchmarks
│   │   └── sharded.go                 # Sharded cache for reduced contention
│   ├── store/
│   │   ├── store.go                   # FeatureStore interface + InMemoryStore
│   │   └── store_test.go              # Tests
│   ├── server/
│   │   ├── server.go                  # gRPC server implementation
│   │   └── server_test.go             # Tests
│   └── proto/featurestore/v1/
│       ├── featurestore.go            # Message types
│       └── featurestore_grpc.go       # gRPC service interface
├── proto/featurestore/v1/
│   └── featurestore.proto             # Canonical protobuf definition
├── pkg/client/
│   └── client.go                      # Reusable gRPC client library
├── go.mod
├── Makefile
├── Dockerfile
└── README.md

Key Design Decisions

Why Sharded Cache?

A single mutex-guarded map becomes a bottleneck under high concurrency. Sharding spreads keys across N independent LRU caches, each with its own lock:

  • 16 shards = 16x less mutex contention
  • Hash-based sharding distributes keys evenly
  • Power-of-two shard count enables bitwise AND for fast index calculation

Why Generic LRU Cache?

Go 1.18+ generics eliminate type assertions in the hot path:

type Cache[K comparable, V any] struct { ... }

cache := NewCache[string, FeatureMap](WithMaxSize(10000))
val, ok := cache.Get("user:12345")  // No type assertion needed

Why TTL on Cache Entries?

ML features have different freshness requirements: - User segments: may be stale for minutes - Real-time signals (last click): must be fresh within seconds

Lazy TTL expiration (check on access, no background goroutine) avoids unnecessary CPU overhead.

Why gRPC Over REST?

  • Protobuf encoding is 3-10x smaller and faster than JSON
  • Strongly typed contract prevents client/server drift
  • Streaming RPCs enable future batch streaming patterns
  • Interceptors provide clean middleware separation (like HTTP middleware but for gRPC)

Go Concepts Showcased

Concept Where It's Used
Generics Cache[K, V] and ShardedCache[K, V] with type constraints
gRPC Service definition, server/client implementation, interceptors
Interfaces FeatureStore interface, Check health check interface
sync.RWMutex Read-heavy cache access (many reads, fewer writes)
sync/atomic Lock-free hit/miss counters in the store layer
Functional options WithMaxSize, WithTTL, WithShardCount
Table-driven tests Comprehensive test suites across all packages
Benchmarks Concurrent read/write performance testing
Graceful shutdown Signal handling, gRPC server GracefulStop
Docker multi-stage Minimal production image

How to Talk About This in an Interview

Interview Talking Points

  1. Start with the problem: "ML models serving ads in real-time need features in under 5ms. This service provides that with a sharded in-memory cache and gRPC for efficient serialization."

  2. Explain the caching strategy: "I used a sharded LRU with per-shard locking to reduce contention. Under high concurrency, a single-lock LRU becomes a bottleneck."

  3. Discuss generics: "The cache is fully generic -- Cache[K, V] -- eliminating type assertions in the hot path. This matters when you're serving millions of feature lookups per second."

  4. Talk about the gRPC design: "Protobuf is significantly more efficient than JSON for structured data. Interceptors provide clean logging and metrics without polluting business logic."

  5. Production readiness: "The service includes health checks, Prometheus metrics, graceful shutdown, and configurable caching parameters -- everything needed to deploy on Kubernetes."


Running the Project

cd go-feature-store

# Install dependencies
go mod tidy

# Run tests
make test

# Run benchmarks
make bench

# Start the server
GRPC_PORT=50051 HTTP_PORT=8081 CACHE_SIZE=10000 SHARD_COUNT=16 make run

# Use grpcurl to test (install: go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest)
grpcurl -plaintext -d '{"entity_id":"user:123","features":{"age":{"int_value":25}}}' \
  localhost:50051 featurestore.v1.FeatureStoreService/SetFeatures