Low-Latency Feature Store Service¶
Repository:
go-feature-store/(sibling directory)
What This Project Demonstrates¶
Feature stores are a critical component of ML infrastructure. In ad-tech, features (user segments, contextual signals, historical behavior) must be served to ML models in real-time during bid requests -- typically under 5ms P99.
| Skill Area | How It's Demonstrated |
|---|---|
| gRPC services | Protobuf-defined API with unary and batch RPCs |
| Low-latency design | Generic sharded LRU cache with sub-millisecond reads |
| ML infrastructure | Feature serving patterns used in real ad-tech stacks |
| Generics | Cache layer uses Go generics throughout |
| Observability | Prometheus metrics, gRPC interceptors, structured logging |
| Production patterns | Graceful shutdown, health checks, configurable via env vars |
Architecture¶
graph TD
MLModel["ML Model\n(gRPC Client)"] -->|"GetFeatures / BatchGet"| GRPCServer["gRPC Server\n:50051"]
GRPCServer --> Interceptors["Interceptors\n(logging, metrics)"]
Interceptors --> Server["FeatureStoreServer"]
Server --> Store["InMemoryStore"]
Store --> ShardedCache["ShardedCache\n(N shards)"]
ShardedCache --> Shard0["Shard 0\n(LRU Cache)"]
ShardedCache --> Shard1["Shard 1\n(LRU Cache)"]
ShardedCache --> ShardN["Shard N\n(LRU Cache)"]
GRPCServer -.->|"HTTP :8081"| Health["/health"]
GRPCServer -.->|"HTTP :8081"| Metrics["/metrics"]
Request Flow¶
- gRPC request arrives for features of a given entity (user ID, device ID, etc.)
- Interceptors log the request and record Prometheus metrics (duration, status)
- Store layer looks up features in the sharded cache
- Sharded cache hashes the key to select a shard, then performs LRU lookup within that shard
- Response returns the feature map with hit/miss metadata
Project Structure¶
go-feature-store/
├── cmd/
│ └── server/
│ └── main.go # gRPC + HTTP server entry point
├── internal/
│ ├── cache/
│ │ ├── lru.go # Generic LRU cache with TTL
│ │ ├── lru_test.go # Tests + benchmarks
│ │ └── sharded.go # Sharded cache for reduced contention
│ ├── store/
│ │ ├── store.go # FeatureStore interface + InMemoryStore
│ │ └── store_test.go # Tests
│ ├── server/
│ │ ├── server.go # gRPC server implementation
│ │ └── server_test.go # Tests
│ └── proto/featurestore/v1/
│ ├── featurestore.go # Message types
│ └── featurestore_grpc.go # gRPC service interface
├── proto/featurestore/v1/
│ └── featurestore.proto # Canonical protobuf definition
├── pkg/client/
│ └── client.go # Reusable gRPC client library
├── go.mod
├── Makefile
├── Dockerfile
└── README.md
Key Design Decisions¶
Why Sharded Cache?¶
A single mutex-guarded map becomes a bottleneck under high concurrency. Sharding spreads keys across N independent LRU caches, each with its own lock:
- 16 shards = 16x less mutex contention
- Hash-based sharding distributes keys evenly
- Power-of-two shard count enables bitwise AND for fast index calculation
Why Generic LRU Cache?¶
Go 1.18+ generics eliminate type assertions in the hot path:
type Cache[K comparable, V any] struct { ... }
cache := NewCache[string, FeatureMap](WithMaxSize(10000))
val, ok := cache.Get("user:12345") // No type assertion needed
Why TTL on Cache Entries?¶
ML features have different freshness requirements: - User segments: may be stale for minutes - Real-time signals (last click): must be fresh within seconds
Lazy TTL expiration (check on access, no background goroutine) avoids unnecessary CPU overhead.
Why gRPC Over REST?¶
- Protobuf encoding is 3-10x smaller and faster than JSON
- Strongly typed contract prevents client/server drift
- Streaming RPCs enable future batch streaming patterns
- Interceptors provide clean middleware separation (like HTTP middleware but for gRPC)
Go Concepts Showcased¶
| Concept | Where It's Used |
|---|---|
| Generics | Cache[K, V] and ShardedCache[K, V] with type constraints |
| gRPC | Service definition, server/client implementation, interceptors |
| Interfaces | FeatureStore interface, Check health check interface |
| sync.RWMutex | Read-heavy cache access (many reads, fewer writes) |
| sync/atomic | Lock-free hit/miss counters in the store layer |
| Functional options | WithMaxSize, WithTTL, WithShardCount |
| Table-driven tests | Comprehensive test suites across all packages |
| Benchmarks | Concurrent read/write performance testing |
| Graceful shutdown | Signal handling, gRPC server GracefulStop |
| Docker multi-stage | Minimal production image |
How to Talk About This in an Interview¶
Interview Talking Points
-
Start with the problem: "ML models serving ads in real-time need features in under 5ms. This service provides that with a sharded in-memory cache and gRPC for efficient serialization."
-
Explain the caching strategy: "I used a sharded LRU with per-shard locking to reduce contention. Under high concurrency, a single-lock LRU becomes a bottleneck."
-
Discuss generics: "The cache is fully generic --
Cache[K, V]-- eliminating type assertions in the hot path. This matters when you're serving millions of feature lookups per second." -
Talk about the gRPC design: "Protobuf is significantly more efficient than JSON for structured data. Interceptors provide clean logging and metrics without polluting business logic."
-
Production readiness: "The service includes health checks, Prometheus metrics, graceful shutdown, and configurable caching parameters -- everything needed to deploy on Kubernetes."
Running the Project¶
cd go-feature-store
# Install dependencies
go mod tidy
# Run tests
make test
# Run benchmarks
make bench
# Start the server
GRPC_PORT=50051 HTTP_PORT=8081 CACHE_SIZE=10000 SHARD_COUNT=16 make run
# Use grpcurl to test (install: go install github.com/fullstorydev/grpcurl/cmd/grpcurl@latest)
grpcurl -plaintext -d '{"entity_id":"user:123","features":{"age":{"int_value":25}}}' \
localhost:50051 featurestore.v1.FeatureStoreService/SetFeatures