netflixdata-gatewayreferencearchitecture

Netflix Data Gateway Reference

This section contains research and learnings from Netflix's Data Gateway architecture, which serves as inspiration for the Prism data access layer.

What is Netflix's Data Gateway?

Netflix's Data Gateway is a battle-tested platform that provides data abstraction layers to simplify and scale data access across thousands of microservices. It decouples application logic from database implementations, enabling:

Unified APIs for diverse data stores (Cassandra, EVCache, DynamoDB, etc.)
Operational resilience through circuit breaking, load shedding, and failover
Zero-downtime migrations via shadow traffic and dual-write patterns
Massive scale: 8M+ QPS, 3,500+ use cases, petabytes of data

Why This Matters for Prism

Prism adopts many of Netflix's proven patterns while improving on performance and operational simplicity:

Aspect	Netflix Approach	Prism Enhancement
Proxy Layer	JVM-based gateway	Rust-based for 10-100x performance
Configuration	Runtime deployment configs	Client-originated requirements (apps declare needs)
Testing	Production-validated	Local-first with sqlite, local kafka
Deployment	Kubernetes-native	Flexible: bare metal, VMs, or containers

Core Concepts

Data Abstractions

Netflix built multiple abstraction layers on their Data Gateway platform:

Key-Value: Primary abstraction for 400+ use cases
TimeSeries: Handles 10M writes/sec for event tracking
Distributed Counter: Scalable counting with tunable accuracy
Write-Ahead Log (WAL): Durable, ordered mutation delivery

Scale & Performance

Netflix's Data Gateway operates at massive scale:

8 million queries per second (key-value abstraction)
10 million writes per second (time-series data)
3,500+ use cases across the organization
Petabyte-scale storage with low-latency retrieval

Read more about scale metrics →

Migration Patterns

Netflix's approach to zero-downtime migrations:

Dual-Write Pattern: Write to old and new datastores simultaneously
Shadow Traffic: Validate new systems with production load
Phased Cutover: Gradual migration with rollback capability
Schema Evolution: Automated compatibility checking

Key Learnings

1. Abstraction Simplifies Scale

Managing database API complexity becomes unmanageable as services scale. A robust data abstraction layer:

Shields applications from database breaking changes
Provides user-friendly gRPC/HTTP APIs tailored to access patterns
Enables backend changes without application code changes

2. Prioritize Reliability

Building for redundancy and resilience:

Circuit breaking and back-pressure prevent cascading failures
Automated load shedding protects backends during traffic spikes
Rigorous capacity planning prevents resource exhaustion

3. Data Management is Critical

Proactive data lifecycle management:

TTL and cleanup should be designed in from day one
Cost monitoring: Every byte has a cost
Tiering strategies: Move cold data to cost-effective storage

4. Sharding for Isolation

Product/feature sharding prevents noisy neighbor problems:

Dedicated proxy instances per product or SLA tier
Independent scaling and capacity planning
Clear ownership and blast radius containment

Read full lessons learned →

Reference Materials

Netflix Data Gateway Use Cases: Real-world applications
Scale Metrics: Performance and throughput numbers
Data Abstractions: Counter, WAL, and other patterns
Migration Strategies: Dual-write and shadow traffic
Video Transcripts: Conference talks on data abstractions (see sidebar)

PDF References

Original blog posts and articles are archived in the references/ directory:

Data Gateway platform overview
Key-Value abstraction deep dive
Time-series architecture
Real-time data processing

How Prism Uses These Learnings

Prism incorporates Netflix's battle-tested patterns:

Data Abstractions (ADR-026): KeyValue, TimeSeries, Graph, Entity patterns
Client Configuration (ADR-001): Apps declare requirements, Prism provisions
Backend Plugins (RFC-008): Clean abstraction for adding new backends
Shadow Traffic (ADR-031): Zero-downtime migrations like Netflix
Sharding Strategy (ADR-034): Product/feature isolation

See our ADRs and RFCs for implementation details.

Note: This documentation is derived from Netflix's public blog posts, conference talks, and open-source contributions. All credit goes to Netflix for pioneering these patterns at scale.

What is Netflix's Data Gateway?​

Why This Matters for Prism​

Core Concepts​

Data Abstractions​

Scale & Performance​

Migration Patterns​

Key Learnings​

1. Abstraction Simplifies Scale​

2. Prioritize Reliability​

3. Data Management is Critical​

4. Sharding for Isolation​

Reference Materials​

PDF References​

How Prism Uses These Learnings​