RFC-048: Cross-Proxy Partition Strategies and Request Forwarding
Abstract
This RFC specifies partition strategies for distributing namespace workloads across multiple Prism proxy instances. When a namespace is created, the admin plane assigns it to a partition using configurable strategies (consistent hashing, key range assignment, or explicit bucket mapping). Pattern runners execute on the proxy instance responsible for the partition. Proxies receiving requests for non-local partitions forward them to the appropriate proxy instance. The partition assignment is stored in namespace configuration and distributed across all proxies via the admin plane control channel.
Key Benefits:
- Horizontal Scaling: Distribute namespace load across multiple proxy instances
- Consistent Routing: Same namespace always routes to same proxy (until rebalancing)
- Flexible Strategies: Support consistent hashing, key ranges, and explicit mapping
- Transparent Forwarding: Clients can connect to any proxy; requests automatically forwarded
- Rebalancing Support: Move partitions between proxies without client changes
Motivation
Problem Statement
Current proxy architecture lacks workload distribution mechanisms:
Problem 1: No Horizontal Scaling
- All namespaces run on all proxy instances
- Cannot scale specific namespace workloads independently
- Resource contention between unrelated namespaces
Problem 2: No Data Locality
- Pattern runners don't know which proxy owns partition
- Proxies cannot route requests to partition owner
- No optimization for co-location with backend data
Problem 3: Unpredictable Routing
- Client connections load-balanced randomly across proxies
- Same namespace may execute on different proxies per request
- No affinity for stateful operations
Problem 4: Rebalancing Complexity
- Adding/removing proxies requires manual reconfiguration
- No automatic redistribution of partitions
- Downtime required for topology changes
Goals
- Partition Assignment: Admin plane assigns namespaces to partitions using configurable strategies
- Consistent Routing: Deterministic mapping from namespace to partition to proxy
- Request Forwarding: Proxies forward requests to partition owner automatically
- Multiple Strategies: Support consistent hashing, key range, and bucket mapping
- Rebalancing Protocol: Move partitions between proxies with minimal disruption
- Pattern Runner Placement: Pattern runners execute only on partition-owning proxy
Non-Goals
- Data Plane Partitioning: This RFC covers control plane partition assignment only
- Backend-Level Partitioning: Backend data partitioning (Kafka partitions, DB sharding)
- Cross-Region Routing: Multi-cluster partition distribution (see RFC-012)
- Client-Side Routing: Clients always connect to any proxy, never compute partitions
Partition Strategies
Strategy 1: Consistent Hashing (Default)
Description: Hash namespace name to determine partition ID (0-255). Partition ranges assigned to proxies using consistent hashing ring.
Use Cases:
- General-purpose workload distribution
- Minimal rebalancing on proxy additions/removals
- Good for large fleets (10+ proxies)
Characteristics:
- Partition Count: 256 (fixed)
- Distribution: ~Equal partitions per proxy
- Rebalancing: Only affected partitions move on topology change
- Locality: Deterministic, same namespace always on same proxy
Algorithm:
Hash Function: CRC32(namespace_name) % 256 → partition_id (0-255)
Consistent Hash Ring:
partition_id → proxy_id mapping
Example with 4 proxies:
Proxy A: partitions [0-63]
Proxy B: partitions [64-127]
Proxy C: partitions [128-191]
Proxy D: partitions [192-255]
Adding Proxy E:
Rebalance ~20% of partitions to E
Proxy A: [0-50] (loses 13)
Proxy B: [64-114] (loses 13)
Proxy C: [128-178] (loses 13)
Proxy D: [192-242] (loses 13)
Proxy E: [51-63, 115-127, 179-191, 243-255] (gains 52)
Configuration:
partition_strategy:
type: consistent_hash
partition_count: 256
hash_function: crc32 # or murmur3, xxhash
rebalance_threshold: 0.1 # Rebalance if imbalance > 10%
Strategy 2: Key Range Assignment
Description: Namespace names assigned to partitions based on lexicographic key ranges. Useful for alphabetically organizing namespaces.
Use Cases:
- Multi-tenant SaaS (e.g., all "acme-*" namespaces on same proxy)
- Alphabetical organization requirements
- Predictable placement for operations
Characteristics:
- Partition Count: Configurable (e.g., 26 for A-Z)
- Distribution: May be uneven if namespace names non-uniform
- Rebalancing: Range splits required
- Locality: Alphabetically adjacent namespaces co-located
Algorithm:
Key Range Mapping:
namespace_name[0] → partition_id
Example with 4 proxies:
Proxy A: ['a'-'f'] → partitions [0-5]
Proxy B: ['g'-'m'] → partitions [6-12]
Proxy C: ['n'-'t'] → partitions [13-19]
Proxy D: ['u'-'z'] → partitions [20-25]
Result:
"orders-prod" → 'o' → partition 14 → Proxy C
"users-cache" → 'u' → partition 20 → Proxy D
"analytics-v2" → 'a' → partition 0 → Proxy A
Configuration:
partition_strategy:
type: key_range
ranges:
- range: "a-f"
partition_ids: [0, 1, 2, 3, 4, 5]
proxy: proxy-01
- range: "g-m"
partition_ids: [6, 7, 8, 9, 10, 11, 12]
proxy: proxy-02
- range: "n-t"
partition_ids: [13, 14, 15, 16, 17, 18, 19]
proxy: proxy-03
- range: "u-z"
partition_ids: [20, 21, 22, 23, 24, 25]
proxy: proxy-04
Strategy 3: Explicit Bucket Mapping
Description: Namespaces explicitly assigned to buckets/partitions in namespace configuration. Admin or operator controls placement.
Use Cases:
- Fine-grained control over placement
- Resource isolation (e.g., high-priority namespaces on dedicated proxies)
- Manual optimization based on observed workload
Characteristics:
- Partition Count: Flexible
- Distribution: Controlled by operator
- Rebalancing: Manual reassignment
- Locality: Operator-defined
Algorithm:
Explicit Mapping:
namespace → partition_id (stored in namespace config)
Example:
"orders-prod" → partition 0 → Proxy A
"orders-staging" → partition 1 → Proxy B
"users-prod" → partition 0 → Proxy A (co-located with orders-prod)
"analytics-heavy" → partition 5 → Proxy F (dedicated proxy)
Configuration:
partition_strategy:
type: explicit
# Partition assignment per namespace in namespace config
# Namespace configuration
namespace: orders-prod
partition:
id: 0
proxy: proxy-01
reason: "High-priority workload, dedicated resources"
Architecture
Component Interactions
┌────────────────────────────────────────────────────────────────┐
│ Admin Control Plane │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Partition Manager │ │
│ │ - Assign namespaces to partitions │ │
│ │ - Maintain partition → proxy mapping │ │
│ │ - Distribute assignments to proxies │ │
│ │ - Handle rebalancing requests │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└────────────────────┬───────────────────────────────────────────┘
│ Control plane gRPC
│ (partition assignments pushed to proxies)
│
┌─────────────┼─────────────┬────────────────┐
│ │ │ │
┌──────▼──────┐ ┌───▼────────┐ ┌──▼───────────┐ ┌─▼────────────┐
│ Proxy A │ │ Proxy B │ │ Proxy C │ │ Proxy D │
│ │ │ │ │ │ │ │
│ Partitions: │ │ Partitions:│ │ Partitions: │ │ Partitions: │
│ [0-63] │ │ [64-127] │ │ [128-191] │ │ [192-255] │
│ │ │ │ │ │ │ │
│ Namespaces: │ │ Namespaces:│ │ Namespaces: │ │ Namespaces: │
│ - orders- │ │ - users- │ │ - analytics- │ │ - video- │
│ prod │ │ cache │ │ v2 │ │ events │
│ - payments- │ │ - sessions │ │ - logs-prod │ │ - transcoding│
│ api │ │ │ │ │ │ │
└─────────────┘ └────────────┘ └──────────────┘ └──────────────┘
▲ ▲ ▲ ▲
│ │ │ │
└─────────────┴─────────────┴────────────────┘
│
┌──────▼───────┐
│ Clients │
│ (connect to │
│ any proxy) │
└──────────────┘
Request Flow:
1. Client connects to Proxy B
2. Client requests operation on "orders-prod" namespace
3. Proxy B checks partition table:
- "orders-prod" → partition 12 → Proxy A
4. Proxy B forwards request to Proxy A via gRPC
5. Proxy A executes operation and returns result
6. Proxy B returns result to client
Partition Assignment Flow
Partition Assignment Protocol
Protobuf Definition
// proto/prism/admin/v1/partition.proto
syntax = "proto3";
package prism.admin.v1;
import "google/protobuf/timestamp.proto";
service PartitionManagementService {
// Get partition assignment for a namespace
rpc GetPartitionAssignment(GetPartitionAssignmentRequest)
returns (GetPartitionAssignmentResponse);
// List all partition assignments
rpc ListPartitionAssignments(ListPartitionAssignmentsRequest)
returns (ListPartitionAssignmentsResponse);
// Rebalance partitions (admin operation)
rpc RebalancePartitions(RebalancePartitionsRequest)
returns (RebalancePartitionsResponse);
// Get partition topology (proxy → partitions mapping)
rpc GetPartitionTopology(GetPartitionTopologyRequest)
returns (GetPartitionTopologyResponse);
}
message GetPartitionAssignmentRequest {
string namespace = 1;
}
message GetPartitionAssignmentResponse {
PartitionAssignment assignment = 1;
}
message ListPartitionAssignmentsRequest {
// Optional: Filter by proxy
optional string proxy_filter = 1;
// Pagination
int32 page_size = 2;
string page_token = 3;
}
message ListPartitionAssignmentsResponse {
repeated PartitionAssignment assignments = 1;
string next_page_token = 2;
int32 total_count = 3;
}
message RebalancePartitionsRequest {
// Trigger rebalancing algorithm
RebalanceStrategy strategy = 1;
// Optional: Target proxy for adding/removing
optional string target_proxy = 2;
// Dry run mode
bool dry_run = 3;
}
message RebalancePartitionsResponse {
bool success = 1;
// Partitions to be moved
repeated PartitionMove moves = 2;
// Estimated impact
RebalanceImpact impact = 3;
}
message GetPartitionTopologyRequest {
// Empty request
}
message GetPartitionTopologyResponse {
// Current partition strategy
PartitionStrategy strategy = 1;
// Proxy → partition range mappings
repeated ProxyPartitionMapping mappings = 2;
// Partition → namespace mappings
map<int32, NamespaceList> partition_namespaces = 3;
// Cluster statistics
TopologyStatistics statistics = 4;
}
message PartitionAssignment {
string namespace = 1;
int32 partition_id = 2;
string proxy_id = 3;
google.protobuf.Timestamp assigned_at = 4;
PartitionStrategyType strategy_type = 5;
}
message ProxyPartitionMapping {
string proxy_id = 1;
string proxy_address = 2;
repeated PartitionRange ranges = 3;
int32 namespace_count = 4;
}
message PartitionRange {
int32 start = 1; // Inclusive
int32 end = 2; // Inclusive
}
message PartitionMove {
int32 partition_id = 1;
string from_proxy = 2;
string to_proxy = 3;
repeated string affected_namespaces = 4;
}
message RebalanceImpact {
int32 total_moves = 1;
int32 affected_namespaces = 2;
int64 estimated_data_transfer_bytes = 3;
google.protobuf.Duration estimated_duration = 4;
}
message PartitionStrategy {
PartitionStrategyType type = 1;
int32 partition_count = 2;
map<string, string> config = 3;
}
message NamespaceList {
repeated string namespaces = 1;
}
message TopologyStatistics {
int32 total_proxies = 1;
int32 total_partitions = 2;
int32 total_namespaces = 3;
double average_namespaces_per_proxy = 4;
double partition_distribution_variance = 5;
}
enum PartitionStrategyType {
PARTITION_STRATEGY_UNSPECIFIED = 0;
PARTITION_STRATEGY_CONSISTENT_HASH = 1;
PARTITION_STRATEGY_KEY_RANGE = 2;
PARTITION_STRATEGY_EXPLICIT = 3;
}
enum RebalanceStrategy {
REBALANCE_STRATEGY_UNSPECIFIED = 0;
REBALANCE_STRATEGY_MINIMIZE_MOVES = 1; // Fewest partitions moved
REBALANCE_STRATEGY_BALANCE_LOAD = 2; // Even distribution
REBALANCE_STRATEGY_DRAIN_PROXY = 3; // Move all partitions off proxy
}
Request Forwarding
Forwarding Modes
Mode 1: Transparent Forwarding (Recommended)
Clients connect to any proxy. Non-owner proxies forward requests to partition owner.
Client → Proxy B → Proxy A (partition owner) → Backend
↑
(transparent forwarding)
Pros:
- Simple client configuration (any proxy works)
- Load balancing across all proxies
- No client-side routing logic
Cons:
- Extra network hop for non-local requests
- Increased latency (~1-2ms)
Mode 2: Redirect with Discovery
Non-owner proxy returns "redirect" error with correct proxy address.
Client → Proxy B → Error: Redirect to Proxy A
Client → Proxy A (directly) → Backend
Pros:
- No extra hop after first request
- Client caches routing information
Cons:
- Client must implement redirect logic
- More complex client SDK
Mode 3: Client-Side Routing (Not Recommended)
Clients query admin plane for partition assignments and connect directly to owner.
Client → Admin (query) → partition → proxy address
Client → Proxy A (directly) → Backend
Pros:
- No forwarding overhead
- Optimal latency
Cons:
- Client complexity (partition calculation, caching, cache invalidation)
- Tight coupling between client and partition strategy
Decision: Use Transparent Forwarding (Mode 1) as default.
Forwarding Implementation
// prism-proxy/src/forwarding/mod.rs
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::RwLock;
use tonic::transport::Channel;
pub struct RequestForwarder {
partition_table: Arc<RwLock<PartitionTable>>,
proxy_connections: Arc<RwLock<HashMap<String, Channel>>>,
local_proxy_id: String,
}
pub struct PartitionTable {
// partition_id → proxy_id
assignments: HashMap<u32, String>,
// proxy_id → gRPC endpoint
proxy_endpoints: HashMap<String, String>,
// namespace → partition_id (cache)
namespace_cache: HashMap<String, u32>,
}
impl RequestForwarder {
pub async fn route_request<T>(
&self,
namespace: &str,
request: T,
) -> Result<T::Response>
where
T: GrpcRequest,
{
// Lookup partition
let partition_id = self.get_partition_for_namespace(namespace).await?;
// Lookup owning proxy
let table = self.partition_table.read().await;
let owner_proxy_id = table.assignments.get(&partition_id)
.ok_or(Error::PartitionNotAssigned)?;
// Check if local
if *owner_proxy_id == self.local_proxy_id {
// Execute locally
return self.execute_local(namespace, request).await;
}
// Forward to remote proxy
self.forward_to_proxy(owner_proxy_id, namespace, request).await
}
async fn get_partition_for_namespace(&self, namespace: &str) -> Result<u32> {
// Check cache
{
let table = self.partition_table.read().await;
if let Some(&partition_id) = table.namespace_cache.get(namespace) {
return Ok(partition_id);
}
}
// Calculate partition (consistent hash)
let partition_id = self.calculate_partition(namespace);
// Cache result
{
let mut table = self.partition_table.write().await;
table.namespace_cache.insert(namespace.to_string(), partition_id);
}
Ok(partition_id)
}
fn calculate_partition(&self, namespace: &str) -> u32 {
// CRC32 hash
let hash = crc32fast::hash(namespace.as_bytes());
(hash % 256) as u32
}
async fn forward_to_proxy<T>(
&self,
proxy_id: &str,
namespace: &str,
request: T,
) -> Result<T::Response>
where
T: GrpcRequest,
{
// Get or create connection to remote proxy
let channel = self.get_proxy_connection(proxy_id).await?;
// Create gRPC client
let mut client = T::create_client(channel);
// Forward request with metadata
let mut req = tonic::Request::new(request);
req.metadata_mut().insert("x-forwarded-from", self.local_proxy_id.parse()?);
req.metadata_mut().insert("x-namespace", namespace.parse()?);
// Execute remote call
let response = client.call(req).await?;
Ok(response.into_inner())
}
async fn get_proxy_connection(&self, proxy_id: &str) -> Result<Channel> {
// Check existing connections
{
let connections = self.proxy_connections.read().await;
if let Some(channel) = connections.get(proxy_id) {
return Ok(channel.clone());
}
}
// Create new connection
let table = self.partition_table.read().await;
let endpoint = table.proxy_endpoints.get(proxy_id)
.ok_or(Error::ProxyNotFound)?;
let channel = Channel::from_shared(endpoint.clone())?
.connect()
.await?;
// Cache connection
{
let mut connections = self.proxy_connections.write().await;
connections.insert(proxy_id.to_string(), channel.clone());
}
Ok(channel)
}
// Update partition table from admin plane
pub async fn update_partition_table(&self, assignments: Vec<PartitionAssignment>) {
let mut table = self.partition_table.write().await;
for assignment in assignments {
table.assignments.insert(assignment.partition_id, assignment.proxy_id.clone());
// Also cache namespace → partition mapping
table.namespace_cache.insert(assignment.namespace, assignment.partition_id);
}
}
}
Forwarding Metadata
Forwarded requests include metadata to prevent loops and track hops:
x-forwarded-from: proxy-b # Originating proxy
x-forwarded-for: client-12.34.56 # Original client IP
x-forwarding-hop: 1 # Hop count (max 2)
x-namespace: orders-prod # Target namespace
x-partition-id: 42 # Target partition
Loop Prevention:
- Maximum 1 forwarding hop allowed
- Requests with
x-forwarding-hop: 1cannot be forwarded again - Return error if partition assignment incorrect
Partition Rebalancing
Trigger Conditions
Rebalancing triggered when:
- New proxy joins cluster (admin registers new proxy)
- Proxy removed from cluster (maintenance or failure)
- Imbalance exceeds threshold (e.g., >10% variance in namespace counts)
- Manual operator request
Rebalancing Algorithm
// cmd/prism-admin/partition_rebalancer.go
package main
import (
"context"
"math"
)
type PartitionRebalancer struct {
partitionManager *PartitionManager
minMoves bool // Minimize partition moves
}
func (r *PartitionRebalancer) Rebalance(ctx context.Context) (*RebalancePlan, error) {
// Get current topology
topology, err := r.partitionManager.GetTopology(ctx)
if err != nil {
return nil, err
}
// Calculate target distribution
targetPartitionsPerProxy := 256 / len(topology.Proxies)
// Calculate imbalance
variance := r.calculateVariance(topology)
if variance < 0.1 {
// Balanced enough, no action needed
return &RebalancePlan{Moves: []PartitionMove{}}, nil
}
// Generate move plan
plan := r.generateMovePlan(topology, targetPartitionsPerProxy)
return plan, nil
}
func (r *PartitionRebalancer) generateMovePlan(
topology *Topology,
targetPerProxy int,
) *RebalancePlan {
plan := &RebalancePlan{Moves: []PartitionMove{}}
// Sort proxies by partition count
overloaded := findOverloadedProxies(topology, targetPerProxy)
underloaded := findUnderloadedProxies(topology, targetPerProxy)
// Move partitions from overloaded to underloaded
for len(overloaded) > 0 && len(underloaded) > 0 {
fromProxy := overloaded[0]
toProxy := underloaded[0]
// Select partition to move (prefer partitions with fewer namespaces)
partition := selectPartitionToMove(fromProxy, r.partitionManager)
move := PartitionMove{
PartitionID: partition.ID,
FromProxy: fromProxy.ID,
ToProxy: toProxy.ID,
AffectedNamespaces: partition.Namespaces,
}
plan.Moves = append(plan.Moves, move)
// Update counts
fromProxy.PartitionCount--
toProxy.PartitionCount++
// Resort
overloaded = findOverloadedProxies(topology, targetPerProxy)
underloaded = findUnderloadedProxies(topology, targetPerProxy)
}
return plan
}
func (r *PartitionRebalancer) ExecuteRebalance(
ctx context.Context,
plan *RebalancePlan,
) error {
for _, move := range plan.Moves {
// 1. Notify target proxy to prepare for partition
err := r.notifyProxyPrepare(ctx, move.ToProxy, move.PartitionID)
if err != nil {
return err
}
// 2. Update partition table (admin storage)
err = r.partitionManager.UpdateAssignment(ctx, move.PartitionID, move.ToProxy)
if err != nil {
return err
}
// 3. Push new assignment to all proxies
err = r.partitionManager.PushAssignments(ctx)
if err != nil {
return err
}
// 4. Wait for target proxy to start pattern runners
err = r.waitForProxyReady(ctx, move.ToProxy, move.PartitionID)
if err != nil {
return err
}
// 5. Notify source proxy to stop pattern runners
err = r.notifyProxyRelease(ctx, move.FromProxy, move.PartitionID)
if err != nil {
return err
}
// Log rebalance operation
log.Printf("Moved partition %d: %s → %s (namespaces: %v)",
move.PartitionID, move.FromProxy, move.ToProxy, move.AffectedNamespaces)
}
return nil
}
Rebalancing Protocol
Partition Move: partition 42 from Proxy A to Proxy B
Step 1: Prepare Phase
Admin → Proxy B: PreparePartition(42, namespaces=["orders-prod"])
Proxy B: Allocate resources, start pattern runners (passive mode)
Proxy B → Admin: ACK
Step 2: Update Partition Table
Admin: UPDATE partitions SET proxy_id = 'proxy-b' WHERE partition_id = 42
Admin: Increment partition_table_version
Step 3: Distribute New Assignment
Admin → All Proxies: PartitionAssignmentUpdate(version, assignments)
All Proxies: Update local partition table, route new requests to Proxy B
Step 4: Activate Target
Admin → Proxy B: ActivatePartition(42)
Proxy B: Pattern runners switch to active mode
Proxy B → Admin: ACK
Step 5: Drain Source
Admin → Proxy A: ReleasePartition(42)
Proxy A: Gracefully stop pattern runners
Proxy A: Wait for in-flight requests to complete
Proxy A → Admin: ACK
Step 6: Confirm Complete
Admin: Log rebalance completion
Admin: Trigger garbage collection on Proxy A
Configuration
Admin Plane Configuration
# prism-admin.yaml
partition_management:
# Partition strategy
strategy: consistent_hash # or key_range, explicit
# Consistent hash settings
consistent_hash:
partition_count: 256
hash_function: crc32 # or murmur3, xxhash
rebalance_on_topology_change: true
# Key range settings (alternative)
key_range:
ranges:
- keys: "a-f"
partition_ids: [0, 1, 2, 3, 4, 5]
- keys: "g-m"
partition_ids: [6, 7, 8, 9, 10, 11, 12]
# ...
# Rebalancing
rebalancing:
enabled: true
auto_rebalance: false # Manual trigger only
imbalance_threshold: 0.1 # 10% variance triggers warning
min_partition_move_interval: 5m # Cooldown between moves
Proxy Configuration
# prism-proxy.yaml
request_forwarding:
enabled: true
mode: transparent # or redirect, local_only
# Forwarding timeouts
forward_timeout: 30s
max_forwarding_hops: 1
# Connection pool to other proxies
proxy_connections:
max_idle: 10
max_open: 100
idle_timeout: 5m
# Partition table refresh
partition_table:
refresh_interval: 30s # Poll admin for updates
cache_size: 10000 # Cache namespace → partition mappings
Monitoring and Metrics
Partition Distribution Metrics
# Prometheus metrics
# Partition assignments per proxy
prism_partition_count{proxy_id="proxy-a"} 64
# Namespace count per proxy
prism_namespace_count{proxy_id="proxy-a"} 127
# Partition imbalance (variance from ideal)
prism_partition_imbalance 0.05
# Forwarding metrics
prism_forwarding_requests_total{from="proxy-b",to="proxy-a"} 1234
prism_forwarding_latency_seconds{from="proxy-b",to="proxy-a",quantile="0.99"} 0.002
# Rebalancing metrics
prism_rebalance_moves_total 15
prism_rebalance_duration_seconds 45.2
Health Checks
# Check partition topology
grpcurl -d '{}' localhost:8981 prism.admin.v1.PartitionManagementService/GetPartitionTopology
# Check partition assignment for namespace
grpcurl -d '{"namespace":"orders-prod"}' localhost:8981 \
prism.admin.v1.PartitionManagementService/GetPartitionAssignment
# List all assignments
grpcurl -d '{}' localhost:8981 \
prism.admin.v1.PartitionManagementService/ListPartitionAssignments
Migration and Rollout
Phase 1: Single Proxy (No Partitioning)
Existing deployments with single proxy:
- No partition assignment needed
- All namespaces execute locally
- No forwarding required
Phase 2: Enable Partitioning (Multi-Proxy)
Steps to enable partitioning:
-
Deploy Admin Plane Updates
- Add partition management tables
- Deploy partition assignment logic
- Keep forwarding disabled initially
-
Assign Existing Namespaces
- Run backfill job to assign partitions to existing namespaces
- Use consistent hashing strategy
- All namespaces initially assigned to Proxy A
-
Enable Forwarding
- Update proxy config:
forwarding.enabled: true - Proxies start accepting forwarded requests
- No routing changes yet (all on Proxy A)
- Update proxy config:
-
First Rebalancing
- Manually trigger rebalance
- Move partitions to other proxies
- Verify forwarding works correctly
-
Enable Auto-Rebalancing
- Set
rebalancing.auto_rebalance: true - Admin automatically balances on topology changes
- Set
Related Documents
- RFC-047: Namespace Reservation with Lease Management - Companion RFC for namespace reservation
- ADR-055: Proxy-Admin Control Plane - Control plane protocol with partition distribution
- ADR-006: Namespace and Multi-Tenancy - Namespace isolation
- RFC-043: Kubernetes Deployment Patterns - Kubernetes partitioning considerations
- Consistent Hashing (Wikipedia)
Revision History
- 2025-10-25: Initial draft - Cross-proxy partition strategies with consistent hashing, key range, and explicit mapping