RFC-043: Kubernetes Deployment Patterns and Scaling Strategies
Summary
Define deployment patterns (StatefulSet vs Deployment) and autoscaling strategies for Prism components in Kubernetes, with focus on backend binding, data locality, and network security.
Context
The PrismStack controller currently treats all components as Deployments with basic replica configuration. However, different components have different statefulness requirements, scaling characteristics, and data locality needs.
Key Questions
- Component Patterns: Which components should be StatefulSets vs Deployments?
- Autoscaling: KEDA vs operator-driven vs launcher-based scaling?
- Backend Binding: How do pattern runners bind to backends with data locality?
- Network Topology: Where do runners run relative to data sources?
- Scaling Triggers: What metrics drive scaling decisions?
Current Implementation
PrismStack:
admin: Deployment (3 replicas) # Should be StatefulSet?
proxy: Deployment (3 replicas) # Correct
webConsole: Deployment (2 replicas) # Correct
patterns:
- keyvalue: Deployment (2 replicas) # Should be StatefulSet?
Problems:
- Admin needs stable identity for Raft consensus
- Pattern runners may need persistent connections to backends
- No backend binding mechanism
- No data locality consideration
- Generic autoscaling doesn't account for pattern-specific metrics
Proposed Solution
1. Component Deployment Patterns
Admin Control Plane: StatefulSet
Rationale: Raft consensus requires stable network identities and persistent storage for log replication.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prism-admin
spec:
serviceName: prism-admin-headless # Stable network IDs
replicas: 3
selector:
matchLabels:
app: prism-admin
template:
metadata:
labels:
app: prism-admin
spec:
containers:
- name: admin
image: ghcr.io/prism/prism-admin:latest
args:
- --node-id=$(POD_NAME) # Use pod name as Raft node ID
- --peers=prism-admin-0.prism-admin-headless:8981,prism-admin-1.prism-admin-headless:8981,prism-admin-2.prism-admin-headless:8981
volumeMounts:
- name: data
mountPath: /var/lib/prism/raft
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Features:
- Stable pod names:
prism-admin-0,prism-admin-1,prism-admin-2 - Headless service for direct pod DNS
- Persistent volumes for Raft logs
- Ordered deployment/scaling
Proxy Data Plane: Deployment + HPA
Rationale: Proxies are stateless routers that scale based on request load.
apiVersion: apps/v1
kind: Deployment
metadata:
name: prism-proxy
spec:
replicas: 3 # Managed by HPA
selector:
matchLabels:
app: prism-proxy
template:
metadata:
labels:
app: prism-proxy
spec:
containers:
- name: proxy
image: ghcr.io/prism/prism-proxy:latest
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: prism-proxy-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: prism-proxy
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Pods
pods:
metric:
name: grpc_requests_per_second
target:
type: AverageValue
averageValue: "1000"
Features:
- Stateless (can scale up/down freely)
- HPA based on CPU + custom metrics
- Service load balancing across replicas
Web Console: Deployment + HPA
Rationale: Web console is stateless UI layer.
apiVersion: apps/v1
kind: Deployment
metadata:
name: prism-web-console
spec:
replicas: 2 # Managed by HPA
# ... same pattern as proxy
Pattern Runners: Deployment or StatefulSet?
Decision Matrix:
| Pattern Type | Deployment | StatefulSet | Rationale |
|---|---|---|---|
| Consumer | ❌ | ✅ | Needs stable consumer group membership, offset management |
| Producer | ✅ | ❌ | Stateless producers, no identity needed |
| KeyValue | ✅ | ❌ | Stateless request/response |
| Mailbox | ❌ | ✅ | Persistent message ownership, FIFO guarantees |
Consumer as StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: consumer-kafka-orders
spec:
serviceName: consumer-kafka-orders-headless
replicas: 5
selector:
matchLabels:
app: consumer-kafka
pattern: orders
template:
metadata:
labels:
app: consumer-kafka
pattern: orders
spec:
containers:
- name: consumer
image: ghcr.io/prism/consumer-runner:latest
env:
- name: CONSUMER_ID
valueFrom:
fieldRef:
fieldPath: metadata.name # Stable consumer ID
- name: KAFKA_BOOTSTRAP
value: "postgres-postgresql:9092"
- name: CONSUMER_GROUP
value: "prism-orders"
Why StatefulSet for Consumer:
- Stable identity for consumer group coordination
- Predictable partition assignment
- Graceful rebalancing on scale up/down
- Persistent offset tracking (if using local storage)
Producer as Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: producer-kafka-events
spec:
replicas: 3 # Can scale freely
selector:
matchLabels:
app: producer-kafka
template:
# ... stateless producer
2. Autoscaling Strategies
Option A: KEDA (Event-Driven Autoscaling)
Pros:
- Kubernetes-native, battle-tested
- 60+ scalers (Kafka, NATS, SQS, Postgres, etc.)
- Scales to zero
- External metrics without custom code
Cons:
- Additional dependency (KEDA operator)
- Limited to supported scalers
- Can't leverage Prism admin metrics directly
Example: Consumer scaling on Kafka lag:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: consumer-kafka-orders-scaler
spec:
scaleTargetRef:
kind: StatefulSet
name: consumer-kafka-orders
pollingInterval: 10
cooldownPeriod: 300
minReplicaCount: 2
maxReplicaCount: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000" # Scale up if lag > 1000 msgs
offsetResetPolicy: latest
Scaling Behavior:
- Lag < 1000: Scale down (respecting cooldown)
- Lag > 1000: Scale up (1 replica per 1000 msgs lag)
- Lag = 0 for extended period: Scale to minReplicaCount
Option B: Operator-Driven Autoscaling
Pros:
- Can leverage Prism admin metrics
- Pattern-specific scaling logic
- Deep integration with Prism semantics
- No external dependencies
Cons:
- More code to maintain
- Must implement metric collection
- Reinventing KEDA for common cases
Example: Custom PrismAutoscaler CRD:
apiVersion: prism.io/v1alpha1
kind: PrismAutoscaler
metadata:
name: consumer-orders-autoscaler
spec:
targetRef:
kind: StatefulSet
name: consumer-kafka-orders
minReplicas: 2
maxReplicas: 50
metrics:
- type: AdminMetric
adminMetric:
metricName: "pattern.consumer.lag"
target:
type: AverageValue
averageValue: "1000"
- type: AdminMetric
adminMetric:
metricName: "pattern.consumer.processing_time_p99"
target:
type: Value
value: "5s" # Scale up if p99 > 5s
Implementation: Operator queries admin gRPC API for metrics, calculates desired replicas, updates StatefulSet.
Option C: prism-launcher (VM-Oriented)
Pros:
- Already implemented
- Works for single-tenant VM deployments
Cons:
- Cuts against Kubernetes primitives
- Doesn't leverage K8s autoscaling
- Complicates networking (launcher needs K8s API access)
- Not cloud-native
Verdict: Use launcher for VM deployments, not Kubernetes.
Recommendation: Hybrid Approach
For standard patterns (Kafka, NATS, SQS):
- Use KEDA for event-driven scaling
- Leverage 60+ built-in scalers
- Standard Kubernetes HPA for CPU/memory
For Prism-specific patterns:
- Implement PrismAutoscaler CRD
- Query admin control plane metrics
- Pattern-specific scaling logic
Example Stack Configuration:
apiVersion: prism.io/v1alpha1
kind: PrismStack
metadata:
name: production
spec:
patterns:
- name: consumer-orders
type: consumer
backend: kafka
autoscaling:
enabled: true
strategy: keda # Use KEDA for Kafka
minReplicas: 2
maxReplicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000"
- name: mailbox-users
type: mailbox
backend: postgres
autoscaling:
enabled: true
strategy: admin # Use admin metrics for custom pattern
minReplicas: 1
maxReplicas: 20
metrics:
- type: AdminMetric
name: "mailbox.queue_depth"
target: 100
3. Backend Binding and Data Locality
Problem Statement
Pattern runners need to access backends (Postgres, Kafka, etc.) with:
- Data locality: Minimize network hops
- Security: Proper namespace isolation and credentials
- Simplicity: Easy to "bind" backend to pattern
Example: Deploy Postgres via Helm, bind to pattern:
# Deploy Postgres to data namespace
helm install postgres bitnami/postgresql -n data-postgres --create-namespace
# How does pattern runner discover and connect?
Solution: Backend Binding via Labels and Services
1. Backend Resource: Deploy backends in their own namespaces
apiVersion: v1
kind: Namespace
metadata:
name: data-postgres
labels:
prism.io/backend-type: postgres
prism.io/backend-name: main-db
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: data-postgres
labels:
prism.io/backend-type: postgres
annotations:
prism.io/connection-string: "postgres:5432"
spec:
selector:
app.kubernetes.io/name: postgresql
ports:
- port: 5432
2. Backend Binding in PrismStack:
apiVersion: prism.io/v1alpha1
kind: PrismStack
metadata:
name: production
spec:
backends:
- name: main-db
type: postgres
# Option A: Explicit connection
connectionString: "postgres.data-postgres.svc:5432"
secretRef:
name: postgres-credentials
namespace: data-postgres
# Option B: Service discovery
serviceRef:
name: postgres
namespace: data-postgres
# Data locality: Deploy runners in same namespace
dataLocality:
strategy: collocate # Deploy in same namespace as backend
namespace: data-postgres
patterns:
- name: consumer-orders
type: consumer
backend: main-db # Binds to backend above
replicas: 5
3. Operator Behavior:
func (r *PrismStackReconciler) reconcilePattern(ctx context.Context, stack *PrismStack, pattern PatternSpec) error {
// Find backend binding
backend := findBackend(stack.Spec.Backends, pattern.Backend)
// Determine namespace for pattern runner
namespace := stack.Namespace // Default
if backend.DataLocality.Strategy == "collocate" {
namespace = backend.DataLocality.Namespace
}
// Create StatefulSet in backend namespace for data locality
statefulSet := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", stack.Name, pattern.Name),
Namespace: namespace, // Deploy near data!
},
Spec: appsv1.StatefulSetSpec{
Template: corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Env: []corev1.EnvVar{
{Name: "BACKEND_TYPE", Value: backend.Type},
{Name: "CONNECTION_STRING", Value: backend.ConnectionString},
{Name: "PROXY_ENDPOINT", Value: getProxyService(stack)},
},
EnvFrom: []corev1.EnvFromSource{{
SecretRef: &corev1.SecretEnvSource{
LocalObjectReference: corev1.LocalObjectReference{
Name: backend.SecretRef.Name,
},
},
}},
}},
},
},
},
}
return r.Create(ctx, statefulSet)
}
4. Network Topology:
┌─────────────────────────────────────────────────────────┐
│ Namespace: prism-system │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Admin │ │ Proxy │ │
│ │ StatefulSet │◀──────│ Deployment │ │
│ │ (3 replicas) │ │ (3 replicas) │ │
│ └──────────────┘ └───────┬──────┘ │
│ │ │
└──────────────────────────────────┼───────────────────────┘
│
gRPC Pattern Requests
│
┌─────────────────────────┼─────────────────────┐
│ ▼ │
│ Namespace: data-postgres (Data Locality) │
│ ┌──────────────────────────────────┐ │
│ │ Consumer Pattern (StatefulSet) │ │
│ │ - consumer-0 │ │
│ │ - consumer-1 │ │
│ │ - consumer-2 │ │
│ └────────────┬─────────────────────┘ │
│ │ localhost/pod network │
│ │ (minimal latency) │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ PostgreSQL (Helm Chart) │ │
│ │ - postgres-0 │ │
│ │ - postgres-1 (replica) │ │
│ └──────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────┘
Benefits:
- Pattern runners in same namespace as backend (data locality)
- NetworkPolicy can restrict access to backend namespace
- Secrets scoped to backend namespace
- Minimal network hops (pod-to-pod on same node if possible)
Security: NetworkPolicy Example
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: postgres-access
namespace: data-postgres
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: postgresql
policyTypes:
- Ingress
ingress:
# Allow from pattern runners in same namespace
- from:
- podSelector:
matchLabels:
prism.io/component: pattern
ports:
- protocol: TCP
port: 5432
4. Scaling Triggers and Metrics
Pattern-Specific Metrics
| Pattern | Primary Metric | Secondary Metric | Scaling Threshold |
|---|---|---|---|
| Consumer | Kafka lag | Processing time p99 | Lag > 1000 msgs |
| Producer | CPU utilization | Throughput | CPU > 75% |
| KeyValue | Request rate | Latency p99 | Requests > 1000/s |
| Mailbox | Queue depth | Message age | Queue > 100 msgs |
KEDA ScaledObject Examples
Consumer (Kafka Lag):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: consumer-scaler
spec:
scaleTargetRef:
kind: StatefulSet
name: consumer-kafka-orders
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000"
Consumer (NATS Queue Depth):
triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: "nats:8222"
stream: "orders"
consumer: "prism-orders"
lagThreshold: "1000"
Producer (CPU):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: producer-hpa
spec:
scaleTargetRef:
kind: Deployment
name: producer-kafka-events
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
Mailbox (Admin Metrics via PrismAutoscaler):
apiVersion: prism.io/v1alpha1
kind: PrismAutoscaler
metadata:
name: mailbox-scaler
spec:
targetRef:
kind: StatefulSet
name: mailbox-users
metrics:
- type: AdminMetric
adminMetric:
endpoint: "prism-admin:8981"
query:
pattern: "mailbox-users"
metric: "queue_depth"
target:
type: AverageValue
averageValue: "100" # Scale up if avg queue > 100
5. Updated PrismStack CRD
apiVersion: prism.io/v1alpha1
kind: PrismStack
metadata:
name: production
namespace: prism-system
spec:
# Admin: StatefulSet with persistent storage
admin:
enabled: true
kind: StatefulSet # NEW
replicas: 3
storage:
size: 1Gi
storageClass: fast-ssd
# Proxy: Deployment with HPA
proxy:
kind: Deployment # NEW (default)
replicas: 3
autoscaling:
enabled: true
strategy: hpa
minReplicas: 3
maxReplicas: 20
targetCPUUtilization: 75
# Web Console: Deployment with HPA
webConsole:
enabled: true
kind: Deployment
replicas: 2
autoscaling:
enabled: true
strategy: hpa
minReplicas: 2
maxReplicas: 10
# Backends with data locality
backends:
- name: main-db
type: postgres
serviceRef:
name: postgres
namespace: data-postgres
secretRef:
name: postgres-creds
namespace: data-postgres
dataLocality:
strategy: collocate
namespace: data-postgres
- name: event-bus
type: kafka
serviceRef:
name: kafka
namespace: data-kafka
dataLocality:
strategy: collocate
namespace: data-kafka
# Patterns with backend binding and autoscaling
patterns:
- name: consumer-orders
type: consumer
kind: StatefulSet # NEW
backend: event-bus
replicas: 5
autoscaling:
enabled: true
strategy: keda
minReplicas: 2
maxReplicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.data-kafka.svc:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000"
- name: producer-events
type: producer
kind: Deployment # Stateless
backend: event-bus
replicas: 3
autoscaling:
enabled: true
strategy: hpa
minReplicas: 3
maxReplicas: 15
targetCPUUtilization: 75
- name: keyvalue-cache
type: keyvalue
kind: Deployment
backend: redis-cache
replicas: 5
- name: mailbox-users
type: mailbox
kind: StatefulSet
backend: main-db
replicas: 2
autoscaling:
enabled: true
strategy: admin # Use admin metrics
minReplicas: 1
maxReplicas: 20
metrics:
- type: AdminMetric
name: "mailbox.queue_depth"
target: 100
Decision
Adopt the following deployment patterns:
Component Types
- Admin: StatefulSet with persistent volumes for Raft
- Proxy: Deployment with HPA (CPU + custom metrics)
- Web Console: Deployment with HPA
- Consumer Pattern: StatefulSet for stable identity
- Producer Pattern: Deployment for stateless operation
- KeyValue Pattern: Deployment for stateless requests
- Mailbox Pattern: StatefulSet for message ownership
Autoscaling Strategy
Hybrid Approach:
- KEDA for standard backends (Kafka, NATS, SQS) - event-driven scaling
- HPA for CPU/memory-based scaling (Proxy, Producer, KeyValue)
- PrismAutoscaler (future) for admin-driven metrics (Mailbox, custom patterns)
- No prism-launcher in Kubernetes (use for VM deployments)
Backend Binding
- Deploy backends in dedicated namespaces (e.g.,
data-postgres) - Pattern runners deployed in backend namespace for data locality
- Service discovery via Kubernetes DNS
- Secrets scoped to backend namespace
- NetworkPolicy for security boundaries
Implementation Phases
Phase 1: Basic Deployment Patterns (Current sprint)
- Convert Admin to StatefulSet
- Keep Proxy/WebConsole as Deployments
- Add
kindfield to PrismStack CRD
Phase 2: KEDA Integration (Next sprint)
- Install KEDA operator
- Support Consumer scaling via Kafka lag
- Support NATS, SQS scalers
Phase 3: Backend Binding (Sprint 3)
- Implement backend service discovery
- Data locality with namespace colocation
- NetworkPolicy templates
Phase 4: PrismAutoscaler (Sprint 4)
- Custom CRD for admin-driven metrics
- Query admin control plane
- Pattern-specific scaling logic
Consequences
Positive
For Operators:
- Clear separation of stateful vs stateless components
- Kubernetes-native autoscaling (battle-tested)
- Data locality reduces latency and improves security
- Backend binding simplifies deployment (Helm + bind)
For Developers:
- Standard Kubernetes patterns (StatefulSet, Deployment, HPA, KEDA)
- No custom launcher complexity in K8s
- Easy to reason about scaling behavior
- Namespace-based security boundaries
For Performance:
- Data locality minimizes network hops
- Pattern-specific scaling metrics
- Efficient autoscaling (KEDA scales to zero)
Negative
Complexity:
- StatefulSet management more complex than Deployment
- KEDA adds another operator dependency
- Backend binding requires namespace coordination
- More CRD fields to configure
Operational:
- Must coordinate backend deployments with Prism
- NetworkPolicy management across namespaces
- Secret propagation to backend namespaces
Migration:
- Existing Deployment-based Admin must migrate to StatefulSet
- Data migration for Raft logs
- Downtime during conversion
Neutral
Alternatives Considered:
- All Deployments: Simpler but loses Raft identity, consumer stability
- All StatefulSets: Overly conservative, slower scaling
- Launcher-based: Not Kubernetes-native, adds complexity
- Pure HPA: Misses event-driven scaling opportunities
Open Questions
-
Admin Migration: How to migrate existing Deployment-based Admin to StatefulSet without downtime?
- Rolling upgrade with Raft leadership transfer?
- Blue/green with data copy?
-
Cross-Namespace Owner References: Kubernetes doesn't allow owner references across namespaces. How to handle PrismStack owning resources in
data-postgres?- Use labels + custom finalizer logic?
- Separate PrismPattern CRD per namespace?
-
KEDA Scalability: Does KEDA handle 100+ ScaledObjects in a cluster?
- Need load testing
- Alternative: Single ScaledObject per backend type with multiple triggers?
-
PrismAutoscaler Priority: When do we implement custom autoscaling vs relying on KEDA?
- Start with KEDA for common cases
- Add PrismAutoscaler only when KEDA insufficient
References
- Kubernetes StatefulSets
- KEDA Documentation
- KEDA Scalers - 60+ supported scalers
- Kubernetes HPA
- NetworkPolicy
- ADR-037: Kubernetes Operator with CRDs
- RFC-017: Multicast Registry Pattern (backend binding concepts)
Next Steps
-
Update
prism-operator/api/v1alpha1/prismstack_types.go:- Add
Kindfield (StatefulSet | Deployment) - Add
Storagespec for StatefulSet volumes - Add
DataLocalityto BackendSpec
- Add
-
Update
prism-operator/controllers/prismstack_controller.go:- Implement
reconcileAdminStatefulSet() - Support backend namespace colocation
- Handle cross-namespace resources
- Implement
-
Create KEDA integration:
- Add ScaledObject reconciliation
- Support common scalers (Kafka, NATS, SQS)
-
Document migration guide:
- Deployment → StatefulSet for Admin
- Data migration procedures