Skip to main content

RFC-043: Kubernetes Deployment Patterns and Scaling Strategies

Summary

Define deployment patterns (StatefulSet vs Deployment) and autoscaling strategies for Prism components in Kubernetes, with focus on backend binding, data locality, and network security.

Context

The PrismStack controller currently treats all components as Deployments with basic replica configuration. However, different components have different statefulness requirements, scaling characteristics, and data locality needs.

Key Questions

  1. Component Patterns: Which components should be StatefulSets vs Deployments?
  2. Autoscaling: KEDA vs operator-driven vs launcher-based scaling?
  3. Backend Binding: How do pattern runners bind to backends with data locality?
  4. Network Topology: Where do runners run relative to data sources?
  5. Scaling Triggers: What metrics drive scaling decisions?

Current Implementation

PrismStack:
admin: Deployment (3 replicas) # Should be StatefulSet?
proxy: Deployment (3 replicas) # Correct
webConsole: Deployment (2 replicas) # Correct
patterns:
- keyvalue: Deployment (2 replicas) # Should be StatefulSet?

Problems:

  • Admin needs stable identity for Raft consensus
  • Pattern runners may need persistent connections to backends
  • No backend binding mechanism
  • No data locality consideration
  • Generic autoscaling doesn't account for pattern-specific metrics

Proposed Solution

1. Component Deployment Patterns

Admin Control Plane: StatefulSet

Rationale: Raft consensus requires stable network identities and persistent storage for log replication.

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prism-admin
spec:
serviceName: prism-admin-headless # Stable network IDs
replicas: 3
selector:
matchLabels:
app: prism-admin
template:
metadata:
labels:
app: prism-admin
spec:
containers:
- name: admin
image: ghcr.io/prism/prism-admin:latest
args:
- --node-id=$(POD_NAME) # Use pod name as Raft node ID
- --peers=prism-admin-0.prism-admin-headless:8981,prism-admin-1.prism-admin-headless:8981,prism-admin-2.prism-admin-headless:8981
volumeMounts:
- name: data
mountPath: /var/lib/prism/raft
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi

Features:

  • Stable pod names: prism-admin-0, prism-admin-1, prism-admin-2
  • Headless service for direct pod DNS
  • Persistent volumes for Raft logs
  • Ordered deployment/scaling

Proxy Data Plane: Deployment + HPA

Rationale: Proxies are stateless routers that scale based on request load.

apiVersion: apps/v1
kind: Deployment
metadata:
name: prism-proxy
spec:
replicas: 3 # Managed by HPA
selector:
matchLabels:
app: prism-proxy
template:
metadata:
labels:
app: prism-proxy
spec:
containers:
- name: proxy
image: ghcr.io/prism/prism-proxy:latest
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: prism-proxy-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: prism-proxy
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
- type: Pods
pods:
metric:
name: grpc_requests_per_second
target:
type: AverageValue
averageValue: "1000"

Features:

  • Stateless (can scale up/down freely)
  • HPA based on CPU + custom metrics
  • Service load balancing across replicas

Web Console: Deployment + HPA

Rationale: Web console is stateless UI layer.

apiVersion: apps/v1
kind: Deployment
metadata:
name: prism-web-console
spec:
replicas: 2 # Managed by HPA
# ... same pattern as proxy

Pattern Runners: Deployment or StatefulSet?

Decision Matrix:

Pattern TypeDeploymentStatefulSetRationale
ConsumerNeeds stable consumer group membership, offset management
ProducerStateless producers, no identity needed
KeyValueStateless request/response
MailboxPersistent message ownership, FIFO guarantees

Consumer as StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: consumer-kafka-orders
spec:
serviceName: consumer-kafka-orders-headless
replicas: 5
selector:
matchLabels:
app: consumer-kafka
pattern: orders
template:
metadata:
labels:
app: consumer-kafka
pattern: orders
spec:
containers:
- name: consumer
image: ghcr.io/prism/consumer-runner:latest
env:
- name: CONSUMER_ID
valueFrom:
fieldRef:
fieldPath: metadata.name # Stable consumer ID
- name: KAFKA_BOOTSTRAP
value: "postgres-postgresql:9092"
- name: CONSUMER_GROUP
value: "prism-orders"

Why StatefulSet for Consumer:

  • Stable identity for consumer group coordination
  • Predictable partition assignment
  • Graceful rebalancing on scale up/down
  • Persistent offset tracking (if using local storage)

Producer as Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: producer-kafka-events
spec:
replicas: 3 # Can scale freely
selector:
matchLabels:
app: producer-kafka
template:
# ... stateless producer

2. Autoscaling Strategies

Option A: KEDA (Event-Driven Autoscaling)

Pros:

  • Kubernetes-native, battle-tested
  • 60+ scalers (Kafka, NATS, SQS, Postgres, etc.)
  • Scales to zero
  • External metrics without custom code

Cons:

  • Additional dependency (KEDA operator)
  • Limited to supported scalers
  • Can't leverage Prism admin metrics directly

Example: Consumer scaling on Kafka lag:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: consumer-kafka-orders-scaler
spec:
scaleTargetRef:
kind: StatefulSet
name: consumer-kafka-orders
pollingInterval: 10
cooldownPeriod: 300
minReplicaCount: 2
maxReplicaCount: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000" # Scale up if lag > 1000 msgs
offsetResetPolicy: latest

Scaling Behavior:

  • Lag < 1000: Scale down (respecting cooldown)
  • Lag > 1000: Scale up (1 replica per 1000 msgs lag)
  • Lag = 0 for extended period: Scale to minReplicaCount

Option B: Operator-Driven Autoscaling

Pros:

  • Can leverage Prism admin metrics
  • Pattern-specific scaling logic
  • Deep integration with Prism semantics
  • No external dependencies

Cons:

  • More code to maintain
  • Must implement metric collection
  • Reinventing KEDA for common cases

Example: Custom PrismAutoscaler CRD:

apiVersion: prism.io/v1alpha1
kind: PrismAutoscaler
metadata:
name: consumer-orders-autoscaler
spec:
targetRef:
kind: StatefulSet
name: consumer-kafka-orders
minReplicas: 2
maxReplicas: 50
metrics:
- type: AdminMetric
adminMetric:
metricName: "pattern.consumer.lag"
target:
type: AverageValue
averageValue: "1000"
- type: AdminMetric
adminMetric:
metricName: "pattern.consumer.processing_time_p99"
target:
type: Value
value: "5s" # Scale up if p99 > 5s

Implementation: Operator queries admin gRPC API for metrics, calculates desired replicas, updates StatefulSet.

Option C: prism-launcher (VM-Oriented)

Pros:

  • Already implemented
  • Works for single-tenant VM deployments

Cons:

  • Cuts against Kubernetes primitives
  • Doesn't leverage K8s autoscaling
  • Complicates networking (launcher needs K8s API access)
  • Not cloud-native

Verdict: Use launcher for VM deployments, not Kubernetes.

Recommendation: Hybrid Approach

For standard patterns (Kafka, NATS, SQS):

  • Use KEDA for event-driven scaling
  • Leverage 60+ built-in scalers
  • Standard Kubernetes HPA for CPU/memory

For Prism-specific patterns:

  • Implement PrismAutoscaler CRD
  • Query admin control plane metrics
  • Pattern-specific scaling logic

Example Stack Configuration:

apiVersion: prism.io/v1alpha1
kind: PrismStack
metadata:
name: production
spec:
patterns:
- name: consumer-orders
type: consumer
backend: kafka
autoscaling:
enabled: true
strategy: keda # Use KEDA for Kafka
minReplicas: 2
maxReplicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000"

- name: mailbox-users
type: mailbox
backend: postgres
autoscaling:
enabled: true
strategy: admin # Use admin metrics for custom pattern
minReplicas: 1
maxReplicas: 20
metrics:
- type: AdminMetric
name: "mailbox.queue_depth"
target: 100

3. Backend Binding and Data Locality

Problem Statement

Pattern runners need to access backends (Postgres, Kafka, etc.) with:

  • Data locality: Minimize network hops
  • Security: Proper namespace isolation and credentials
  • Simplicity: Easy to "bind" backend to pattern

Example: Deploy Postgres via Helm, bind to pattern:

# Deploy Postgres to data namespace
helm install postgres bitnami/postgresql -n data-postgres --create-namespace

# How does pattern runner discover and connect?

Solution: Backend Binding via Labels and Services

1. Backend Resource: Deploy backends in their own namespaces

apiVersion: v1
kind: Namespace
metadata:
name: data-postgres
labels:
prism.io/backend-type: postgres
prism.io/backend-name: main-db
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: data-postgres
labels:
prism.io/backend-type: postgres
annotations:
prism.io/connection-string: "postgres:5432"
spec:
selector:
app.kubernetes.io/name: postgresql
ports:
- port: 5432

2. Backend Binding in PrismStack:

apiVersion: prism.io/v1alpha1
kind: PrismStack
metadata:
name: production
spec:
backends:
- name: main-db
type: postgres
# Option A: Explicit connection
connectionString: "postgres.data-postgres.svc:5432"
secretRef:
name: postgres-credentials
namespace: data-postgres

# Option B: Service discovery
serviceRef:
name: postgres
namespace: data-postgres

# Data locality: Deploy runners in same namespace
dataLocality:
strategy: collocate # Deploy in same namespace as backend
namespace: data-postgres

patterns:
- name: consumer-orders
type: consumer
backend: main-db # Binds to backend above
replicas: 5

3. Operator Behavior:

func (r *PrismStackReconciler) reconcilePattern(ctx context.Context, stack *PrismStack, pattern PatternSpec) error {
// Find backend binding
backend := findBackend(stack.Spec.Backends, pattern.Backend)

// Determine namespace for pattern runner
namespace := stack.Namespace // Default
if backend.DataLocality.Strategy == "collocate" {
namespace = backend.DataLocality.Namespace
}

// Create StatefulSet in backend namespace for data locality
statefulSet := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s", stack.Name, pattern.Name),
Namespace: namespace, // Deploy near data!
},
Spec: appsv1.StatefulSetSpec{
Template: corev1.PodTemplateSpec{
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Env: []corev1.EnvVar{
{Name: "BACKEND_TYPE", Value: backend.Type},
{Name: "CONNECTION_STRING", Value: backend.ConnectionString},
{Name: "PROXY_ENDPOINT", Value: getProxyService(stack)},
},
EnvFrom: []corev1.EnvFromSource{{
SecretRef: &corev1.SecretEnvSource{
LocalObjectReference: corev1.LocalObjectReference{
Name: backend.SecretRef.Name,
},
},
}},
}},
},
},
},
}

return r.Create(ctx, statefulSet)
}

4. Network Topology:

┌─────────────────────────────────────────────────────────┐
│ Namespace: prism-system │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Admin │ │ Proxy │ │
│ │ StatefulSet │◀──────│ Deployment │ │
│ │ (3 replicas) │ │ (3 replicas) │ │
│ └──────────────┘ └───────┬──────┘ │
│ │ │
└──────────────────────────────────┼───────────────────────┘

gRPC Pattern Requests

┌─────────────────────────┼─────────────────────┐
│ ▼ │
│ Namespace: data-postgres (Data Locality) │
│ ┌──────────────────────────────────┐ │
│ │ Consumer Pattern (StatefulSet) │ │
│ │ - consumer-0 │ │
│ │ - consumer-1 │ │
│ │ - consumer-2 │ │
│ └────────────┬─────────────────────┘ │
│ │ localhost/pod network │
│ │ (minimal latency) │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ PostgreSQL (Helm Chart) │ │
│ │ - postgres-0 │ │
│ │ - postgres-1 (replica) │ │
│ └──────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────┘

Benefits:

  • Pattern runners in same namespace as backend (data locality)
  • NetworkPolicy can restrict access to backend namespace
  • Secrets scoped to backend namespace
  • Minimal network hops (pod-to-pod on same node if possible)

Security: NetworkPolicy Example

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: postgres-access
namespace: data-postgres
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: postgresql
policyTypes:
- Ingress
ingress:
# Allow from pattern runners in same namespace
- from:
- podSelector:
matchLabels:
prism.io/component: pattern
ports:
- protocol: TCP
port: 5432

4. Scaling Triggers and Metrics

Pattern-Specific Metrics

PatternPrimary MetricSecondary MetricScaling Threshold
ConsumerKafka lagProcessing time p99Lag > 1000 msgs
ProducerCPU utilizationThroughputCPU > 75%
KeyValueRequest rateLatency p99Requests > 1000/s
MailboxQueue depthMessage ageQueue > 100 msgs

KEDA ScaledObject Examples

Consumer (Kafka Lag):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: consumer-scaler
spec:
scaleTargetRef:
kind: StatefulSet
name: consumer-kafka-orders
triggers:
- type: kafka
metadata:
bootstrapServers: kafka:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000"

Consumer (NATS Queue Depth):

triggers:
- type: nats-jetstream
metadata:
natsServerMonitoringEndpoint: "nats:8222"
stream: "orders"
consumer: "prism-orders"
lagThreshold: "1000"

Producer (CPU):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: producer-hpa
spec:
scaleTargetRef:
kind: Deployment
name: producer-kafka-events
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75

Mailbox (Admin Metrics via PrismAutoscaler):

apiVersion: prism.io/v1alpha1
kind: PrismAutoscaler
metadata:
name: mailbox-scaler
spec:
targetRef:
kind: StatefulSet
name: mailbox-users
metrics:
- type: AdminMetric
adminMetric:
endpoint: "prism-admin:8981"
query:
pattern: "mailbox-users"
metric: "queue_depth"
target:
type: AverageValue
averageValue: "100" # Scale up if avg queue > 100

5. Updated PrismStack CRD

apiVersion: prism.io/v1alpha1
kind: PrismStack
metadata:
name: production
namespace: prism-system
spec:
# Admin: StatefulSet with persistent storage
admin:
enabled: true
kind: StatefulSet # NEW
replicas: 3
storage:
size: 1Gi
storageClass: fast-ssd

# Proxy: Deployment with HPA
proxy:
kind: Deployment # NEW (default)
replicas: 3
autoscaling:
enabled: true
strategy: hpa
minReplicas: 3
maxReplicas: 20
targetCPUUtilization: 75

# Web Console: Deployment with HPA
webConsole:
enabled: true
kind: Deployment
replicas: 2
autoscaling:
enabled: true
strategy: hpa
minReplicas: 2
maxReplicas: 10

# Backends with data locality
backends:
- name: main-db
type: postgres
serviceRef:
name: postgres
namespace: data-postgres
secretRef:
name: postgres-creds
namespace: data-postgres
dataLocality:
strategy: collocate
namespace: data-postgres

- name: event-bus
type: kafka
serviceRef:
name: kafka
namespace: data-kafka
dataLocality:
strategy: collocate
namespace: data-kafka

# Patterns with backend binding and autoscaling
patterns:
- name: consumer-orders
type: consumer
kind: StatefulSet # NEW
backend: event-bus
replicas: 5
autoscaling:
enabled: true
strategy: keda
minReplicas: 2
maxReplicas: 50
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.data-kafka.svc:9092
consumerGroup: prism-orders
topic: orders
lagThreshold: "1000"

- name: producer-events
type: producer
kind: Deployment # Stateless
backend: event-bus
replicas: 3
autoscaling:
enabled: true
strategy: hpa
minReplicas: 3
maxReplicas: 15
targetCPUUtilization: 75

- name: keyvalue-cache
type: keyvalue
kind: Deployment
backend: redis-cache
replicas: 5

- name: mailbox-users
type: mailbox
kind: StatefulSet
backend: main-db
replicas: 2
autoscaling:
enabled: true
strategy: admin # Use admin metrics
minReplicas: 1
maxReplicas: 20
metrics:
- type: AdminMetric
name: "mailbox.queue_depth"
target: 100

Decision

Adopt the following deployment patterns:

Component Types

  1. Admin: StatefulSet with persistent volumes for Raft
  2. Proxy: Deployment with HPA (CPU + custom metrics)
  3. Web Console: Deployment with HPA
  4. Consumer Pattern: StatefulSet for stable identity
  5. Producer Pattern: Deployment for stateless operation
  6. KeyValue Pattern: Deployment for stateless requests
  7. Mailbox Pattern: StatefulSet for message ownership

Autoscaling Strategy

Hybrid Approach:

  • KEDA for standard backends (Kafka, NATS, SQS) - event-driven scaling
  • HPA for CPU/memory-based scaling (Proxy, Producer, KeyValue)
  • PrismAutoscaler (future) for admin-driven metrics (Mailbox, custom patterns)
  • No prism-launcher in Kubernetes (use for VM deployments)

Backend Binding

  • Deploy backends in dedicated namespaces (e.g., data-postgres)
  • Pattern runners deployed in backend namespace for data locality
  • Service discovery via Kubernetes DNS
  • Secrets scoped to backend namespace
  • NetworkPolicy for security boundaries

Implementation Phases

Phase 1: Basic Deployment Patterns (Current sprint)

  • Convert Admin to StatefulSet
  • Keep Proxy/WebConsole as Deployments
  • Add kind field to PrismStack CRD

Phase 2: KEDA Integration (Next sprint)

  • Install KEDA operator
  • Support Consumer scaling via Kafka lag
  • Support NATS, SQS scalers

Phase 3: Backend Binding (Sprint 3)

  • Implement backend service discovery
  • Data locality with namespace colocation
  • NetworkPolicy templates

Phase 4: PrismAutoscaler (Sprint 4)

  • Custom CRD for admin-driven metrics
  • Query admin control plane
  • Pattern-specific scaling logic

Consequences

Positive

For Operators:

  • Clear separation of stateful vs stateless components
  • Kubernetes-native autoscaling (battle-tested)
  • Data locality reduces latency and improves security
  • Backend binding simplifies deployment (Helm + bind)

For Developers:

  • Standard Kubernetes patterns (StatefulSet, Deployment, HPA, KEDA)
  • No custom launcher complexity in K8s
  • Easy to reason about scaling behavior
  • Namespace-based security boundaries

For Performance:

  • Data locality minimizes network hops
  • Pattern-specific scaling metrics
  • Efficient autoscaling (KEDA scales to zero)

Negative

Complexity:

  • StatefulSet management more complex than Deployment
  • KEDA adds another operator dependency
  • Backend binding requires namespace coordination
  • More CRD fields to configure

Operational:

  • Must coordinate backend deployments with Prism
  • NetworkPolicy management across namespaces
  • Secret propagation to backend namespaces

Migration:

  • Existing Deployment-based Admin must migrate to StatefulSet
  • Data migration for Raft logs
  • Downtime during conversion

Neutral

Alternatives Considered:

  • All Deployments: Simpler but loses Raft identity, consumer stability
  • All StatefulSets: Overly conservative, slower scaling
  • Launcher-based: Not Kubernetes-native, adds complexity
  • Pure HPA: Misses event-driven scaling opportunities

Open Questions

  1. Admin Migration: How to migrate existing Deployment-based Admin to StatefulSet without downtime?

    • Rolling upgrade with Raft leadership transfer?
    • Blue/green with data copy?
  2. Cross-Namespace Owner References: Kubernetes doesn't allow owner references across namespaces. How to handle PrismStack owning resources in data-postgres?

    • Use labels + custom finalizer logic?
    • Separate PrismPattern CRD per namespace?
  3. KEDA Scalability: Does KEDA handle 100+ ScaledObjects in a cluster?

    • Need load testing
    • Alternative: Single ScaledObject per backend type with multiple triggers?
  4. PrismAutoscaler Priority: When do we implement custom autoscaling vs relying on KEDA?

    • Start with KEDA for common cases
    • Add PrismAutoscaler only when KEDA insufficient

References

Next Steps

  1. Update prism-operator/api/v1alpha1/prismstack_types.go:

    • Add Kind field (StatefulSet | Deployment)
    • Add Storage spec for StatefulSet volumes
    • Add DataLocality to BackendSpec
  2. Update prism-operator/controllers/prismstack_controller.go:

    • Implement reconcileAdminStatefulSet()
    • Support backend namespace colocation
    • Handle cross-namespace resources
  3. Create KEDA integration:

    • Add ScaledObject reconciliation
    • Support common scalers (Kafka, NATS, SQS)
  4. Document migration guide:

    • Deployment → StatefulSet for Admin
    • Data migration procedures