patternsprocess-managementbulkheadisolationorchestration

Status: ProposedAuthor: Claude CodeCreated: Oct 15, 2025Updated: Oct 15, 2025

RFC-035: Pattern Process Launcher with Bulkhead Isolation

Summary

This RFC proposes a lightweight process launcher for pattern executables that can run headless and answer launch requests from the Prism proxy. The launcher will be an optional component (alternatives include Kubernetes deployments, systemd, or other orchestrators) that provides lifecycle management using the bulkhead isolation pattern (via pkg/isolation) and robust process management (via pkg/procmgr). The launcher will support three isolation levels: None, Namespace, and Session, ensuring fault isolation and proper resource boundaries.

Motivation

Current Situation

Prism patterns (Consumer, Producer, Multicast Registry, Claim Check, etc.) currently run as standalone executables that must be launched manually or via external orchestration. The Prism proxy needs a way to:

Launch pattern processes on demand: When a client requests a pattern operation, the proxy must ensure the corresponding pattern process is running
Manage process lifecycle: Start, monitor health, restart on failure, graceful shutdown
Isolate failures: Prevent one namespace/session's failures from affecting others (bulkhead pattern)
List available patterns: Proxy needs to know which patterns are available and their status
Support multiple deployment models: Local development (direct exec), containerized (Podman/Docker), orchestrated (Kubernetes)

Why an Optional Launcher?

The pattern process launcher is optional because different deployment models have different orchestration needs:

Deployment Model	Orchestration Method	When to Use Launcher
Local Development	Direct exec, Launcher	✅ Use Launcher - simplest local workflow
Docker Compose	Compose services	❌ Compose handles lifecycle
Kubernetes	Deployments, StatefulSets	❌ K8s handles lifecycle
Bare Metal / VMs	systemd, Launcher	✅ Use Launcher - lightweight alternative to systemd
Serverless (Lambda)	Function invocation	❌ Platform handles lifecycle

Key insight: The launcher provides proxy-driven lifecycle control (proxy decides when to start/stop patterns) rather than external orchestration (K8s/systemd decides independently).

Bulkhead Isolation Pattern

The bulkhead pattern (from ship design: compartmentalized hull sections prevent total flooding) isolates processes into separate "compartments" to prevent cascading failures:

┌─────────────────────────────────────────────────────┐
│  Pattern Process Launcher (Headless Daemon)         │
│                                                      │
│  ┌────────────────┐  ┌────────────────┐            │
│  │  Isolation     │  │  Process       │            │
│  │  Manager       │←→│  Manager       │            │
│  │  (Bulkhead)    │  │  (procmgr)     │            │
│  └────────────────┘  └────────────────┘            │
│          ↓                    ↓                     │
│  ┌──────────────────────────────────────────────┐  │
│  │  Process Pool (Isolated by Level)            │  │
│  │                                               │  │
│  │  Isolation Level: Namespace                  │  │
│  │  ┌─────────────┐  ┌─────────────┐           │  │
│  │  │ ns:tenant-a │  │ ns:tenant-b │           │  │
│  │  │  Consumer   │  │  Consumer   │           │  │
│  │  │  Process    │  │  Process    │           │  │
│  │  └─────────────┘  └─────────────┘           │  │
│  │                                               │  │
│  │  Isolation Level: Session                    │  │
│  │  ┌──────────────┐  ┌──────────────┐         │  │
│  │  │session:user-1│  │session:user-2│         │  │
│  │  │  Producer    │  │  Producer    │         │  │
│  │  │  Process     │  │  Process     │         │  │
│  │  └──────────────┘  └──────────────┘         │  │
│  │                                               │  │
│  │  Isolation Level: None                       │  │
│  │  ┌─────────────┐                             │  │
│  │  │   shared    │                             │  │
│  │  │  Registry   │                             │  │
│  │  │  Process    │                             │  │
│  │  └─────────────┘                             │  │
│  └──────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────┘
         ↑                           ↑
         │ gRPC Launch API           │ Health/Status
         │                           │
┌────────┴────────┐         ┌────────┴────────┐
│  Prism Proxy    │         │  Monitoring     │
│  (Rust)         │         │  (Prometheus)   │
└─────────────────┘         └─────────────────┘

Design

Architecture Components

1. Pattern Process Launcher (`cmd/pattern-launcher`)

Headless daemon that:

Listens on gRPC for launch requests from proxy
Uses pkg/isolation.IsolationManager to manage process pools
Uses pkg/procmgr.ProcessManager for robust process lifecycle
Discovers available patterns via filesystem (executable manifests)
Exports Prometheus metrics and health endpoints

type PatternLauncher struct {
    // Configuration
    config         *LauncherConfig
    isolationLevel isolation.IsolationLevel

    // Management
    isolationMgr   *isolation.IsolationManager

    // Pattern discovery
    patterns       map[string]*PatternManifest
    patternsMu     sync.RWMutex

    // gRPC server
    grpcServer     *grpc.Server
}

2. Pattern Manifest (`patterns/<name>/manifest.yaml`)

Declarative configuration for each pattern:

name: consumer
version: 1.0.0
executable: ./patterns/consumer/consumer
isolation_level: namespace  # none | namespace | session
healthcheck:
  port: 9090
  path: /health
  interval: 30s
resources:
  cpu_limit: 1.0
  memory_limit: 512Mi
backend_slots:
  - name: storage
    type: postgres
    required: true
  - name: messaging
    type: kafka
    required: true
environment:
  LOG_LEVEL: info
  METRICS_PORT: "9091"

3. Launch gRPC API

service PatternLauncher {
    // Launch or get existing pattern process
    rpc LaunchPattern(LaunchRequest) returns (LaunchResponse);

    // List all running pattern processes
    rpc ListPatterns(ListPatternsRequest) returns (ListPatternsResponse);

    // Terminate a pattern process
    rpc TerminatePattern(TerminateRequest) returns (TerminateResponse);

    // Health check
    rpc Health(HealthRequest) returns (HealthResponse);
}

message LaunchRequest {
    string pattern_name = 1;          // e.g., "consumer", "producer"
    IsolationLevel isolation = 2;     // NONE, NAMESPACE, SESSION
    string namespace = 3;             // Tenant namespace (for NAMESPACE isolation)
    string session_id = 4;            // Session ID (for SESSION isolation)
    map<string, string> config = 5;   // Pattern-specific config
}

message LaunchResponse {
    string process_id = 1;            // Unique process ID
    ProcessState state = 2;           // STARTING, RUNNING, TERMINATING, etc.
    string address = 3;               // gRPC address to connect to pattern
    bool healthy = 4;
}

message ListPatternsResponse {
    repeated PatternInfo patterns = 1;
}

message PatternInfo {
    string pattern_name = 1;
    string process_id = 2;
    ProcessState state = 3;
    string address = 4;
    bool healthy = 5;
    int64 uptime_seconds = 6;
    string namespace = 7;
    string session_id = 8;
}

enum IsolationLevel {
    ISOLATION_NONE = 0;
    ISOLATION_NAMESPACE = 1;
    ISOLATION_SESSION = 2;
}

enum ProcessState {
    STATE_STARTING = 0;
    STATE_RUNNING = 1;
    STATE_TERMINATING = 2;
    STATE_TERMINATED = 3;
    STATE_FAILED = 4;
}

Isolation Levels Explained

Isolation Level: NONE (Shared Process Pool)

All requests share the same process, regardless of namespace or session.

Use Case: Stateless patterns with no tenant-specific data (e.g., schema registry lookup)

Example:

Client A (namespace: tenant-a, session: user-1) ──┐
Client B (namespace: tenant-b, session: user-2) ──┼─→ shared:consumer (single process)
Client C (namespace: tenant-a, session: user-3) ──┘

Benefits:

✅ Lowest resource usage (one process serves all)
✅ Simplest management

Risks:

❌ No fault isolation (one bug affects all tenants)
❌ No resource isolation (noisy neighbor problem)

Isolation Level: NAMESPACE (Tenant Isolation)

Each namespace gets its own dedicated process. Multiple sessions within the same namespace share the process.

Use Case: Multi-tenant SaaS where tenants must be isolated (data security, billing, fault isolation)

Example:

Client A (namespace: tenant-a, session: user-1) ──┐
Client C (namespace: tenant-a, session: user-3) ──┼─→ ns:tenant-a:consumer

Client B (namespace: tenant-b, session: user-2) ────→ ns:tenant-b:consumer

Benefits:

✅ Fault isolation: tenant-a's crash doesn't affect tenant-b
✅ Resource quotas: limit CPU/memory per tenant
✅ Billing: track resource usage per tenant

Risks:

⚠️ Higher resource usage (one process per namespace)
⚠️ Cold start latency for new namespaces

Isolation Level: SESSION (Maximum Isolation)

Each session gets its own dedicated process. Maximum isolation guarantees.

Use Case: High-security environments, compliance requirements (PCI-DSS, HIPAA), debugging

Example:

Client A (namespace: tenant-a, session: user-1) ───→ session:user-1:consumer
Client B (namespace: tenant-b, session: user-2) ───→ session:user-2:consumer
Client C (namespace: tenant-a, session: user-3) ───→ session:user-3:consumer

Benefits:

✅ Maximum fault isolation: one session crash = one user affected
✅ Security: no cross-session data leakage possible
✅ Debugging: session-level logs and metrics

Risks:

❌ Highest resource usage (one process per session)
❌ Significant cold start latency
❌ Management overhead (thousands of processes possible)

Process Lifecycle with procmgr Integration

The launcher uses pkg/procmgr.ProcessManager for robust lifecycle management:

// Pattern process syncer implementation
type patternProcessSyncer struct {
    launcher *PatternLauncher
}

func (s *patternProcessSyncer) SyncProcess(ctx context.Context, updateType procmgr.UpdateType, config interface{}) (terminal bool, err error) {
    processConfig := config.(*ProcessConfig)
    manifest := s.launcher.patterns[processConfig.PatternName]

    // Build command
    cmd := exec.CommandContext(ctx, manifest.Executable)
    cmd.Env = append(os.Environ(),
        fmt.Sprintf("PATTERN_NAME=%s", processConfig.PatternName),
        fmt.Sprintf("NAMESPACE=%s", processConfig.Namespace),
        fmt.Sprintf("SESSION_ID=%s", processConfig.SessionID),
        fmt.Sprintf("GRPC_PORT=%d", processConfig.GRPCPort),
    )

    // Start process
    if err := cmd.Start(); err != nil {
        return false, fmt.Errorf("start process: %w", err)
    }

    // Store process handle
    s.launcher.storeProcessHandle(processConfig.ProcessID, cmd.Process)

    // Wait for health check to pass
    if err := s.launcher.waitForHealthy(ctx, processConfig); err != nil {
        cmd.Process.Kill()
        return false, fmt.Errorf("health check failed: %w", err)
    }

    // Check if process exited (terminal state)
    select {
    case <-ctx.Done():
        return false, ctx.Err()
    default:
        // Process still running
        return false, nil
    }
}

func (s *patternProcessSyncer) SyncTerminatingProcess(ctx context.Context, config interface{}, gracePeriodSecs *int64, statusFn procmgr.ProcessStatusFunc) error {
    processConfig := config.(*ProcessConfig)
    process := s.launcher.getProcessHandle(processConfig.ProcessID)

    // Send SIGTERM for graceful shutdown
    if err := process.Signal(syscall.SIGTERM); err != nil {
        return fmt.Errorf("send SIGTERM: %w", err)
    }

    // Wait for graceful exit
    timeout := time.Duration(*gracePeriodSecs) * time.Second
    done := make(chan error, 1)
    go func() {
        _, err := process.Wait()
        done <- err
    }()

    select {
    case err := <-done:
        // Process exited gracefully
        return err
    case <-time.After(timeout):
        // Grace period expired, force kill
        process.Kill()
        return fmt.Errorf("forced kill after grace period")
    }
}

func (s *patternProcessSyncer) SyncTerminatedProcess(ctx context.Context, config interface{}) error {
    processConfig := config.(*ProcessConfig)

    // Cleanup resources
    s.launcher.removeProcessHandle(processConfig.ProcessID)

    return nil
}

Launch Request Flow

1. Proxy receives client request for pattern operation
   │
   ├─→ Check if pattern process already running (cache lookup)
   │   ├─ Yes: Use existing process
   │   └─ No: Send LaunchPattern gRPC request
   │
2. Launcher receives LaunchPattern request
   │
   ├─→ Determine ProcessID based on isolation level
   │   ├─ NONE:      "shared:<pattern>"
   │   ├─ NAMESPACE: "ns:<namespace>:<pattern>"
   │   └─ SESSION:   "session:<session_id>:<pattern>"
   │
   ├─→ IsolationManager.GetOrCreateProcess(isolationKey, processConfig)
   │   │
   │   ├─→ Check if process exists and healthy
   │   │   ├─ Yes: Return existing handle
   │   │   └─ No: Create new process
   │   │
   │   └─→ ProcessManager.UpdateProcess(CREATE)
   │       │
   │       ├─→ patternProcessSyncer.SyncProcess()
   │       │   ├─ exec.Command() - start pattern executable
   │       │   ├─ Wait for health check
   │       │   └─ Return success
   │       │
   │       └─→ Return ProcessHandle
   │
3. Launcher returns LaunchResponse
   │
   └─→ Proxy caches process address and forwards client request

Pattern Discovery

The launcher discovers available patterns by scanning directories:

patterns/
├── consumer/
│   ├── consumer             # Executable binary
│   ├── manifest.yaml        # Pattern metadata
│   └── README.md
├── producer/
│   ├── producer
│   ├── manifest.yaml
│   └── README.md
├── multicast_registry/
│   ├── multicast_registry
│   ├── manifest.yaml
│   └── README.md
└── claimcheck/
    ├── claimcheck
    ├── manifest.yaml
    └── README.md

Discovery algorithm:

Scan patterns/ directory
For each subdirectory, check for manifest.yaml
Validate manifest schema
Check executable exists and is runnable
Load pattern into registry

Configuration

Launcher configuration (~/.prism/launcher-config.yaml):

launcher:
  # Port for gRPC API
  grpc_port: 8982

  # Pattern discovery
  patterns_dir: ./patterns

  # Default isolation level (can be overridden per pattern)
  default_isolation: namespace

  # Process manager settings
  process_manager:
    resync_interval: 30s
    backoff_period: 5s
    max_concurrent_starts: 10

  # Resource limits (applied to all pattern processes)
  resources:
    cpu_limit: 2.0
    memory_limit: 1Gi

  # Metrics and observability
  metrics:
    port: 9092
    path: /metrics

  health:
    port: 9093
    path: /health

Implementation Phases

Phase 1: Core Launcher (Week 1)

Deliverables:

cmd/pattern-launcher skeleton with gRPC server
Pattern manifest schema and validation
Pattern discovery (scan filesystem for manifests)
Integration with pkg/isolation.IsolationManager
LaunchPattern API (no actual process launch yet, just mock)

Tests:

Pattern discovery finds all valid manifests
Invalid manifests rejected
gRPC API responds to LaunchPattern requests
IsolationManager creates correct ProcessIDs

Phase 2: Process Launch (Week 2)

Deliverables:

patternProcessSyncer implementation
exec.Command() integration for spawning processes
Process handle tracking (PID, port, address)
Health check polling (HTTP /health endpoint)
LaunchPattern returns running process address

Tests:

Launch single pattern process successfully
Health check waits until process ready
Failed process launch returns error
Process address returned correctly

Phase 3: Isolation Levels (Week 3) ✅ COMPLETE

Deliverables:

✅ Namespace isolation: one process per namespace
✅ Session isolation: one process per session
✅ None isolation: shared process
✅ Concurrent launch requests handled correctly
✅ Process reuse for existing isolation keys

Tests (pkg/launcher/integration_test.go):

✅ TestIsolationLevels_Integration: All 3 isolation levels (NONE, NAMESPACE, SESSION)
- Verifies process reuse for same isolation key
- Verifies process separation for different keys
- Validates PID tracking and address assignment
✅ TestConcurrentLaunches: 5 concurrent requests correctly reuse process
✅ TestProcessTermination: Graceful termination with status verification
✅ TestHealthCheck: Health endpoint monitoring and service health reporting

Results:

Unit tests: 100% passing (run with -short flag, 0.2s)
Integration tests: Created (require actual process launching)
Test pattern binary: Built and ready (7.4MB, Go-based with HTTP health endpoint)

Phase 4: Termination and Cleanup (Week 4)

Deliverables:

TerminatePattern API
Graceful SIGTERM with timeout
Force SIGKILL after grace period
Process cleanup (remove from tracking)
Orphan process detection and cleanup

Tests:

Graceful shutdown completes within grace period
Force kill after timeout
Terminated processes removed from list
Orphaned processes detected and terminated

Phase 5: Metrics and Observability (Week 5)

Deliverables:

Prometheus metrics export
Process lifecycle metrics (starts, stops, failures)
Isolation level distribution metrics
Resource usage per process
Health endpoint with detailed status

Tests:

Metrics exported correctly
Counter increases on process start/stop
Health endpoint returns all processes
Resource metrics tracked accurately

Usage Examples

Example 1: Proxy Launches Consumer Pattern

// In Prism proxy (Rust), making gRPC call to launcher
func launchConsumerPattern(namespace string) (string, error) {
    client := NewPatternLauncherClient(conn)

    resp, err := client.LaunchPattern(ctx, &LaunchRequest{
        PatternName:  "consumer",
        Isolation:    IsolationLevel_ISOLATION_NAMESPACE,
        Namespace:    namespace,
        Config: map[string]string{
            "kafka_brokers": "localhost:9092",
            "consumer_group": fmt.Sprintf("%s-consumer", namespace),
        },
    })

    if err != nil {
        return "", fmt.Errorf("launch consumer: %w", err)
    }

    // Cache the process address for future requests
    proxyCache.Set(namespace, "consumer", resp.Address)

    return resp.Address, nil
}

Example 2: Local Development Workflow

# Terminal 1: Start pattern launcher
cd cmd/pattern-launcher
go run . --config ~/.prism/launcher-config.yaml

# Terminal 2: Use prismctl to launch pattern
prismctl pattern launch consumer --namespace tenant-a --isolation namespace

# Terminal 3: Check running patterns
prismctl pattern list

# Output:
# PATTERN              PROCESS ID             STATE      HEALTHY  UPTIME
# consumer             ns:tenant-a:consumer   RUNNING    true     5m30s
# producer             ns:tenant-a:producer   RUNNING    true     3m15s
# multicast_registry   shared:registry        RUNNING    true     10m45s

Example 3: Kubernetes Alternative (No Launcher Needed)

In Kubernetes, the launcher is not used. Instead, patterns are deployed as Deployments:

# patterns/consumer/k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: consumer-pattern
spec:
  replicas: 3
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      labels:
        app: consumer
    spec:
      containers:
      - name: consumer
        image: prism/consumer:1.0.0
        env:
        - name: ISOLATION_LEVEL
          value: "namespace"
        - name: GRPC_PORT
          value: "50051"
        resources:
          limits:
            cpu: 1.0
            memory: 512Mi

The Prism proxy discovers pattern processes via Kubernetes service discovery (DNS, endpoints API).

Metrics and Observability

Prometheus Metrics

# Pattern lifecycle
pattern_launcher_process_starts_total{pattern, namespace, isolation} counter
pattern_launcher_process_stops_total{pattern, namespace, isolation} counter
pattern_launcher_process_failures_total{pattern, namespace, isolation} counter

# Process state
pattern_launcher_processes_running{pattern, isolation} gauge
pattern_launcher_processes_terminating{pattern, isolation} gauge

# Launch latency
pattern_launcher_launch_duration_seconds{pattern, isolation} histogram

# Isolation distribution
pattern_launcher_isolation_level{level} gauge

# Resource usage (per process)
pattern_launcher_process_cpu_usage{process_id} gauge
pattern_launcher_process_memory_bytes{process_id} gauge

Health Check Response

{
  "status": "healthy",
  "total_processes": 15,
  "running_processes": 13,
  "terminating_processes": 2,
  "failed_processes": 0,
  "isolation_distribution": {
    "none": 2,
    "namespace": 10,
    "session": 3
  },
  "processes": [
    {
      "pattern_name": "consumer",
      "process_id": "ns:tenant-a:consumer",
      "state": "RUNNING",
      "healthy": true,
      "uptime_seconds": 3600,
      "namespace": "tenant-a",
      "address": "localhost:50051"
    }
  ]
}

Security Considerations

Process Isolation: Use OS-level process isolation (cgroups, namespaces) to prevent cross-contamination
Resource Limits: Enforce CPU/memory limits per process to prevent resource exhaustion
Authentication: gRPC API requires mTLS or OIDC token authentication
Authorization: Only authorized namespaces can launch patterns
Audit Logging: All launch/terminate operations logged for security audit
Secret Management: Pattern configs may contain secrets (DB passwords) - use secret providers

Performance Considerations

Cold Start Latency: First request for a namespace incurs process spawn latency (~500ms-2s)
Process Reuse: Subsequent requests to same namespace reuse existing process (< 10ms)
Concurrent Launches: ProcessManager handles concurrent launch requests without race conditions
Memory Overhead: Each process consumes memory (baseline ~50MB + pattern-specific usage)
CPU Overhead: Process management goroutines negligible (< 1% CPU)

Optimization: Implement warm pool for common patterns (pre-launch consumer processes for popular namespaces).

Alternatives Considered

Alternative 1: Kubernetes StatefulSets

Pros:

Industry-standard orchestration
Built-in health checks, rolling updates
Service discovery via DNS

Cons:

Requires Kubernetes cluster (heavy dependency)
Overly complex for local development
Doesn't support proxy-driven launch (K8s decides lifecycle)

Verdict: Good for production, but launcher needed for local development.

Alternative 2: systemd

Pros:

Standard on Linux systems
Automatic restart on failure
Resource limits via cgroups

Cons:

Not cross-platform (Linux only)
Requires root privileges for system-level units
No dynamic isolation levels (fixed unit files)

Verdict: Complementary, not a replacement. Launcher can be managed by systemd.

Alternative 3: Docker Compose

Pros:

Simple YAML configuration
Built-in networking
Easy local testing

Cons:

Static configuration (can't launch dynamically per namespace)
No isolation level support
Requires Docker daemon

Verdict: Good for fixed deployments, but lacks dynamic launch capability.

Open Questions

Pattern Versioning: Should launcher support multiple versions of the same pattern running concurrently?
Auto-scaling: Should launcher automatically scale processes based on load (e.g., spawn more consumer workers)?
Checkpointing: Should process state be persisted for crash recovery?
Network Isolation: Should namespace-isolated processes run in separate network namespaces (Linux network namespaces)?
Resource Quotas: Should launcher enforce global resource quotas (e.g., max 100 processes total)?

Success Criteria

✅ Launcher can start/stop pattern processes on demand
✅ All three isolation levels work correctly (none, namespace, session)
✅ Graceful shutdown completes within grace period
✅ Health checks detect unhealthy processes and restart them
✅ Prometheus metrics exported and accurate
✅ Can handle 100+ concurrent namespaces without issues
✅ Cold start latency < 2 seconds
✅ Integrated with Prism proxy via gRPC API

References

RFC-034: Robust Process Manager Package - procmgr foundation
pkg/isolation - Bulkhead isolation implementation
pkg/procmgr - Process manager implementation
Bulkhead Pattern (Michael Nygard, Release It!)
Kubernetes Pod Lifecycle
systemd Service Management

Appendix A: Launch Request Sequence Diagram

┌──────┐           ┌──────┐           ┌──────────┐         ┌──────────┐
│Client│           │Proxy │           │Launcher  │         │Pattern   │
│      │           │      │           │          │         │Process   │
└──┬───┘           └──┬───┘           └────┬─────┘         └────┬─────┘
   │                  │                    │                    │
   │  Pattern Request │                    │                    │
   │─────────────────→│                    │                    │
   │                  │                    │                    │
   │                  │ LaunchPattern RPC  │                    │
   │                  │───────────────────→│                    │
   │                  │                    │                    │
   │                  │                    │ GetOrCreateProcess │
   │                  │                    │ (IsolationManager) │
   │                  │                    │──────────┐         │
   │                  │                    │          │         │
   │                  │                    │←─────────┘         │
   │                  │                    │                    │
   │                  │                    │ UpdateProcess(CREATE)
   │                  │                    │ (ProcessManager)   │
   │                  │                    │──────────┐         │
   │                  │                    │          │         │
   │                  │                    │←─────────┘         │
   │                  │                    │                    │
   │                  │                    │    exec.Command()  │
   │                  │                    │───────────────────→│
   │                  │                    │                    │
   │                  │                    │                    │ Start()
   │                  │                    │                    │────────┐
   │                  │                    │                    │        │
   │                  │                    │                    │←───────┘
   │                  │                    │                    │
   │                  │                    │   Health Check     │
   │                  │                    │───────────────────→│
   │                  │                    │       OK           │
   │                  │                    │←───────────────────│
   │                  │                    │                    │
   │                  │ LaunchResponse     │                    │
   │                  │ (process address)  │                    │
   │                  │←───────────────────│                    │
   │                  │                    │                    │
   │                  │  Forward Request   │                    │
   │                  │───────────────────────────────────────→│
   │                  │                    │                    │
   │                  │       Response     │                    │
   │                  │←───────────────────────────────────────│
   │                  │                    │                    │
   │     Response     │                    │                    │
   │←─────────────────│                    │                    │
   │                  │                    │                    │

Appendix B: Isolation Level Comparison Table

Aspect	NONE	NAMESPACE	SESSION
Process Count	1 (shared)	1 per namespace	1 per session
Fault Isolation	❌ None (all fail together)	✅ Per-tenant isolation	✅✅ Per-user isolation
Resource Isolation	❌ Shared resources	✅ Per-tenant quotas	✅✅ Per-user quotas
Cold Start Latency	None (always warm)	~1-2s per namespace	~1-2s per session
Memory Overhead	Lowest (~50MB)	Medium (~50MB × namespaces)	Highest (~50MB × sessions)
Security	❌ No isolation	✅ Tenant boundaries	✅✅ User boundaries
Use Case	Read-only lookups	Multi-tenant SaaS	High-security, debugging
Scalability	✅✅ Best (1 process)	✅ Good (10-100 processes)	⚠️ Limited (1000s processes)

Implementation Status

Overall Status: ✅ COMPLETE (All 5 Phases Implemented)

Phase 1 (Week 1): ✅ COMPLETE

cmd/pattern-launcher with gRPC server (port 8080)
Pattern manifest schema (patterns/<name>/manifest.yaml)
Pattern discovery and validation (pkg/launcher/discovery.go)
Integration with pkg/isolation and pkg/procmgr
All unit tests passing

Phase 2 (Week 2): ✅ COMPLETE

Real process launching with exec.Command()
patternProcessSyncer implementation (pkg/launcher/syncer.go)
Health check polling (HTTP /health endpoint)
Process handle tracking (PID, address, state)
Test pattern binary (Go-based, 7.4MB)

Phase 3 (Week 3): ✅ COMPLETE

Comprehensive integration tests for all isolation levels
Process reuse verification (same key → same process)
Process separation verification (different key → different process)
Concurrent launch handling
Health monitoring and termination tests

Phase 4 (Week 4): ✅ COMPLETE

✅ Production-ready error handling with retry limits
✅ Orphan process detection and cleanup (Linux /proc, macOS ps fallback)
✅ Resource cleanup verification after termination
✅ Health check monitoring (30s intervals)
✅ Error tracking across restarts (RestartCount, ErrorCount, LastError)
✅ Circuit breaker pattern (max 5 consecutive errors → terminal)

Phase 5 (Week 5): ✅ COMPLETE

✅ Prometheus metrics export (text format with HELP/TYPE)
✅ Process lifecycle metrics (starts, stops, failures, restarts)
✅ Health check metrics (success/failure counters)
✅ Launch duration percentiles (p50, p95, p99)
✅ Isolation level distribution tracking
✅ JSON format export for custom dashboards
✅ Uptime tracking and availability metrics

Implementation Complete - All 5 phases delivered:

✅ Core launcher with gRPC API and pattern discovery
✅ Real process launching with health checks
✅ All isolation levels (NONE, NAMESPACE, SESSION)
✅ Production error handling and crash recovery
✅ Prometheus metrics and observability

Next Steps for Production Deployment:

Integration testing with actual Prism proxy
Performance testing (100+ concurrent namespaces, stress testing)
Documentation for deployment alternatives (K8s, systemd, Docker Compose)
Runbook for operational procedures (scaling, troubleshooting)
SLO definition (launch latency, availability, restart rates)

Summary​

Motivation​

Current Situation​

Why an Optional Launcher?​

Bulkhead Isolation Pattern​

Design​

Architecture Components​

1. Pattern Process Launcher (cmd/pattern-launcher)​

2. Pattern Manifest (patterns/<name>/manifest.yaml)​

3. Launch gRPC API​

Isolation Levels Explained​

Isolation Level: NONE (Shared Process Pool)​

Isolation Level: NAMESPACE (Tenant Isolation)​

Isolation Level: SESSION (Maximum Isolation)​

Process Lifecycle with procmgr Integration​

Launch Request Flow​

Pattern Discovery​

Configuration​

Implementation Phases​

Phase 1: Core Launcher (Week 1)​

Phase 2: Process Launch (Week 2)​

Phase 3: Isolation Levels (Week 3) ✅ COMPLETE​

Phase 4: Termination and Cleanup (Week 4)​

Phase 5: Metrics and Observability (Week 5)​

Usage Examples​

Example 1: Proxy Launches Consumer Pattern​

Example 2: Local Development Workflow​

Example 3: Kubernetes Alternative (No Launcher Needed)​

Metrics and Observability​

Prometheus Metrics​

Health Check Response​

Security Considerations​

Performance Considerations​

Alternatives Considered​

Alternative 1: Kubernetes StatefulSets​

Alternative 2: systemd​

Alternative 3: Docker Compose​

Open Questions​

Success Criteria​

References​

Appendix A: Launch Request Sequence Diagram​

Appendix B: Isolation Level Comparison Table​

Implementation Status​

Summary

Motivation

Current Situation

Why an Optional Launcher?

Bulkhead Isolation Pattern

Design

Architecture Components

1. Pattern Process Launcher (`cmd/pattern-launcher`)

2. Pattern Manifest (`patterns/<name>/manifest.yaml`)

3. Launch gRPC API

Isolation Levels Explained

Isolation Level: NONE (Shared Process Pool)

Isolation Level: NAMESPACE (Tenant Isolation)

Isolation Level: SESSION (Maximum Isolation)

Process Lifecycle with procmgr Integration

Launch Request Flow

Pattern Discovery

Configuration

Implementation Phases

Phase 1: Core Launcher (Week 1)

Phase 2: Process Launch (Week 2)

Phase 3: Isolation Levels (Week 3) ✅ COMPLETE

Phase 4: Termination and Cleanup (Week 4)

Phase 5: Metrics and Observability (Week 5)

Usage Examples

Example 1: Proxy Launches Consumer Pattern

Example 2: Local Development Workflow

Example 3: Kubernetes Alternative (No Launcher Needed)

Metrics and Observability

Prometheus Metrics

Health Check Response

Security Considerations

Performance Considerations

Alternatives Considered

Alternative 1: Kubernetes StatefulSets

Alternative 2: systemd

Alternative 3: Docker Compose

Open Questions

Success Criteria

References

Appendix A: Launch Request Sequence Diagram

Appendix B: Isolation Level Comparison Table

Implementation Status