Skip to main content

RFC-062: Unified Authentication and Session Management

Summary

This RFC consolidates Prism's authentication and session management into a unified reference, superseding RFC-010, RFC-011, and RFC-019. It provides:

  • Unified authentication flows for three user types: humans (OIDC/JWT), services (platform identities), and admins (OIDC+MFA)
  • Complete session lifecycle management from establishment through renewal to teardown
  • Credential hierarchy spanning four layers: client auth → proxy-plugin → Vault → backend
  • Cross-region session replication for global user mobility (RFC-024 integration)
  • Implementation status audit identifying what's working (Phase 1) vs designed-only (Phases 2-4)
  • Clear roadmap with 16-week implementation plan across 4 phases

Motivation

Current State

Prism has authentication and authorization concepts spread across multiple documents:

  1. ADR-007: High-level auth strategy (mTLS + OAuth2/JWT)
  2. RFC-010: Admin API OIDC authentication
  3. RFC-011: Data proxy mTLS + secrets provider abstraction
  4. RFC-019: Plugin-layer token validation and Vault integration
  5. RFC-024: Distributed session store for cross-region state
  6. MEMO-008: Vault token exchange implementation details
  7. MEMO-041: E2E auth integration testing with Dex

Gaps and Inconsistencies

  1. No unified session model: Human vs service authentication handled separately
  2. Credential management unclear: When to use Vault vs secrets managers vs static credentials
  3. Session lifecycle undefined: No clear guidance on session establishment, renewal, expiry
  4. Cross-cutting concerns: Tracing, audit logging, metrics scattered across documents
  5. Implementation gaps: Token refresh, session affinity, credential rotation not fully specified

Goals

  1. Consolidate scattered documentation: Unify auth concepts from 7 separate documents into single reference
  2. Audit implementation status: Identify what's implemented vs designed-only to prevent confusion
  3. Define complete session model: End-to-end lifecycle for both human and service sessions
  4. Clarify credential management: Decision framework for Vault vs cloud secrets managers vs static credentials
  5. Provide implementation roadmap: Phased approach with timelines and priorities
  6. Document working patterns: Real code examples from implemented Phase 1 features

Architecture Overview

Unified Authentication Model

┌──────────────────────────────────────────────────────────────────────┐
│ Prism Authentication Architecture │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Identity Sources │ │
│ │ │ │
│ │ Human Users Service Identities Admin Users │ │
│ │ ├─ OIDC/JWT ├─ K8s SA Tokens ├─ OIDC/JWT │ │
│ │ ├─ Auth0/Okta ├─ AWS IAM Roles ├─ MFA Required │ │
│ │ ├─ Dex (local dev) ├─ Azure MI ├─ Short TTL │ │
│ │ └─ Google Workspace └─ GCP SA └─ Audit Logged │ │
│ └──────────────────┬───────────────────────┬────────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Token Validation & Session Establishment │ │
│ │ │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌─────────────┐ │ │
│ │ │ Proxy Layer │ │ Plugin Layer │ │ Admin API │ │ │
│ │ │ - JWT verify │ │ - Token→Vault │ │ - RBAC │ │ │
│ │ │ - Namespace │ │ - Credential │ │ - Audit │ │ │
│ │ │ - AuthZ │ │ - Exchange │ │ - Policy │ │ │
│ │ └────────────────┘ └────────────────┘ └─────────────┘ │ │
│ └────────────────────┬────────────────────┬─────────┬─────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Session Management │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ Session Store│ │ Credential │ │ Backend │ │ │
│ │ │ - Redis │ │ - Vault │ │ - Postgres/Redis │ │ │
│ │ │ - DynamoDB │ │ - AWS SM │ │ - Kafka/NATS │ │ │
│ │ │ - PostgreSQL │ │ - GCP SM │ │ - Per-session │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘

Key Principles

  1. Defense-in-Depth: Multiple layers validate authentication (proxy + plugin)
  2. Token-Based Identity: JWT tokens as universal identity carrier across all flows
  3. Credential Isolation: Per-session backend credentials, never shared between sessions
  4. Pluggable Backends: Support multiple secrets managers (Vault, AWS SM, GCP SM) and session stores
  5. Zero-Trust: Plugins validate independently, never trust upstream components

Key Decisions

This RFC makes the following architectural decisions:

  1. JWT Authentication First: OIDC/JWT for human users is Phase 1 (implemented), mTLS for services is Phase 3
  2. Proxy-Layer Authorization: Namespace permissions enforced at proxy for fast rejection of invalid requests
  3. Plugin-Layer Credentials: Token→credential exchange at plugin layer for per-session isolation
  4. Eventual Consistency for Sessions: Cross-region session replication uses eventual consistency (acceptable for auth context)
  5. Vault for Dynamic Credentials: HashiCorp Vault preferred for per-user backend credentials, cloud secrets managers for static credentials
  6. Sliding Window TTL: Sessions extend on activity (sliding window) with absolute max lifetime

Authentication Flows

Flow 1: Human User with OIDC (Primary)

Use Case: Interactive application, web UI, CLI tool

Key Points:

  • JWT validated at proxy layer (fast rejection of invalid tokens)
  • Token exchanged for backend credentials at plugin layer (per-session isolation)
  • Background renewal ensures credentials remain valid during long sessions
  • Vault revokes credentials on session end (automatic cleanup)

Flow 2: Service-to-Service with K8s SA / IAM

Use Case: Background job, aggregator service, ETL pipeline

Key Points:

  • Service identity from platform (K8s SA, IAM role, managed identity)
  • Long-lived sessions (service lifetime vs user session)
  • Credential renewal for services running hours/days
  • Graceful shutdown with credential revocation

Flow 3: Admin API with OIDC + RBAC

Use Case: Platform team managing Prism infrastructure

Key Points:

  • Admin API on separate port (8981 vs 8980 for data plane)
  • RBAC with admin:read, admin:write, admin:operational scopes
  • All operations audit logged with admin identity
  • Token refresh for long CLI sessions

Session Management

Session Lifecycle

Sessions in Prism support both short-lived interactive use and long-running service operations.

┌──────────────────────────────────────────────────────────────────┐
│ Session Lifecycle │
│ │
│ 1. ESTABLISHMENT │
│ ├─ Token validation (JWT signature + claims) │
│ ├─ Permission check (namespace + operation) │
│ ├─ Session ID generation (UUID v4) │
│ └─ Session store entry created │
│ │
│ 2. CREDENTIAL ACQUISITION │
│ ├─ Exchange token for Vault token │
│ ├─ Fetch backend credentials (dynamic, per-session) │
│ ├─ Establish backend connection │
│ └─ Store connection handle in session │
│ │
│ 3. ACTIVE USE │
│ ├─ Requests use cached credentials │
│ ├─ Last-accessed timestamp updated │
│ ├─ Metrics collected (request count, latency) │
│ └─ Audit events logged │
│ │
│ 4. RENEWAL (Background) │
│ ├─ Vault token renewed every lease_duration/2 │
│ ├─ Backend credential lease renewed │
│ ├─ Session TTL extended (sliding window) │
│ └─ Connection pool refreshed if needed │
│ │
│ 5. EXPIRATION / TEARDOWN │
│ ├─ TTL reached or explicit close │
│ ├─ Vault lease revoked │
│ ├─ Backend credentials deleted (DROP USER) │
│ ├─ Session store entry removed │
│ └─ Metrics/audit log final event │
│ │
└──────────────────────────────────────────────────────────────────┘

Session Store Schema

Sessions are stored in distributed session store (Redis Cluster, DynamoDB, PostgreSQL).

message SessionData {
// Unique session identifier
string session_id = 1;

// Session metadata
SessionMetadata metadata = 2;

// Credential information
CredentialInfo credentials = 3;

// Application-specific key-value data
map<string, google.protobuf.Any> data = 4;

// Lifecycle timestamps
google.protobuf.Timestamp created_at = 5;
google.protobuf.Timestamp last_accessed = 6;
google.protobuf.Timestamp expires_at = 7;

// Metrics
SessionMetrics metrics = 8;
}

message SessionMetadata {
// Identity
string user_id = 1; // JWT subject claim
string user_email = 2; // User email
string service_name = 3; // Service identity (if service)

// Context
string namespace = 4; // Primary namespace
string region = 5; // Region where created
string client_id = 6; // Client application

// Permissions
repeated string scopes = 7; // OAuth scopes
Permission permission = 8; // Read/Write/Admin

// Additional metadata
map<string, string> attributes = 9;
}

message CredentialInfo {
string vault_token = 1;
string vault_lease_id = 2;
google.protobuf.Timestamp credential_expires_at = 3;

// Backend connection info (opaque to session store)
string backend_id = 4;
google.protobuf.Any backend_handle = 5;
}

message SessionMetrics {
int64 request_count = 1;
int64 bytes_sent = 2;
int64 bytes_received = 3;
double avg_latency_ms = 4;
google.protobuf.Timestamp last_activity = 5;
}

enum Permission {
PERMISSION_UNSPECIFIED = 0;
PERMISSION_READ = 1;
PERMISSION_WRITE = 2;
PERMISSION_ADMIN = 3;
}

Session Store Backends

Multiple backends supported for session storage:

BackendConsistencyLatency (P99)Use Case
Redis ClusterEventual2msHigh throughput, low latency
DynamoDB GlobalEventual15msAWS native, auto-scaling
PostgreSQL + pglogicalStrong (sync)10msStrong consistency required
CockroachDBSerializable20msGlobal distributed SQL

Configuration Example:

session_store:
backend: redis-cluster
config:
addresses:
- redis-us-1:6379
- redis-us-2:6379
- redis-us-3:6379
pool_size: 100

replication:
enabled: true
strategy: eventual
regions:
- us-west-2
- eu-central-1
- ap-southeast-1
sync_interval: 100ms

ttl:
default: 86400 # 24 hours
sliding_window: true # Extend on activity
max_lifetime: 604800 # 7 days absolute max

Credential Management

Credential Hierarchy

Prism uses a layered approach to credentials based on context:

┌──────────────────────────────────────────────────────────────────┐
│ Credential Hierarchy │
│ │
│ Layer 1: CLIENT AUTHENTICATION │
│ ├─ Human Users: JWT tokens from OIDC provider │
│ ├─ Services: K8s SA tokens, AWS IAM, Azure MI, GCP SA │
│ └─ Admin: JWT tokens with admin scopes + MFA │
│ │
│ Layer 2: PROXY ← → PLUGIN │
│ ├─ mTLS: Client certificates for mutual authentication │
│ └─ Or: Plaintext HTTP/2 within trusted network │
│ │
│ Layer 3: VAULT AUTHENTICATION │
│ ├─ JWT Auth: Exchange client JWT for Vault token │
│ ├─ K8s Auth: Exchange K8s SA token for Vault token │
│ ├─ AWS Auth: IAM role authentication │
│ └─ Azure/GCP Auth: Managed identity authentication │
│ │
│ Layer 4: BACKEND CREDENTIALS │
│ ├─ Dynamic: Vault generates per-session credentials │
│ ├─ Static: Retrieve from AWS/GCP/Azure Secrets Manager │
│ └─ Rotation: Background renewal every lease_duration/2 │
│ │
└──────────────────────────────────────────────────────────────────┘

When to Use Vault vs Cloud Secrets Managers

Use HashiCorp Vault when:

  • ✅ Dynamic credential generation required (per-user DB accounts)
  • ✅ Short-lived credentials with automatic rotation
  • ✅ Multi-cloud deployment (AWS + Azure + GCP)
  • ✅ Fine-grained access control policies needed
  • ✅ Audit trail of credential access required

Use Cloud Secrets Managers (AWS/GCP/Azure) when:

  • ✅ Cloud-native deployment (single cloud provider)
  • ✅ Static credentials sufficient (shared service accounts)
  • ✅ Integration with cloud IAM (no separate auth system)
  • ✅ Lower operational complexity preferred
  • ✅ Cost optimization (avoid Vault infrastructure)

Hybrid Approach (Recommended):

credential_sources:
# Vault for dynamic human-user credentials
vault:
enabled: true
address: https://vault.internal:8200
auth_methods:
- jwt
- kubernetes
- aws

backends:
- postgres: database/creds/postgres-role
- redis: database/creds/redis-role

# AWS Secrets Manager for static service credentials
aws_secrets:
enabled: true
region: us-west-2
secrets:
- kafka-broker-password
- s3-backup-access-keys

# Precedence: vault > aws_secrets > gcp_secrets > azure_keyvault
precedence:
- vault
- aws_secrets
- gcp_secrets
- azure_keyvault

Credential Rotation Strategy

For Dynamic Credentials (Vault):

  • Lease duration: 1 hour (configurable)
  • Renewal interval: Every 30 minutes (lease_duration/2)
  • Max lease: 2 hours (prevents infinite renewal)
  • Auto-revoke on session end

For Static Credentials (Secrets Managers):

  • Manual rotation: Quarterly or on compromise
  • Version tracking: Keep 2 previous versions
  • Zero-downtime rotation: New credentials fetched before old expire
  • Alert on rotation failures

Cross-Region Session Replication

Replication Architecture

Sessions replicate across regions to support global user mobility.

┌──────────────────────────────────────────────────────────────────┐
│ Cross-Region Session Replication │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌────────────┐│
│ │ US Region │ │ EU Region │ │ APAC Region││
│ │ │ │ │ │ ││
│ │ Session │ ──────> │ Session │ ──────> │ Session ││
│ │ Store │ Async │ Store │ Async │ Store ││
│ │ (Primary) │ Replic │ (Replica) │ Replic │ (Replica) ││
│ │ │ │ │ │ ││
│ │ sess-abc123 │ │ sess-abc123 │ │ sess-abc123││
│ │ { │ │ { │ │ { ││
│ │ user: │ │ user: │ │ user: ││
│ │ alice │ │ alice │ │ alice ││
│ │ region: │ │ region: │ │ region: ││
│ │ us-west-2 │ │ us-west-2 │ │ us-west-2││
│ │ data: {..}│ │ data: {..}│ │ data:{..}││
│ │ } │ │ } │ │ } ││
│ └─────────────┘ └─────────────┘ └────────────┘│
│ │
│ Replication Lag: ~100ms (Redis) | ~500ms (DynamoDB Global) │
│ Conflict Resolution: Last-Write-Wins (timestamp-based) │
│ Consistency: Eventual (acceptable for sessions) │
└──────────────────────────────────────────────────────────────────┘

User Mobility Flow:

  1. User creates session in US region
  2. Session data stored in US Redis Cluster
  3. Async replication copies session to EU/APAC (100-500ms lag)
  4. User travels to Europe, connects to EU proxy
  5. EU proxy queries local EU Redis for session
  6. Session found (replicated), user continues with same session_id
  7. Updates made in EU replicate back to US/APAC

Implementation Guide

Plugin SDK Integration

Pattern plugins use the SDK for zero-boilerplate auth integration.

Step 1: Add interceptor to gRPC server:

import "github.com/jrepp/prism-data-layer/pkg/plugin"

func main() {
grpcServer := grpc.NewServer(
grpc.UnaryInterceptor(plugin.AuthLoggingInterceptor()),
grpc.StreamInterceptor(plugin.AuthStreamInterceptor()),
)

// Register your service
pb.RegisterKeyValueServiceServer(grpcServer, &MyPlugin{})

// Start server
lis, _ := net.Listen("tcp", ":50051")
grpcServer.Serve(lis)
}

Step 2: Extract auth context in handlers:

func (p *MyPlugin) Get(ctx context.Context, req *pb.GetRequest) (*pb.GetResponse, error) {
// Extract auth context (zero-boilerplate)
authCtx := plugin.ExtractAuthContext(ctx)

// Log with user identity
slog.Info("Get operation",
"key", req.Key,
"trace_id", authCtx.TraceID,
"user_id", authCtx.UserID,
"namespace", authCtx.Namespace,
"permission", authCtx.Permission,
)

// Optional: Secondary authorization check
if !authCtx.HasPermission(plugin.PermissionRead) {
return nil, status.Error(codes.PermissionDenied, "read permission required")
}

// Your handler logic here...
}

Step 3: Use Vault for backend credentials:

import "github.com/jrepp/prism-data-layer/pkg/authz"

func (p *MyPlugin) createSession(ctx context.Context, token string) error {
// Step 1: Validate token
claims, err := p.tokenValidator.Validate(ctx, token)
if err != nil {
return fmt.Errorf("invalid token: %w", err)
}

// Step 2: Exchange for Vault token
vaultToken, leaseDuration, err := p.vaultClient.AuthenticateWithJWT(ctx, token)
if err != nil {
return fmt.Errorf("vault auth failed: %w", err)
}

// Step 3: Fetch backend credentials
creds, err := p.vaultClient.GetBackendCredentials(ctx)
if err != nil {
return fmt.Errorf("credential fetch failed: %w", err)
}

// Step 4: Establish backend connection
redisClient := redis.NewClient(&redis.Options{
Addr: "localhost:6379",
Username: creds.Username, // Per-session username
Password: creds.Password, // Short-lived password
})

// Step 5: Start background renewal
sessionCtx, cancel := context.WithCancel(context.Background())
go p.vaultClient.StartCredentialRenewal(sessionCtx, creds)

// Store session
p.sessions[claims.UserID] = &Session{
Claims: claims,
Credentials: creds,
Client: redisClient,
CancelFunc: cancel,
}

return nil
}

Security Considerations

Token Security

  1. JWT Validation:

    • Verify signature with JWKS public keys
    • Check issuer, audience, expiration claims
    • Cache validated tokens (keyed by hash, expire with token)
  2. Token Theft Mitigation:

    • Short TTL (15 minutes for access tokens)
    • Refresh token rotation on each use
    • Token binding to client IP (optional, via custom claim)
  3. Token Revocation:

    • Maintain revocation list in Redis (check on validation)
    • Admin API to revoke user tokens
    • Automatic revocation on logout

Credential Security

  1. Per-Session Isolation:

    • Each session gets unique backend username
    • Credentials never shared across sessions
    • Automatic cleanup on session end
  2. Credential Rotation:

    • Vault rotates every lease_duration (1h default)
    • Background renewal prevents expiration
    • Alert on renewal failures > 3 consecutive
  3. Secret Storage:

    • Never log passwords or tokens (redact in logs)
    • Encrypt session store if storing sensitive data
    • Use platform secret managers (K8s secrets, AWS SM)

Network Security

  1. TLS Everywhere:

    • Client → Proxy: TLS 1.3
    • Proxy → Plugin: mTLS (production) or plaintext (trusted network)
    • Plugin → Vault: TLS with cert pinning
    • Plugin → Backend: TLS if supported
  2. Network Segmentation:

    • Admin API on internal network only (port 8981)
    • Data plane accessible externally (port 8980)
    • Vault on internal network, not internet-facing

Monitoring and Observability

Metrics

Authentication Metrics:

prism_auth_requests_total{result="success|failed",method="jwt|service|admin"}
prism_auth_latency_seconds{method="jwt|service|admin"}
prism_token_validations_total{result="success|expired|invalid"}
prism_permission_checks_total{result="allowed|denied",permission="read|write"}

Session Metrics:

prism_sessions_active{region="us|eu|apac",type="human|service"}
prism_session_duration_seconds{type="human|service"}
prism_session_operations_total{namespace}
prism_session_replication_lag_seconds{source_region,target_region}

Credential Metrics:

prism_vault_auth_total{result="success|failed",method="jwt|k8s|aws"}
prism_credential_fetch_total{result="success|failed",backend="postgres|redis"}
prism_credential_renewals_total{result="success|failed"}
prism_credential_ttl_seconds{backend}

Logging

Audit Events:

{
"event": "auth.request",
"timestamp": "2025-11-16T10:00:00Z",
"trace_id": "a7f3c8d1-9e4b-4f2a-8d6c-3e5f7a9b2c4d",
"user_id": "alice@example.com",
"namespace": "team-alpha",
"operation": "keyvalue.Get",
"permission": "read",
"decision": "allowed",
"latency_ms": 2.3
}

Session Events:

{
"event": "session.created",
"session_id": "sess-abc123",
"user_id": "alice@example.com",
"region": "us-west-2",
"ttl_seconds": 86400
}

Credential Events:

{
"event": "credential.fetched",
"user_id": "alice@example.com",
"backend": "postgres",
"username": "v-jwt-alice-xyz",
"lease_id": "database/creds/...",
"lease_duration_seconds": 3600
}

Alerts

Critical:

  • Auth failure rate > 10% (possible attack or misconfiguration)
  • Vault unreachable (credential fetch failures)
  • Session replication lag > 5s (data loss risk)

Warning:

  • Token validation P99 latency > 10ms
  • Credential renewal failures > 3 consecutive
  • Active sessions > 80% capacity

Configuration Reference

Proxy Configuration

# prism-proxy.yaml
auth:
enabled: true

# JWT validation
jwt:
issuer: https://auth.example.com
audience: prism-api
jwks_url: https://auth.example.com/.well-known/jwks.json
cache_ttl: 1h

# Namespace policies (loaded from config store)
policies:
source: database # or: file, http
refresh_interval: 60s

# Auth context injection
context:
inject_headers: true
trace_id_generator: uuid_v4

session_store:
backend: redis-cluster
config:
addresses: [redis-1:6379, redis-2:6379, redis-3:6379]
pool_size: 100

ttl:
default: 86400
sliding_window: true
max_lifetime: 604800

admin_api:
enabled: true
port: 8981

auth:
required: true
issuer: https://auth.example.com
audience: prism-admin

rbac:
policies:
admin:
permissions: [admin:read, admin:write, admin:operational, admin:audit]
operator:
permissions: [admin:read, admin:operational]
viewer:
permissions: [admin:read]

Plugin Configuration

# patterns/redis/config.yaml
auth:
token:
enabled: true
issuer: https://auth.example.com
audience: prism-patterns
cache_ttl: 1h

vault:
enabled: true
address: https://vault.internal:8200
role: prism-redis-plugin
auth_path: auth/jwt
secret_path: database/creds/redis-role
renew_interval: 1800s # 30 minutes

tls:
enabled: true
ca_cert: /etc/prism/vault-ca.pem

Migration Path

Phase 1: Core Auth Infrastructure (Weeks 1-2)

  1. Deploy Vault with JWT/K8s/AWS auth methods
  2. Configure secrets engines for each backend
  3. Set up OIDC provider (Dex for dev, Auth0/Okta for prod)
  4. Deploy session store (Redis Cluster or DynamoDB)

Phase 2: Proxy Integration (Weeks 3-4)

  1. Enable JWT validation in proxy
  2. Implement namespace authorization policies
  3. Add auth context header injection
  4. Deploy to staging, run integration tests

Phase 3: Plugin SDK (Weeks 5-6)

  1. Implement auth SDK in pkg/plugin
  2. Add Vault client with token exchange
  3. Update pattern plugins (keyvalue, consumer, producer)
  4. Test with real backends (Postgres, Redis, Kafka, NATS)

Phase 4: Session Replication (Weeks 7-8)

  1. Deploy session store replicas in EU/APAC
  2. Configure cross-region replication
  3. Test user mobility (connect to different regions)
  4. Measure replication lag, tune consistency

Phase 5: Production Rollout (Weeks 9-10)

  1. Enable auth in production (behind feature flag)
  2. Monitor auth metrics, session metrics
  3. Gradual rollout to applications
  4. Document patterns and troubleshooting

Open Questions

1. Session Affinity vs Session Mobility

Question: Should we prefer session affinity (route user to same proxy) or full mobility?

Options:

  • Affinity: Lower latency (no session store lookup), simpler
  • Mobility: Better load balancing, fault tolerance

Recommendation: Mobility with local caching (best of both worlds)

2. Credential Rotation During Long Operations

Question: What if credentials expire mid-operation (e.g., long scan)?

Options:

  • Fail operation (secure but poor UX)
  • Allow completion (better UX, security risk)
  • Token refresh (complex but best)

Recommendation: Allow completion with TTL margin (5 minutes buffer)

3. Multi-Tenancy Isolation Model

Question: How to isolate sessions across tenants?

Options:

  • Namespace-based (tenant = namespace)
  • User-based (user can access multiple tenants)
  • Hybrid (namespace + user permissions)

Recommendation: Namespace-based with user permissions (see RFC-056)

Implementation Status

What's Implemented ✅

  1. Proxy Layer Authentication (prism-proxy/src/auth.rs):

    • JWT token validation with JWKS from OIDC provider
    • Namespace-based authorization policies
    • Auth context header injection (trace-id, user-id, permission)
    • Support for Read/Write permissions based on gRPC method
    • Test policies for development (team-alpha, team-beta, shared namespaces)
  2. Pattern Plugin SDK (pkg/plugin/):

    • AuthContext extraction from gRPC metadata (zero-boilerplate)
    • Auth logging interceptors (unary and streaming RPCs)
    • Permission and scope checking helpers
    • Integration with pattern runners (keyvalue, consumer, producer)
  3. OIDC Integration (cmd/prismctl/internal/auth/oidc.go):

    • Device code flow for CLI authentication
    • Authorization code flow with local callback server
    • Password grant for testing
    • Token refresh capability
    • Token caching to ~/.prism/token
  4. E2E Integration Testing (tests/testing/auth_integration_test.go):

    • Testcontainers-based Dex backend
    • 10 comprehensive test scenarios
    • Real JWT validation with pattern runners
    • Auth context pass-through verification

What's Documented But Not Implemented ❌

  1. HashiCorp Vault Integration:

    • Token exchange (JWT → Vault token) - Documented in MEMO-008
    • Dynamic credential generation - Design complete
    • Credential renewal/rotation - Architecture defined
    • Status: No Vault SDK imports found in codebase
  2. Service Identity Authentication:

    • Kubernetes ServiceAccount token exchange - Designed
    • AWS IAM role authentication - Designed
    • Azure Managed Identity - Designed
    • GCP Service Account - Designed
    • Status: Code examples in MEMO-008, not implemented
  3. Distributed Session Store:

    • Cross-region session replication - RFC-024 complete
    • Session lifecycle management - Fully designed
    • Redis/DynamoDB/PostgreSQL backends - Not implemented
    • Status: Design complete, zero implementation
  4. Admin API Authentication:

    • RBAC for admin operations - RFC-010 proposed
    • Admin scopes (admin:read, admin:write) - Designed
    • Audit logging - Partially implemented (pattern layer only)
    • Status: Admin API exists, no auth enforcement yet

Implementation Priorities

Phase 1: Production-Ready Auth (Current)

  • ✅ JWT validation at proxy
  • ✅ Namespace authorization
  • ✅ Auth context to plugins
  • ✅ Integration testing

Phase 2: Vault Integration (Next, 4-6 weeks)

  • ❌ Vault client in pkg/authz
  • ❌ JWT → Vault token exchange
  • ❌ Dynamic credential fetching
  • ❌ Background credential renewal
  • ❌ Per-session backend credentials

Phase 3: Service Identity (8-10 weeks)

  • ❌ K8s ServiceAccount auth
  • ❌ AWS IAM role auth
  • ❌ Service-to-service patterns
  • ❌ Long-running service sessions

Phase 4: Distributed Sessions (12-14 weeks)

  • ❌ Session store backend selection
  • ❌ Cross-region replication
  • ❌ Session lifecycle management
  • ❌ User mobility support

Revision History

  • 2025-11-16: Initial unified RFC consolidating auth and session management
  • 2025-11-16: Added implementation status audit - Phase 1 complete, Vault integration pending