Documentation Change Log
Quick access to recently updated documentation. Changes listed in reverse chronological order (newest first).
Recent Changesโ
2025-11-17โ
Documentation Feedback Mechanism ๐ฌโ
Branch: docs/manual-testing-readme-and-memo-049-fix
Summary: Added interactive feedback mechanism to GitHub Pages documentation site. Users can select text and submit feedback directly to GitHub Issues, enabling Claude Code agents to query and act on documentation improvement suggestions.
Features:
- Text Selection Feedback: Select any text on documentation pages to reveal feedback button
- GitHub Integration: Creates structured GitHub Issues with selected text, page context, and user comments
- GitHub Authentication: Only users with repository access can submit feedback (no spam)
- Zero Infrastructure: Pure static site implementation using GitHub Issues as backend
- Claude Code Integration: Python tooling for querying and processing feedback programmatically
Components:
docusaurus/src/components/Feedback/: React components for feedback UIFeedbackButton.tsx: Floating button that appears on text selectionFeedbackModal.tsx: Preview modal before redirecting to GitHubFeedback.module.css: Responsive styling with dark mode support
docusaurus/src/theme/Root.tsx: Global integration into all pagestooling/query_doc_feedback.py: CLI tool for querying feedback issues
Usage:
# List all open feedback issues
uv run tooling/query_doc_feedback.py list
# Show details for specific issue
uv run tooling/query_doc_feedback.py show 123
# Export feedback as JSON
uv run tooling/query_doc_feedback.py export --output feedback.json
# Close issue after addressing
uv run tooling/query_doc_feedback.py close 123 --comment "Fixed in commit abc123"
Issue Format: Feedback issues include:
- Selected text and surrounding context
- Page URL and path
- Structured JSON metadata for automation
- Labels:
doc-feedback,needs-review
Benefits:
- Users can provide specific feedback on confusing sections
- Claude Code can discover areas needing improvement
- No database or backend infrastructure required
- Integrated with existing GitHub workflow
- Trackable and actionable feedback loop
Netflix Video Reference: Database Migrations at Scaleโ
Summary: Added AWS re:Invent 2023 video reference featuring Netflix engineers discussing strategies for safely migrating databases that handle millions of requests per second.
New Document:
- Video: Database Migrations at Millions RPS: Zero-downtime migration patterns, dual-write strategies, and shadow traffic validation
Updates:
- Netflix index page now includes direct links to all three video references
Unified Authentication and Session Management (RFC-062) ๐โ
Branch: massive-scale-graph-rfcs
Summary: Comprehensive RFC unifying Prism's authentication and session management across human users, service identities, and admin operations. Consolidates existing work from RFC-010, RFC-011, RFC-019, RFC-024, MEMO-008, and MEMO-041 into a single reference with clear implementation status.
Key Document:
- RFC-062: Unified Authentication and Session Management:
- End-to-end authentication flows for humans (OIDC/JWT) and services (K8s SA, AWS IAM, Azure MI, GCP SA)
- Complete session lifecycle management from establishment to teardown
- Credential hierarchy: client auth โ proxy-plugin auth โ Vault โ backend credentials
- Cross-region session replication for global user mobility
- Defense-in-depth authorization at proxy and plugin layers
- Detailed implementation guide with working code examples
Implementation Status Audit:
โ Phase 1 Complete (Production-Ready Auth):
- JWT validation at proxy layer (
prism-proxy/src/auth.rs) - Namespace-based authorization policies with test users
- Auth context injection via HTTP/2 headers (trace-id, user-id, permission)
- Zero-boilerplate plugin SDK (
pkg/plugin/auth_context.go,auth_interceptor.go) - E2E integration testing with Dex OIDC provider (
tests/testing/auth_integration_test.go)
โ Pending Implementation:
- HashiCorp Vault integration (JWT โ Vault token exchange, dynamic credentials)
- Service identity authentication (K8s SA, AWS IAM, Azure MI, GCP SA)
- Distributed session store with cross-region replication
- Admin API RBAC enforcement
Architecture Highlights:
- Token Validation: JWT signature verification with JWKS from OIDC provider
- Authorization: Namespace-based Read/Write permissions at proxy layer
- Audit Logging: User identity and trace IDs propagated to pattern runners
- Zero-Copy Forwarding: Auth headers injected via HTTP/2 HEADERS frame modification
- Pluggable Backends: Support for multiple session stores (Redis, DynamoDB, PostgreSQL)
Security Features:
- Per-session backend credentials (never shared across sessions)
- Automatic credential rotation (every lease_duration/2)
- Token refresh for long-running sessions
- Defense-in-depth with proxy + plugin authorization
- Comprehensive audit trail with distributed tracing
Related Documents:
- Supersedes RFC-010: Admin Protocol with OIDC
- Supersedes RFC-011: Data Proxy Authentication
- Supersedes RFC-019: Plugin SDK Authorization Layer
- Integrates with RFC-024: Distributed Session Store Pattern
- Leverages MEMO-008: Vault Token Exchange Flow
- Built on MEMO-041: Auth Integration Testing Guide
Remove Manual Testing Directory ๐งนโ
Branch: docs/manual-testing-readme-and-memo-049-fix
Summary: Removed testing/manual/ directory and associated documentation. Manual testing should be performed using actual Prism tooling (prismctl, acceptance tests) rather than standalone bash scripts.
Changes:
- Removed
testing/manual/directory and README - Updated
prism-proxy/TRANSPARENT_PROXY.mdto reference automated tests - Redirected users to proper testing infrastructure:
- Unit tests:
cargo test(Rust),go test(Go) - Integration tests:
task test:integration-all - Acceptance tests:
task test:acceptance-all - User acceptance:
prismctl local start+ manual interaction
- Unit tests:
Rationale:
- Manual bash scripts lacked assertions and proper error handling
- Not integrated with CI/CD pipeline
- Duplicated functionality covered by automated tests
- User acceptance testing should use real Prism tooling, not test scripts
- See MEMO-040 for historical context
2025-11-15โ
Massive-Scale Graph RFCs for 100B Vertices ๐โ
Branch: massive-scale-graph-rfcs
Summary: Comprehensive set of 5 RFCs defining distributed graph architecture for massive-scale graphs with 100 billion vertices, trillions of edges, hot/cold storage tiers, distributed query execution, and fine-grained authorization.
Key RFCs:
-
RFC-057: Massive-Scale Graph Sharding:
- Hierarchical 3-tier sharding (Cluster โ Proxy โ Partition)
- Supports 100B vertices across 1000+ lightweight nodes
- Locality-aware partitioning strategies (consistent hashing, key-range, label-based)
- Dynamic partition rebalancing without downtime
- Query routing and cross-partition coordination
-
RFC-058: Multi-Level Graph Indexing:
- 4-tier index hierarchy (Partition โ Proxy โ Cluster โ Global)
- Online index building without blocking queries
- Distributed WAL for incremental index updates
- Bloom filter cascade for fast negative lookups (333ร speedup)
- Inverted edge indexes for backward traversals (972,000ร speedup)
-
RFC-059: Hot/Cold Storage Tiers with S3 Snapshots:
- 95% cost reduction vs pure in-memory ($12.5k/month vs $105k/month)
- Multi-format snapshot loading (Parquet, Prometheus/Thanos, HDFS, Protobuf, JSON Lines)
- Parallel snapshot loading: 210 TB in 17 minutes (1000 workers)
- ML-based hot/cold classification with automatic tier management
- Distributed WAL with multi-tier updates
-
RFC-060: Distributed Gremlin Query Execution:
- Full Apache TinkerPop Gremlin specification support
- Partition pruning using multi-level indexes (10-100ร speedup)
- Adaptive parallelism based on intermediate result sizes
- Cost-based query optimization with selectivity estimation
- Result streaming without full materialization
-
RFC-061: Graph Authorization with Vertex Labels:
- Vertex-level fine-grained access control
- Label-based authorization (tags like "pii", "confidential", "org:acme")
- Hierarchical clearances with wildcard matching (org:acme:**)
- Authorization push-down to partition level (<100 ฮผs overhead)
- Comprehensive audit logging for compliance (GDPR, SOC2)
Architecture Highlights:
- Scale: 100B vertices, 10T edges, 1000 proxies, 210 TB total data
- Performance: Sub-second queries for common traversals
- Cost: 86% reduction via hot/cold tiering
- Security: Fine-grained authorization at vertex level
Use Cases:
- Global social network graphs (Facebook/Meta scale)
- Financial transaction networks (SWIFT/Visa scale)
- IoT device networks (50B+ devices)
- Knowledge graphs at web scale (Google KG scale)
Related Documents:
- Extends RFC-055: Graph Pattern
- Integrates with RFC-048: Cross-Proxy Partition Strategies
- Leverages RFC-046: Consolidated Pattern Protocols
- Uses RFC-051: Write-Ahead Log Pattern
Graph Implementation Readiness Assessment (MEMO-081) ๐ฏโ
Branch: massive-scale-graph-rfcs
Summary: Comprehensive readiness assessment for implementing massive-scale graph features (100B vertices) in prism-data-layer, identifying 15 critical gaps across 5 implementation phases with 16-20 week timeline.
Key Document:
- MEMO-081: Graph Implementation Readiness Assessment:
- Analysis of RFC-057 through RFC-061 requirements against current codebase
- 15 critical gaps across distributed systems, storage, query infrastructure, authorization, and graph features
- Phase-by-phase implementation roadmap with acceptance criteria
- Risk assessment and mitigation strategies
- Testing strategy with coverage targets and success criteria
Implementation Phases:
-
Phase 1: Distributed Systems Foundations (4-6 weeks)
- Hierarchical partition registry (cluster โ proxy โ partition)
- Distributed routing table for opaque vertex IDs
- Distributed coordination service (etcd/Raft)
- Distributed write-ahead log (Kafka)
-
Phase 2: Storage Tier Infrastructure (4-5 weeks)
- Hot/cold/warm storage tier management
- Snapshot loading infrastructure (Parquet, Protobuf, JSON Lines)
- S3 multi-tier caching (Varnish โ CloudFront โ S3 Express)
-
Phase 3: Graph Query Infrastructure (4-5 weeks)
- Multi-level index infrastructure (hash, range, inverted, edge)
- Gremlin query parser and planner
- Distributed query execution engine
-
Phase 4: Authorization and Observability (3-4 weeks)
- Label-based access control (LBAC)
- Audit logging infrastructure
- Distributed tracing and observability
-
Phase 5: Graph-Specific Features (4-6 weeks)
- Super-node handling (sampling, circuit breakers)
- Dynamic partition rebalancing
Timeline: 16-20 weeks (4-6 months) of foundational work before graph implementation can begin
Risk Assessment:
- High-risk: Distributed coordination, WAL replay consistency, S3 cost explosion
- Mitigations: Use etcd (proven), comprehensive testing, mandatory caching
Recommendation: Invest 16-20 weeks in solid foundations rather than attempt graph implementation on inadequate infrastructure
20-Week Implementation Plan and MEMOs (MEMO-050 through MEMO-080) ๐โ
Branch: massive-scale-graph-rfcs
Summary: 24 MEMOs documenting 20 weeks of comprehensive work on massive-scale graph architecture, including production readiness analysis, RFC editing, copy editing, and production preparation.
Weeks 1-8: Foundation and Validation:
-
MEMO-050: Production Readiness Analysis (2,011 lines)
- 18 critical findings across cost modeling and operational concerns
- True TCO: $47M/year (not $7M) due to S3 request costs
- Network topology awareness: $365M/year savings with AZ-aware placement
-
MEMO-051: RFC Edit Summary (1,678 lines)
- 15 specific edits across RFCs 057-061
- Code examples and configuration snippets
- Priority-ordered action items (P0 โ P1 โ P2)
-
MEMO-052: Eight-Week Implementation Plan (1,345 lines)
- Extended 12-week timeline for thorough implementation
- Weeks 1-6: RFC implementation (2-3 edits per week)
- Weeks 7-8: Validation and integration testing
- Weeks 9-12: Extended copy editing
Weeks 9-12: Copy Editing and Readability:
- MEMO-061: Week 9 Heading Hierarchy Audit (353 lines)
- MEMO-062: Week 9 Paragraph Structure Review (332 lines)
- MEMO-063: Week 9 Code Example Placement (508 lines)
- MEMO-064: Week 9 Table Diagram Review (679 lines)
- MEMO-065: Week 10 Line-Level Copy Edit (681 lines)
- MEMO-066: Week 11 Terminology Consistency (454 lines)
- MEMO-067: Week 11 Code Style Consistency (546 lines)
- MEMO-068: Week 11 Final Consistency Review (438 lines)
- MEMO-069: Week 12 Executive Summary Review (503 lines)
- MEMO-070: Week 12 Technical Section Review (1,036 lines)
- MEMO-071: Week 12 Operations Review (899 lines)
- MEMO-072: Week 12 Final Readability Review (807 lines)
Weeks 13-20: Production Preparation:
- MEMO-073: Week 13 Storage Backend Evaluation (1,383 lines)
- MEMO-074: Week 14 Performance Benchmarking (837 lines)
- MEMO-075: Week 15 Disaster Recovery (899 lines)
- MEMO-076: Week 16 Cost Analysis (1,002 lines)
- MEMO-077: Week 17 Network Infrastructure (1,562 lines)
- MEMO-078: Week 18 Observability Stack (2,340 lines)
- MEMO-079: Week 19 Development Tooling (1,647 lines)
- MEMO-080: Week 20 Infrastructure Gaps (1,903 lines)
Total Documentation: 24 MEMOs, ~20,000 lines covering production readiness, copy editing, and implementation planning
2025-01-15โ
Template Consolidation and Author Field Standardization ๐โ
Branch: main
Summary: Consolidated duplicate ADR templates into single canonical template and added author field to all ADRs for consistent attribution.
Key Changes:
- Template Consolidation:
- Merged
docs-cms/templates/ADR-TEMPLATE.mdintodocs-cms/adr/adr-000-template.md - Removed entire
docs-cms/templates/directory - Single canonical ADR template with comprehensive sections
- Enhanced template with evaluation criteria and best practices
- Merged
- Author Field Addition:
- Added
authorfield to all 62 ADRs - Set author="Jacob Repp" across all ADRs
- Maintains existing
decidersfield for decision ownership - ADR README.md updated with author and deciders fields
- Added
Template Improvements:
- Added evaluation criteria section for objective decision-making
- Enhanced consequences section (Positive/Negative/Neutral)
- Better structured alternatives considered section
- Clearer frontmatter with inline generation snippets
- Cross-references to ADR best practices guide
Benefits:
- Single Source of Truth: One canonical ADR template for all new ADRs
- Consistent Attribution: Clear authorship across all documentation
- Simplified Maintenance: Fewer templates to keep in sync
- Better Guidance: Enhanced template with evaluation framework
Files Deleted:
docs-cms/templates/ADR-TEMPLATE.mddocs-cms/templates/MEMO-TEMPLATE.mddocs-cms/templates/RFC-TEMPLATE.md
Files Modified:
docs-cms/adr/adr-000-template.md(consolidated and enhanced)- 62 ADR files (added author field)
docs-cms/adr/README.md(added author and deciders)
Related:
- Complements PR #135: ADR Best Practices Guide
- Works with PR #136: Timestamp Standardization
- Uses tool from PR #137: Bulk Frontmatter Update Tool
Standardize All Documents to Use Created/Updated Timestamps ๐ โ
Branch: main
Summary: Migrated all document types (ADRs, RFCs, MEMOs) to use consistent created and updated timestamp fields based on git history.
Breaking Change: ADRs now use created and updated fields instead of single date field.
Key Changes:
- Template Updates: All templates now use
createdandupdatedfields- ADR templates: Migrated from single
datefield tocreated/updated - RFC/MEMO templates: Already used this format (no changes needed)
- Clear comments explaining when to update each field
- ADR templates: Migrated from single
- Script Enhancement: Updated
tooling/update_doc_timestamps.py- Automatic migration from legacy
datefield tocreated/updated - Unified handling for all document types
- Smart detection and conversion of old format
- Automatic migration from legacy
- Schema Update: Updated
tooling/doc_schemas.py- ADR validation now requires
createdandupdatedfields - Removed legacy
datefield requirement - Consistent validation across all document types
- ADR validation now requires
- Mass Migration: Converted 60 ADRs from
datetocreated/updated
Benefits:
- Consistent timestamp format across all document types
- Tracks both creation date (immutable) and last modification (updates)
- Git history remains source of truth
- Better tracking of document lifecycle
- Enables queries like "recently updated" vs "recently created"
Migration Details:
- 60 ADRs migrated from legacy format
- 141 total documents updated with git-based timestamps
- All documents validated successfully
- No manual intervention required
Related Files:
- Templates:
docs-cms/adr/adr-000-template.md,docs-cms/templates/ADR-TEMPLATE.md - Script:
tooling/update_doc_timestamps.py(enhanced) - Schema:
tooling/doc_schemas.py(updated)
Automated Document Timestamp Management ๐โ
Branch: main
Summary: Enhanced template date fields with generation snippets and created automated tooling to sync document timestamps with git history.
Key Features:
- Template Improvements: Updated all templates with clearer date generation instructions
- ADR template: Clarified that 'date' field tracks status changes
- RFC/MEMO templates: Already had detailed 'created' and 'updated' field comments
- Added multi-line comments for better readability
- Git-based Timestamp Script: New
tooling/update_doc_timestamps.py- Uses git log to find document creation date (first commit)
- Uses git log to find last modification date (most recent commit)
- ADRs: Updates 'date' field to last modification
- RFCs/MEMOs: Updates 'created' to first commit, 'updated' to last modification
- Supports dry-run mode for safe preview
- Verbose output for tracking changes
- Mass Update: Updated 145 of 165 documents with accurate git-based timestamps
Usage:
# Preview changes
uv run tooling/update_doc_timestamps.py --dry-run -v
# Update all documents
uv run tooling/update_doc_timestamps.py
# Update specific file
uv run tooling/update_doc_timestamps.py --path docs-cms/adr/adr-001-rust-for-proxy.md
Benefits:
- Accurate timestamps reflecting true document history
- Automated maintenance reduces manual errors
- Templates now have clearer instructions for new documents
- Git history becomes source of truth for document lifecycle
- Supports both date formats (single 'date' for ADRs, 'created'/'updated' for RFCs/MEMOs)
Updated Files:
- Template:
docs-cms/adr/adr-000-template.md(improved comments) - Script:
tooling/update_doc_timestamps.py(new) - 145 documents updated with accurate git-based timestamps
Bulk Frontmatter Update Tool ๐งโ
Branch: main
Summary: Created general-purpose command-line tool for bulk updates to YAML frontmatter fields across documentation, enabling safe and efficient metadata management.
Key Features:
- Four Operations:
set: Update or create field (overwrites existing)add: Create field only if doesn't existremove: Delete field from frontmatterrename: Change field name while preserving value
- Flexible Filtering:
- Filter by document type (
--type adr|rfc|memo) - Filter by glob patterns (
--glob "adr-0*") - Specify single file (
--path docs-cms/adr/adr-001.md)
- Filter by document type (
- Safety Features:
- Dry-run mode (
--dry-run) for safe preview - Verbose output (
-v) for tracking changes - Validation after updates
- Dry-run mode (
- Batch Processing: Efficiently updates hundreds of files in seconds
Usage Examples:
# Set status to Accepted for all ADRs
uv run tooling/bulk_update_frontmatter.py --type adr --set status=Accepted
# Add priority field to specific RFCs
uv run tooling/bulk_update_frontmatter.py --glob "rfc-04*" --add priority=high
# Remove deprecated field from all docs
uv run tooling/bulk_update_frontmatter.py --remove deprecated
# Rename field across all memos
uv run tooling/bulk_update_frontmatter.py --type memo --rename old_name=new_name
# Preview changes without modifying files
uv run tooling/bulk_update_frontmatter.py --type adr --set author="Jacob Repp" --dry-run
Benefits:
- Consistency: Update metadata across all documents uniformly
- Efficiency: Bulk operations replace tedious manual edits
- Safety: Dry-run mode prevents accidental changes
- Flexibility: Multiple operations and filtering options
- Maintainability: Single tool for all frontmatter management
Use Cases:
- Standardize author/decider fields across documentation
- Update status for multiple related ADRs
- Add new required fields to existing documents
- Remove deprecated fields during schema migrations
- Rename fields for consistency
Files Created:
tooling/bulk_update_frontmatter.py(411 lines)
Related Tools:
tooling/update_doc_timestamps.py- Git-based timestamp managementtooling/validate_docs.py- Document validation
ADR Best Practices Guide ๐โ
Branch: main
Summary: Comprehensive guide for creating, reviewing, and maintaining Architecture Decision Records (ADRs) in Prism, adapted from industry best practices.
Key Features:
- When to Create ADRs: Clear criteria for significant impact, long-term ramifications, and cross-team decisions
- Evaluation Framework: Explicit scoring rubric with weighted criteria for objective decision-making
- Do's and Don'ts: Practical guidance on writing effective ADRs
- DO: Keep brief (โค800 words), explain trade-offs, be honest about cons
- DON'T: Hide doubts, make it a sales pitch, skip deep investigation
- Status Lifecycle: Clear progression from Proposed โ Accepted โ Deprecated/Superseded
- Review Process: 3-5 day timeline with stakeholder engagement
- Integration with Workflow: Commands for creating, validating, and linking ADRs
New Resources:
- Guide:
docs-cms/guides/adr-best-practices.md - Slash Command:
/adr-guidefor quick reference - Updated
CLAUDE.mdwith ADR section references
Benefits:
- Forces thoughtful evaluation with explicit criteria and weights
- Improves team communication with documented rationale
- Prevents analysis paralysis ("pretty good now" beats "perfect later")
- Creates institutional knowledge that survives team changes
- Encourages honesty about trade-offs and uncertainties
Related:
- Template:
docs-cms/adr/adr-000-template.md - Example: ADR-049: Podman Optimization
2025-11-14โ
RFC-056: Unified Configuration Model ๐ฏโ
Branch: main
Summary: Consolidates and clarifies Prism's configuration story across all layers: user-facing namespace configuration, operator-managed backend/frontend registries, developer workflow, and authorization boundaries.
Key Contributions:
- Terminology Standardization: Establishes canonical terms across all docs
- "pattern" (not "api" or "client_api")
- "needs" (not "requirements" or "guarantees")
- "slot" (not "backend_requirement")
- "capability" vs "interface" distinction clarified
- Six-Layer Configuration Model: Clear hierarchy from user request to runtime execution
- Layer 1: User Request (Application Owner)
- Layer 2: Platform Policy (Authorization & Quotas)
- Layer 3: Pattern Selection (Platform Intelligence)
- Layer 4: Backend Binding (Operator-Managed)
- Layer 5: Frontend Exposure (Operator-Managed)
- Layer 6: Runtime Execution (Pattern Runners & Proxies)
- Authorization Consolidation: Unified permission model
- 3 Permission Levels: Guided (default), Advanced (approved), Expert (platform team)
- Team Quotas: Aggregate limits across namespaces
- Topaz Policy Integration: Fine-grained RBAC/ABAC enforcement
- Multi-Tenancy Model: Clear hierarchy and relationships
- Tenant โ Team โ Namespace hierarchy
- Cross-namespace patterns (shared backends, pub/sub communication)
- Cross-team access restrictions with policy-based exceptions
- Role-Based Responsibilities: Explicit definitions for each persona
- Application Owner (User): Controls pattern, needs, access policies
- Platform Operator (Operator): Manages backends, frontends, team quotas
- Platform Developer (Developer): Implements patterns, drivers, adapters
- Configuration Precedence Rules: Clear override hierarchy
- Platform Policy > User Request > Pattern Defaults > Backend Capabilities > Global Defaults
Documents Consolidated:
- ADR-002: Client-Originated Configuration
- ADR-006: Namespace and Multi-Tenancy
- ADR-007: Authentication and Authorization
- ADR-022: Dynamic Client Configuration
- ADR-050: Topaz Policy Authorization
- RFC-014: Layered Data Access Patterns
- RFC-027: Namespace Configuration Client Perspective
- RFC-039: Backend Configuration Registry
- MEMO-006: Backend Interface Decomposition
Benefits:
- Single consistent vocabulary across all configuration documents
- Clear understanding of who controls what configuration
- Eliminates confusion between different permission/authorization models
- Provides mental model for configuration flow from user intent to execution
- No breaking changes - consolidation only
Related:
Documentation Updates: RFC-056 Consistency and Cross-Linking ๐โ
Branch: main
Summary: Updated 8 core configuration documents with RFC-056 cross-references and standardized terminology.
Documents Updated:
- ADR-002: Client-Originated Configuration - Added permission level references
- ADR-006: Namespace and Multi-Tenancy - Added multi-tenancy hierarchy reference
- ADR-007: Authentication and Authorization - Added authorization model reference
- ADR-050: Topaz Policy Authorization - Added Topaz integration reference
- RFC-014: Layered Data Access Patterns - Updated to Layer 3 reference
- RFC-027: Namespace Configuration - Standardized "pattern" vs "client_api" terminology
- RFC-039: Backend Configuration Registry - Added Layer 4 & 5 reference
- MEMO-006: Backend Interface Decomposition - Added backend binding reference
Key Changes:
- Terminology Standardization: Consistent use of "pattern" (not "client_api"), "needs" (not "requirements")
- Cross-Linking: Added 35 new cross-references between related documents (627 total links)
- Consolidation Notes: Each document now references RFC-056 for the complete unified model
- Related Documents: Comprehensive cross-reference sections added to all documents
Benefits:
- Easier navigation between related concepts
- Clear indication of how documents relate to unified model
- Consistent vocabulary across entire documentation set
2025-11-11โ
Testing: Migrate Manual Tests to Automated Go Integration Tests ๐งชโ
Branch: migrate-manual-tests-to-automated
Summary: Migrated manual bash test scripts to automated Go integration tests and enhanced e2e test suite with comprehensive test coverage.
Key Changes:
-
Enhanced E2E Tests (
tests/e2e/transparent_proxy_test.go):- Added "Expire existing key" test case covering TTL operations
- Added "Complete CRUD sequence" test covering full lifecycle
- Tests now cover all scenarios from manual test scripts
-
New Test Tasks (testing/Taskfile.yml):
test:e2e-transparent-proxy- Run transparent proxy E2E teststest:e2e-all- Run all E2E tests with infrastructure- Updated
test:allto include e2e tests in comprehensive suite - Updated help output to document e2e test tasks
-
Documentation Templates Updated:
- Added UUID generation snippets:
uuidgen | tr '[:upper:]' '[:lower:]'orpython -c "import uuid; print(uuid.uuid4())" - Added date generation snippets:
date +%Y-%m-%dorpython -c "from datetime import date; print(date.today())" - Updated all templates: ADR-TEMPLATE.md, RFC-TEMPLATE.md, MEMO-TEMPLATE.md, adr-000-template.md
- Added UUID generation snippets:
Benefits:
- Automated execution in CI/CD pipeline
- Proper assertions and error handling
- Better integration with test infrastructure
- Part of comprehensive test suite
- Eliminates need for manual bash scripts
Replaced Manual Scripts:
test_transparent_proxy.sh- Basic KeyValue operationstest_ttl_operations.sh- TTL operations (Expire, GetTTL, Persist)test_transparent_proxy_local.sh- Full stack integration
2025-10-27โ
RFC-049: Mailbox Lifecycle, Lease Management, and Routing Coordination ๐๏ธโ
Branch: jrepp/shared-acceptance-patterns
Summary: Comprehensive RFC exploring mailbox lifecycle management, TTL-based lease coordination, and routing infrastructure. Recommends raft-based admin plane over etcd or Redis for distributed coordination, aligning with project philosophy of minimal infrastructure dependencies and local-first testing.
Key Decisions:
- Lease Backend: Raft-based admin plane (recommended)
- Zero new infrastructure dependencies
- Leverages existing admin control plane (ADR-055)
- Consistent with namespace lease management (RFC-047)
- Aligns with local-first testing (ADR-004)
- Strong consistency via raft consensus
- Rejected Alternatives:
- etcd: Too complex (3-5 node cluster), conflicts with minimal dependencies
- Redis: Weak consistency, key expiration imprecise, split-brain risk
Architecture:
- Routable Identity:
{namespace}@{proxy_id}format (e.g.,$admin@proxy-01) - Lease Protocol: Pattern runners acquire/renew/release leases via admin gRPC
- Session Management: Heartbeat every 60s, TTL 300s, grace period 60s
- Routing Coordination: Admin distributes mailbox routing table to all proxies
- Query Forwarding: Proxies forward mailbox queries to owning pattern runner
- Partition Alignment: Uses RFC-048 consistent hashing (0-255 partitions)
Protocol Operations:
service ControlPlane {
rpc AcquireLease(AcquireLeaseRequest) returns (AcquireLeaseResponse);
rpc ReleaseLease(ReleaseLeaseRequest) returns (ReleaseLeaseResponse);
rpc MailboxHeartbeat(MailboxHeartbeatRequest) returns (MailboxHeartbeatResponse);
rpc UpdateMailboxRoute(MailboxRouteUpdate) returns (MailboxRouteUpdateAck);
}
Storage Schema (ADR-054):
mailbox_leasestable: Lease state with TTL trackingmailbox_routestable: Routing table distributed to proxies- Raft log: Linearizable lease operations
Benefits:
- Simplified Deployment: No etcd/Redis clusters to manage
- Built-In Coordination: Raft consensus already in admin plane
- Routable Queries: Clients query any proxy, forwarded to correct runner
- Automatic Cleanup: Expired leases removed after grace period
- Local-First Testing: Single admin process for dev/test
- Unified Operations: Same monitoring/metrics for all leases
Trade-Offs Accepted:
- Admin plane becomes critical path (already true for namespaces)
- Lease performance ~5ms vs ~1ms Redis (acceptable for control plane)
Related Documents:
- RFC-049: Mailbox Lifecycle, Lease Management, and Routing Coordination
- RFC-037: Mailbox Pattern - Searchable Event Store
- RFC-047: Namespace Reservation with Lease Management
- RFC-048: Cross-Proxy Partition Strategies
- ADR-055: Proxy-Admin Control Plane Protocol
- ADR-054: Prism-Admin SQLite Storage
Dual Schema Configuration for Envelope and Payload Validation ๐โ
Branch: jrepp/shared-acceptance-patterns
Summary: Added comprehensive schema configuration for both PrismEnvelope (universal wrapper) and payload schemas (application-specific messages) to namespace configuration. Renamed MailboxEvent to MailboxItem for semantic clarity.
Key Changes:
- RFC-031: Dual Schema Configuration
- Added "Schema Context Integration" section with envelope + payload schemas
- Namespace config now includes
schemas.envelope(PrismEnvelope) andschemas.payload(application schemas) - Schema validation modes:
strict,warn,disabled - Multiple payload schemas per namespace with topic mapping
- Example:
OrderCreated,OrderUpdated,OrderCancelledschemas fororder-eventsnamespace
- RFC-037: Mailbox Pattern Schema Validation
- Added complete schema configuration example with admin events
- New "Schema Validation" section explaining validation flow
- Validates both envelope structure and payload schema before storage
- Schema enforcement modes with use case recommendations
- Updated indexed_headers to include
trace_idfor observability
- Naming Improvement:
MailboxEventโMailboxItem- Renamed across both RFC-031 and RFC-037
- Better semantic clarity: stored items aren't strictly "events"
Schema Configuration Example:
namespaces:
- name: order-events
schemas:
envelope:
url: github.com/prism-project/prism/proto/envelope/v1/envelope.proto
version: v1
validation: strict
payload:
url: github.com/myorg/schemas/order-events/v2
version: v2
schemas:
- topic: orders.created
message_name: OrderCreated
validation: strict
compatibility: backward
Benefits:
- Type Safety: Both envelope and payload validated before processing
- Schema Evolution: Compatibility mode ensures backward/forward compatibility
- Error Detection: Invalid messages rejected at ingestion time
- Documentation: Schema URLs serve as living documentation
- Multi-Topic: Single namespace can handle multiple message types
Related Documents:
- RFC-031: Universal Message Envelope Protocol
- RFC-037: Mailbox Pattern - Searchable Event Store
- RFC-030: Schema Evolution and Validation
Generalized PrismEnvelope Format for All Messaging Patterns ๐ฏโ
Branch: jrepp/shared-acceptance-patterns
Summary: Expanded RFC-031 from pub/sub-specific to universal message envelope protocol, and updated RFC-037 (Mailbox Pattern) to consume PrismEnvelope format for searchable event storage.
Key Changes:
- RFC-031 Expanded: "Message Envelope Protocol for Pub/Sub Systems" โ "Universal Message Envelope Protocol"
- Added pattern-agnostic design supporting PubSub, Queue, Mailbox, Request/Response, and Multicast Registry
- New section: "Universal Envelope Across Patterns" with comparison table
- New section: "Pattern-Specific Envelope Usage" showing how each pattern uses envelope fields
- Mailbox Pattern example: Extract indexed headers from envelope, store full envelope as blob
- Updated abstract and motivation to emphasize pattern-agnostic design
- RFC-037 Updated: Mailbox Pattern now uses PrismEnvelope (RFC-031)
- Changed from custom
PubSubMessagestruct toPrismEnvelope - Updated header extraction to use envelope fields:
metadata.message_id,security.publisher_id,observability.trace_id, etc. - Updated table schema to include
envelope_blob BLOB NOT NULLfor full message reconstruction - Added
ExtractMailboxItem()function to convert PrismEnvelope โ MailboxItem - Added
ReconstructEnvelope()method to deserialize envelope from blob - Updated comparison table showing PrismEnvelope benefits
- Added RFC-031 reference in related RFCs
- Changed from custom
Benefits:
- Consistency: Single envelope format across all patterns reduces implementation complexity
- Observability: Built-in trace_id, span_id indexing for distributed tracing
- Security: Envelope encryption metadata preserved in mailbox storage
- Evolution: Envelope versioning supports schema evolution without breaking changes
- Full Context: Storing complete envelope enables later reconstruction of all metadata
Related Documents:
- RFC-031: Universal Message Envelope Protocol
- RFC-037: Mailbox Pattern - Searchable Event Store
- RFC-030: Schema Evolution and Validation
- RFC-014: Layered Data Access Patterns
2025-10-26โ
Shared Acceptance Test Patterns: Reusable Multi-Backend Testing Framework โ โ
Branch: jrepp/shared-acceptance-patterns
Summary: Extracted generalizable testing patterns from KeyValue acceptance tests into shared package, enabling any test suite to easily run against multiple backends in parallel. Applied pattern to Redis integration tests as proof of concept.
Key Improvements:
- Shared Testing Package: Created
tests/testing/backends/acceptance_helpers.gowith reusable patternsBackendSetupstruct for configuring multiple backendsRunMultiBackendTests()- Execute tests against multiple backends in parallelRunMultiBackendSubtests()- Structured subtests across backendsPodmanCompatibilityDoc()- Standard documentation for Podman/macOS setup
- KeyValue Backend Helpers:
tests/testing/backends/keyvalue_backends.goSetupMemStoreKeyValue()- MemStore backend setup with plugin.Plugin signatureSetupRedisKeyValue()- Redis backend setup with per-test container isolationGetKeyValueBackends()- Convenience function for all KeyValue backends
- Redis Integration Tests Refactored: Applied shared patterns to
tests/acceptance/redis/- Removed TestMain with shared container (conflicted with t.Parallel())
- Added multi-backend support - now tests both MemStore and Redis
- Replaced fixed 4-second sleep with intelligent polling (100ms intervals)
- Enabled full parallel execution with t.Parallel()
- Improved TTL test from 4s to 1.02s (optimal)
Performance Gains:
- Redis tests: >50% faster (4s sleep eliminated, replaced with polling)
- Total runtime: 2.084s for 26 test assertions (13 tests ร 2 backends)
- MemStore tests: <1ms (except TTL: 1.01s)
- Redis tests: ~0.7-0.8s per test group (including container lifecycle)
- All tests run in parallel across backends
Developer Benefits:
- Code Reuse: Any acceptance test can use these patterns
- Consistency: Standard approach across all multi-backend tests
- Maintainability: Single source of truth for multi-backend testing patterns
- Documentation: Standard Podman/macOS setup instructions
- Isolation: Per-test containers prevent cross-test contamination
Files Created:
tests/testing/backends/acceptance_helpers.go(119 lines) - Core testing frameworktests/testing/backends/keyvalue_backends.go(107 lines) - KeyValue-specific helpers
Files Modified:
tests/acceptance/redis/redis_integration_test.go(366 lines) - Complete refactor using shared patterns
Migration Path: Other acceptance tests (NATS, PostgreSQL, Kafka) can now easily adopt these patterns for multi-backend testing with minimal code changes.
KeyValue Pattern Acceptance Tests: Multi-Backend Parallel Execution โ โ
Branch: jrepp/rfc-046-keyvalue-memstore-impl
Summary: Enabled Redis backend in KeyValue pattern acceptance tests with full parallel execution across both MemStore and Redis backends. Fixed Podman/macOS testcontainers compatibility issue.
Key Changes:
- Multi-Backend Testing: Re-enabled Redis backend alongside MemStore (2 backends ร 32 tests = 64 total assertions)
- Parallel Execution: All 10 test suites run in parallel with t.Parallel() for maximum speed
- Podman Compatibility: Resolved testcontainers Ryuk volume mount issue on Podman/macOS
- Root cause: Ryuk tries to mount Podman socket as volume, which fails on macOS VM
- Solution: Set
TESTCONTAINERS_RYUK_DISABLED=trueenvironment variable - Impact: Tests work perfectly with Podman, no functionality lost
- Zero Sleeps: Confirmed no time.Sleep calls - tests run at full rate
- Test Execution: All 64 tests passing in 19s (MemStore <1s, Redis ~2s per suite due to container startup)
Performance:
- MemStore tests: <1s (in-memory, zero latency)
- Redis tests: ~2s per test suite (includes container create/start/stop)
- Total runtime: 19s for 64 test assertions across 2 backends
- Fully parallelized - multiple Redis containers running concurrently
Test Coverage (32 subtests per backend):
- TestPatternService_Store (5 subtests)
- TestPatternService_Retrieve (3 subtests)
- TestPatternService_Remove (3 subtests)
- TestPatternService_Exists (3 subtests)
- TestPatternService_StoreBatch (3 subtests)
- TestPatternService_RetrieveBatch (2 subtests)
- TestPatternService_ListKeys (3 subtests)
- TestPatternService_ConcurrentOperations (3 subtests)
- TestPatternService_EdgeCases (3 subtests)
- TestPatternService_ExpirationOperations (4 subtests)
Files Modified:
tests/acceptance/patterns/keyvalue/pattern_service_test.go: Added package documentation with Podman setup instructions, re-enabled Redis in getBackends()
Running Tests:
# Podman on macOS
export TESTCONTAINERS_RYUK_DISABLED=true
export DOCKER_HOST="unix://$(podman machine inspect --format '{{.ConnectionInfo.PodmanSocket.Path}}')"
cd tests/acceptance/patterns/keyvalue
go test -v -parallel 10 -timeout 10m
# Docker Desktop (no Ryuk disable needed)
cd tests/acceptance/patterns/keyvalue
go test -v -parallel 10 -timeout 10m
Status: Full multi-backend acceptance testing validated. Both MemStore and Redis patterns work perfectly with zero artificial delays and maximum parallelization.
2025-10-25โ
KeyValue Pattern RFC-046 Vertical Slice Implementation โ โ
Branch: jrepp/rfc-046-keyvalue-memstore-impl
Summary: Complete vertical slice implementation of RFC-046 consolidated pattern protocol for KeyValue pattern with MemStore backend, including comprehensive unit and integration tests.
Implementation:
- Consolidated Proto:
proto/prism/patterns/keyvalue/keyvalue_pattern.protowith 11 mandatory operations - Pattern Service:
patterns/keyvalue/pattern_service.goimplementing KeyValuePatternServer - Zero Client Complexity: NO GetCapabilities() RPC - all operations are mandatory
- Backend Integration: Full integration with MemStore backend (Basic, Batch, Scan interfaces)
Operations Implemented:
- Core: Store, Retrieve, Remove, Exists
- Batch: StoreBatch, RetrieveBatch
- Scan: Scan (streaming), ListKeys
- Expiration: SetExpiration, GetExpiration, ClearExpiration (stub implementations)
Test Coverage (Updated 2025-10-26):
- Unit Tests: 12 test cases covering all 11 operations
- Core operations: Store, Retrieve, Remove, Exists
- Batch operations: StoreBatch, RetrieveBatch
- Scan operations: ListKeys
- Expiration: SetExpiration, GetExpiration, ClearExpiration (100% coverage)
- Pattern service coverage: 56.7% (up from 50.4%)
- Integration Tests: 3 comprehensive test suites:
- End-to-end gRPC with full server stack
- Concurrent access (100 concurrent operations)
- Large values (1MB single values, 10x100KB batch)
- Acceptance Tests (New): 10 comprehensive test suites with 32 subtests
- tests/acceptance/patterns/keyvalue/pattern_service_test.go (807 lines)
- Tests gRPC pattern service across multiple backends (currently MemStore)
- Test suites:
- TestPatternService_Store (5 subtests: Success, Empty Key, Binary Value, Empty Value, Large Value 1MB)
- TestPatternService_Retrieve (3 subtests: Found, Not Found, Empty Key)
- TestPatternService_Remove (3 subtests: Success, Non-Existent Key, Empty Key)
- TestPatternService_Exists (3 subtests: Exists True, Exists False, Empty Key)
- TestPatternService_StoreBatch (3 subtests: Multiple Keys, Empty Batch, Large Batch 100 Keys)
- TestPatternService_RetrieveBatch (2 subtests: Mixed Found/Not Found, Empty Keys List)
- TestPatternService_ListKeys (3 subtests: With Prefix, With Limit, No Matching Keys)
- TestPatternService_ConcurrentOperations (3 subtests: 50 workers Store, 50 readers Retrieve, 20 workers Batch)
- TestPatternService_EdgeCases (3 subtests: Very Long Key, Rapid Store/Retrieve, Store After Delete)
- TestPatternService_ExpirationOperations (4 subtests: SetExpiration, GetExpiration, ClearExpiration)
- All 32 tests passing in 0.575s with MemStore backend
- Redis backend support prepared (currently disabled pending container setup fix)
- All tests passing with MemStore backend
Files Changed:
proto/prism/patterns/keyvalue/keyvalue_pattern.proto(227 lines) - Pattern protocol definitionpatterns/keyvalue/pattern_service.go(380 lines) - gRPC service implementationpatterns/keyvalue/pattern_service_test.go(418 lines) - Unit tests with expiration coveragepatterns/keyvalue/batch_test.go- Fixed Config structure and mock interfacespatterns/keyvalue/integration_test.go(283 lines) - Integration testspatterns/keyvalue/go.mod- Updated dependencies with memstore replace directivetests/acceptance/patterns/keyvalue/pattern_service_test.go(807 lines) - Comprehensive acceptance teststests/acceptance/go.mod- Added keyvalue pattern module
Status: Implementation complete and validated. Ready for code review and PR submission.
RFC-047 & RFC-048: Cross-Proxy Namespace Reservation and Partition Strategiesโ
Summary: Two companion RFCs specifying cross-proxy namespace coordination with JWT-based lease management (RFC-047) and partition strategies for distributing namespace workloads across multiple proxy instances (RFC-048).
RFC-047: Cross-Proxy Namespace Reservation with Lease Management
Problem Statement:
- No Cross-Proxy Coordination: Multiple proxies can create namespaces with same name independently
- Unauthorized Configuration: No token-based authorization for namespace configuration
- Namespace Abandonment: No mechanism to reclaim unused namespace allocations
- Standalone vs Multi-Proxy: Single-tenant deployments don't need admin plane coordination
Proposed Solution:
- Cross-Proxy Reservation: Clients request namespaces through any proxy; admin plane ensures uniqueness
- JWT-Based Authorization: Namespace tokens grant configuration permissions scoped to specific namespace
- Lease Lifecycle: TTL-based leases (default 24h) with refresh mechanism and 1h grace period
- Standalone Mode: Proxies can operate without admin plane for single-tenant deployments
- Audit Logging: All reservation operations logged with client identity
Key Features:
- Namespace JWT tokens with custom claims (namespace, lease_id, permissions)
- Automatic lease refresh mechanism (recommended at 50% of TTL)
- Grace period warnings before expiration
- Background cleanup job for expired namespaces
- Support for both coordinated (multi-proxy) and standalone (single-proxy) modes
RFC-048: Cross-Proxy Partition Strategies and Request Forwarding
Problem Statement:
- No Horizontal Scaling: Cannot scale specific namespace workloads independently
- No Data Locality: Pattern runners don't know which proxy owns partition
- Unpredictable Routing: Same namespace may execute on different proxies per request
- Rebalancing Complexity: No automatic redistribution of partitions
Proposed Solution:
- Partition Assignment: Admin plane assigns namespaces to partitions using configurable strategies
- Consistent Routing: Deterministic mapping from namespace to partition to proxy
- Request Forwarding: Proxies forward requests to partition owner automatically
- Multiple Strategies: Consistent hashing (default), key range assignment, explicit bucket mapping
- Rebalancing Protocol: Move partitions between proxies with minimal disruption
Partition Strategies:
- Consistent Hashing (default):
- 256 partitions, CRC32 hash function
- Minimal rebalancing on proxy additions/removals
- Good for large fleets (10+ proxies)
- Key Range Assignment:
- Lexicographic ranges (e.g., A-F, G-M)
- Multi-tenant SaaS organization
- Alphabetically adjacent namespaces co-located
- Explicit Bucket Mapping:
- Manual operator control
- Resource isolation for high-priority workloads
Request Forwarding Modes:
- Transparent Forwarding (recommended): Non-owner proxies forward to partition owner
- Redirect with Discovery: Return redirect error with correct proxy address
- Client-Side Routing: Clients query admin plane for partition assignments
Rebalancing:
- Automatic triggers: proxy joins/leaves, imbalance exceeds threshold (10%)
- Minimize partition moves algorithm
- 5-step rebalancing protocol (prepare, update, distribute, activate, drain)
- Graceful cutover with zero data loss
Benefits:
- Global Uniqueness: Admin plane ensures namespace names unique across all proxies
- Secure Configuration: JWT tokens grant namespace-scoped configuration permissions
- Horizontal Scaling: Distribute namespace load across multiple proxy instances
- Consistent Routing: Same namespace always routes to same proxy
- Transparent Client Experience: Clients connect to any proxy, requests automatically forwarded
Impact: These RFCs enable Prism to scale horizontally across multiple proxy instances with secure, coordinated namespace management and intelligent workload distribution. Essential foundation for multi-proxy production deployments.
MEMO-043: Gap Analysis for RFC-047 and RFC-048 (Updated)โ
Links: MEMO-043 | RFC-047 | RFC-048
Summary: Comprehensive gap analysis of namespace reservation (RFC-047) and partition strategies (RFC-048) against Prism's core principles. Updated to clarify admin HA architecture, load balancer integration, and partition strategy guidance.
Key Findings:
- Missing Namespace Advertisement Protocol: No protobuf definition for admin โ proxy namespace assignments (including proxy list, runners, partition strategy)
- Admin HA Architecture Underspecified: Raft cluster design for prism-admin not detailed; proxy discovery mechanism not defined
- Load Balancer Integration Missing: No guidance on upstream load balancer session affinity for optimal performance
- Operational Complexity: JWT lease management + partition strategies creates significant operational burden
- Testing Gap: No strategy for testing 3-node admin Raft + 3-proxy coordination locally (conflicts with ADR-004)
- Missing Observability: Comprehensive metrics, traces, and debugging tools not specified
- Configuration Explosion: 23+ tunable parameters without production-ready defaults
Scores Against Core Principles:
- Simplicity: 5/10 (multiple modes/strategies have valid use cases; need clear defaults)
- Reliability: 6/10 (admin HA architecture clarified; NamespaceAdvertisement protocol missing)
- Robustness: 6/10 (performance acceptable with load balancer; consistency mechanisms underspecified)
- Comprehensibility: 5/10 (concepts span multiple documents)
- Configurability: 6/10 (many parameters but guidance improving)
Priority 1 Recommendations (Critical - Block Implementation):
- Define Namespace Advertisement Protocol: Add NamespaceAdvertisement protobuf with namespace, partition_id, proxy_list, runner_list, config, partition_strategy
- Admin HA Architecture: 3-node Raft cluster for prism-admin (hashicorp/raft); proxy does NOT run Raft, discovers peers through namespace advertisements
- Load Balancer Integration Guide: Standardize X-Prism-Namespace header; provide HAProxy/Envoy/NGINX configs; document session affinity strategy
- Testing Strategy: Docker-compose with 3 admin nodes (Raft) + 3 proxies; chaos tests for leader election, network partition, proxy failure
Priority 2 Recommendations (Before Production):
- Partition Strategy Guidance: Keep all three strategies (consistent hashing, key range, explicit); provide decision tree; require ADR-002 "Advanced" permission for non-default strategies
- Merge RFCs: Combine RFC-047 and RFC-048 into single comprehensive RFC with NamespaceAdvertisement as core protocol element
- Configuration Profiles: Provide development/staging/production configs with safe defaults
- Observability: Define complete metrics taxonomy, distributed tracing, operational dashboards
- Error Handling: Create error catalog with recovery actions, graceful degradation, retry policies
Key Clarifications:
- Performance: Request forwarding latency is acceptable when upstream load balancer provides session affinity based on namespace (eliminates 75% of forwarding via consistent hashing at LB)
- Admin HA: prism-admin runs 3-node Raft cluster; proxy does NOT run Raft; proxy learns about other proxies through namespace advertisements
- Partition Strategies: All three strategies (consistent hashing, key range, explicit) have valid use cases and should be kept with proper guidance
Impact: This gap analysis identifies critical architectural gaps (NamespaceAdvertisement protocol, admin HA design, load balancer integration) that must be addressed before RFC-047/RFC-048 implementation. Estimated 2-3 weeks to address Priority 1 gaps, 4-6 weeks for full refinement.
RFC-046: Consolidated Pattern Protocols with Capability Negotiationโ
Summary: Comprehensive architectural proposal to consolidate pattern protocols from multiple backend-level services into single semantic pattern services with explicit capability negotiation.
Problem Statement:
- Protocol Leakage: KeyValue pattern currently exposes 4 separate gRPC services (Basic, Batch, Scan, TTL), forcing clients to understand backend capabilities
- No Pattern Abstraction: Multicast Registry has no proto definition, only Go interfaces
- Ad-Hoc Capability Detection: Runtime try/catch instead of explicit capability discovery
- Backend Variations: Different backends support different operations, creating inconsistent client experiences
Proposed Solution:
- Single Pattern Protocol: Each pattern exposes ONE consolidated gRPC service
- Semantic Operations: Pattern APIs reflect business intent (Store/Retrieve) not backend primitives (Set/Get)
- GetCapabilities() RPC: Explicit capability discovery before optional operations
- Graceful Degradation: Patterns handle missing features (emulation, fallback, clear errors)
- Formal Slot Schemas: Backend slot requirements defined in proto
Pattern Protocol Definitions (4000+ lines of proto):
- KeyValuePattern: Consolidated service replacing 4 separate interfaces
- Core: Store, Retrieve, Remove, Exists (always available)
- Optional: Scan, Batch, Expiration, Transactions (capability-gated)
- Performance metadata: NATIVE vs EMULATED vs UNAVAILABLE
- MulticastRegistryPattern: First proto definition for pattern
- Identity management: Register, Update, Deregister, Heartbeat
- Discovery: Enumerate, GetIdentity with flexible filtering
- Multicast: Broadcast to filtered identities
- 3-slot architecture (registry, messaging, durability)
- SessionStorePattern: Distributed session management
- Lifecycle: Create, Get, Update, Extend, Invalidate
- Replication and conflict resolution support
- Producer/Consumer Patterns: Semantic messaging
- Delivery guarantees: At-most-once, At-least-once, Exactly-once
- Capability-aware batching, ordering, compression
Implementation Strategy (MEMO-042):
- 10-week phased rollout maintaining backward compatibility
- Phase 1 (Week 1): Proto definitions and code generation
- Phase 2-3 (Weeks 2-5): KeyValue and Multicast Registry migration
- Phase 4 (Week 6): Rust client integration and matrix testing
- Phase 5-7 (Weeks 7-10): Producer/Consumer, documentation, cleanup
- Test Coverage: 122 integration tests (pattern ร backend combinations)
- Migration Path: 6-month backward compatibility period
Capability Negotiation:
message Capabilities {
bool supports_scan = 1;
bool supports_expiration = 2;
ScanPerformance scan_performance = 3; // NATIVE, EMULATED, UNAVAILABLE
SlotConfiguration slots = 4;
}
Benefits:
- Simplified Clients: 50% reduction in client code complexity
- Backend Flexibility: Same client works with 3+ backend combinations
- Clear Contracts: Zero "method not found" errors in production
- Independent Evolution: Pattern protocols version separately from backends
- Graceful Degradation: Patterns handle missing features intelligently
Impact: Major architectural improvement that decouples pattern semantics from backend implementation details, enabling Prism to scale across diverse backend ecosystems while maintaining simple, consistent client experiences.
2025-10-23โ
Add KeyValue Batch Operations Interfaceโ
Links: Feature branch feature/keyvalue-batch-operations
Summary: Implemented comprehensive batch operations interface for KeyValue pattern, providing significant performance improvements for bulk operations through reduced network round trips.
Key Changes:
- KeyValueBatchInterface: Added new optional interface in
pkg/plugin/interfaces.goBatchSet(keys, values, ttls)- Store multiple key-value pairs atomicallyBatchGet(keys)- Retrieve multiple values by keysBatchDelete(keys)- Remove multiple keys
- Pattern Implementation: Extended
patterns/keyvalue/keyvalue.gowith batch operations- Automatic fallback to sequential operations for drivers without batch support
SupportsBatch()method to check driver capabilities
- Driver Implementations:
- MemStore: Simple sequential batch operations
- Redis: Native batch operations using Redis Pipeline (MSET/MGET/DEL)
- Testing: Comprehensive test suite in
patterns/keyvalue/batch_test.go- Tests for all batch operations (set, get, delete)
- TTL support testing
- Empty batch handling
- Large batch operations (100 keys)
- Fallback mechanism testing
Performance Benefits:
- Reduced network round trips for bulk operations
- Redis Pipeline for efficient batch execution
- Maintains backward compatibility with non-batch drivers
Impact: Applications can now perform bulk KeyValue operations with significantly better performance, especially for Redis-backed patterns.
RFC-045: Selective CI Execution - COMPLETE Implementation โ โ
Links: RFC-045
Summary: Implemented full selective CI execution system (Phases 1-2) with Taskfile-based dependency graph analysis, reducing CI times by 70-90% while maintaining comprehensive testing.
Phase 1: Infrastructure โ
- Local CI Preview:
task ci-previewshows what CI will run before pushing - Auto-detection: Zero-config GitHub Actions integration via
task ci-matrix - Taskfile Integration: Single source of truth - parses existing Taskfile for dependencies
- User-friendly Errors: Clear error messages with โ and ๐ก emoji
- Regression Tests: 13/13 test cases passing, covering all change scenarios
- Composite Actions: Reusable GitHub Actions for reduced YAML boilerplate
Phase 2: Workflow Integration โ
- Selective Matrix Generation: New
generate-matrixjob analyzes changes and outputs affected jobs - Conditional Test Execution: Tests only run when
has_test=trueorrun_full=true - ci:full Label: Escape hatch to force full CI pipeline for pre-release testing
- GitHub Actions Summary: Shows execution plan in PR checks (selective vs full)
- Shellcheck Compliance: All workflow scripts follow best practices
Developer Commands:
# Preview CI jobs for uncommitted changes
task ci-preview
# Preview staged changes only
task ci-preview-staged
# Test specific file changes
task ci-matrix -- --changed-files="pkg/drivers/redis/client.go" --output=json
# Force full CI: Add "ci:full" label to PR
Performance Improvements (Validated):
- Redis driver change: 45 min โ 5 min (89% faster)
- Rust proxy change: 45 min โ 13 min (71% faster)
- Docs-only change: 45 min โ 3 min (93% faster)
- Proto change: 45 min (correctly triggers full rebuild)
- Average: 73% CI time reduction
Implementation Files:
tooling/ci_matrix.py(444 lines): Taskfile dependency graph analyzertooling/test_ci_matrix.py(179 lines): Regression test suite.github/workflows/ci.yml: Added generate-matrix job, conditional execution.github/actions/run-task/action.yml: Composite action for running tasksTaskfile.yml: Added ci-matrix, ci-preview, ci-preview-staged tasksdocs-cms/rfcs/RFC-045-selective-ci-execution.md: Updated with Phase 1-2 completion
Git Diff Logic:
- Two dots (
base..head): Commits in feature branch not in merge target (CORRECT) - Three dots (
base...head): Symmetric diff from common ancestor (INCORRECT) - Fixed to use two dots for accurate PR change detection
Next Steps (Phase 3-4):
- Phase 3: Monitor real-world performance over 20+ PRs, tune dependency graph
- Phase 4: Full rollout with team announcement and monitoring dashboard
2025-10-22โ
KeyValue Scan Interface Implementationโ
Summary: Implemented optional KeyValue Scan interface with cursor-based pagination, enabling efficient key enumeration and prefix-based queries.
New Features:
- KeyValueScanInterface Proto: New protobuf service definition with three operations:
Scan: Cursor-based pagination with optional value inclusionListKeys: Simplified key listing without cursor complexityCount: Efficient prefix-based key counting
- MemStore Implementation: Full scan support with:
- Prefix matching for all operations
- Sorted key iteration for consistent pagination
- TTL-aware scanning (expired keys automatically excluded)
- Zero-allocation filter evaluation
- gRPC Service: Complete KeyValueScanService implementation with:
- Permission checks (read permission required)
- Structured logging with trace IDs
- Auth context integration
- Comprehensive Tests: 80+ test assertions covering:
- Prefix filtering and pagination
- Cursor-based iteration with multiple pages
- Value inclusion/exclusion
- Edge cases (empty results, no prefix, limit handling)
Use Cases:
- List all keys with common prefix (e.g.,
user:*for all user keys) - Paginate through large keyspaces without loading entire dataset
- Count keys matching pattern for analytics
- Export/backup operations with streaming
Architecture:
- Optional interface pattern (drivers check via type assertion)
- Consistent with existing KeyValueBasicInterface and KeyValueTTLInterface
- Backend drivers advertise support via InterfaceDeclarations
Files Added:
proto/prism/interfaces/keyvalue/keyvalue_scan.proto- Proto definitionpkg/drivers/memstore/memstore_scan_test.go- Comprehensive tests
Files Modified:
pkg/drivers/memstore/memstore.go- Scan, ListKeys, Count implementationspkg/plugin/interfaces.go- KeyValueScanInterface Go definitionpatterns/keyvalue/keyvalue.go- Pattern-level scan support with type assertionspatterns/keyvalue/grpc_server.go- gRPC service implementation
Performance:
- Pagination test: 10 keys in 4 pages (3+3+3+1) with limit=3
- All scan operations complete in <1ms for 100 keys
- Sorted key iteration ensures deterministic cursor behavior
2025-10-22 (Earlier)โ
prism-probe CLI Tool - Testing and Debugging Interfaceโ
Summary: Implemented command-line testing tool for zero-code pattern validation, debugging, and inspection.
Features:
- Zero-code testing: Test patterns without writing application code
- KeyValue commands:
set,get,delete,existswith optional TTL support - PubSub commands:
publish(single messages),subscribe(streaming with Ctrl+C support) - Inspect commands:
healthcheck for proxy status,configdisplay - Flexible output: Table (human-readable) or JSON (machine-parseable) formats
- Configuration: CLI flags, YAML config file, or environment variables
- Interactive subscribe: Live message streaming with graceful shutdown
Usage Examples:
# Test KeyValue
prism-probe keyvalue set --pattern cache --key user:123 --value '{"name":"Alice"}'
prism-probe keyvalue get --pattern cache --key user:123
# Test PubSub
prism-probe pubsub publish --pattern events --topic user.created --message '{"user_id":"123"}'
prism-probe pubsub subscribe --pattern events --topic "user.*"
# Check health
prism-probe inspect health --proxy localhost:8980
Use Cases:
- Quick pattern validation before deployment
- CI/CD integration testing
- Production debugging and monitoring
- Load testing preparation
Python Client SDK (prism-data) - Initial Implementation + CI Integrationโ
Links: RFC-040, Python Client
Summary: Implemented initial Python client SDK with synchronous-first API, full type hints, comprehensive pattern support, and full CI/CD integration.
Features:
- Synchronous-first API: Simple, blocking interface hiding asyncio complexity for 80% use cases
- Async API available: High-performance async interface for advanced users (FastAPI, aiohttp)
- Pattern-centric: Producer, Consumer, KeyValue as first-class APIs
- Type-safe: Full type hints, mypy strict mode compliant, pydantic configuration
- Progressive disclosure: Minimal โ Production โ Advanced configuration tiers
- Error taxonomy: Transient/Permanent/Client errors for intelligent retry logic
Package Structure:
clients/python/- Complete Python SDK structure- PyPI package name:
prism-data(import asprism_data) - Module structure:
prism_data.sync- Synchronous facade (recommended)prism_data._async- Async implementationprism_data.patterns- Producer, Consumer, KeyValueprism_data.config- Pydantic configuration modelsprism_data.errors- Structured error hierarchy
Hello World Example:
from prism_data.sync import Client
with Client("localhost:8980") as client:
producer = client.producer("orders")
producer.publish("new-orders", b"order-123")
Development Setup:
- Dependencies: grpcio, protobuf, pydantic (minimal core)
- Dev tools: pytest, mypy, ruff, grpcio-tools
- Testing: Unit tests for errors, config; integration tests pending
- Build system: hatchling + uv for fast, reproducible builds
Files Created:
clients/python/pyproject.toml- Package metadata, dependencies, tool configclients/python/README.md- Documentation and quick start guideclients/python/src/prism_data/- Core implementation__init__.py,errors.py,config.py_async.py,sync.py- Client factoriespatterns/- Producer, Consumer, KeyValue implementations
clients/python/tests/unit/- Error and config unit testsclients/python/examples/- Working examples (producer, consumer, keyvalue)
CI/CD Integration:
- Added to client-sdk-ci.yml workflow with full checks (lint, type check, test, build)
- Integrated into Taskfile.yml for local development
task python-client- Build packagetask python-client-test- Run teststask python-client-lint- Lint codetask python-client-typecheck- Type check
- Added to release workflow for PyPI-ready artifacts
- Builds wheel (.whl) and source distribution (.tar.gz)
- Version auto-updated from git tags (v1.2.3 โ 1.2.3)
- SHA256 checksums generated
- Artifacts published to GitHub Releases
- Updated lint-python-ruff to include client code
Strict CI Validation (Latest):
- Mypy strict mode with --show-error-codes --pretty
- Ruff warnings treated as errors (--output-format=full)
- Format checks with --diff for clear error messages
- Applied to both client-sdk-ci and release workflows
- Taskfile tasks updated for strict local development
Code Quality:
- Auto-fixed 46 linting issues (Optional โ |, Union โ |)
- Python 3.10+ modern type annotations
- Fixed import ordering and removed unused imports
- 32 remaining ANN401 (Any) in internal helper methods (acceptable)
PyPI Optimization:
- QUICKSTART.md: 5-minute guide as PyPI readme
- 6-line hello world example
- Quick examples: Producer, Consumer, KeyValue
- Configuration tiers: Development โ Production โ Builder
- Links to GitHub, docs, RFCs, examples
- SDK badges: PyPI version, CI status, license, type-checked, ruff
Status: Core implementation, CI integration, and PyPI optimization complete. gRPC integration pending
Next Steps:
- Generate protobuf stubs from
proto/prism/interfaces/ - Implement gRPC client connections
- Add integration tests with testcontainers
- Publish to PyPI (ready for release workflow)
MCP Client Strict ESLint Implementation with Type-Safe Protocolโ
Links: PR #49
Summary: Implemented strict ESLint configuration for MCP test client with full TypeScript type safety and protocol-specific interfaces. Achieves zero errors/warnings with comprehensive linting rules.
Key Changes:
- ESLint Flat Config: Created
tests/testing/mcp/eslint.config.jswith 20+ strict TypeScript rules- No explicit
anytypes - Explicit function return types required
- Strict boolean expressions (no truthy/falsy checks)
- Mandatory optional chaining and nullish coalescing
- No explicit
- Type-Safe Protocol: Defined MCP protocol interfaces in
mcp-test-client.ts(110 lines changed)MCPParams,MCPInitializeResult,MCPToolInfo,MCPToolCallResult- Replaced all
unknown/anytypes with proper interfaces - Fixed 15 strict boolean violations, 4 optional chaining issues
- CI Integration: Added
lint-mcptask to Taskfile and mainlintdependency chain - GitHub Actions: Fixed shellcheck warnings in CI/merge-queue workflows
Files Changed:
tests/testing/mcp/eslint.config.js(new) - Strict linting rulestests/testing/mcp/mcp-test-client.ts- Type-safe protocol implementationtests/testing/mcp/package.json- ESLint dependencies and scriptsTaskfile.yml- CI integration.github/workflows/{ci,merge-queue}.yml- Shellcheck fixes
Impact: MCP client development now enforces production-grade TypeScript standards with full type safety.
RFC-044: Prism MCP Agent Integration with Mailbox Patternโ
Links: RFC-044, PR #45, RFC-037
Summary: Comprehensive RFC defining Prism as a Model Context Protocol (MCP) server for AI agent integration, enabling Claude and other agents to interact with Prism as first-class data access clients using the mailbox pattern.
Core Features:
- 7 MCP Tools:
prism_create_namespace,prism_publish,prism_subscribe_mailbox,prism_query_mailbox,prism_keyvalue_set/get,prism_get_identity - Mailbox Pattern: Agent-to-agent messaging with personal mailboxes (
{namespace}/mailbox/{agent_id}) - Standalone Testing: Test client (
mcp-test-client.ts) simulates MCP protocol via STDIO without agent embedding - Authentication: OAuth2, API keys, and developer identity for local testing
- Query Interface: Search mailbox history by principal, topic, correlation_id with indexed SQLite
Architecture:
- TypeScript + STDIO: MCP server using TypeScript client SDK with STDIO transport
- Agent Identity: Each agent gets unique principal for namespace isolation and access control
- Test Infrastructure: Docker Compose with Prism proxy, PostgreSQL, NATS, Redis at
tests/testing/mcp/
Use Cases:
- Multi-agent task coordination via message queues
- Persistent key-value storage for agent context/memory
- Event-driven agent reactions with subscriptions
- Searchable message history for context retrieval
Implementation Plan: 5-phase rollout (Core MCP, Pattern Integration, Mailbox Support, Testing/Docs) over 4-5 weeks.
Related: RFC-037 (Mailbox Pattern), RFC-040 (TypeScript Client SDK), ADR-007 (Authentication)
K8s Integration Tests Refactored with Clean Abstractionsโ
Links: PR #46
Summary: Major refactoring of Kubernetes integration tests introducing clean abstractions, improved error handling, and better test organization for local cluster validation.
Key Improvements:
- Clean Abstractions: New helper functions for namespace management, resource lifecycle, and waiting logic
- Better Error Handling: Comprehensive error messages and cleanup on failures
- Resource Cleanup: Automatic cleanup of test resources with proper lifecycle management
- Test Organization: Clearer test structure with setup/teardown and better assertions
Files Changed:
tests/integration/k8s/*.go- Test suite refactoring- Enhanced reliability for kind/Docker Desktop validation
Impact: More maintainable K8s integration tests with consistent patterns across the suite.
CLAUDE.md Refactored for Clarity and Reduced Token Usageโ
Links: PR #44
Summary: Comprehensive refactoring of project instructions for AI assistant context, reducing from verbose documentation to concise, actionable guidance.
Changes:
- Condensed critical sections (Git commit format, docs validation, testing)
- Added clear command examples with Task runner usage
- Emphasized "uv run" for Python tooling automation
- Improved structure with emoji markers for critical sections
- Reduced token usage while maintaining essential context
Impact: Faster AI assistant context loading with clearer instructions for contributors.
KeyValue Pattern Docker Build Fix and K8s Image Loading Enhancementโ
Summary: Fixed keyvalue pattern Dockerfile to properly handle nested Go module structure and enhanced k8s-build-images task to load images into local Docker daemon.
Changes:
- Dockerfile Fix: Updated
patterns/keyvalue/Dockerfileto work with nested module structure atcmd/keyvalue-runner/- Copy all required go.mod files for dependencies (plugin, launcherclient, drivers)
- Run
go mod downloadfrom nested module directory - Build from nested module with correct path
- Build Task Enhancement: Added
--loadflag to all Docker build commands ink8s-build-imagestask- Images now automatically loaded into local Docker daemon for Kubernetes access
- Works with Docker Desktop, kind, and Minikube
- Documentation: Created
tests/integration/k8s/README.mdwith:- Setup instructions for different K8s environments
- Image loading requirements and troubleshooting
- Test suite descriptions
- CI/CD exclusion rationale
Files Changed:
patterns/keyvalue/Dockerfile- Fixed nested module buildTaskfile.yml- Added --load flags to k8s-build-images tasktests/integration/k8s/README.md- New documentation
Note: K8s integration tests excluded from CI due to local cluster requirements.