Skip to main content

MEMO-082: Configuration Schema and Code Generation System

Overview

This memo describes a configuration schema system that prevents drift between documentation, schemas, and actual configuration files. The system uses JSON Schema as the source of truth for all configuration formats, with automated generation of documentation, validation, and examples.

Problem Statement

Currently, Prism has configuration drift across multiple layers:

  1. Documentation drift: RFCs describe ideal config, but examples don't match
  2. Schema drift: YAML configs use fields not defined in protobuf schemas
  3. Validation drift: Validation logic scattered across Go/Rust/Python code
  4. Example drift: Example configs in README don't match actual working configs

Concrete Examples of Drift

From patterns/multicast_registry/examples/redis-nats.yaml:

pattern: multicast-registry  # Not in any proto definition
slots: # Not in NamespaceConfig proto
registry:
backend: redis

From proto/prism/control_plane.proto:

message NamespaceConfig {
map<string, BackendConfig> backends = 1; # Different structure!
map<string, PatternConfig> patterns = 2;
}

From RFC-056 (documentation):

needs:
durability: strong # Structured in RFC-056
write_rps: 5000

From current code (actual implementation):

message NamespaceConfig {
map<string, string> metadata = 4; # Everything dumped in metadata!
}

Solution: Schema-Driven Configuration

Principles

  1. JSON Schema is the source of truth for all configuration formats
  2. Documentation is generated from schemas (never hand-written)
  3. Validation is centralized using JSON Schema validators
  4. Examples are validated against schemas in CI
  5. Protobuf is generated from JSON Schema for gRPC APIs

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Source of Truth │
│ schemas/config/*.schema.json │
│ (JSON Schema with custom x-* extensions) │
└─────────────────────┬───────────────────────────────────────┘

┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│ Validate│ │Generate │ │ Generate │
│ YAMLs │ │ Docs │ │ Examples │
└─────────┘ └─────────┘ └──────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│ CI Check│ │ Markdown│ │ YAML │
│ (fail) │ │ in docs │ │ in repo │
└─────────┘ └─────────┘ └──────────┘

Implementation

Phase 1: Schema Definition (Completed)

Created JSON Schema files with rich metadata:

schemas/config/namespace-request.schema.json:

  • Defines Layer 1 (User Request) from RFC-056
  • Includes validation rules, permission level constraints, examples
  • Custom x-* extensions for:
    • x-rfc: RFC/ADR references
    • x-permission-level: Permission constraints per field
    • x-triggers-pattern: Which patterns are triggered
    • x-backend-preference: Backend selection hints
    • x-validation-error: Custom error messages
    • x-quota: Which quota this counts against

Example schema snippet:

{
"properties": {
"needs": {
"properties": {
"durability": {
"enum": ["strong", "eventual", "best-effort"],
"x-enum-descriptions": {
"strong": "Sync disk writes before ack (no data loss)",
"eventual": "Async disk writes (minimal data loss window)",
"best-effort": "In-memory only (fastest)"
},
"x-permission-level": {
"guided": ["strong", "eventual", "best-effort"],
"advanced": ["strong", "eventual", "best-effort"],
"expert": ["strong", "eventual", "best-effort"]
},
"x-rfc": "RFC-056"
}
}
}
}
}

Phase 2: Validation and Documentation Tools (Completed)

tooling/config_schema_tools.py provides:

  1. Validation: uv run tooling/config_schema_tools.py validate <config.yaml>

    • Validates YAML against JSON Schema
    • Returns detailed error messages with field paths
    • Exit code 0 if valid, 1 if invalid
  2. Documentation Generation: uv run tooling/config_schema_tools.py docs <schema.json>

    • Generates Markdown documentation from schema
    • Includes all validation rules, permission levels, examples
    • Documents enum values with descriptions
    • References RFCs/ADRs automatically
  3. Example Generation: uv run tooling/config_schema_tools.py example <schema.json>

    • Generates valid example YAML from schema
    • Uses schema examples where provided
    • Generates valid values based on validation rules
  4. Drift Detection: uv run tooling/config_schema_tools.py drift <config-dir>

    • Finds configs that don't match schema
    • Detects deprecated fields (used but not in schema)
    • Detects unused fields (in schema but never used)
    • Provides summary report

Phase 3: CI Integration (To-Do)

Add to .github/workflows/ci.yml:

- name: Validate Configuration Schemas
run: |
# Validate all example configs against schemas
uv run tooling/config_schema_tools.py validate patterns/*/examples/*.yaml

# Check for drift
uv run tooling/config_schema_tools.py drift patterns/

# Ensure documentation is up-to-date
uv run tooling/config_schema_tools.py docs schemas/config/namespace-request.schema.json \
--output docs-cms/config/namespace-request.md

# Fail if docs have changed (means they weren't regenerated)
git diff --exit-code docs-cms/config/

Phase 4: Schema Coverage (To-Do)

Create schemas for all configuration types:

  1. namespace-request.schema.json (Layer 1: User Request)
  2. platform-policy.schema.json (Layer 2: Team Quotas & Permissions)
  3. pattern-selection.schema.json (Layer 3: Pattern Selection Output)
  4. backend-registry.schema.json (Layer 4: Backend Definitions)
  5. frontend-registry.schema.json (Layer 5: API Bindings)
  6. runtime-config.schema.json (Layer 6: Runtime Process Config)

Phase 5: Protobuf Generation (To-Do)

Generate protobuf definitions from JSON Schema:

tooling/json_schema_to_proto.py:

def generate_proto_from_schema(schema_path: Path) -> str:
"""Convert JSON Schema to .proto definition."""
# Read schema
# Generate message types from object schemas
# Generate enums from enum schemas
# Add field options from x-* extensions
# Write .proto file

Usage:

uv run tooling/json_schema_to_proto.py \
schemas/config/namespace-request.schema.json \
--output proto/prism/config/v1/namespace_request.proto

Usage Examples

Validating a Config File

# Validate a namespace request config
uv run tooling/config_schema_tools.py validate \
patterns/multicast_registry/examples/redis-nats.yaml \
--schema schemas/config/namespace-request.schema.json

# Output:
# ❌ patterns/multicast_registry/examples/redis-nats.yaml has validation errors:
# - needs.durability: 'strong' is required but missing
# - slots: Additional property not allowed (not in schema)

Generating Documentation

# Generate Markdown documentation
uv run tooling/config_schema_tools.py docs \
schemas/config/namespace-request.schema.json \
--output docs-cms/config/namespace-request.md

# Output: docs-cms/config/namespace-request.md with:
# - Field descriptions
# - Validation rules
# - Permission level constraints
# - Examples
# - RFC references

Checking for Drift

# Check all pattern configs for drift
uv run tooling/config_schema_tools.py drift patterns/

# Output:
# 📊 Configuration Drift Report
#
# Total configs: 15
# Invalid configs: 5
# Deprecated fields: 8
# Unused fields: 3
#
# ❌ Invalid Configs:
# patterns/multicast_registry/examples/redis-nats.yaml:
# - needs.durability: Required property missing
# - slots: Additional property not allowed
#
# ⚠️ Deprecated Fields (used but not in schema):
# - slots
# - config.pattern_name
# - behavior.max_identities
#
# 💤 Unused Fields (in schema but never used):
# - policies.encryption.key_rotation
# - needs.partition_count

Generating Example Configs

# Generate example YAML
uv run tooling/config_schema_tools.py example \
schemas/config/namespace-request.schema.json \
--output examples/namespace-request-example.yaml

# Output: Valid YAML with all required fields filled with example values

Migration Plan

Step 1: Audit Current Configs (Week 1)

  1. Run drift detection on all existing configs
  2. Document all deprecated fields
  3. Create migration guide for each pattern

Step 2: Update Schemas (Week 1-2)

  1. Update JSON Schema to match actual usage
  2. Add missing fields from current configs
  3. Document which fields are deprecated vs new

Step 3: Update Configs (Week 2-3)

  1. Migrate example configs to new schema
  2. Validate all configs pass
  3. Update README/documentation

Step 4: Update Code (Week 3-4)

  1. Generate new protobuf from schemas
  2. Update Go/Rust code to use new proto
  3. Update validation logic to use schema

Step 5: CI Enforcement (Week 4)

  1. Add schema validation to CI
  2. Make schema validation required for PR merge
  3. Add documentation generation to CI

Benefits

Prevents Configuration Drift

Before:

RFC-056 (docs) ➜ Hand-written examples ➜ Protobuf ➜ Go/Rust code
↓ ↓ ↓ ↓
Drift 1 Drift 2 Drift 3 Drift 4

After:

JSON Schema (source of truth)
├── Validates ➜ YAML configs (CI enforced)
├── Generates ➜ Documentation (always in sync)
├── Generates ➜ Protobuf (single source)
└── Validates ➜ Examples (CI enforced)

Improves Developer Experience

  1. Faster validation: Immediate feedback on invalid configs
  2. Better errors: Detailed error messages with field paths
  3. Always current docs: Documentation generated from schema
  4. Correct examples: Examples validated in CI

Enables Advanced Features

  1. Auto-completion: IDE support using JSON Schema
  2. Web UI: JSON Schema powers form generation
  3. Migration tools: Automated config version upgrades
  4. Policy validation: Permission levels enforced at schema level

Alternatives Considered

Alternative 1: Protobuf as Source of Truth

Pros:

  • Already using protobuf
  • Native gRPC support

Cons:

  • Protobuf doesn't support rich validation rules
  • No standard for documentation generation
  • Custom options not well-supported by tooling
  • YAML ↔ Protobuf conversion loses information

Decision: Use JSON Schema, generate protobuf from it

Alternative 2: Manual Documentation

Pros:

  • Full control over documentation format

Cons:

  • Documentation drifts from implementation
  • No automated validation
  • Examples become stale

Decision: Generate documentation from schema

Alternative 3: Code as Source of Truth

Pros:

  • Code is always correct

Cons:

  • Code doesn't capture intent (only implementation)
  • No single source for validation logic
  • Hard to generate documentation from code

Decision: Schema is higher-level than code

Future Work

Schema Registry Service

Create a schema registry service that:

  • Stores all configuration schemas
  • Provides schema validation API
  • Tracks schema evolution history
  • Validates backward compatibility

Config Migration Tool

Create a migration tool that:

  • Detects config version from schema
  • Automatically upgrades configs to latest version
  • Validates migration completeness
  • Generates migration reports

Web-based Config Editor

Create a web UI that:

  • Uses JSON Schema for form generation
  • Validates in real-time
  • Shows inline documentation
  • Generates valid YAML
  • RFC-056: Unified Configuration Model (6-layer model)
  • ADR-002: Client-Originated Configuration (permission levels)
  • ADR-022: Dynamic Client Configuration (runtime updates)
  • RFC-039: Backend Configuration Registry (backend definitions)
  • RFC-027: Namespace Configuration Client Perspective (user-facing config)

Implementation Checklist

Phase 1: Schema Definition

  • Create namespace-request.schema.json
  • Create platform-policy.schema.json
  • Create pattern-selection.schema.json
  • Create backend-registry.schema.json
  • Create frontend-registry.schema.json
  • Create runtime-config.schema.json

Phase 2: Tooling

  • Build config_schema_tools.py (validation, docs, examples, drift)
  • Build json_schema_to_proto.py (protobuf generation)
  • Build config_migration_tool.py (version upgrades)

Phase 3: Migration

  • Audit all existing configs (drift report)
  • Update schemas to match reality
  • Migrate example configs
  • Update code to use new schemas

Phase 4: CI Integration

  • Add schema validation to CI
  • Add documentation generation to CI
  • Add drift detection to CI
  • Make validation required for merge

Phase 5: Documentation

  • Generate docs from schemas
  • Update READMEs with new config format
  • Create migration guide
  • Update RFCs with schema references

Success Metrics

  1. Zero drift: All configs validate against schemas
  2. 100% coverage: All config types have schemas
  3. Always current: Documentation auto-generated from schemas
  4. Fast feedback: CI validates configs in <1 minute
  5. Developer satisfaction: Positive feedback on DX improvements

Conclusion

The configuration schema system solves drift problems by making JSON Schema the single source of truth. All documentation, validation, and examples are generated from schemas, ensuring they stay in sync.

The system is implemented in tooling/config_schema_tools.py with schemas in schemas/config/. Next steps are CI integration and migrating existing configs to the new format.