Protobuf as Single Source of Truth
Context
In a data gateway system, multiple components need consistent understanding of data models:
- Proxy: Routes requests, validates data
- Backends: Store and retrieve data
- Client libraries: Make requests
- Admin UI: Display and manage data
- Documentation: Describe APIs
Traditionally, these are defined separately:
- Database schemas (SQL DDL)
- API schemas (OpenAPI/Swagger)
- Client code (hand-written)
- Documentation (hand-written)
This leads to:
- Drift: Schemas get out of sync
- Duplication: Same model defined 4+ times
- Errors: Manual synchronization fails
- Slow iteration: Every change requires updating multiple files
Problem: How do we maintain consistency across all components while keeping the architecture DRY (Don't Repeat Yourself)?
Decision
Use Protocol Buffers (protobuf) as the single source of truth for all data models, with custom options for Prism-specific metadata. Generate all code, schemas, and configuration from proto definitions.
Rationale
Why Protobuf?
- Language Agnostic: Generate code for Rust, Python, JavaScript, TypeScript
- Strong Typing: Catch errors at compile time
- Backward Compatible: Evolve schemas without breaking clients
- Compact: Efficient binary serialization
- Extensible: Custom options for domain-specific metadata
- Tooling: Excellent IDE support, linters, formatters
Custom Options for Prism
// prism/options.proto
syntax = "proto3";
package prism;
import "google/protobuf/descriptor.proto";
// Message-level options
extend google.protobuf.MessageOptions {
string access_pattern = 50001; // read_heavy | write_heavy | append_heavy
int64 estimated_read_rps = 50002; // Capacity planning
int64 estimated_write_rps = 50003;
string backend = 50004; // postgres | kafka | nats | sqlite | neptune
string consistency = 50005; // strong | eventual | causal
int32 retention_days = 50006; // Auto-delete policy
bool enable_cache = 50007; // Add caching layer
}
// Field-level options
extend google.protobuf.FieldOptions {
string index = 50101; // primary | secondary | partition_key | clustering_key
string pii = 50102; // email | name | ssn | phone | address
bool encrypt_at_rest = 50103; // Field-level encryption
string validation = 50104; // email | uuid | url | regex:...
int32 max_length = 50105; // String length validation
}
// Service-level options (for future gRPC services)
extend google.protobuf.ServiceOptions {
bool require_auth = 50201; // All RPCs require auth
int32 rate_limit_rps = 50202; // Service-wide rate limit
}
// RPC-level options
extend google.protobuf.MethodOptions {
bool idempotent = 50301; // Safe to retry
int32 timeout_ms = 50302; // RPC timeout
string cache_ttl = 50303; // Cache responses
}
Code Generation Pipeline
proto/*.proto │ ├──> Rust code (prost) │ ├── Data structures │ ├── gRPC server traits │ └── Validation logic │ ├──> Python code (protoc) │ ├── Data classes │ └── gRPC clients │ ├──> TypeScript code (ts-proto) │ ├── Types for admin UI │ └── API client │ ├──> SQL schemas │ ├── CREATE TABLE statements │ ├── Indexes │ └── Constraints │ ├──> Kafka schemas │ ├── Topic configurations │ └── Serialization │ ├──> OpenAPI docs │ └── REST API documentation │ └──> Deployment configs ├── Capacity specs └── Backend routing
### Example: Complete Data Model
// user_profile.proto syntax = "proto3";
package prism.example;
import "prism/options.proto";
message UserProfile { option (prism.backend) = "postgres"; option (prism.consistency) = "strong"; option (prism.estimated_read_rps) = "5000"; option (prism.estimated_write_rps) = "500"; option (prism.enable_cache) = true;
// Primary key string user_id = 1 [ (prism.index) = "primary", (prism.validation) = "uuid" ];
// PII fields string email = 2 [ (prism.pii) = "email", (prism.index) = "secondary", (prism.validation) = "email" ];
string full_name = 3 [ (prism.pii) = "name", (prism.max_length) = 256 ];
// Encrypted field string ssn = 4 [ (prism.pii) = "ssn", (prism.encrypt_at_rest) = true ];
// Metadata int64 created_at = 5; int64 updated_at = 6;
// Nested message ProfileSettings settings = 7; }
message ProfileSettings { bool email_notifications = 1; string timezone = 2; string language = 3; }
This **single file** generates:
1. **Rust structs** with validation:
#[derive(Clone, PartialEq, Message)] pub struct UserProfile { #[prost(string, tag = "1")] pub user_id: String, #[prost(string, tag = "2")] pub email: String, // ... with validation methods }
impl UserProfile { pub fn validate(&self) -> Result<(), ValidationError> { validate_uuid(&self.user_id)?; validate_email(&self.email)?; // ... } }
2. **Postgres schema**:
CREATE TABLE user_profile ( user_id UUID PRIMARY KEY, email VARCHAR(255) NOT NULL, full_name VARCHAR(256), ssn_encrypted BYTEA, -- Encrypted at application layer created_at BIGINT NOT NULL, updated_at BIGINT NOT NULL, settings JSONB );
CREATE INDEX idx_user_profile_email ON user_profile(email);
3. **TypeScript types** for admin UI:
export interface UserProfile { userId: string; email: string; fullName: string; ssn: string; createdAt: number; updatedAt: number; settings?: ProfileSettings; }
4. **Deployment config** (auto-generated):
name: user-profile backend: postgres capacity: read_rps: 5000 write_rps: 500 estimated_data_size_mb: 1000 cache: enabled: true ttl_seconds: 300
### Alternatives Considered
1. **OpenAPI/Swagger as Source of Truth**
- Pros:
- HTTP-first
- Good tooling
- Popular
- Cons:
- Doesn't support binary protocols (Kafka, NATS)
- Weaker typing than protobuf
- No field-level metadata
- Rejected because: Doesn't cover all our use cases
2. **SQL DDL as Source of Truth**
- Pros:
- Natural for database-first design
- DBAs comfortable with it
- Cons:
- Only works for SQL backends
- Doesn't describe APIs
- Poor code generation for clients
- Rejected because: Too backend-specific
3. **JSON Schema**
- Pros:
- Simple
- Widely understood
- Works with HTTP APIs
- Cons:
- Runtime validation only
- No compile-time safety
- Verbose
- Rejected because: Lack of strong typing
4. **Hand-Written Code**
- Pros:
- Full control
- No code generation complexity
- Cons:
- Massive duplication
- Drift between components
- Error-prone
- Rejected because: Doesn't scale
## Consequences
### Positive
- **Single Source of Truth**: One place to change data models
- **Consistency**: All components guaranteed to have same understanding
- **Type Safety**: Compile-time errors across all languages
- **Fast Iteration**: Change proto, regenerate, done
- **Documentation**: Proto files are self-documenting
- **Validation**: Generated validators ensure data integrity
- **Backward Compatibility**: Protobuf's rules prevent breaking changes
### Negative
- **Code Generation Complexity**: Must maintain codegen tooling
- *Mitigation*: Use existing tools (prost, ts-proto); only customize for Prism options
- **Learning Curve**: Team must learn protobuf
- *Mitigation*: Good documentation; protobuf is simpler than alternatives
- **Build Step Required**: Can't edit generated code directly
- *Mitigation*: Fast build times; clear separation of generated vs. hand-written
### Neutral
- **Proto Language Limitations**: Can't express all constraints
- Use custom options for Prism-specific needs
- Complex validation logic in hand-written code
- **Version Management**: Proto file changes must be carefully reviewed
- Enforce backward compatibility checks in CI
## Implementation Notes
### Project Structure
proto/
├── prism/
│ ├── options.proto # Custom Prism options
│ └── common/
│ ├── types.proto # Common types (timestamps, UUIDs, etc.)
│ └── errors.proto # Error definitions
├── examples/
│ ├── user_profile.proto # Example from above
│ ├── user_events.proto # Kafka example
│ └── social_graph.proto # Neptune example
└── BUILD.bazel # Or build.rs for Rust
Code Generation Tool
# tooling/codegen/__main__.py
python -m tooling.codegen \
--proto-path proto \
--out-rust proxy/src/generated \
--out-python tooling/generated \
--out-typescript admin/app/models/generated \
--out-sql backends/postgres/migrations \
--out-docs docs/api
CI Integration
# .github/workflows/proto.yml
name: Protobuf
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Check backward compatibility
run: buf breaking --against '.git#branch=main'
- name: Lint proto files
run: buf lint
- name: Generate code
run: python -m tooling.codegen
- name: Verify no changes
run: git diff --exit-code # Fail if generated code is stale
Migration Strategy
When changing proto definitions:
- Additive changes (new fields): Safe, just regenerate
- Renaming fields: Use
json_name
option for backward compat - Removing fields: Mark as
reserved
instead - Changing types: Create new field, migrate data, deprecate old
References
- Protocol Buffers Language Guide
- Buf Schema Registry
- prost (Rust protobuf)
- ts-proto (TypeScript)
- ADR-002: Client-Originated Configuration
- ADR-004: Local-First Testing Strategy
Revision History
- 2025-10-05: Initial draft and acceptance