ADR-050: Topaz for Policy-Based Authorization
Status
Accepted
Context
Prism requires fine-grained authorization beyond simple OIDC authentication. We need:
- Multi-tenancy isolation: Users can only access their namespace's data
- Role-based access control (RBAC): Different permissions for developers, operators, admins
- Attribute-based access control (ABAC): Context-aware policies (time of day, IP address, data sensitivity)
- Resource-level policies: Per-namespace, per-backend, per-pattern permissions
- Audit trail: Who accessed what, when, and why
Design Constraints:
- Authorization decisions must be fast (<5ms P99)
- Policies must be centrally managed and versioned
- Must integrate with existing OIDC authentication (RFC-010, RFC-011)
- Should support both synchronous (proxy) and asynchronous (admin) authorization
- Need local policy enforcement for low-latency decisions
Decision
We will use Topaz by Aserto as our policy engine for authorization decisions.
What is Topaz?
- Open-source authorization engine based on Open Policy Agent (OPA)
- Built-in directory service for storing user/group/resource relationships
- Supports fine-grained, real-time authorization (FGA)
- Sidecar deployment model for low-latency local decisions
- Centralized policy management with decentralized enforcement
Rationale
Alternatives Considered
Alternative A: Open Policy Agent (OPA) Alone
Pros:
- Industry standard policy engine
- Flexible Rego policy language
- Wide adoption and ecosystem
Cons:
- ❌ No built-in directory service (need separate user/resource store)
- ❌ No relationship/graph modeling (need to implement ourselves)
- ❌ Limited real-time updates (bundle-based refresh only)
- ❌ No built-in audit logging
Verdict: Too much plumbing required. We'd essentially rebuild Topaz.
Alternative B: Cloud Provider IAM (AWS IAM, Google Cloud IAM)
Pros:
- Integrated with cloud infrastructure
- No additional infrastructure to manage
Cons:
- ❌ Cloud-specific (not portable)
- ❌ Coarse-grained (resource-level, not fine-grained)
- ❌ High latency (API calls to cloud control plane)
- ❌ No support for on-premises deployments
Verdict: Doesn't meet latency or portability requirements.
Alternative C: Zanzibar-based Systems (SpiceDB, Ory Keto)
Pros:
- Google Zanzibar-inspired relationship-based access control
- Fine-grained permissions with relationships
- High performance with caching
Cons:
- ⚠️ More complex to set up than Topaz
- ⚠️ SpiceDB requires separate deployment (not sidecar)
- ⚠️ Ory Keto still maturing
Verdict: Good alternative, but Topaz provides simpler integration.
Alternative D: Topaz (Selected)
Pros:
- ✅ Built on OPA (reuses Rego policy language)
- ✅ Includes directory service (users, groups, resources)
- ✅ Sidecar deployment for <1ms authorization checks
- ✅ Relationship-based authorization (like Zanzibar)
- ✅ Real-time policy updates (no bundle delays)
- ✅ Built-in audit logging
- ✅ Open source with commercial support option (Aserto)
- ✅ gRPC and REST APIs for integration
- ✅ Works on-premises and in cloud
Cons:
- ⚠️ Smaller ecosystem than OPA
- ⚠️ Requires learning Aserto-specific concepts (manifests, directory API)
Verdict: ✅ Best fit - combines OPA's power with Zanzibar-style relationships and local enforcement.
Architecture
Deployment Model: Sidecar Pattern
┌────────────────────────────────────────────┐
│ Prism Proxy (Rust) │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ gRPC Request Handler │ │
│ └──────────┬───────────────────────────┘ │
│ │ │
│ │ 1. Authorize(user, resource) │
│ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ Topaz Sidecar (localhost:8282) │ │
│ │ - Policy Engine (Rego) │ │
│ │ - Directory Service (users/groups) │ │
│ │ - Decision Cache (local) │ │
│ └──────────┬───────────────────────────┘ │
└─────────────┼───────────────────────────────┘
│
│ 2. Policy sync (background)
▼
┌──────────────────────────────┐
│ Central Policy Repository │
│ (Git + Aserto Control Plane)│
└──────────────────────────────┘
Flow:
- Prism proxy receives gRPC request
- Proxy calls Topaz sidecar:
Is(user, "can-read", namespace)
- Topaz evaluates policy locally (<1ms)
- Topaz returns
{ allowed: true, reasons: [...] }
- Proxy enforces decision (allow or deny request)
Policy updates:
- Policies stored in Git (as Rego files)
- Topaz syncs policies every 30s from central control plane
- No proxy restarts required for policy changes
Authorization Model: Relationship-Based Access Control
Inspired by Google Zanzibar, Topaz models authorization as relationships between subjects (users), objects (resources), and permissions.
Example Relationships:
# User alice is a member of team platform-engineering
alice | member | group:platform-engineering
# Group platform-engineering is an admin of namespace iot-devices
group:platform-engineering | admin | namespace:iot-devices
# Namespace iot-devices contains backend redis-001
namespace:iot-devices | contains | backend:redis-001
# Policy: Admins of a namespace can read/write its backends
allow(user, "read", backend) if
user | admin | namespace
namespace | contains | backend
Result: Alice can read backend redis-001
because:
- Alice ∈ platform-engineering (member)
- platform-engineering → admin → iot-devices
- iot-devices → contains → redis-001
- Policy: admin → can read contained backends
Integration Points
1. Prism Proxy (Rust)
Authorization Middleware:
// src/authz/topaz.rs
use tonic::Request;
use anyhow::Result;
pub struct TopazAuthz {
client: TopazClient,
}
impl TopazAuthz {
pub async fn authorize(&self, req: &AuthzRequest) -> Result<bool> {
let decision = self.client.is(IsRequest {
subject: req.user,
relation: req.permission, // "read", "write", "admin"
object: req.resource, // "namespace:iot-devices"
}).await?;
if decision.is {
info!("Authorized: {} can {} {}", req.user, req.permission, req.resource);
Ok(true)
} else {
warn!("Denied: {} cannot {} {}", req.user, req.permission, req.resource);
Ok(false)
}
}
}
// Middleware applied to all gRPC requests
pub async fn authz_middleware(
req: Request<()>,
authz: &TopazAuthz,
) -> Result<Request<()>, Status> {
let metadata = req.metadata();
let user = metadata.get("x-user-id")
.ok_or_else(|| Status::unauthenticated("Missing user ID"))?;
let resource = extract_resource_from_request(&req)?;
let permission = infer_permission_from_method(&req)?;
let allowed = authz.authorize(&AuthzRequest {
user: user.to_str()?,
permission,
resource,
}).await?;
if allowed {
Ok(req)
} else {
Err(Status::permission_denied("Access denied by policy"))
}
}
2. Admin CLI (Go)
Authorization Check Before Commands:
// prismctl/internal/authz/topaz.go
package authz
import (
"context"
"fmt"
topazpb "github.com/aserto-dev/go-grpc/aserto/authorizer/v2"
"google.golang.org/grpc"
)
type TopazClient struct {
client topazpb.AuthorizerClient
}
func NewTopazClient(endpoint string) (*TopazClient, error) {
conn, err := grpc.Dial(endpoint, grpc.WithInsecure())
if err != nil {
return nil, fmt.Errorf("connect to Topaz: %w", err)
}
return &TopazClient{
client: topazpb.NewAuthorizerClient(conn),
}, nil
}
func (c *TopazClient) CanUser(ctx context.Context, user, permission, resource string) (bool, error) {
req := &topazpb.IsRequest{
Subject: user,
Relation: permission,
Object: resource,
}
resp, err := c.client.Is(ctx, req)
if err != nil {
return false, fmt.Errorf("authorization check: %w", err)
}
return resp.Is, nil
}
// Usage in CLI commands (prismctl/cmd/namespace.go)
var namespaceDeleteCmd = &cobra.Command{
Use: "delete NAME",
Short: "Delete a namespace (requires admin permission)",
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
namespace := args[0]
user := getCurrentUser()
// Check authorization before dangerous operation
authz, err := authz.NewTopazClient("localhost:8282")
if err != nil {
return fmt.Errorf("connect to authz: %w", err)
}
allowed, err := authz.CanUser(cmd.Context(), user, "admin", fmt.Sprintf("namespace:%s", namespace))
if err != nil {
return fmt.Errorf("authorization check: %w", err)
}
if !allowed {
uiInstance.Error(fmt.Sprintf("Access denied: You don't have admin permission on %s", namespace))
return fmt.Errorf("permission denied")
}
// Proceed with deletion
uiInstance.Info(fmt.Sprintf("Deleting namespace %s...", namespace))
// ... deletion logic
return nil
},
}
3. Admin UI (FastAPI)
Protect API Endpoints:
# admin/app/authz.py
from fastapi import Depends, HTTPException, status
from topaz import TopazClient
authz = TopazClient(endpoint="localhost:8282")
async def require_permission(
permission: str,
resource_type: str
):
"""FastAPI dependency for authorization."""
async def check_permission(
resource_id: str,
current_user: str = Depends(get_current_user)
):
resource = f"{resource_type}:{resource_id}"
allowed = await authz.can_user(
user=current_user,
permission=permission,
resource=resource
)
if not allowed:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=f"You don't have '{permission}' permission on {resource}"
)
return check_permission
# Usage
@app.delete("/api/namespaces/{namespace_id}")
async def delete_namespace(
namespace_id: str,
_: None = Depends(require_permission("admin", "namespace"))
):
"""Delete namespace (requires admin permission)."""
# ... deletion logic
Policy Examples
Policy 1: Namespace Isolation (Multi-Tenancy)
# policies/namespace_isolation.rego
package prism.authz
# Default deny
default allow = false
# Users can only access namespaces they have explicit permission for
allow {
input.permission == "read"
input.resource_type == "namespace"
has_namespace_access(input.user, input.resource_id)
}
allow {
input.permission == "write"
input.resource_type == "namespace"
has_namespace_write_access(input.user, input.resource_id)
}
has_namespace_access(user, namespace) {
# Check if user is member of group with access
user_groups := data.directory.user_groups[user]
group := user_groups[_]
data.directory.group_namespaces[group][namespace]
}
has_namespace_write_access(user, namespace) {
# Only admins can write
is_namespace_admin(user, namespace)
}
is_namespace_admin(user, namespace) {
user_groups := data.directory.user_groups[user]
group := user_groups[_]
data.directory.group_roles[group][namespace] == "admin"
}
Policy 2: Time-Based Access (Maintenance Windows)
# policies/maintenance_windows.rego
package prism.authz
# Allow writes only during maintenance window
allow {
input.permission == "write"
input.resource_type == "backend"
is_maintenance_window()
}
is_maintenance_window() {
# Maintenance: Sundays 02:00-06:00 UTC
now := time.now_ns()
day_of_week := time.weekday(now)
hour := time.clock(now)[0]
day_of_week == "Sunday"
hour >= 2
hour < 6
}
Policy 3: Data Sensitivity (PII Protection)
# policies/pii_protection.rego
package prism.authz
# PII data can only be accessed by users with pii-access role
allow {
input.permission == "read"
contains_pii(input.resource_id)
user_has_pii_access(input.user)
}
contains_pii(resource) {
# Check if resource is marked as containing PII
data.directory.resource_attributes[resource].sensitivity == "pii"
}
user_has_pii_access(user) {
user_roles := data.directory.user_roles[user]
user_roles[_] == "pii-access"
}
Directory Schema
Topaz Directory Models Users, Groups, Resources, and Relationships:
# topaz/directory/schema.yaml
model:
version: 3
types:
# Subjects
user:
relations:
member: group
group:
relations:
admin: namespace
developer: namespace
viewer: namespace
# Resources
namespace:
relations:
contains: backend
contains: pattern
backend:
relations:
exposed_by: namespace
pattern:
relations:
used_by: namespace
permissions:
# Namespace permissions
namespace:
read:
- viewer
- developer
- admin
write:
- developer
- admin
admin:
- admin
# Backend permissions
backend:
read:
- admin@namespace[exposed_by]
- developer@namespace[exposed_by]
write:
- admin@namespace[exposed_by]
admin:
- admin@namespace[exposed_by]
Populating the Directory:
# Add users
topaz directory set user alice@example.com
# Add groups
topaz directory set group platform-engineering
# Add user to group
topaz directory set relation alice@example.com member group:platform-engineering
# Add namespace
topaz directory set namespace iot-devices
# Grant group admin access to namespace
topaz directory set relation group:platform-engineering admin namespace:iot-devices
# Add backend
topaz directory set backend redis-001
# Link backend to namespace
topaz directory set relation namespace:iot-devices contains backend:redis-001
Performance Characteristics
Latency
Local sidecar authorization checks:
- P50: <0.5ms
- P95: <2ms
- P99: <5ms
Why so fast?
- Topaz sidecar runs locally (no network round-trip to remote authz service)
- Decisions cached in-memory
- Policy compiled ahead of time (not interpreted)
Comparison:
- Remote authz service (e.g., AWS IAM): 50-200ms
- Database lookup: 10-50ms
- Topaz local: <5ms
Throughput
Topaz sidecar can handle:
- 10,000+ authorization checks per second (local)
- Limited only by proxy throughput (not authz)
Deployment
Local Development
Docker Compose:
# docker-compose.yml
services:
topaz:
image: ghcr.io/aserto-dev/topaz:latest
ports:
- "8282:8282" # gRPC API
- "8383:8383" # REST API
- "8484:8484" # Console UI
volumes:
- ./topaz/config:/config
- ./topaz/policies:/policies
environment:
- TOPAZ_DB_PATH=/data/topaz.db
- TOPAZ_POLICY_ROOT=/policies
command: run -c /config/topaz-config.yaml
prism-proxy:
build: ./proxy
depends_on:
- topaz
environment:
- TOPAZ_ENDPOINT=topaz:8282
ports:
- "50051:50051"
Configuration (topaz/config/topaz-config.yaml
):
# Topaz configuration
version: 2
api:
grpc:
listen_address: "0.0.0.0:8282"
rest:
listen_address: "0.0.0.0:8383"
directory:
db:
type: sqlite
path: /data/topaz.db
policy:
engine: opa
policy_root: /policies
edge:
enabled: true
sync_interval: 30s
remote: https://topaz.aserto.com # Central policy repo
Production Deployment
Kubernetes Sidecar:
# k8s/prism-proxy-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prism-proxy
spec:
template:
spec:
containers:
# Main proxy container
- name: proxy
image: prism-proxy:latest
env:
- name: TOPAZ_ENDPOINT
value: "localhost:8282"
# Topaz sidecar
- name: topaz
image: ghcr.io/aserto-dev/topaz:latest
ports:
- containerPort: 8282
name: grpc
volumeMounts:
- name: topaz-config
mountPath: /config
- name: topaz-policies
mountPath: /policies
volumes:
- name: topaz-config
configMap:
name: topaz-config
- name: topaz-policies
configMap:
name: topaz-policies
Security Considerations
1. Policy Isolation
Risk: Malicious user modifies policies to grant themselves unauthorized access.
Mitigation:
- Policies stored in Git with branch protection
- Only CI/CD can push policy changes to Topaz
- Audit all policy changes in Git history
2. Directory Integrity
Risk: Unauthorized modification of user/group/resource relationships.
Mitigation:
- Directory API requires authentication (admin token)
- All directory changes logged to audit trail
- Periodic snapshots for disaster recovery
3. Sidecar Compromise
Risk: Attacker gains access to Topaz sidecar and bypasses authorization.
Mitigation:
- Topaz sidecar bound to localhost only (not exposed externally)
- Proxy and sidecar run in same pod/VM (network isolation)
- mTLS between proxy and sidecar (optional, for paranoid mode)
4. Denial of Service
Risk: Flood of authorization checks overwhelms Topaz sidecar.
Mitigation:
- Rate limiting in proxy before authz checks
- Circuit breaker pattern (fail open/closed configurable)
- Horizontal scaling of proxy+sidecar pairs
Migration Path
Phase 1: Basic RBAC (Week 1)
- Deploy Topaz sidecar alongside Prism proxy
- Implement simple RBAC policies (admin/developer/viewer roles)
- Integrate with existing OIDC authentication
- Test authorization checks in local development
Phase 2: Namespace Isolation (Week 2)
- Model namespaces in Topaz directory
- Implement namespace isolation policies
- Migrate existing namespace ACLs to Topaz
- Validate multi-tenancy enforcement
Phase 3: Fine-Grained Permissions (Week 3)
- Model backends and patterns in directory
- Implement resource-level policies
- Add attribute-based policies (time, IP, data sensitivity)
- Enable audit logging
Phase 4: Production Rollout (Week 4)
- Deploy to staging environment
- Load test authorization performance
- Gradual rollout to production (canary deployment)
- Monitor authorization latency and error rates
Monitoring and Observability
Metrics
Authorization Decision Metrics:
prism_authz_decisions_total{decision="allowed|denied"}
- Total authorization checksprism_authz_latency_seconds
- Authorization check latency histogramprism_authz_errors_total
- Failed authorization checksprism_authz_cache_hit_ratio
- Decision cache hit rate
Policy Evaluation Metrics:
topaz_policy_evaluations_total
- Policy evaluation counttopaz_policy_errors_total
- Policy evaluation errorstopaz_directory_queries_total
- Directory lookups
Logging
Authorization Audit Trail:
{
"timestamp": "2025-10-09T14:32:15Z",
"event": "authorization_decision",
"user": "alice@example.com",
"permission": "read",
"resource": "namespace:iot-devices",
"decision": "allowed",
"reasons": [
"user is member of group platform-engineering",
"platform-engineering has developer role on iot-devices"
],
"latency_ms": 1.2
}
Alerts
Authorization Failures:
- Alert if authorization error rate > 1%
- Alert if authorization latency P99 > 10ms
- Alert if policy sync fails
Unusual Access Patterns:
- Alert if user accesses namespace they've never accessed before
- Alert if admin actions outside maintenance window
- Alert if PII data accessed by non-authorized user
Open Questions
1. Fail-Open vs Fail-Closed?
Question: If Topaz sidecar is unavailable, should proxy allow or deny requests?
Options:
- Fail-closed (deny all): More secure, but impacts availability
- Fail-open (allow all): Better availability, but security risk
- Degraded mode (allow read-only): Compromise between security and availability
Recommendation: Fail-closed by default, with opt-in fail-open per namespace.
2. How to Handle Policy Conflicts?
Question: What happens if multiple policies conflict (one allows, one denies)?
Options:
- Deny wins: Conservative approach (deny if any policy denies)
- Allow wins: Permissive approach (allow if any policy allows)
- Explicit priority: Policies have precedence order
Recommendation: Deny wins (secure by default).
3. Should We Cache Authorization Decisions?
Question: Can we cache authorization decisions to reduce Topaz load?
Pros:
- Reduces latency for repeated checks
- Reduces load on Topaz sidecar
Cons:
- Stale decisions if policies/relationships change
- Cache invalidation complexity
Recommendation: Yes, with short TTL (5 seconds). Trade-off between performance and freshness.
Related Documents
- RFC-010: Admin Protocol with OIDC - OIDC authentication
- RFC-011: Data Proxy Authentication - Secrets provider abstraction
- ADR-046: Dex IDP for Local Testing - Local OIDC provider
Consequences
Positive
- ✅ Fine-grained authorization: Per-resource, per-user, attribute-based policies
- ✅ Low latency: <5ms P99 authorization checks (local sidecar)