Skip to content

Architecture

This document explains the internal architecture of the Vault Access Operator.

High-Level Overview

graph TD
    subgraph K8s["Kubernetes Cluster"]
        subgraph Operator["Vault Access Operator"]
            CF["Connection Feature"]
            PF["Policy Feature"]
            RF["Role Feature"]
            DF["Discovery Feature"]
            EB["Event Bus<br/>(shared/events)"]
            CC["Client Cache"]
            VC["Vault Client<br/>(pkg/vault)"]
        end
    end
    V["HashiCorp Vault"]

    CF --> EB
    PF --> EB
    RF --> EB
    CF --> CC
    PF --> CC
    RF --> CC
    DF --> CC
    CC --> VC
    VC --> V

Feature-Driven Design

The operator follows a feature-driven architecture where each domain (connection, policy, role, discovery) is self-contained:

features/
├── connection/
│   ├── feature.go        # Feature registration
│   └── controller/       # Reconciler, handler
├── policy/
│   ├── feature.go
│   ├── controller/       # Policy + ClusterPolicy reconcilers
│   └── domain/           # Business logic adapters
├── role/
│   ├── feature.go
│   ├── controller/
│   └── domain/
└── discovery/
    ├── feature.go
    └── controller/       # Scanner implementation

Benefits

  1. Isolation: Each feature can be developed and tested independently
  2. Pluggability: Features can be enabled/disabled at startup
  3. Maintainability: Clear ownership and separation of concerns
  4. Testability: Domain logic separated from K8s/Vault infrastructure

Controller Pattern

Each resource type has a dedicated controller following the standard Kubernetes controller pattern:

flowchart LR
    A["Watch Events"] --> B["Enqueue<br/>(workqueue)"]
    B --> C["Reconcile<br/>(business logic)"]
    C --> D["Requeue<br/>(delay)"]
    D -.-> A

BaseReconciler

All controllers use a shared BaseReconciler that provides:

  • Finalizer management
  • Status updates
  • Error handling
  • Requeue logic
  • ReconcileID tracking
// BaseReconciler provides common reconciliation patterns
type BaseReconciler[T client.Object] struct {
    client   client.Client
    scheme   *runtime.Scheme
    log      logr.Logger
    finalizer string
    recorder record.EventRecorder
}

// Reconcile handles the common reconciliation flow
func (r *BaseReconciler[T]) Reconcile(
    ctx context.Context,
    req ctrl.Request,
    handler FeatureHandler[T],
    newObj func() T,
) (ctrl.Result, error)

FeatureHandler Interface

Each feature implements the FeatureHandler interface:

type FeatureHandler[T client.Object] interface {
    // Sync creates or updates the resource in Vault
    Sync(ctx context.Context, obj T) error

    // Cleanup removes the resource from Vault
    Cleanup(ctx context.Context, obj T) error
}

Reconciliation Flow

Sync Flow (Create/Update)

flowchart TD
    A["1. Reconcile triggered"] --> B["2. Get resource from API server"]
    B --> C{"3. Being deleted?"}
    C -- Yes --> cleanup["Cleanup Flow"]
    C -- No --> D["4. Ensure finalizer exists"]
    D --> E["5. Get VaultConnection"]
    E --> F["6. Get Vault client"]
    F --> G{"7. Exists in Vault?"}
    G -- No --> H["Create in Vault"]
    G -- Yes --> I["Check drift → Update if needed"]
    H --> J["8. Update status"]
    I --> J
    J --> K["9. Requeue after interval"]

Cleanup Flow (Delete)

flowchart TD
    A["1. Resource has deletion timestamp"] --> B{"2. deletionPolicy?"}
    B -- Retain --> skip["Skip Vault deletion"]
    B -- Delete --> C["3. Get Vault client"]
    skip --> E["5. Remove finalizer"]
    C --> D["4. Delete from Vault"]
    D --> E
    E --> F["6. Resource deleted"]

Domain Adapters

Domain adapters abstract the differences between namespaced and cluster-scoped resources:

// PolicyAdapter provides a unified interface for VaultPolicy and VaultClusterPolicy
type PolicyAdapter interface {
    GetName() string
    GetNamespace() string
    GetVaultPolicyName() string
    GetConnectionRef() vaultv1alpha1.ConnectionReference
    GetRules() []vaultv1alpha1.PolicyRule
    GetDriftMode() vaultv1alpha1.DriftMode
    // ... status setters
}

This allows the same business logic to handle both: - VaultPolicy (namespaced) → policy name: namespace-name - VaultClusterPolicy (cluster-scoped) → policy name: name

Vault Client Layer

The pkg/vault package provides a client abstraction:

type Client struct {
    client     *api.Client
    authConfig AuthConfig
    log        logr.Logger
}

// Policy operations
func (c *Client) ReadPolicy(ctx context.Context, name string) (string, error)
func (c *Client) WritePolicy(ctx context.Context, name, rules string) error
func (c *Client) DeletePolicy(ctx context.Context, name string) error

// Role operations
func (c *Client) ReadKubernetesAuthRole(ctx context.Context, authPath, roleName string) (*RoleConfig, error)
func (c *Client) WriteKubernetesAuthRole(ctx context.Context, authPath, roleName string, config *RoleConfig) error
func (c *Client) DeleteKubernetesAuthRole(ctx context.Context, authPath, roleName string) error

// Authentication
func (c *Client) Authenticate(ctx context.Context) error
func (c *Client) RenewToken(ctx context.Context) error

Client Cache

The ClientCache is a thread-safe cache that shares authenticated Vault clients between features. The Connection feature creates and caches clients; Policy, Role, and Discovery features retrieve them by VaultConnection name. This avoids redundant authentication and ensures all features use the same token lifecycle.

Event Bus

The operator uses an in-process event bus (shared/events) for inter-feature communication, avoiding direct coupling between features.

Event Types

Category Events Description
Connection connection.ready, connection.disconnected, connection.health_changed VaultConnection lifecycle
Policy policy.created, policy.updated, policy.deleted VaultPolicy/VaultClusterPolicy changes
Role role.created, role.updated, role.deleted VaultRole/VaultClusterRole changes
Token token.renewed, token.renewal_failed, token.reviewer_refreshed Token lifecycle events

Pattern

Features publish events via EventBus.Publish() (synchronous) or EventBus.PublishAsync() (fire-and-forget goroutine). Other features subscribe with typed handlers using Go generics: events.Subscribe[ConnectionReady](bus, handler).

Managed Markers and Orphan Detection

The operator tracks which Vault resources it manages using KV v2 markers stored at:

secret/data/vault-access-operator/managed/policies/{name}
secret/data/vault-access-operator/managed/roles/{name}

These markers enable:

  • Orphan detection: Periodic scans compare markers against existing K8s resources. A marker without a matching CR indicates an orphaned Vault resource (e.g., CR deleted while operator was down).
  • Cleanup queue: Orphaned resources are queued for cleanup with retry logic and exponential backoff.
  • Conflict detection: When creating a resource, the operator checks for existing markers to detect conflicts with other CRs.

Authentication Flow

sequenceDiagram
    participant VC as VaultConnection
    participant K8s as Kubernetes API
    participant V as Vault Server

    VC->>K8s: TokenRequest (ServiceAccount)
    K8s-->>VC: SA JWT Token
    VC->>V: Login with SA JWT
    V->>K8s: Validate Token
    K8s-->>V: Token Valid
    V-->>VC: Vault Token
    Note over VC: Token Cached + Renewal
    Note over VC,V: Shown: Kubernetes auth. The operator supports 8 auth methods — see Authentication Methods docs

Renewal Strategy

The operator supports two token renewal strategies:

Strategy Behavior
renew (default) Renew existing token before expiration
reauth Re-authenticate with fresh credentials

Status Management

Phase Transitions

stateDiagram-v2
    [*] --> Pending
    Pending --> Syncing
    Syncing --> Active
    Syncing --> Error
    Syncing --> Conflict
    Active --> Error
    Active --> Syncing : spec change
    Error --> Syncing : retry
    Pending --> Conflict
    Error --> Conflict
    Active --> Deleting
    Deleting --> [*]

Conditions

Resources track detailed conditions:

Condition Description
Ready Resource is ready for use
Synced Resource is synced to Vault
ConnectionReady VaultConnection is available
PoliciesResolved Referenced policies exist
DependencyReady All dependencies (connection, policies) are satisfied
Drifted Vault resource differs from desired K8s state
Deleting Resource is being deleted (finalizer in progress)

Metrics and Observability

Prometheus Metrics

# Connection health
vault_access_operator_connection_healthy{connection}
vault_access_operator_connection_health_checks_total{connection, result}
vault_access_operator_connection_consecutive_fails{connection}

# Reconciliation
vault_access_operator_policy_reconcile_total{kind, namespace, result}
vault_access_operator_role_reconcile_total{kind, namespace, result}

# Drift
vault_access_operator_vault_drift_detected{kind, namespace, name}
vault_access_operator_vault_drift_corrected_total{kind, namespace}

# Orphan & Cleanup
vault_access_operator_vault_orphaned_resources{connection, type}
vault_access_operator_cleanup_queue_size
vault_access_operator_cleanup_retries_total{resource_type, result}

# Safety
vault_access_operator_safety_destructive_blocked_total{kind, namespace}

# Discovery
vault_access_operator_discovery_adoptions_total{kind, namespace, result}
vault_access_operator_discovery_unmanaged_resources{connection, type}
vault_access_operator_discovery_scans_total{connection, result}

Structured Logging

All log entries include:

{
  "level": "info",
  "ts": "2026-01-15T10:30:00Z",
  "logger": "vaultpolicy",
  "msg": "reconciling resource",
  "reconcileID": "abc123",
  "namespace": "production",
  "name": "my-policy"
}

Filter logs by reconcileID:

kubectl logs deploy/vault-access-operator-controller-manager | \
  jq 'select(.reconcileID == "abc123")'

Error Handling

Retry with Backoff

Failed operations are automatically retried with exponential backoff:

Attempt 1: immediate
Attempt 2: 30s delay
Attempt 3: 1m delay
Attempt 4: 2m delay
...
Max delay: 5m

Error Categories

Category Behavior
Transient (network, timeout) Requeue with backoff
Conflict Set status, wait for resolution
Validation Set error status, no requeue
Permanent (auth failed) Set error status, alert

See Also