Architecture¶
This document explains the internal architecture of the Vault Access Operator.
High-Level Overview¶
graph TD
subgraph K8s["Kubernetes Cluster"]
subgraph Operator["Vault Access Operator"]
CF["Connection Feature"]
PF["Policy Feature"]
RF["Role Feature"]
DF["Discovery Feature"]
EB["Event Bus<br/>(shared/events)"]
CC["Client Cache"]
VC["Vault Client<br/>(pkg/vault)"]
end
end
V["HashiCorp Vault"]
CF --> EB
PF --> EB
RF --> EB
CF --> CC
PF --> CC
RF --> CC
DF --> CC
CC --> VC
VC --> V
Feature-Driven Design¶
The operator follows a feature-driven architecture where each domain (connection, policy, role, discovery) is self-contained:
features/
├── connection/
│ ├── feature.go # Feature registration
│ └── controller/ # Reconciler, handler
├── policy/
│ ├── feature.go
│ ├── controller/ # Policy + ClusterPolicy reconcilers
│ └── domain/ # Business logic adapters
├── role/
│ ├── feature.go
│ ├── controller/
│ └── domain/
└── discovery/
├── feature.go
└── controller/ # Scanner implementation
Benefits¶
- Isolation: Each feature can be developed and tested independently
- Pluggability: Features can be enabled/disabled at startup
- Maintainability: Clear ownership and separation of concerns
- Testability: Domain logic separated from K8s/Vault infrastructure
Controller Pattern¶
Each resource type has a dedicated controller following the standard Kubernetes controller pattern:
flowchart LR
A["Watch Events"] --> B["Enqueue<br/>(workqueue)"]
B --> C["Reconcile<br/>(business logic)"]
C --> D["Requeue<br/>(delay)"]
D -.-> A
BaseReconciler¶
All controllers use a shared BaseReconciler that provides:
- Finalizer management
- Status updates
- Error handling
- Requeue logic
- ReconcileID tracking
// BaseReconciler provides common reconciliation patterns
type BaseReconciler[T client.Object] struct {
client client.Client
scheme *runtime.Scheme
log logr.Logger
finalizer string
recorder record.EventRecorder
}
// Reconcile handles the common reconciliation flow
func (r *BaseReconciler[T]) Reconcile(
ctx context.Context,
req ctrl.Request,
handler FeatureHandler[T],
newObj func() T,
) (ctrl.Result, error)
FeatureHandler Interface¶
Each feature implements the FeatureHandler interface:
type FeatureHandler[T client.Object] interface {
// Sync creates or updates the resource in Vault
Sync(ctx context.Context, obj T) error
// Cleanup removes the resource from Vault
Cleanup(ctx context.Context, obj T) error
}
Reconciliation Flow¶
Sync Flow (Create/Update)¶
flowchart TD
A["1. Reconcile triggered"] --> B["2. Get resource from API server"]
B --> C{"3. Being deleted?"}
C -- Yes --> cleanup["Cleanup Flow"]
C -- No --> D["4. Ensure finalizer exists"]
D --> E["5. Get VaultConnection"]
E --> F["6. Get Vault client"]
F --> G{"7. Exists in Vault?"}
G -- No --> H["Create in Vault"]
G -- Yes --> I["Check drift → Update if needed"]
H --> J["8. Update status"]
I --> J
J --> K["9. Requeue after interval"]
Cleanup Flow (Delete)¶
flowchart TD
A["1. Resource has deletion timestamp"] --> B{"2. deletionPolicy?"}
B -- Retain --> skip["Skip Vault deletion"]
B -- Delete --> C["3. Get Vault client"]
skip --> E["5. Remove finalizer"]
C --> D["4. Delete from Vault"]
D --> E
E --> F["6. Resource deleted"]
Domain Adapters¶
Domain adapters abstract the differences between namespaced and cluster-scoped resources:
// PolicyAdapter provides a unified interface for VaultPolicy and VaultClusterPolicy
type PolicyAdapter interface {
GetName() string
GetNamespace() string
GetVaultPolicyName() string
GetConnectionRef() vaultv1alpha1.ConnectionReference
GetRules() []vaultv1alpha1.PolicyRule
GetDriftMode() vaultv1alpha1.DriftMode
// ... status setters
}
This allows the same business logic to handle both:
- VaultPolicy (namespaced) → policy name: namespace-name
- VaultClusterPolicy (cluster-scoped) → policy name: name
Vault Client Layer¶
The pkg/vault package provides a client abstraction:
type Client struct {
client *api.Client
authConfig AuthConfig
log logr.Logger
}
// Policy operations
func (c *Client) ReadPolicy(ctx context.Context, name string) (string, error)
func (c *Client) WritePolicy(ctx context.Context, name, rules string) error
func (c *Client) DeletePolicy(ctx context.Context, name string) error
// Role operations
func (c *Client) ReadKubernetesAuthRole(ctx context.Context, authPath, roleName string) (*RoleConfig, error)
func (c *Client) WriteKubernetesAuthRole(ctx context.Context, authPath, roleName string, config *RoleConfig) error
func (c *Client) DeleteKubernetesAuthRole(ctx context.Context, authPath, roleName string) error
// Authentication
func (c *Client) Authenticate(ctx context.Context) error
func (c *Client) RenewToken(ctx context.Context) error
Client Cache¶
The ClientCache is a thread-safe cache that shares authenticated Vault clients between features. The Connection feature creates and caches clients; Policy, Role, and Discovery features retrieve them by VaultConnection name. This avoids redundant authentication and ensures all features use the same token lifecycle.
Event Bus¶
The operator uses an in-process event bus (shared/events) for inter-feature communication, avoiding direct coupling between features.
Event Types¶
| Category | Events | Description |
|---|---|---|
| Connection | connection.ready, connection.disconnected, connection.health_changed |
VaultConnection lifecycle |
| Policy | policy.created, policy.updated, policy.deleted |
VaultPolicy/VaultClusterPolicy changes |
| Role | role.created, role.updated, role.deleted |
VaultRole/VaultClusterRole changes |
| Token | token.renewed, token.renewal_failed, token.reviewer_refreshed |
Token lifecycle events |
Pattern¶
Features publish events via EventBus.Publish() (synchronous) or EventBus.PublishAsync() (fire-and-forget goroutine). Other features subscribe with typed handlers using Go generics: events.Subscribe[ConnectionReady](bus, handler).
Managed Markers and Orphan Detection¶
The operator tracks which Vault resources it manages using KV v2 markers stored at:
secret/data/vault-access-operator/managed/policies/{name}
secret/data/vault-access-operator/managed/roles/{name}
These markers enable:
- Orphan detection: Periodic scans compare markers against existing K8s resources. A marker without a matching CR indicates an orphaned Vault resource (e.g., CR deleted while operator was down).
- Cleanup queue: Orphaned resources are queued for cleanup with retry logic and exponential backoff.
- Conflict detection: When creating a resource, the operator checks for existing markers to detect conflicts with other CRs.
Authentication Flow¶
sequenceDiagram
participant VC as VaultConnection
participant K8s as Kubernetes API
participant V as Vault Server
VC->>K8s: TokenRequest (ServiceAccount)
K8s-->>VC: SA JWT Token
VC->>V: Login with SA JWT
V->>K8s: Validate Token
K8s-->>V: Token Valid
V-->>VC: Vault Token
Note over VC: Token Cached + Renewal
Note over VC,V: Shown: Kubernetes auth. The operator supports 8 auth methods — see Authentication Methods docs
Renewal Strategy¶
The operator supports two token renewal strategies:
| Strategy | Behavior |
|---|---|
renew (default) |
Renew existing token before expiration |
reauth |
Re-authenticate with fresh credentials |
Status Management¶
Phase Transitions¶
stateDiagram-v2
[*] --> Pending
Pending --> Syncing
Syncing --> Active
Syncing --> Error
Syncing --> Conflict
Active --> Error
Active --> Syncing : spec change
Error --> Syncing : retry
Pending --> Conflict
Error --> Conflict
Active --> Deleting
Deleting --> [*]
Conditions¶
Resources track detailed conditions:
| Condition | Description |
|---|---|
Ready |
Resource is ready for use |
Synced |
Resource is synced to Vault |
ConnectionReady |
VaultConnection is available |
PoliciesResolved |
Referenced policies exist |
DependencyReady |
All dependencies (connection, policies) are satisfied |
Drifted |
Vault resource differs from desired K8s state |
Deleting |
Resource is being deleted (finalizer in progress) |
Metrics and Observability¶
Prometheus Metrics¶
# Connection health
vault_access_operator_connection_healthy{connection}
vault_access_operator_connection_health_checks_total{connection, result}
vault_access_operator_connection_consecutive_fails{connection}
# Reconciliation
vault_access_operator_policy_reconcile_total{kind, namespace, result}
vault_access_operator_role_reconcile_total{kind, namespace, result}
# Drift
vault_access_operator_vault_drift_detected{kind, namespace, name}
vault_access_operator_vault_drift_corrected_total{kind, namespace}
# Orphan & Cleanup
vault_access_operator_vault_orphaned_resources{connection, type}
vault_access_operator_cleanup_queue_size
vault_access_operator_cleanup_retries_total{resource_type, result}
# Safety
vault_access_operator_safety_destructive_blocked_total{kind, namespace}
# Discovery
vault_access_operator_discovery_adoptions_total{kind, namespace, result}
vault_access_operator_discovery_unmanaged_resources{connection, type}
vault_access_operator_discovery_scans_total{connection, result}
Structured Logging¶
All log entries include:
{
"level": "info",
"ts": "2026-01-15T10:30:00Z",
"logger": "vaultpolicy",
"msg": "reconciling resource",
"reconcileID": "abc123",
"namespace": "production",
"name": "my-policy"
}
Filter logs by reconcileID:
kubectl logs deploy/vault-access-operator-controller-manager | \
jq 'select(.reconcileID == "abc123")'
Error Handling¶
Retry with Backoff¶
Failed operations are automatically retried with exponential backoff:
Error Categories¶
| Category | Behavior |
|---|---|
| Transient (network, timeout) | Requeue with backoff |
| Conflict | Set status, wait for resolution |
| Validation | Set error status, no requeue |
| Permanent (auth failed) | Set error status, alert |
See Also¶
- Drift Detection - How drift is detected and corrected
- Discovery - Finding unmanaged resources
- API Reference - Complete CRD reference