Drift Detection¶
Drift detection monitors the actual state of database resources and compares them to the desired state defined in your Custom Resources. When differences (drift) are found, the operator can report or automatically correct them.
Overview¶
Configuration drift occurs when the actual database state no longer matches what's defined in Kubernetes. This can happen due to:
- Manual changes via SQL clients
- Scripts or applications modifying database objects
- Other operators or tools managing the same resources
- Database migrations that modify schema
Without drift detection, these changes can lead to:
- Security misconfigurations (e.g., excessive privileges)
- Inconsistent environments between staging and production
- Deployment failures when expected state differs from actual state
- Audit compliance issues
Drift Modes¶
The operator supports three drift detection modes:
| Mode | Behavior |
|---|---|
ignore |
No drift detection. Changes outside Kubernetes are not tracked. |
detect |
Detects drift and reports it in status/events. Does not auto-correct. |
correct |
Detects drift and automatically corrects it to match the CR spec. |
Configuration¶
Instance-Level Policy (Default)¶
Set a default drift policy on DatabaseInstance that applies to all child resources:
apiVersion: dbops.dbprovision.io/v1alpha1
kind: DatabaseInstance
metadata:
name: postgres-primary
spec:
engine: postgres
connection:
host: postgres.database.svc.cluster.local
port: 5432
secretRef:
name: postgres-credentials
driftPolicy:
mode: detect # Default for all databases/users/roles
interval: "5m"
Resource-Level Override¶
Individual resources can override the instance's default policy:
apiVersion: dbops.dbprovision.io/v1alpha1
kind: Database
metadata:
name: production-db
spec:
instanceRef:
name: postgres-primary
name: production
driftPolicy:
mode: correct # Override: auto-correct for this database
interval: "2m"
Policy Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
mode |
string | detect |
ignore, detect, or correct |
interval |
string | 5m |
How often to check for drift (Go duration) |
Status Fields¶
When drift is detected, the status shows details:
status:
phase: Ready
drift:
detected: true
lastChecked: "2024-01-15T10:30:00Z"
diffs:
- field: "encoding"
expected: "UTF8"
actual: "LATIN1"
destructive: true
immutable: false
- field: "connectionLimit"
expected: "100"
actual: "50"
destructive: false
immutable: false
Diff Fields¶
| Field | Description |
|---|---|
field |
The name of the field that differs |
expected |
The value defined in the CR spec |
actual |
The current value in the database |
destructive |
If correcting this would cause data loss |
immutable |
If this field cannot be changed after creation |
Events¶
The operator emits events for drift-related activities:
| Event | Type | Description |
|---|---|---|
DriftDetected |
Warning | Drift was detected in one or more fields |
DriftCorrected |
Normal | Drift was successfully corrected |
DriftCorrectionFailed |
Warning | Attempted to correct drift but failed |
View events with:
kubectl describe database production-db
# or
kubectl get events --field-selector involvedObject.name=production-db
Destructive Drift Corrections¶
Some drift corrections are potentially destructive (e.g., changing encoding requires recreating the database). By default, the operator skips destructive corrections even in correct mode.
Allowing Destructive Corrections¶
To allow destructive corrections, add an annotation:
apiVersion: dbops.dbprovision.io/v1alpha1
kind: Database
metadata:
name: dev-db
annotations:
dbops.dbprovision.io/allow-destructive-drift: "true"
spec:
instanceRef:
name: postgres-primary
driftPolicy:
mode: correct
Use with Caution
Destructive corrections may cause data loss. Only enable this for:
- Development/test environments
- Resources where data loss is acceptable
- After verifying the current state and having backups
Destructive vs Non-Destructive Changes¶
Operator Drift Corrections (automatic)¶
These are checked by the operator's reconciliation loop and require the allow-destructive-drift annotation for destructive corrections:
| Resource | Non-Destructive | Destructive |
|---|---|---|
| Database | Owner, connection limit | Encoding, collation, template |
| User | Password, connection limit, roles | Username |
| Role | Privileges, role membership | Role name |
| Grant | Adding privileges | Revoking privileges |
Migration Tool Corrections (dbctl migrate)¶
The dbctl migrate reverse-privileges command performs ownership integrity checks that go beyond what the operator checks during reconciliation. These run only when explicitly invoked:
| Check | Classification | Details |
|---|---|---|
| Owner role existence | Non-destructive | Creates missing NOLOGIN INHERIT role |
| App user existence | Non-destructive | Creates missing LOGIN INHERIT role |
| Role membership | Non-destructive | Grants app user membership in owner role |
| Database owner | Destructive | Transfers ownership via ALTER DATABASE ... OWNER TO ... |
| Forward default privileges | Non-destructive | Idempotent ALTER DEFAULT PRIVILEGES |
| Reverse default privileges | Non-destructive | Idempotent ALTER DEFAULT PRIVILEGES |
No annotation required for migration tool
Unlike the operator's automatic drift correction, the migration command does not require the allow-destructive-drift annotation. Running dbctl migrate is an explicit operator action — the user has already opted in by invoking the command. Use --dry-run to preview destructive changes before applying.
Immutable Fields¶
Some fields cannot be changed after resource creation:
| Resource | Immutable Fields |
|---|---|
| Database | Template (PostgreSQL) |
| User | Username (in most cases) |
| DatabaseInstance | Engine type |
When drift is detected in immutable fields, the operator reports it but cannot correct it automatically.
Resource Discovery¶
The operator can discover database resources that exist but are not managed by Kubernetes CRs.
Enabling Discovery¶
apiVersion: dbops.dbprovision.io/v1alpha1
kind: DatabaseInstance
metadata:
name: postgres-primary
spec:
engine: postgres
discovery:
enabled: true
interval: "30m"
Discovered Resources Status¶
status:
discoveredResources:
lastScan: "2024-01-15T10:00:00Z"
databases:
- name: legacy_app
discovered: "2024-01-15T10:00:00Z"
adopted: false
- name: temp_data
discovered: "2024-01-15T10:00:00Z"
adopted: false
users:
- name: old_admin
discovered: "2024-01-15T10:00:00Z"
adopted: false
Adopting Discovered Resources¶
To bring a discovered resource under management, use adoption annotations:
apiVersion: dbops.dbprovision.io/v1alpha1
kind: DatabaseInstance
metadata:
name: postgres-primary
annotations:
dbops.dbprovision.io/adopt-databases: "legacy_app,temp_data"
dbops.dbprovision.io/adopt-users: "old_admin"
When adopted:
- The operator creates a CR for the discovered resource
- The CR is labeled with
dbops.dbprovision.io/adopted: "true" - Future changes are managed through the CR
Metrics¶
The operator exposes drift-related metrics:
| Metric | Description |
|---|---|
dbprovision_drift_detected |
Gauge: 1 if drift detected, 0 otherwise |
dbprovision_drift_corrections_total |
Counter of drift corrections |
Example Prometheus query to alert on drift:
Best Practices¶
Development Environments¶
Use correct mode to keep environments consistent:
Production Environments¶
Use detect mode with alerts:
Set up alerts for DriftDetected events:
# Example Prometheus alert rule
- alert: DatabaseDriftDetected
expr: dbprovision_drift_detected == 1
for: 5m
labels:
severity: warning
annotations:
summary: "Configuration drift detected"
description: "{{ $labels.name }} has configuration drift"
Audit Trails¶
For compliance, combine with event logging:
# Export events to external logging system
kubectl get events -o json | jq '.items[] | select(.reason | contains("Drift"))'
Supported Resources¶
| Resource | Drift Detection | Auto-Correction |
|---|---|---|
| Database | Yes | Yes |
| DatabaseUser | Yes | Yes |
| DatabaseRole | Yes | Yes |
| DatabaseGrant | Yes | Yes |
| DatabaseBackup | No | No |
| DatabaseRestore | No | No |
| DatabaseBackupSchedule | No | No |
Example: Full Drift Detection Setup¶
---
# Instance with detect-only default
apiVersion: dbops.dbprovision.io/v1alpha1
kind: DatabaseInstance
metadata:
name: postgres-primary
spec:
engine: postgres
connection:
host: postgres.database.svc.cluster.local
port: 5432
secretRef:
name: postgres-credentials
driftPolicy:
mode: detect
interval: "5m"
discovery:
enabled: true
interval: "30m"
---
# Production database: detect only, alert on drift
apiVersion: dbops.dbprovision.io/v1alpha1
kind: Database
metadata:
name: production-db
spec:
instanceRef:
name: postgres-primary
name: production
# Uses instance default (detect mode)
---
# Dev database: auto-correct drift
apiVersion: dbops.dbprovision.io/v1alpha1
kind: Database
metadata:
name: dev-db
annotations:
dbops.dbprovision.io/allow-destructive-drift: "true"
spec:
instanceRef:
name: postgres-primary
name: development
driftPolicy:
mode: correct
interval: "1m"
Troubleshooting¶
Drift Not Detected¶
- Check drift mode is not
ignore - Verify the interval has passed since last check
- Check controller logs for errors
kubectl logs -l app.kubernetes.io/name=db-provision-operator -n db-provision-operator-system | grep -i drift
Drift Correction Failing¶
- Check if the field is marked as
immutable - Check if destructive corrections need the annotation
- Verify database permissions
Ownership Drift (Roles, Membership, Default Privileges)¶
The operator's drift detection checks database-level properties (owner, extensions, schemas) but does not re-verify role existence, membership, or default privileges after initial creation. If these are broken (e.g., after manual role deletion), use the dbctl migrate reverse-privileges command to diagnose and repair:
# Diagnose
dbctl migrate reverse-privileges my-database -n my-namespace --dry-run
# Repair
dbctl migrate reverse-privileges my-database -n my-namespace
See Migrations for full documentation.
Drift Detected but Expected¶
For known differences that should be ignored, consider:
- Updating the CR spec to match actual state
- Setting
mode: ignorefor that resource - Using the skip-reconcile annotation temporarily