Troubleshooting¶
Common issues and solutions for DB Provision Operator.
Diagnostic Commands¶
Quick Health Check¶
# Operator status
kubectl get pods -n db-provision-operator-system
# Resource status
kubectl get databaseinstances,databases,databaseusers -A
# Recent events
kubectl get events -n db-provision-operator-system --sort-by='.lastTimestamp' | tail -20
Detailed Debugging¶
# Operator logs
kubectl logs -n db-provision-operator-system deployment/db-provision-operator -f
# Specific resource details
kubectl describe databaseinstance postgres-primary
# Resource YAML
kubectl get databaseinstance postgres-primary -o yaml
Common Issues¶
DatabaseInstance Issues¶
Instance Stuck in Pending¶
Symptoms:
Causes and Solutions:
-
Cannot connect to database:
-
Invalid credentials:
-
DNS resolution:
Instance Shows Failed¶
Check events:
Common errors:
| Error | Solution |
|---|---|
connection refused |
Check database is running, correct port |
authentication failed |
Verify credentials in Secret |
SSL required |
Add sslMode: require to spec |
unknown host |
Check hostname and DNS |
Database Issues¶
Database Not Created¶
Check dependencies:
# Instance must be Ready
kubectl get databaseinstance postgres-primary -o jsonpath='{.status.phase}'
Check logs:
kubectl logs -n db-provision-operator-system deployment/db-provision-operator | grep "myapp-database"
Extension Installation Failed¶
Common causes:
-
Extension not available:
-
Insufficient permissions:
User Issues¶
User Not Created¶
Check instance status:
Verify username is valid: - PostgreSQL: alphanumeric and underscores - MySQL: max 32 characters
Password Not Generated¶
Check secret:
If missing:
1. Verify passwordSecret.generate: true in spec
2. Check passwordSecret.secretName is specified
3. Look for errors in operator logs
Cannot Connect with Generated Password¶
Verify credentials:
# Get password
kubectl get secret myapp-user-credentials -o jsonpath='{.data.password}' | base64 -d
# Test connection
kubectl run psql-test --rm -it --image=postgres:15 -- \
psql "postgresql://myapp_user:PASSWORD@postgres:5432/myapp"
Grant Issues¶
Grants Not Applied¶
Prerequisites: 1. User/Role must exist and be Ready 2. Database must exist 3. Tables/schemas must exist
Check user status:
Verify grants in database:
-- PostgreSQL
\dp tablename
-- or
SELECT * FROM information_schema.table_privileges WHERE grantee = 'myapp_user';
Backup Issues¶
Backup Stuck in Running¶
Check backup job:
Common causes: - Large database taking long time - Network issues to storage - Insufficient resources
Backup Failed¶
Check job logs:
Storage issues:
# Verify storage credentials
kubectl get secret s3-credentials -o yaml
# Test S3 connectivity
kubectl run aws-cli --rm -it --image=amazon/aws-cli -- s3 ls s3://my-bucket/
Schedule Not Triggering¶
Verify schedule:
Check timezone:
# Schedule uses UTC by default
kubectl get databasebackupschedule myapp-backup -o jsonpath='{.spec.timezone}'
Restore Issues¶
Restore Failed¶
Check restore job:
Common errors:
| Error | Solution |
|---|---|
database exists |
Set dropExisting: true or delete database |
backup not found |
Verify backup exists and path is correct |
permission denied |
Check storage credentials |
Drift Detection Issues¶
Drift Not Detected¶
Check drift mode:
kubectl get database myapp -o jsonpath='{.spec.driftPolicy.mode}'
# Must be "detect" or "correct", not "ignore"
Check last drift check time:
Verify interval has passed:
# Default is 5m, check configured interval
kubectl get database myapp -o jsonpath='{.spec.driftPolicy.interval}'
Check operator logs:
Drift Detected But Not Corrected¶
Check drift mode:
# Mode must be "correct", not "detect"
kubectl get database myapp -o jsonpath='{.spec.driftPolicy.mode}'
Check if field is immutable:
kubectl get database myapp -o jsonpath='{.status.drift.diffs}' | jq '.[] | select(.immutable==true)'
Check if correction is destructive:
kubectl get database myapp -o jsonpath='{.status.drift.diffs}' | jq '.[] | select(.destructive==true)'
Allow destructive corrections (use with caution):
Drift Correction Failed¶
Check events:
Common causes:
| Cause | Solution |
|---|---|
| Insufficient privileges | Check admin account permissions |
| Field is immutable | Cannot auto-correct, update spec to match |
| Database object locked | Check for active transactions or locks |
Drift Keeps Reappearing¶
Possible causes:
- External processes making changes:
-
Check for migration scripts, other operators, or manual changes
-
Application creating/modifying objects:
-
Review application database connection and permissions
-
Spec doesn't match intended state:
- Update CR spec to match desired configuration
Deletion Protection Issues¶
Cannot Delete Protected Resource¶
Check if deletion protection is enabled:
# Spec-based resources (Database, Instance, Grant, BackupSchedule)
kubectl get database myapp -o jsonpath='{.spec.deletionProtection}'
# Annotation-based resources (DatabaseUser, DatabaseRole)
kubectl get databaseuser myapp-user -o jsonpath='{.metadata.annotations.dbops\.dbprovision\.io/deletion-protection}'
Method 1: Disable protection:
# Spec-based (Database, Instance, Grant)
kubectl patch database myapp -p '{"spec":{"deletionProtection":false}}'
kubectl delete database myapp
# Annotation-based (User, Role)
kubectl annotate databaseuser myapp-user dbops.dbprovision.io/deletion-protection-
kubectl delete databaseuser myapp-user
Method 2: Force delete annotation:
kubectl annotate database myapp dbops.dbprovision.io/force-delete="true"
# Resource will be deleted on next reconciliation
# Note: if the resource has children, you must also confirm the cascade — see deletion-protection docs
Method 3: One-liner (spec-based only):
kubectl patch database myapp -p '{"spec":{"deletionProtection":false}}' && kubectl delete database myapp
Resource Stuck in Terminating¶
Check for DeletionBlocked events:
Check finalizers:
If operator is running, disable protection:
If operator is down (emergency only):
# WARNING: Bypasses cleanup logic, may leave orphaned database objects
kubectl patch database myapp -p '{"metadata":{"finalizers":null}}' --type=merge
Namespace Stuck Deleting¶
Find protected resources blocking deletion:
# Spec-protected resources (Database, Instance, Grant, BackupSchedule)
kubectl get databases,databaseinstances,databasegrants,databasebackupschedules -n my-namespace \
-o jsonpath='{range .items[?(@.spec.deletionProtection==true)]}{.kind}/{.metadata.name}{"\n"}{end}'
# Annotation-protected resources (User, Role)
kubectl get databaseusers,databaseroles -n my-namespace -o json | \
jq -r '.items[] | select(.metadata.annotations["dbops.dbprovision.io/deletion-protection"]=="true") | "\(.kind)/\(.metadata.name)"'
Also check for dependency blocks (resources with children that block deletion even without protection):
kubectl get databases,databaseusers,databaseroles,databaseinstances -n my-namespace \
-o jsonpath='{range .items[?(@.status.conditions[?(@.reason=="DependenciesExist")])]}{.kind}/{.metadata.name}{"\n"}{end}'
Force delete all protected resources:
for kind in database databaseuser databaserole databasegrant databasebackupschedule; do
for name in $(kubectl get $kind -n my-namespace -o name 2>/dev/null); do
kubectl annotate $name -n my-namespace dbops.dbprovision.io/force-delete="true" --overwrite
done
done
# Wait for cascade confirmation hashes to appear, then confirm them
sleep 5
for kind in databaseinstance database databaseuser databaserole; do
for name in $(kubectl get $kind -n my-namespace -o name 2>/dev/null); do
HASH=$(kubectl get $name -n my-namespace -o jsonpath='{.status.deletionConfirmation.hash}' 2>/dev/null)
if [ -n "$HASH" ]; then
kubectl annotate $name -n my-namespace dbops.dbprovision.io/confirm-force-delete="$HASH" --overwrite
fi
done
done
CockroachDB Issues¶
Connection Failed in Insecure Mode¶
Error: password authentication failed
Cause: CockroachDB insecure mode doesn't support passwords.
Solution: Use empty password in secret:
apiVersion: v1
kind: Secret
metadata:
name: cockroach-admin-credentials
type: Opaque
stringData:
username: dbprovision_admin
password: "" # Must be empty
Verify insecure mode:
kubectl exec -it cockroachdb-0 -- cockroach sql --insecure -e "SHOW CLUSTER SETTING server.host_based_authentication.enabled;"
Cannot Create User with Password¶
Error: password is not supported in insecure mode
Cause: Attempting to create user with password on insecure cluster.
Solution: The operator automatically handles this. Users are created without passwords in insecure mode.
Permission Denied for Grants¶
Error: must be admin or owner
Cause: Admin user doesn't have admin role membership.
Solution:
Cannot Create Database¶
Error: user does not have CREATEDB privilege
Solution:
TLS Certificate Errors¶
Error: certificate signed by unknown authority
Check TLS configuration:
Verify certificates:
# Check secret contains required keys
kubectl get secret cockroach-tls -o jsonpath='{.data}' | jq -r 'keys'
# Should include: ca.crt, tls.crt, tls.key
Test connection manually:
kubectl run cockroach-test --rm -it --image=cockroachdb/cockroach:v24.1.0 -- \
sql --url="postgresql://user:pass@cockroachdb:26257/defaultdb?sslmode=verify-full&sslrootcert=/certs/ca.crt"
Backup Failed¶
Error: BACKUP requires admin role
Solution:
Check backup job in CockroachDB:
Operator Issues¶
Operator Not Starting¶
Check pod status:
Common issues:
-
Image pull error:
-
RBAC issues:
High Memory Usage¶
Check resource usage:
Solutions: 1. Increase memory limits 2. Reduce concurrent reconciles 3. Check for resource leaks in logs
Reconciliation Loops¶
Symptoms: Constant reconciliation, high CPU
Debug:
# Watch reconciliation
kubectl logs -n db-provision-operator-system deployment/db-provision-operator -f | grep "Reconciling"
Causes: - Status updates triggering reconciles - External changes to managed resources - Conflicting controllers
Recovery Procedures¶
Force Delete Stuck Resource¶
# Remove finalizers (use with caution!)
kubectl patch database myapp-database -p '{"metadata":{"finalizers":null}}' --type=merge
kubectl delete database myapp-database
Reset Resource State¶
Operator Recovery¶
# Restart operator
kubectl rollout restart deployment/db-provision-operator -n db-provision-operator-system
# Watch rollout
kubectl rollout status deployment/db-provision-operator -n db-provision-operator-system
Debug Mode¶
Enable Debug Logging¶
Or patch deployment:
Trace Specific Resource¶
kubectl logs -n db-provision-operator-system deployment/db-provision-operator | \
grep "myapp-database"
Getting Help¶
Collect Debug Information¶
#!/bin/bash
# debug-bundle.sh
echo "=== Operator Pods ===" > debug.txt
kubectl get pods -n db-provision-operator-system -o wide >> debug.txt
echo "=== Operator Logs ===" >> debug.txt
kubectl logs -n db-provision-operator-system deployment/db-provision-operator --tail=500 >> debug.txt
echo "=== All Resources ===" >> debug.txt
kubectl get databaseinstances,databases,databaseusers,databaseroles,databasegrants,databasebackups -A >> debug.txt
echo "=== Events ===" >> debug.txt
kubectl get events -n db-provision-operator-system >> debug.txt
echo "Debug bundle saved to debug.txt"
Report Issues¶
Open an issue at: https://github.com/panteparak/db-provision-operator/issues
Include: 1. Operator version 2. Kubernetes version 3. Database engine and version 4. Resource YAML (redact secrets) 5. Operator logs 6. Steps to reproduce