Troubleshooting Guide¶
This guide helps you diagnose and resolve common issues with the Vault Access Operator.
Table of Contents¶
- Common Issues
- VaultConnection Issues
- VaultPolicy Issues
- VaultRole Issues
- Webhook Issues
- Debugging Techniques
- Log Analysis
- Status Conditions Explained
Common Issues¶
VaultConnection Issues¶
Connection Stuck in "Pending" Phase¶
Symptoms:
kubectl get vaultconnection
NAME ADDRESS PHASE VERSION AGE
vault-primary https://vault.example.com:8200 Pending 5m
Possible Causes:
-
Operator not running: Check if the operator pod is running.
-
Invalid Vault address: Verify the address is correct and reachable.
-
Network connectivity: The operator cannot reach Vault.
Connection in "Error" Phase¶
Symptoms:
kubectl get vaultconnection
NAME ADDRESS PHASE VERSION AGE
vault-primary https://vault.example.com:8200 Error 5m
Check the error message:
Common error causes:
- TLS Certificate Issues
- Error:
x509: certificate signed by unknown authority -
Solution: Provide the correct CA certificate or set
skipVerify: true(not recommended for production). -
Authentication Failure
- Error:
permission deniedorinvalid role -
Solution: Verify the Vault role exists and the service account is bound correctly.
-
Vault Sealed
- Error:
Vault is sealed - Solution: Unseal the Vault server.
Connection Authentication Failures¶
Symptoms:
Debugging steps:
-
Verify Kubernetes auth is enabled:
-
Check the Vault role configuration:
-
Verify service account binding:
-
Test authentication manually:
VaultPolicy Issues¶
Policy Stuck in "Syncing" Phase¶
Symptoms:
kubectl get vaultpolicy -n my-app
NAME VAULT NAME PHASE RULES AGE
app-secrets my-app-app-secrets Syncing 3 5m
Possible causes:
-
VaultConnection not ready:
-
Insufficient Vault permissions:
Policy in "Conflict" Phase¶
Symptoms:
kubectl get vaultpolicy -n my-app
NAME VAULT NAME PHASE RULES AGE
app-secrets my-app-app-secrets Conflict 3 5m
Explanation: A policy with the same name already exists in Vault and is either: - Managed by a different Kubernetes resource - Not managed by the operator at all
Solutions:
-
Use Adopt conflict policy:
-
Delete the existing policy from Vault:
-
Rename your VaultPolicy resource:
Namespace Boundary Validation Error¶
Symptoms:
Error: validation failed: rule[0]: path "secret/data/*" must contain {{namespace}} when namespace boundary enforcement is enabled
Solution: Add the {{namespace}} variable to your paths, or disable namespace boundary enforcement:
Option 1: Add namespace variable (recommended for multi-tenant environments):
spec:
enforceNamespaceBoundary: true # Explicitly enabled (default is false)
rules:
- path: "secret/data/{{namespace}}/*" # Correct
capabilities: [read, list]
Option 2: Disable enforcement (default behavior):
spec:
enforceNamespaceBoundary: false # This is the default
rules:
- path: "secret/data/*"
capabilities: [read, list]
Wildcard Before Namespace Error¶
Symptoms:
Error: validation failed: rule[0]: path "secret/*/{{namespace}}/data" contains wildcard (*) before {{namespace}} which is a security risk
Explanation: Having a wildcard before {{namespace}} could allow access to secrets in other namespaces.
Solution: Restructure your path:
# Bad
- path: "secret/*/{{namespace}}/data"
# Good
- path: "secret/data/{{namespace}}/*"
- path: "kv/{{namespace}}/data/*"
VaultRole Issues¶
Role in "Error" Phase with PolicyNotFound¶
Symptoms:
kubectl describe vaultrole app-role -n my-app
# Status shows: policy VaultPolicy "missing-policy" not found in namespace "my-app"
Solutions:
-
Create the missing policy:
-
Fix the policy reference:
-
If referencing a policy in another namespace:
Service Account Not Authorized¶
Symptoms:
- Application pods cannot authenticate to Vault
- Error: role not found or service account not authorized
Debugging steps:
-
Check the VaultRole status:
-
Verify the service account exists:
-
Check the role in Vault:
-
Test authentication:
Webhook Issues¶
Webhook Certificate Errors¶
Symptoms:
Error creating: Internal error occurred: failed calling webhook:
x509: certificate signed by unknown authority
Solutions:
-
If using cert-manager, verify the certificate:
-
Check cert-manager is working:
-
Restart the operator to pick up new certificate:
Webhook Timeout¶
Symptoms:
Solutions:
-
Check operator health:
-
Increase webhook timeout:
-
Check network policies:
Debugging Techniques¶
Enable Debug Logging¶
Increase log verbosity for more detailed output:
Or set environment variable:
kubectl set env deployment/vault-access-operator-controller-manager \
-n vault-access-operator-system \
-- ZAP_LOG_LEVEL=debug
Check Resource Events¶
Inspect Resource Details¶
# Full resource with status
kubectl get vaultpolicy app-secrets -n my-app -o yaml
# Just the status
kubectl get vaultpolicy app-secrets -n my-app -o jsonpath='{.status}'
# Just conditions
kubectl get vaultpolicy app-secrets -n my-app \
-o jsonpath='{.status.conditions}' | jq .
Test Vault Connectivity from Operator¶
# Get a shell in the operator pod
kubectl exec -it deploy/vault-access-operator-controller-manager \
-n vault-access-operator-system -- /bin/sh
# Test Vault health
wget -qO- https://vault.example.com:8200/v1/sys/health
# Check environment
env | grep VAULT
Verify CRD Installation¶
# List CRDs
kubectl get crds | grep vault.platform.io
# Check CRD details
kubectl describe crd vaultpolicies.vault.platform.io
Log Analysis¶
Operator Log Locations¶
# Stream logs from the operator
kubectl logs -f -n vault-access-operator-system deploy/vault-access-operator-controller-manager
# Get logs from the last hour
kubectl logs -n vault-access-operator-system deploy/vault-access-operator-controller-manager \
--since=1h
# Get logs from a crashed pod
kubectl logs -n vault-access-operator-system deploy/vault-access-operator-controller-manager \
--previous
Common Log Patterns¶
Successful Reconciliation¶
{"level":"info","ts":"...","msg":"Reconciling VaultPolicy","namespace":"my-app","name":"app-secrets"}
{"level":"info","ts":"...","msg":"VaultPolicy reconciled successfully","namespace":"my-app","name":"app-secrets","vaultName":"my-app-app-secrets"}
Connection Error¶
{"level":"error","ts":"...","msg":"Failed to get Vault client","error":"connection \"vault-primary\" not ready: VaultConnection is in Pending phase"}
Conflict Detection¶
{"level":"error","ts":"...","msg":"Conflict detected","error":"conflict: policy \"my-app-app-secrets\": already managed by other-namespace/other-policy"}
Retry Scheduling¶
Filtering Logs¶
# Only errors
kubectl logs -n vault-access-operator-system deploy/vault-access-operator-controller-manager \
| grep '"level":"error"'
# Specific resource
kubectl logs -n vault-access-operator-system deploy/vault-access-operator-controller-manager \
| grep '"name":"app-secrets"'
# Using jq for JSON logs
kubectl logs -n vault-access-operator-system deploy/vault-access-operator-controller-manager \
| jq 'select(.level == "error")'
Status Conditions Explained¶
Condition Types¶
| Type | Description | Healthy State |
|---|---|---|
Ready |
Resource is fully reconciled and operational | True |
Synced |
Resource has been successfully synced to Vault | True |
ConnectionReady |
Referenced VaultConnection is available | True |
PoliciesResolved |
All referenced policies have been found and resolved | True |
DependencyReady |
All dependencies (connection, policies) are satisfied | True |
Drifted |
Vault resource differs from desired K8s state | False |
Deleting |
Resource is being deleted (finalizer in progress) | N/A |
Condition Reasons¶
| Reason | Description | Action |
|---|---|---|
Succeeded |
Operation completed successfully | None needed |
Failed |
Operation failed | Check message and logs |
InProgress |
Operation is ongoing | Wait for completion |
Conflict |
Conflict with existing Vault resource | Use Adopt policy or resolve manually |
ValidationFailed |
Resource spec validation failed | Fix spec according to error |
ConnectionNotReady |
VaultConnection is not active | Fix VaultConnection |
PolicyNotFound |
Referenced policy doesn't exist | Create the policy |
DependencyNotReady |
A dependency (connection, policy) is not ready | Check dependent resources |
DependencyReady |
All dependencies are satisfied | None needed |
DriftDetected |
Vault resource differs from desired state | Review drift, add allow-destructive annotation if correction is desired |
DriftCorrected |
Drift was auto-corrected | None needed |
NoDrift |
No drift detected | None needed |
DeletionBlocked |
Deletion cannot proceed (e.g., dependent resources exist) | Remove dependent resources first |
DeletionInProgress |
Resource is being deleted from Vault | Wait for completion |
ChildrenExist |
Dependent resources still reference this resource | Delete child resources first |
ObservedGenerationStale |
Controller has not yet processed the latest spec change | Wait for reconciliation |
PolicyNotInVault |
Referenced policy does not exist in Vault | Ensure policy CR is synced first |
ImmutableFieldChanged |
An immutable field was changed | Revert the change or recreate the resource |
Reading Conditions¶
# Get all conditions
kubectl get vaultpolicy app-secrets -n my-app \
-o jsonpath='{range .status.conditions[*]}{.type}: {.status} ({.reason}) - {.message}{"\n"}{end}'
# Example output:
# Ready: False (ConnectionNotReady) - connection "vault-primary" not ready: VaultConnection is in Error phase
# Synced: False (Failed) - failed to sync policy to Vault
Interpreting Multiple Conditions¶
Healthy resource:
Ready: True (Succeeded) - Policy synced to Vault
Synced: True (Succeeded) - Policy synced successfully
Connection issue:
Ready: False (ConnectionNotReady) - connection "vault-primary" not ready
Synced: False (Failed) - cannot sync without connection
Conflict detected:
Ready: False (Conflict) - policy already exists and is managed by other-ns/other-policy
Synced: False (Conflict) - cannot sync due to conflict
Getting Help¶
If you're still experiencing issues:
-
Check the GitHub Issues: github.com/panteparak/vault-access-operator/issues
-
Collect diagnostic information:
# Export all vault resources kubectl get vaultconnections,vaultclusterpolicies,vaultpolicies,vaultclusterroles,vaultroles \ -A -o yaml > vault-resources.yaml # Get operator logs kubectl logs -n vault-access-operator-system deploy/vault-access-operator-controller-manager \ --since=1h > operator-logs.txt # Get events kubectl get events -A --field-selector reason!=Normal > events.txt -
Open an issue with the diagnostic information (remove any sensitive data like tokens or secrets).