Troubleshooting — PostgreSQL
PostgreSQL pod stuck in Pending state
Cause: the PersistentVolumeClaim (PVC) cannot bind to a volume. This can be due to a non-existent storageClass, exceeded storage quota, or insufficient resources on the nodes.
Solution:
- Check the pod status and associated events:
kubectl describe pod pg-<name>-1 - Check the PVC status:
kubectl get pvc
kubectl describe pvc pg-<name>-1 - Make sure the
storageClassused is one of the available classes:local,replicated, orreplicated-async. - Check that your storage quota has not been reached.
- If needed, fix the
storageClassin your manifest and reapply:kubectl apply -f postgresql.yaml
Replication desynchronized between primary and standby
Cause: replication lag can occur due to high network load, insufficient resources on standby nodes, or a high volume of transactions on the primary.
Solution:
- Connect to the primary and check the replication status:
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication; - Compare LSN positions between
sent_lsnandreplay_lsn. A large gap indicates lag. - Check the resources allocated to standby nodes (CPU, memory). If needed, increase the
resourcesPresetor explicitresources. - Check network connectivity between pods:
kubectl logs pg-<name>-2 - If lag persists, consider reducing the write load on the primary or increasing resources.
Connection refused to PostgreSQL
Cause: pods are not running, the Secret name is incorrect, or the service is not accessible.
Solution:
- Check that PostgreSQL pods are in
Runningstate:kubectl get pods -l app=pg-<name> - Check that the service exists and points to the correct endpoints:
kubectl get svc pg-<name>-rw
kubectl get endpoints pg-<name>-rw - Make sure you are using the correct Secret name for credentials. The pattern is
pg-<name>-app:kubectl get tenantsecret pg-<name>-app - Test the connection from a pod in the same namespace:
kubectl run test-pg --rm -it --image=postgres:16 -- psql -h pg-<name>-rw -p 5432 -U <user>
PITR restore failed
Cause: bootstrap parameters are misconfigured. The bootstrap.oldName field must match exactly the name of the original instance, and the new instance name must be different.
Solution:
- Verify that
bootstrap.oldNamematches exactly the name of the original PostgreSQL instance:postgresql-restore.yamlapiVersion: apps.cozystack.io/v1alpha1
kind: Postgres
metadata:
name: restored-db # Must be a new name
spec:
bootstrap:
enabled: true
oldName: "original-db" # Exact name of the old instance
recoveryTime: "2025-06-15T14:30:00Z" # RFC 3339 format - The
recoveryTimemust be in RFC 3339 format (e.g.,2025-06-15T14:30:00Z). If left empty, the restore will use the latest available state. - The name in
metadata.namemust be different frombootstrap.oldName. - Make sure the backups from the original instance are still accessible in the S3 storage.
Slow performance
Cause: PostgreSQL parameters are not tuned for the workload, or allocated resources are insufficient.
Solution:
- Adjust PostgreSQL parameters in your manifest:
postgresql.yaml
spec:
postgresql:
parameters:
shared_buffers: 512MB # ~25% of allocated RAM
work_mem: 64MB # Memory per sort operation
max_connections: 200 # Adjust based on load
effective_cache_size: 1536MB # ~75% of RAM - Check that the
resourcesPresetis suitable for your workload:- Development:
nanoormicro - Production:
medium,largeor higher
- Development:
- Monitor resource usage:
kubectl top pod pg-<name>-1 - If queries are slow, identify them with
pg_stat_statementsand optimize indexes. - Increase resources if needed by switching to a higher preset or defining explicit
resources.