Troubleshooting — Redis
Data loss after restart
Cause: the storageClass used is local, which means data is stored only on the physical node where the pod was running. If the pod is rescheduled on another node, the previous data is lost.
Solution:
- Check the
storageClassbeing used:kubectl get pvc -l app=redis-<name> - If you are using a single replica (
replicas= 1), switch tostorageClass: replicatedso that storage compensates for the lack of application replication. If you have multiple replicas (replicas>= 3),storageClass: localis appropriate since Redis Sentinel already ensures high availability:redis.yamlspec:
storageClass: replicated # If replicas = 1
# storageClass: local # If replicas >= 3 (Sentinel ensures HA) - Apply the change. Note that changing
storageClasstypically requires recreating the PVCs. - Also ensure that
replicas>= 3 to benefit from Redis Sentinel replication.
Redis Sentinel not converging
Cause: the number of replicas is even or less than 3, which prevents the Sentinel quorum from working correctly. Sentinel requires a majority to elect a new primary.
Solution:
- Check the number of replicas:
kubectl get pods -l app=redis-<name> - Make sure to use an odd number >= 3:
redis.yaml
spec:
replicas: 3 # Or 5, never 2 or 4 - Check the Sentinel logs to identify convergence issues:
kubectl logs -l app=rfs-redis-<name> - Verify network connectivity between Redis pods. DNS or network issues can prevent node discovery.
Memory saturated (OOMKilled)
Cause: the Redis dataset exceeds the memory allocated to the container. Kubernetes kills the pod when it exceeds its memory limit.
Solution:
- Check if the pod was killed due to OOM:
kubectl describe pod rfr-redis-<name>-0 | grep -i oom - Increase the allocated memory via
resources.memoryor a higherresourcesPreset:redis.yamlspec:
resources:
cpu: 1000m
memory: 2Gi # Increase memory - Check the Redis eviction policy (
maxmemory-policy). By default, Redis returns an error when memory is full. Consider usingallkeys-lruif Redis is used as a cache. - Monitor the dataset size:
redis-cli -h rfr-redis-<name> -p 6379 -a <password> INFO memory
Connection timeout
Cause: Redis pods are not running, service endpoints are empty, or the client-side authentication configuration does not match the server configuration.
Solution:
- Check that pods are in
Runningstate:kubectl get pods -l app=redis-<name> - Check that services have endpoints:
kubectl get endpoints rfr-redis-<name>
kubectl get endpoints rfs-redis-<name> - If
authEnabled: true, make sure your client provides the correct password. - Test the connection from a debug pod:
kubectl run test-redis --rm -it --image=redis:7 -- redis-cli -h rfr-redis-<name> -p 6379 -a <password> PING
Authentication fails
Cause: the password used does not match the one stored in the Kubernetes Secret, or authEnabled is not enabled on the server while the client sends a password (or vice versa).
Solution:
- Get the correct password from the Secret:
kubectl get tenantsecret redis-<name>-auth -o jsonpath='{.data.password}' | base64 -d - Verify that
authEnabled: trueis configured in your manifest:redis.yamlspec:
authEnabled: true - Make sure your client uses exactly the password retrieved in step 1.
- If you changed the
authEnabledconfiguration, existing clients must be updated to reflect the change.