Troubleshooting — RabbitMQ
Queue blocked (flow control)
Cause: RabbitMQ has triggered a memory alarm or disk alarm, blocking publications to protect the system. This occurs when memory consumption exceeds the threshold (high watermark) or when disk space is insufficient.
Solution:
- Check cluster status and active alarms:
kubectl exec <rabbitmq-pod> -- rabbitmqctl status | grep -A 10 "alarms" - Identify the resource causing the issue (memory or disk):
kubectl exec <rabbitmq-pod> -- rabbitmqctl status | grep -E "mem_|disk_" - Increase allocated resources in your manifest:
rabbitmq.yaml
replicas: 3
resources:
cpu: 1
memory: 2Gi
size: 20Gi - Purge unused queues if necessary:
kubectl exec <rabbitmq-pod> -- rabbitmqctl purge_queue <queue-name>
RabbitMQ node not joining the cluster
Cause: a RabbitMQ node cannot join the cluster, often due to DNS resolution issues, Erlang cookie inconsistency, or restrictive network policies.
Solution:
- Check cluster status from a working node:
kubectl exec <rabbitmq-pod> -- rabbitmqctl cluster_status - Check the logs of the failing pod:
kubectl logs <problematic-rabbitmq-pod> - Verify DNS resolution works between pods:
kubectl exec <rabbitmq-pod> -- nslookup <problematic-rabbitmq-pod>.<headless-service> - If the problem persists, delete the failing pod to force its recreation:
kubectl delete pod <problematic-rabbitmq-pod>
Messages not routed (misconfigured exchange)
Cause: published messages are not reaching queues, usually because of a wrong exchange type, incorrect routing key, or missing binding between the exchange and the queue.
Solution:
- List existing bindings to identify configured routes:
kubectl exec <rabbitmq-pod> -- rabbitmqctl list_bindings -p <vhost> - Check the exchange type and expected routing key:
kubectl exec <rabbitmq-pod> -- rabbitmqctl list_exchanges -p <vhost> - Configure a dead letter exchange to capture unrouted messages and facilitate diagnosis:
kubectl exec <rabbitmq-pod> -- rabbitmqctl set_policy DLX ".*" '{"dead-letter-exchange":"dlx"}' -p <vhost> - Verify that the producer uses the correct exchange and routing key in its configuration
Memory saturated (memory alarm)
Cause: RabbitMQ has reached the memory threshold (high watermark, 40% of available memory by default). All publications are blocked until memory drops below the threshold.
Solution:
- Check memory consumption:
kubectl exec <rabbitmq-pod> -- rabbitmqctl status | grep "mem_used" - Identify the largest queues:
kubectl exec <rabbitmq-pod> -- rabbitmqctl list_queues name messages memory -p <vhost> --formatter table - Increase memory allocated to RabbitMQ:
rabbitmq.yaml
resources:
cpu: 1
memory: 4Gi - Purge unused queues or queues containing a large number of unconsumed messages
AMQP connection refused
Cause: the client cannot connect to the RabbitMQ broker. This can be due to incorrect credentials, missing vhost permissions, or a network accessibility issue.
Solution:
- Check connection credentials in the Kubernetes Secret:
kubectl get tenantsecret <rabbitmq-name>-credentials -o jsonpath='{.data}' | base64 -d - Verify the user has the necessary permissions on the vhost:
kubectl exec <rabbitmq-pod> -- rabbitmqctl list_permissions -p <vhost> - Test connectivity to the AMQP port (5672):
kubectl exec <rabbitmq-pod> -- rabbitmq-diagnostics check_port_connectivity - If connecting from outside the cluster, make sure
external: trueis configured in your manifest