Version: 3.0.0-alpha (Diátaxis)

Troubleshooting — RabbitMQ

Queue blocked (flow control)

Cause: RabbitMQ has triggered a memory alarm or disk alarm, blocking publications to protect the system. This occurs when memory consumption exceeds the threshold (high watermark) or when disk space is insufficient.

Solution:

Check cluster status and active alarms:

kubectl exec <rabbitmq-pod> -- rabbitmqctl status | grep -A 10 "alarms"

Identify the resource causing the issue (memory or disk):

kubectl exec <rabbitmq-pod> -- rabbitmqctl status | grep -E "mem_|disk_"

Increase allocated resources in your manifest:

rabbitmq.yaml
replicas: 3
resources:
  cpu: 1
  memory: 2Gi
size: 20Gi

Purge unused queues if necessary:

kubectl exec <rabbitmq-pod> -- rabbitmqctl purge_queue <queue-name>

RabbitMQ node not joining the cluster

Cause: a RabbitMQ node cannot join the cluster, often due to DNS resolution issues, Erlang cookie inconsistency, or restrictive network policies.

Solution:

Check cluster status from a working node:

kubectl exec <rabbitmq-pod> -- rabbitmqctl cluster_status

Check the logs of the failing pod:
```
kubectl logs <problematic-rabbitmq-pod>
```

Verify DNS resolution works between pods:

kubectl exec <rabbitmq-pod> -- nslookup <problematic-rabbitmq-pod>.<headless-service>

If the problem persists, delete the failing pod to force its recreation:
```
kubectl delete pod <problematic-rabbitmq-pod>
```

Messages not routed (misconfigured exchange)

Cause: published messages are not reaching queues, usually because of a wrong exchange type, incorrect routing key, or missing binding between the exchange and the queue.

Solution:

List existing bindings to identify configured routes:

kubectl exec <rabbitmq-pod> -- rabbitmqctl list_bindings -p <vhost>

Check the exchange type and expected routing key:

kubectl exec <rabbitmq-pod> -- rabbitmqctl list_exchanges -p <vhost>

Configure a dead letter exchange to capture unrouted messages and facilitate diagnosis:

kubectl exec <rabbitmq-pod> -- rabbitmqctl set_policy DLX ".*" '{"dead-letter-exchange":"dlx"}' -p <vhost>

Verify that the producer uses the correct exchange and routing key in its configuration

Memory saturated (memory alarm)

Cause: RabbitMQ has reached the memory threshold (high watermark, 40% of available memory by default). All publications are blocked until memory drops below the threshold.

Solution:

Check memory consumption:

kubectl exec <rabbitmq-pod> -- rabbitmqctl status | grep "mem_used"

Identify the largest queues:

kubectl exec <rabbitmq-pod> -- rabbitmqctl list_queues name messages memory -p <vhost> --formatter table

Increase memory allocated to RabbitMQ:
rabbitmq.yaml
```
resources:
  cpu: 1
  memory: 4Gi
```
Purge unused queues or queues containing a large number of unconsumed messages

AMQP connection refused

Cause: the client cannot connect to the RabbitMQ broker. This can be due to incorrect credentials, missing vhost permissions, or a network accessibility issue.

Solution:

Check connection credentials in the Kubernetes Secret:

kubectl get tenantsecret <rabbitmq-name>-credentials -o jsonpath='{.data}' | base64 -d

Verify the user has the necessary permissions on the vhost:

kubectl exec <rabbitmq-pod> -- rabbitmqctl list_permissions -p <vhost>

Test connectivity to the AMQP port (5672):

kubectl exec <rabbitmq-pod> -- rabbitmq-diagnostics check_port_connectivity

If connecting from outside the cluster, make sure external: true is configured in your manifest

Queue blocked (flow control)​

RabbitMQ node not joining the cluster​

Messages not routed (misconfigured exchange)​

Memory saturated (memory alarm)​

AMQP connection refused​

Queue blocked (flow control)

RabbitMQ node not joining the cluster

Messages not routed (misconfigured exchange)

Memory saturated (memory alarm)

AMQP connection refused