Troubleshooting — NATS
Lost messages (no JetStream)
Cause: JetStream is not enabled or no stream is configured to capture messages. Without JetStream, NATS operates in fire-and-forget mode: messages are only delivered to subscribers connected at the time of publication.
Solution:
- Verify that JetStream is enabled in your manifest:
nats.yaml
jetstream:
enabled: true
size: 10Gi - Reapply the manifest if necessary:
kubectl apply -f nats.yaml - Create a stream to capture messages from the desired subjects:
nats stream add --subjects "orders.>" --storage file --replicas 3 --retention limits orders-stream - Verify the stream is created and capturing messages:
nats stream info orders-stream
Consumer not receiving messages
Cause: the consumer is subscribed to a subject that does not match the one used by the producer. Common errors include typos in the subject name, incorrect wildcard usage, or wrong queue group configuration.
Solution:
- Verify the exact subject used by the producer and consumer — subjects are case-sensitive
- Test reception with a diagnostic subscription:
This lets you see all messages published on the server
nats sub ">" - Check the wildcards used:
orders.*does not matchorders.new.urgent(useorders.>for sub-levels)
- If using queue groups, verify the consumer is a member of the expected group and that the group name is identical
JetStream storage full
Cause: the JetStream volume has reached its maximum capacity (jetstream.size). New messages can no longer be persisted and publications fail.
Solution:
- Check JetStream storage usage:
nats account info - Identify the largest streams:
nats stream list - Purge old messages from streams that allow it:
nats stream purge <stream-name> - Check stream retention policy — use
limitswithmax-ageto automatically delete old messages:nats stream edit <stream-name> --max-age 72h - If needed, increase
jetstream.sizein your manifest:nats.yamljetstream:
enabled: true
size: 50Gi
Insufficient memory
Cause: the NATS server consumes more memory than the allocated limit, often due to a high number of connections, large messages (max_payload too high), or in-memory JetStream streams.
Solution:
- Check pod events to confirm an OOMKill:
kubectl describe pod <nats-pod> | grep -A 5 "Last State" - Increase resources allocated to NATS:
nats.yaml
replicas: 3
resources:
cpu: 1
memory: 2Gi - Check the
max_payloadvalue inconfig.merge— reduce it if very large messages are not needed - Reapply the manifest:
kubectl apply -f nats.yaml
Connection refused
Cause: the client cannot connect to the NATS server. This can be due to pods not running, incorrect credentials, or attempting an external connection without external: true.
Solution:
- Verify NATS pods are in
Runningstate:kubectl get pods -l app.kubernetes.io/component=nats - Check pod logs for errors:
kubectl logs <nats-pod> - Verify user credentials in the Kubernetes Secret:
kubectl get tenantsecret <nats-name>-credentials -o jsonpath='{.data}' | base64 -d - If connecting from outside the cluster, make sure
external: trueis configured:nats.yamlexternal: true - Test connectivity from a pod within the cluster:
kubectl exec <nats-pod> -- nats-server --help 2>&1 | head -1