Version: 3.0.0-alpha (Diátaxis)

Hikube Global Troubleshooting

This guide covers the most common issues encountered on Hikube and their solutions.

1. General diagnosis

Before looking for a specific solution, start with these diagnostic commands:

# Resource status in your namespace
kubectl get all

# Recent events (sorted by date)
kubectl get events --sort-by=.metadata.creationTimestamp

# Detailed description of a resource
kubectl describe <type> <name>

# Pod logs
kubectl logs <pod-name>

# Previous container logs (in case of crash)
kubectl logs <pod-name> --previous

2. Pods in error

CrashLoopBackOff

Symptom: The pod restarts in a loop, the status shows CrashLoopBackOff.

Diagnosis:

kubectl describe pod <pod-name>
kubectl logs <pod-name> --previous

Solutions:

Insufficient memory: increase resources.memory or use a higher resourcesPreset
Configuration error: check environment variables and configuration files in the logs
Missing dependency: verify that required services (database, secrets) are available

Pending

Symptom: The pod remains in Pending state without starting.

Diagnosis:

kubectl describe pod <pod-name>
# Look for the "Events" section at the bottom of the output

Solutions:

Insufficient resources: the cluster does not have enough CPU/memory. Check available nodes with kubectl get nodes and kubectl top nodes
Unbound PVC: the requested persistent volume is not available (see Storage section)
NodeSelector/Affinity: the pod has placement constraints that do not match any node

ImagePullBackOff

Symptom: The pod does not start, the status shows ImagePullBackOff or ErrImagePull.

Diagnosis:

kubectl describe pod <pod-name>
# Look for "Failed to pull image" in the events

Solutions:

Image not found: check the image name and tag in your manifest
Private registry: make sure an imagePullSecret is configured
Network issue: check connectivity to the registry

OOMKilled

Symptom: The pod is killed with exit code 137 and reason OOMKilled.

Diagnosis:

kubectl describe pod <pod-name>
# Look for "Last State: Terminated - Reason: OOMKilled"

Solutions:

Increase the memory limit in resources.memory or switch to a higher resourcesPreset
Check if the application has a memory leak by monitoring consumption with kubectl top pod

3. Cluster access

Invalid kubeconfig

Symptom: error: You must be logged in to the server (Unauthorized)

Diagnosis:

# Check the kubeconfig file being used
echo $KUBECONFIG
kubectl config current-context

Solutions:

Regenerate the kubeconfig from your Hikube cluster:

kubectl get secret <cluster-name>-admin-kubeconfig \
  -o go-template='{{ printf "%s\n" (index .data "super-admin.conf" | base64decode) }}' \
  > my-cluster-kubeconfig.yaml
export KUBECONFIG=my-cluster-kubeconfig.yaml

Verify that the KUBECONFIG variable points to the correct file

Expired certificate

Symptom: Unable to connect to the server: x509: certificate has expired

Diagnosis:

kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d | openssl x509 -text -noout | grep -A2 Validity

Solution: Retrieve an up-to-date kubeconfig from the cluster Secret (see above).

Connection refused

Symptom: The connection to the server was refused

Diagnosis:

# Test connectivity
kubectl cluster-info

Solutions:

Check that the cluster is in Ready state: kubectl get kubernetes <cluster-name>
Verify that the control plane is accessible from your network
If you are using a VPN, make sure it is active

4. Storage

PVC in Pending state

Symptom: The PVC remains in Pending and dependent pods do not start.

Diagnosis:

kubectl get pvc
kubectl describe pvc <pvc-name>

Solutions:

Invalid StorageClass: check that the specified storageClass exists with kubectl get storageclass
Insufficient capacity: reduce the requested size or contact support to increase quotas
Empty StorageClass: if storageClass: "", the default class is used. Try storageClass: replicated explicitly

Insufficient disk space

Symptom: Pods crash with errors such as No space left on device.

Diagnosis:

# Check PVC usage
kubectl exec -it <pod-name> -- df -h

Solutions:

Increase the size value in the manifest and reapply
Delete unnecessary data (logs, temporary files)

5. Network

Service not accessible

Symptom: Unable to connect to the service from outside or between pods.

Diagnosis:

# Check that the service exists and has an endpoint
kubectl get svc
kubectl get endpoints <service-name>

# Test connectivity from a pod
kubectl run test-net --image=busybox --rm -it -- wget -qO- http://<service-name>:<port>

Solutions:

No endpoint: the service selector labels do not match any pod
External not enabled: add external: true in the manifest to create a LoadBalancer
Incorrect port: check that the service port matches the port exposed by the application

DNS not resolved

Symptom: Could not resolve host when accessing a service by its name.

Diagnosis:

# Check cluster DNS
kubectl run test-dns --image=busybox --rm -it -- nslookup <service-name>

# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

Solutions:

Use the full DNS name: <service>.<namespace>.svc.cluster.local
Check that the CoreDNS pods are in Running state

Ingress returns 404 or 502

Symptom: The Ingress URL returns a 404 (Not Found) or 502 (Bad Gateway) error.

Diagnosis:

kubectl describe ingress <ingress-name>
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller

Solutions:

404: check that the Ingress path and host match your configuration
502: the backend service is not responding. Check that the backend pods are in Running state and that the port is correct
Missing IngressClass: add ingressClassName: nginx in the Ingress spec

6. Databases

Connection refused

Symptom: Connection refused when attempting to connect to the database.

Diagnosis:

# Check the database pod status
kubectl get pods | grep <database-name>

# Check services
kubectl get svc | grep <database-name>

Solutions:

Check that the database pods are in Running state
Check the credentials: kubectl get secret <name>-auth -o json | jq -r '.data | to_entries[] | "\(.key): \(.value|@base64d)"'
If external: false, use kubectl port-forward to connect locally

Replication lag

Symptom: Replicas have significant replication lag compared to the master.

Diagnosis:

# Redis - Check replication
kubectl exec -it rfr-redis-<name>-0 -- redis-cli -a "$REDIS_PASSWORD" INFO replication

# PostgreSQL - Check lag
kubectl exec -it <name>-1 -- psql -c "SELECT * FROM pg_stat_replication;"

Solutions:

Increase the resources (CPU/memory) for the replicas
Check the network load between datacenters
Reduce write load if the lag persists

Failover not triggered

Symptom: The master is down but no replica is promoted.

Diagnosis:

# Redis - Check Sentinel
kubectl exec -it rfs-redis-<name>-<id> -- redis-cli -p 26379 SENTINEL masters

# Check events
kubectl get events --sort-by=.metadata.creationTimestamp | grep <database-name>

Solutions:

Check that replicas > 1 in the manifest (failover requires at least one replica)
Verify that Sentinel pods (Redis) or the operator are in Running state
Check the operator logs for errors

7. Messaging (NATS, RabbitMQ)

Producer/consumer disconnected

Symptom: Clients lose connection to the message broker.

Diagnosis:

# Check broker pod status
kubectl get pods | grep <nats|rabbitmq>

# Check logs
kubectl logs <broker-pod-name>

Solutions:

Check that the broker pods are in Running state
Implement automatic reconnection logic on the client side
Check the configured connection limits

Lost messages

Symptom: Sent messages are never received by consumers.

Diagnosis:

# RabbitMQ - Check queues
kubectl exec -it <rabbitmq-pod> -- rabbitmqctl list_queues name messages consumers

# NATS - Check JetStream streams
kubectl exec -it <nats-pod> -- nats stream ls

Solutions:

RabbitMQ: use Quorum Queues to ensure message durability
NATS: enable JetStream for message persistence
Verify that consumers are connected and active
Make sure queues/subjects exist before sending messages

1. General diagnosis​

2. Pods in error​

CrashLoopBackOff​

Pending​

ImagePullBackOff​

OOMKilled​

3. Cluster access​

Invalid kubeconfig​

Expired certificate​

Connection refused​

4. Storage​

PVC in Pending state​

Insufficient disk space​

5. Network​

Service not accessible​

DNS not resolved​

Ingress returns 404 or 502​

6. Databases​

Connection refused​

Replication lag​

Failover not triggered​

7. Messaging (NATS, RabbitMQ)​

Producer/consumer disconnected​

Lost messages​

1. General diagnosis

2. Pods in error

CrashLoopBackOff

Pending

ImagePullBackOff

OOMKilled

3. Cluster access

Invalid kubeconfig

Expired certificate

Connection refused

4. Storage

PVC in Pending state

Insufficient disk space

5. Network

Service not accessible

DNS not resolved

Ingress returns 404 or 502

6. Databases

Connection refused

Replication lag

Failover not triggered

7. Messaging (NATS, RabbitMQ)

Producer/consumer disconnected

Lost messages