Hikube Global Troubleshooting
This guide covers the most common issues encountered on Hikube and their solutions.
1. General diagnosisβ
Before looking for a specific solution, start with these diagnostic commands:
# Resource status in your namespace
kubectl get all
# Recent events (sorted by date)
kubectl get events --sort-by=.metadata.creationTimestamp
# Detailed description of a resource
kubectl describe <type> <name>
# Pod logs
kubectl logs <pod-name>
# Previous container logs (in case of crash)
kubectl logs <pod-name> --previous
2. Pods in errorβ
CrashLoopBackOffβ
Symptom: The pod restarts in a loop, the status shows CrashLoopBackOff.
Diagnosis:
kubectl describe pod <pod-name>
kubectl logs <pod-name> --previous
Solutions:
- Insufficient memory: increase
resources.memoryor use a higherresourcesPreset - Configuration error: check environment variables and configuration files in the logs
- Missing dependency: verify that required services (database, secrets) are available
Pendingβ
Symptom: The pod remains in Pending state without starting.
Diagnosis:
kubectl describe pod <pod-name>
# Look for the "Events" section at the bottom of the output
Solutions:
- Insufficient resources: the cluster does not have enough CPU/memory. Check available nodes with
kubectl get nodesandkubectl top nodes - Unbound PVC: the requested persistent volume is not available (see Storage section)
- NodeSelector/Affinity: the pod has placement constraints that do not match any node
ImagePullBackOffβ
Symptom: The pod does not start, the status shows ImagePullBackOff or ErrImagePull.
Diagnosis:
kubectl describe pod <pod-name>
# Look for "Failed to pull image" in the events
Solutions:
- Image not found: check the image name and tag in your manifest
- Private registry: make sure an
imagePullSecretis configured - Network issue: check connectivity to the registry
OOMKilledβ
Symptom: The pod is killed with exit code 137 and reason OOMKilled.
Diagnosis:
kubectl describe pod <pod-name>
# Look for "Last State: Terminated - Reason: OOMKilled"
Solutions:
- Increase the memory limit in
resources.memoryor switch to a higherresourcesPreset - Check if the application has a memory leak by monitoring consumption with
kubectl top pod
3. Cluster accessβ
Invalid kubeconfigβ
Symptom: error: You must be logged in to the server (Unauthorized)
Diagnosis:
# Check the kubeconfig file being used
echo $KUBECONFIG
kubectl config current-context
Solutions:
- Regenerate the kubeconfig from your Hikube cluster:
kubectl get secret <cluster-name>-admin-kubeconfig \
-o go-template='{{ printf "%s\n" (index .data "super-admin.conf" | base64decode) }}' \
> my-cluster-kubeconfig.yaml
export KUBECONFIG=my-cluster-kubeconfig.yaml - Verify that the
KUBECONFIGvariable points to the correct file
Expired certificateβ
Symptom: Unable to connect to the server: x509: certificate has expired
Diagnosis:
kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d | openssl x509 -text -noout | grep -A2 Validity
Solution: Retrieve an up-to-date kubeconfig from the cluster Secret (see above).
Connection refusedβ
Symptom: The connection to the server was refused
Diagnosis:
# Test connectivity
kubectl cluster-info
Solutions:
- Check that the cluster is in
Readystate:kubectl get kubernetes <cluster-name> - Verify that the control plane is accessible from your network
- If you are using a VPN, make sure it is active
4. Storageβ
PVC in Pending stateβ
Symptom: The PVC remains in Pending and dependent pods do not start.
Diagnosis:
kubectl get pvc
kubectl describe pvc <pvc-name>
Solutions:
- Invalid StorageClass: check that the specified
storageClassexists withkubectl get storageclass - Insufficient capacity: reduce the requested size or contact support to increase quotas
- Empty StorageClass: if
storageClass: "", the default class is used. TrystorageClass: replicatedexplicitly
Insufficient disk spaceβ
Symptom: Pods crash with errors such as No space left on device.
Diagnosis:
# Check PVC usage
kubectl exec -it <pod-name> -- df -h
Solutions:
- Increase the
sizevalue in the manifest and reapply - Delete unnecessary data (logs, temporary files)
5. Networkβ
Service not accessibleβ
Symptom: Unable to connect to the service from outside or between pods.
Diagnosis:
# Check that the service exists and has an endpoint
kubectl get svc
kubectl get endpoints <service-name>
# Test connectivity from a pod
kubectl run test-net --image=busybox --rm -it -- wget -qO- http://<service-name>:<port>
Solutions:
- No endpoint: the service
selectorlabels do not match any pod - External not enabled: add
external: truein the manifest to create a LoadBalancer - Incorrect port: check that the service port matches the port exposed by the application
DNS not resolvedβ
Symptom: Could not resolve host when accessing a service by its name.
Diagnosis:
# Check cluster DNS
kubectl run test-dns --image=busybox --rm -it -- nslookup <service-name>
# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
Solutions:
- Use the full DNS name:
<service>.<namespace>.svc.cluster.local - Check that the CoreDNS pods are in
Runningstate
Ingress returns 404 or 502β
Symptom: The Ingress URL returns a 404 (Not Found) or 502 (Bad Gateway) error.
Diagnosis:
kubectl describe ingress <ingress-name>
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller
Solutions:
- 404: check that the Ingress
pathandhostmatch your configuration - 502: the backend service is not responding. Check that the backend pods are in
Runningstate and that the port is correct - Missing IngressClass: add
ingressClassName: nginxin the Ingress spec
6. Databasesβ
Connection refusedβ
Symptom: Connection refused when attempting to connect to the database.
Diagnosis:
# Check the database pod status
kubectl get pods | grep <database-name>
# Check services
kubectl get svc | grep <database-name>
Solutions:
- Check that the database pods are in
Runningstate - Check the credentials:
kubectl get secret <name>-auth -o json | jq -r '.data | to_entries[] | "\(.key): \(.value|@base64d)"' - If
external: false, usekubectl port-forwardto connect locally
Replication lagβ
Symptom: Replicas have significant replication lag compared to the master.
Diagnosis:
# Redis - Check replication
kubectl exec -it rfr-redis-<name>-0 -- redis-cli -a "$REDIS_PASSWORD" INFO replication
# PostgreSQL - Check lag
kubectl exec -it <name>-1 -- psql -c "SELECT * FROM pg_stat_replication;"
Solutions:
- Increase the resources (CPU/memory) for the replicas
- Check the network load between datacenters
- Reduce write load if the lag persists
Failover not triggeredβ
Symptom: The master is down but no replica is promoted.
Diagnosis:
# Redis - Check Sentinel
kubectl exec -it rfs-redis-<name>-<id> -- redis-cli -p 26379 SENTINEL masters
# Check events
kubectl get events --sort-by=.metadata.creationTimestamp | grep <database-name>
Solutions:
- Check that
replicas > 1in the manifest (failover requires at least one replica) - Verify that Sentinel pods (Redis) or the operator are in
Runningstate - Check the operator logs for errors
7. Messaging (NATS, RabbitMQ)β
Producer/consumer disconnectedβ
Symptom: Clients lose connection to the message broker.
Diagnosis:
# Check broker pod status
kubectl get pods | grep <nats|rabbitmq>
# Check logs
kubectl logs <broker-pod-name>
Solutions:
- Check that the broker pods are in
Runningstate - Implement automatic reconnection logic on the client side
- Check the configured connection limits
Lost messagesβ
Symptom: Sent messages are never received by consumers.
Diagnosis:
# RabbitMQ - Check queues
kubectl exec -it <rabbitmq-pod> -- rabbitmqctl list_queues name messages consumers
# NATS - Check JetStream streams
kubectl exec -it <nats-pod> -- nats stream ls
Solutions:
- RabbitMQ: use Quorum Queues to ensure message durability
- NATS: enable JetStream for message persistence
- Verify that consumers are connected and active
- Make sure queues/subjects exist before sending messages