Skip to main content
Version: 2.0.2

API Reference - GPU

This reference details the APIs for using GPUs on Hikube, whether with virtual machines or Kubernetes clusters.


πŸ–₯️ GPU with Virtual Machines​

VirtualMachine API​

apiVersion: apps.cozystack.io/v1alpha1
kind: VirtualMachine
metadata:
name: vm-gpu
spec:
running: true
instanceProfile: ubuntu
instanceType: u1.xlarge
gpus:
- name: "nvidia.com/AD102GL_L40S"

GPU Parameters for VM​

ParameterTypeDescriptionRequired
gpus[]GPUList of GPUs to attachβœ…
gpus[].namestringNVIDIA GPU typeβœ…

Available GPU Types​

# GPU for inference and development
gpus:
- name: "nvidia.com/AD102GL_L40S"

# GPU for ML training
gpus:
- name: "nvidia.com/GA100_A100_PCIE_80GB"

# GPU for LLM and exascale computing
gpus:
- name: "nvidia.com/H100_94GB"

Hardware Specifications​

GPUArchitectureMemoryPerformance
L40SAda Lovelace48 GB GDDR6362 TOPS (INT8)
A100Ampere80 GB HBM2e312 TOPS (INT8)
H100Hopper80 GB HBM31979 TOPS (INT8)

Complete GPU VM Example​

apiVersion: apps.cozystack.io/v1alpha1
kind: VirtualMachine
metadata:
name: ai-workstation
spec:
running: true
instanceProfile: ubuntu
instanceType: u1.2xlarge # 8 vCPU, 32 GB RAM
gpus:
- name: "nvidia.com/GA100_A100_PCIE_80GB"
systemDisk:
size: 200Gi
storageClass: replicated
external: true
externalMethod: PortList
externalPorts:
- 22
- 8888 # Jupyter
cloudInit: |
#cloud-config
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL

packages:
- python3-pip
- build-essential

runcmd:
# NVIDIA drivers
- wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
- dpkg -i cuda-keyring_1.0-1_all.deb
- apt-get update
- apt-get install -y cuda-toolkit nvidia-driver-535

# PyTorch with CUDA
- pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121

☸️ GPU with Kubernetes​

Kubernetes API with GPU Workers​

apiVersion: apps.cozystack.io/v1alpha1
kind: Kubernetes
metadata:
name: cluster-gpu
spec:
controlPlane:
replicas: 1

nodeGroups:
gpu-workers:
minReplicas: 1
maxReplicas: 5
instanceType: "u1.xlarge"
ephemeralStorage: 100Gi
gpus:
- name: "nvidia.com/AD102GL_L40S"

GPU Parameters for NodeGroups​

ParameterTypeDescriptionRequired
nodeGroups.<name>.gpus[]GPUGPUs for workers❌
gpus[].namestringNVIDIA GPU typeβœ…

Multi-GPU Configuration​

nodeGroups:
gpu-intensive:
minReplicas: 1
maxReplicas: 2
instanceType: "u1.4xlarge" # 16 vCPU, 64 GB RAM
gpus:
- name: "nvidia.com/GA100_A100_PCIE_80GB"
- name: "nvidia.com/GA100_A100_PCIE_80GB"
- name: "nvidia.com/GA100_A100_PCIE_80GB"
- name: "nvidia.com/GA100_A100_PCIE_80GB"

Usage in Pods​

apiVersion: v1
kind: Pod
metadata:
name: ml-training
spec:
containers:
- name: trainer
image: pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
command:
- python
- train.py

Multi-GPU Job​

apiVersion: batch/v1
kind: Job
metadata:
name: distributed-training
spec:
template:
spec:
containers:
- name: trainer
image: pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
resources:
limits:
nvidia.com/gpu: 4
requests:
nvidia.com/gpu: 4
env:
- name: CUDA_VISIBLE_DEVICES
value: "0,1,2,3"
restartPolicy: Never

πŸ“‹ Approach Comparison​

VM GPU vs Kubernetes GPU​

AspectVM GPUKubernetes GPU
Allocation1 GPU = 1 VM (exclusive)1+ GPU per worker (shareable)
IsolationComplete at VM levelNamespace/Pod
ScalingVertical (more GPUs)Horizontal + Vertical
ManagementManual via YAMLOrchestrated by K8s
SharingNoYes (between pods)
OverheadMinimalOrchestration overhead

When to use each approach​

  • Legacy non-containerized applications
  • Need for direct and complete GPU access
  • Development and prototyping
  • Monolithic workloads
  • Graphics applications (rendering, CAD)
  • Containerized applications
  • Workloads requiring automatic scaling
  • Parallel and distributed jobs
  • GPU resource sharing
  • Complex ML/AI pipelines

πŸ”§ Advanced Configuration​

Multi-GPU on VM​

apiVersion: apps.cozystack.io/v1alpha1
kind: VirtualMachine
metadata:
name: multi-gpu-vm
spec:
instanceType: u1.8xlarge # 32 vCPU, 128 GB RAM
gpus:
- name: "nvidia.com/H100_94GB"
- name: "nvidia.com/H100_94GB"
- name: "nvidia.com/H100_94GB"
- name: "nvidia.com/H100_94GB"

Specialized GPU NodeGroup​

nodeGroups:
gpu-inference:
minReplicas: 2
maxReplicas: 10
instanceType: "u1.large"
gpus:
- name: "nvidia.com/AD102GL_L40S"

gpu-training:
minReplicas: 1
maxReplicas: 3
instanceType: "u1.4xlarge"
gpus:
- name: "nvidia.com/GA100_A100_PCIE_80GB"
- name: "nvidia.com/GA100_A100_PCIE_80GB"

Pod with Specific GPU​

apiVersion: v1
kind: Pod
metadata:
name: specific-gpu-pod
spec:
nodeSelector:
gpu-type: "L40S"
containers:
- name: app
image: nvidia/cuda:12.0-runtime-ubuntu20.04
resources:
limits:
nvidia.com/gpu: 1

βœ… Verification and Monitoring​

VM GPU Verification​

# Access VM
virtctl ssh ubuntu@vm-gpu

# Check GPUs
nvidia-smi

# CUDA test
nvidia-smi --query-gpu=name,memory.total,utilization.gpu --format=csv

Kubernetes GPU Verification​

# See GPU resources on nodes
kubectl describe nodes

# Check GPU allocation
kubectl get nodes -o custom-columns=NAME:.metadata.name,GPU:.status.allocatable.'nvidia\.com/gpu'

# Monitor GPU usage
kubectl top nodes

GPU Monitoring in a Pod​

# Exec into a pod with GPU
kubectl exec -it <pod-name> -- nvidia-smi

# See GPU metrics
kubectl exec -it <pod-name> -- nvidia-smi --query-gpu=utilization.gpu,memory.used,memory.total --format=csv -l 5

πŸ’‘ Best Practices​

For VM GPU:​

  • Use replicated storage class for production
  • Size CPU/RAM according to GPU (ratio 8-16 vCPU per GPU)
  • Install NVIDIA drivers via cloud-init
  • Stop VMs when unused to optimize costs

For Kubernetes GPU:​

  • Configure appropriate resource limits
  • Use nodeSelector or nodeAffinity to target specific GPUs
  • Implement PodDisruptionBudgets for critical workloads
  • Monitor GPU usage with custom metrics

General:​

  • L40S for inference/development
  • A100 for standard ML training
  • H100 for LLM and exascale computing
  • Test with L40S before moving to more expensive GPUs