Overview
GPU slicing (time-slicing) enables efficient GPU resource sharing on Amazon EKS clusters, particularly for AI workloads. By dividing GPU access into smaller time intervals, multiple tasks or processes can share GPU resources, leading to cost optimization and improved utilization.
Amazon EKS supports GPU slicing through NVIDIA’s Kubernetes device plugin, which exposes GPU resources to Kubernetes, allowing the scheduler to manage GPU allocation dynamically.
Here’s how to enable GPU slicing on EKS clusters.
Steps to Enable GPU Slicing on EKS Clusters
1. Prepare Your EKS Cluster
Ensure your EKS cluster has NVIDIA GPU-backed EC2 instances. Instances like p3.8xlarge
or A100 GPUs support GPU slicing. Use eksctl
to set up the cluster.
Example commands with eksctl
:
eksctl create nodegroup --name gpu --node-type p3.8xlarge --nodes 1 --cluster <cluster-name>
Verify the nodes:
kubectl get nodes
2. Install the NVIDIA Device Plugin
Deploy the NVIDIA Kubernetes device plugin, which manages GPU resource allocation.
-
Label GPU-enabled nodes:
kubectl label node <node-name> eks-node=gpu
-
Install the device plugin with Helm:
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm install nvdp nvdp/nvidia-device-plugin \ --namespace kube-system \ --version 0.17.0
Verify the plugin:
kubectl get daemonset -n kube-system | grep nvidia
3. Enable GPU Time-Slicing
Configure the device plugin for time-slicing by creating a ConfigMap
with the desired GPU slices.
Example configuration:
cat << EOF > nvidia-device-plugin.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin
namespace: kube-system
data:
any: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 10 #Number of virtual GPUs per physical GPU
EOF
kubectl apply -f nvidia-device-plugin.yaml
Here we are setting 10 virtual GPUs per physical GPU.
Update the plugin with the time-slicing configuration:
helm upgrade --install nvdp nvdp/nvidia-device-plugin \
--namespace kube-system \
-f nvdp-values.yaml \
--set config.name=nvidia-device-plugin \
--force
Validate the GPU slices:
kubectl get nodes -o json | jq -r '.items[] | select(.status.capacity."nvidia.com/gpu" != null) | {name: .metadata.name, capacity: .status.capacity}'
Using NVIDIA GPU Operator
Time slicing can be enabled in the same way by also applying configuration using Nvidia GPU Operator. Prior to installing the operator the configmap shall be created by running the command.
A similiar configmap to the one created in Enable GPU Time-Slicing section should be created in order to apply the cluster wide configuration. (Note: It is also possible to create Node-specific configurations. Check out this link)
kubectl apply -f nvidia-device-plugin.yaml -n gpu-operator
Afterwards, the operator should be installed.
helm install gpu-operator nvidia/gpu-operator \
-n gpu-operator \
--version=v24.9.0 \
--set devicePlugin.config.name=time-slicing-config
You can confirm that slicing configuration is applied successfuly by describing node and looking for these labels
kubectl describe node <node-name>
...
Labels:
nvidia.com/gpu.count=4
nvidia.com/gpu.product=Tesla-T4-SHARED
nvidia.com/gpu.replicas=4
Capacity:
nvidia.com/gpu: 16
...
Allocatable:
nvidia.com/gpu: 16
...
Validating GPU slicing by deploying a workload
kubectl apply -f deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: time-slicing-verification
labels:
app: time-slicing-verification
spec:
replicas: 5
selector:
matchLabels:
app: time-slicing-verification
template:
metadata:
labels:
app: time-slicing-verification
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
hostPID: true
containers:
- name: cuda-sample-vector-add
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
command: ["/bin/bash", "-c", "--"]
args:
- while true; do /cuda-samples/vectorAdd; done
resources:
limits:
nvidia.com/gpu: 1
Check out the logs from the pods.
kubectl logs deploy/time-slicing-verification
Found 5 pods, using pod/time-slicing-verification-7cdc7f87c5-s8qwk
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
...
Karpenter and GPU device plugin
The NVIDIA device plugin registers GPUs as extended resources (e.g. nvidia.com/gpu). This means that Kubernetes scheduler is aware of the availability of GPUs on a node but does not directly control how GPUs are physically shared or allocated.
Karpenter, as a cluster autoscaler, provisions nodes based on pod resource requests. Since GPUs are registered as extended resources, Karpenter treats them as abstract quantities, not tied to specific physical configurations like MIG or time-slicing. Karpenter will launch GPU-enabled nodes if pods require nvidia.com/gpu, but it won’t understand advanced GPU-sharing configurations (e.g., splitting a single GPU into multiple logical slices).
Read more
Time-Slicing GPUs in Kubernetes
GPU sharing on Amazon EKS with NVIDIA time-slicing and accelerated EC2 instances