Overview

GPU slicing (time-slicing) enables efficient GPU resource sharing on Amazon EKS clusters, particularly for AI workloads. By dividing GPU access into smaller time intervals, multiple tasks or processes can share GPU resources, leading to cost optimization and improved utilization.

Amazon EKS supports GPU slicing through NVIDIA’s Kubernetes device plugin, which exposes GPU resources to Kubernetes, allowing the scheduler to manage GPU allocation dynamically.

Here’s how to enable GPU slicing on EKS clusters.


Steps to Enable GPU Slicing on EKS Clusters

1. Prepare Your EKS Cluster

Ensure your EKS cluster has NVIDIA GPU-backed EC2 instances. Instances like p3.8xlarge or A100 GPUs support GPU slicing. Use eksctl to set up the cluster.

Example commands with eksctl:

eksctl create nodegroup --name gpu --node-type p3.8xlarge --nodes 1 --cluster <cluster-name>

Verify the nodes:

kubectl get nodes

2. Install the NVIDIA Device Plugin

Deploy the NVIDIA Kubernetes device plugin, which manages GPU resource allocation.

  • Label GPU-enabled nodes:

    kubectl label node <node-name> eks-node=gpu
    
  • Install the device plugin with Helm:

    helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
    helm install nvdp nvdp/nvidia-device-plugin \
      --namespace kube-system \
      --version 0.17.0
    

Verify the plugin:

kubectl get daemonset -n kube-system | grep nvidia

3. Enable GPU Time-Slicing

Configure the device plugin for time-slicing by creating a ConfigMap with the desired GPU slices.

Example configuration:

cat << EOF > nvidia-device-plugin.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin
  namespace: kube-system
data:
  any: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 10 #Number of virtual GPUs per physical GPU
EOF
kubectl apply -f nvidia-device-plugin.yaml

Here we are setting 10 virtual GPUs per physical GPU.

Update the plugin with the time-slicing configuration:

helm upgrade --install nvdp nvdp/nvidia-device-plugin \
  --namespace kube-system \
  -f nvdp-values.yaml \
  --set config.name=nvidia-device-plugin \
  --force

Validate the GPU slices:

kubectl get nodes -o json | jq -r '.items[] | select(.status.capacity."nvidia.com/gpu" != null) | {name: .metadata.name, capacity: .status.capacity}'

Using NVIDIA GPU Operator

Time slicing can be enabled in the same way by also applying configuration using Nvidia GPU Operator. Prior to installing the operator the configmap shall be created by running the command.

A similiar configmap to the one created in Enable GPU Time-Slicing section should be created in order to apply the cluster wide configuration. (Note: It is also possible to create Node-specific configurations. Check out this link)

kubectl apply -f nvidia-device-plugin.yaml -n gpu-operator

Afterwards, the operator should be installed.

helm install gpu-operator nvidia/gpu-operator \
    -n gpu-operator \
    --version=v24.9.0 \
    --set devicePlugin.config.name=time-slicing-config

You can confirm that slicing configuration is applied successfuly by describing node and looking for these labels

kubectl describe node <node-name>
...
Labels:
                  nvidia.com/gpu.count=4
                  nvidia.com/gpu.product=Tesla-T4-SHARED
                  nvidia.com/gpu.replicas=4
Capacity:
  nvidia.com/gpu: 16
  ...
Allocatable:
  nvidia.com/gpu: 16
  ...

Validating GPU slicing by deploying a workload

kubectl apply -f deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: time-slicing-verification
  labels:
    app: time-slicing-verification
spec:
  replicas: 5
  selector:
    matchLabels:
      app: time-slicing-verification
  template:
    metadata:
      labels:
        app: time-slicing-verification
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      hostPID: true
      containers:
        - name: cuda-sample-vector-add
          image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
          command: ["/bin/bash", "-c", "--"]
          args:
            - while true; do /cuda-samples/vectorAdd; done
          resources:
           limits:
             nvidia.com/gpu: 1

Check out the logs from the pods.

kubectl logs deploy/time-slicing-verification
Found 5 pods, using pod/time-slicing-verification-7cdc7f87c5-s8qwk
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
...

Karpenter and GPU device plugin

The NVIDIA device plugin registers GPUs as extended resources (e.g. nvidia.com/gpu). This means that Kubernetes scheduler is aware of the availability of GPUs on a node but does not directly control how GPUs are physically shared or allocated.

Karpenter, as a cluster autoscaler, provisions nodes based on pod resource requests. Since GPUs are registered as extended resources, Karpenter treats them as abstract quantities, not tied to specific physical configurations like MIG or time-slicing. Karpenter will launch GPU-enabled nodes if pods require nvidia.com/gpu, but it won’t understand advanced GPU-sharing configurations (e.g., splitting a single GPU into multiple logical slices).

Read more

Time-Slicing GPUs in Kubernetes

GPU sharing on Amazon EKS with NVIDIA time-slicing and accelerated EC2 instances