Working with NVIDIA GPU Node Group
Last updated
Last updated
Address
VNG CorporationThe NVIDIA GPU Operator is an operator that simplifies the deployment and management of GPU nodes in Kubernetes clusters. It provides a set of Kubernetes custom resources and controllers that work together to automate the management of GPU resources in a Kubernetes cluster.
In this guide, we will show you how to:
Create a nodegroup with NVIDIA GPUs in a VKS cluster.
Install the NVIDIA GPU Operator in a VKS cluster.
Deploy your GPU workload in a VKS cluster.
Configure GPU Sharing in a VKS cluster.
Monitor GPU resources in a VKS cluster.
Autoscale GPU resources in a VKS cluster.
A VKS cluster with at least one NVIDIA GPU nodegroup.
kubectl
command-line tool installed on your machine. For more information, see Install and Set Up kubectl.
helm
command-line tool installed on your machine. For more information, see Installing Helm.
(Optional) Other tools and libraries that you can use to monitor and manage your Kubernetes resources:
kubectl-view-allocations
plugin for monitoring cluster resources. For more information, see kubectl-view-allocations.
The image below shows my machine setup, it will be used in this guide:
And this is my VKS cluster with 1 NVIDIA GPU nodegroup, it will be used in this guide, execute the following command to check the nodegroup in your cluster:
This guide only focus on installing the NVIDIA GPU Operator, for more information about the NVIDIA GPU Operator, see NVIDIA GPU Operator Documentation. We manually install the NVIDIA GPU Operator in a VKS cluster by using Helm charts, execute the following command to install the NVIDIA GPU Operator in your VKS cluster:
You MUST wait for the installation to complete (about 5-10 minutes), execute the following command to check all the pods in the gpu-operator
namespace are running:
The operator will label the node with the nvidia.com/gpu
label, which can be used to filter the nodes that have GPUs. The nvidia.com/gpu
label is used by the NVIDIA GPU Operator to identify nodes that have GPUs. The NVIDIA GPU Operator will only deploy the NVIDIA GPU device plugin on nodes that have the nvidia.com/gpu
label.
For the above result, the single node in the cluster has the
nvidia.com/gpu
label, which means that the node has GPUs.These labels also tell that this node is using 1 card of RTX 2080Ti GPU, number of available GPUs, the GPU Memory and other information.
On the pod nvidia-device-plugin-daemonset
in the gpu-operator
namespace, you can execute nvidia-smi
command to check the GPU information of the node:
In this section, we will show you how to deploy a GPU workload in a VKS cluster. We will use the cuda-vectoradd-test
workload as an example. The cuda-vectoradd-test
workload is a simple CUDA program that adds two vectors together. The program is provided as a container image that you can deploy in your VKS cluster. See file cuda-vectoradd-test.yaml.
In this section, we apply a Deployment
manifest for a TensorFlow GPU application. The purpose of this Deployment
is to create and manage a single pod running a TensorFlow container that utilizes GPU resource for executing the sum of random values from a normal distribution of size \( 100000 \) by \( 100000 \). For more detail about the manifest, see file tensorflow-gpu.yaml
GPU sharing strategies allow multiple containers to efficiently use your attached GPUs and save running costs. The following tables summarizes the difference between the GPU sharing modes supported by NVIDIA GPUs:
VKS uses the built-in timesharing ability provided by the NVIDIA GPU and the software stack. Starting with the Pascal architecture, NVIDIA GPUs support instruction level preemption. When doing context switching between processes running on a GPU, instruction-level preemption ensures every process gets a fair timeslice. GPU time-sharing provides software-level isolation between the workloads in terms of address space isolation, performance isolation, and error isolation.
To enable GPU time-slicing, you need to configure a ConfigMap
with the following settings:
The above manifest allows 4 pods to share the GPU. The replicas
field specifies the number of pods that can share the GPU. The replicas
field should be less than the number of GPUs on the node. The nvidia.com/gpu
label is used to filter the nodes that have GPUs. The migStrategy
field is set to none
to disable MIG.
This configuration will apply to all nodes in the cluster that have the nvidia.com/gpu
label. To apply the configuration, execute the following command:
And then you need to patch the ClusterPolicy
to enable GPU time-slicing using the any
setting:
Your new configuration will be applied to all nodes in the cluster that have the
nvidia.com/gpu
label.The configuration is considered successful if the
ClusterPolicy
STATUS isready
.Because of the
sharing.timeSlicing.resources.replicas
is set to 4, you can deploy up to 4 pods that share the GPU.My cluster has only 1 GPU node, so I can deploy up to 4 pods that share the GPU.
Until now, we have configured the GPU time-slicing, now we will deploy 5 pods that share the GPU using Deployment
, because of only 4 pods can share the GPU, the 5th pod will be in Pending
state. See file time-slicing-verification.yaml.
VKS uses NVIDIA's Multi-Process Service (MPS). NVIDIA MPS is an alternative, binary-compatible implementation of the CUDA API designed to transparently enable co-operative multi-process CUDA workloads to run concurrently on a single GPU device. GPU with NVIDIA MPS provides software-level isolation in terms of resource limits (active thread percentage and pinned device memory).
To enable GPU MPS, you need to update the previous ConfigMap
with the following settings:
Now let's apply this new ConfigMap
and then patching the ClusterPolicy
like the way at the GPU time-slicing section.
Your new configuration will be applied to all nodes in the cluster that have the
nvidia.com/gpu
label.The configuration is considered successful if the
ClusterPolicy
STATUS isready
.Because of the
sharing.mps.resources.replicas
is set to 4, you can deploy up to 4 pods that share the GPU.
Until now, we have configured the GPU MPS, now we will deploy 5 pods that share the GPU using Deployment
, because of only 4 pods can share the GPU, the 5th pod will be in Pending
state. See file mps-verification.yaml.
An alternative to applying one cluster-wide configuration is to specify multiple time-slicing configurations in the ConfigMap
and to apply labels node-by-node to control which configuration is applied to which nodes.
In this guideline, I add a new RTX-4090 into the cluster.
This configuration should be greate if your cluster have multiple nodes with different GPU models. For example:
NodeGroup 1 includes the instance of GPU RTX 2080Ti.
NodeGroup 2 includes the instance of GPU RTX 4090.
And if you want to run multiple GPU sharing strategies in the same cluster, you can apply multiple configurations to the same node by using labels. For example:
NodeGroup 1 includes the instance of GPU RTX 2080Ti with 4 pods sharing the GPU using time-slicing.
NodeGroup 2 includes the instance of GPU RTX 4090 with 8 pods sharing the GPU using MPS.
To using this feature, you need to update the previous ConfigMap
with the following settings:
Apply the above configure.