Tanzu Kubernetes Grid GPU Integration

I recently had to demonstrate Tanzu Kubernetes Grid and its GPU integration capabilities. Developing a good use case and assembling the demo required some preliminary research.

During my research, I reached out to Jay Vyas, staff engineer at VMware, SIG Windows lead for Kubernetes, a Kubernetes legend, and an awesome guy in general. :) For those who don’t know Jay, he is also one of the authors of the fantastic book Core Kubernetes (look it up!).

Jay walked me through the improved integration now available in TKG 2.x and helped me get started with my deployment (thanks again for your support, Jay!)

In this post, I will go over the use case and share the steps I have taken (and the code I have used). All files and configurations are available on my tkg-gpu-integration GitHub repository.

The Use Case

The following is a high-level diagram of the use case I’ve built. Due to the relatively large number of moving parts here, the diagram helps illustrate it.

High-level architecture:

The underlying vSphere environment has one or more ESXi hosts with a physical GPU installed.
The GPU-powered ESXi host(s) leverage PCI passthrough to connect the GPU device(s) to the TKG/Kubernetes worker nodes.
The TKG/Kubernetes cluster has the following installed:
- Prometheus, Prometheus Operator, and Grafana enabled for observability purposes.
- Portworx Operator - as ReadWriteMany volumes are part of the solution.
- NVIDIA GPU Operator - for Kubernetes GPU support, allocation operations, etc.
Prometheus scrapes metrics from the NVIDIA GPU Operator. This is based on the Prometheus Operator Service Monitor and is enabled at the NVIDIA GPU Operator Helm Chart. Grafana then visualizes these metrics using dashboards.
The sample application used for this use case is a supply chain pipeline consisting of three microservices:
- Repository service - a REST API service used to upload/download/inspect artifacts (such as video files).
- Coordinator service - a controller that watches for new artifacts in the repository and triggers Kubernetes jobs to process those artifacts.
- FFMpeg worker - a processing job initiated by the coordinator service to transcode the artifacts (videos) using FFmpeg, leveraging NVIDIA CUDA hardware acceleration. Each FFmpeg job is allocated GPU resources.
All three microservices are connected to a ReadWriteMany shared PVC, provisioned and managed by the Portworx Operator.

Prerequisites

PCI passthrough must be enabled on all ESXi hosts where the GPUs are installed. From the vSphere Client, right-click on the ESXi host and select Settings. On the Configure tab, select Hardware > PCI Devices and go to the All PCI Devices tab. Locate the GPU on the list of devices. You can filter by name using the Vendor Name or the Device Name columns. Select the checkbox for the GPU device and click Toggle Passthrough.

Deploy a GPU-Enabled TKG Workload Cluster

As of this writing, TKGm 2.1.1 is used for this demo. You can refer to the cluster configs located in the tkg-cluster-configs directory. Update any relevant parameters in the configuration file as needed.

You can deploy the workload cluster as you normally would, with a few additional parameters that are required for the GPU integration:

#! The Vendor ID and Device ID of the GPU device
VSPHERE_WORKER_PCI_DEVICES: "0x10DE:0x1EB8"

#! Set 'pciPassthru.64bitMMIOSizeGB' to the total GB of framebuffer memory of all GPUs in the cluster rounded up to the next higher power of two.
VSPHERE_WORKER_CUSTOM_VMX_KEYS: 'pciPassthru.allowP2P=true,pciPassthru.RelaxACSforP2P=true,pciPassthru.use64bitMMIO=true,pciPassthru.64bitMMIOSizeGB=16'

#! 'false' if you are using the NVIDIA Tesla T4 GPU and 'true' if you are using the NVIDIA V100 GPU
VSPHERE_IGNORE_PCI_DEVICES_ALLOW_LIST: "false"

#! 'RollingUpdate' if you have extra PCI devices which can be used by the worker nodes during upgrades, otherwise use 'OnDelete'.
WORKER_ROLLOUT_STRATEGY: OnDelete

#! The virtual machine hardware version we want the VM to upgrade to. The minimum version required for GPU nodes should be 17.
VSPHERE_WORKER_HARDWARE_VERSION: vmx-17

Refer to the official documentation for more information on the above parameters

Generate the cluster deployment manifest from tkg-wld-gpu.yaml.

tanzu cluster create -f tkg-cluster-configs/tkg-wld-gpu.yaml --dry-run > tkg-cluster-configs/gpu-workload-cluster-spec.yaml

You can refer to gpu-workload-cluster-spec-example.yaml to view an example of the generated manifest.

As you can see, the PCI passthrough configuration is added to the Cluster object based on the configuration specified in the cluster config file.

...
    - name: worker
      value:
        count: 1
        machine:
          customVMXKeys:
            pciPassthru.64bitMMIOSizeGB: "16"
            pciPassthru.RelaxACSforP2P: "true"
            pciPassthru.allowP2P: "true"
            pciPassthru.use64bitMMIO: "true"
          diskGiB: 300
          memoryMiB: 16384
          numCPUs: 4
    - name: pci
      value:
        worker:
          devices:
          - deviceId: 7864
            vendorId: 4318
          hardwareVersion: vmx-17
...

Deploy the workload cluster.

tanzu cluster create -f tkg-cluster-configs/gpu-workload-cluster-spec.yaml

Retrieve the admin kubeconfig for the workload cluster and deploy any necessary packages on the workload cluster. For example, I have cert-manager, External DNS, and Contour running on the cluster, as ingress is used in the configuration.

Deploy Prometheus Operator and Grafana

Deploy the kube-prometheus-stack Helm Chart for Prometheus/Prometheus Operator and Grafana.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade -i kube-prometheus-stack prometheus-community/kube-prometheus-stack -n observability -f helm-chart-configs/kube-prometheus-stack-values.yaml --create-namespace

Note: as you can see in my kube-prometheus-stack-values.yaml file, I embedded two Grafana dashboards in the values file.

Ensure the pods are up and running and the Prometheus/Grafana ingress resources have been created.

kubectl get pods,ingress -n observability

NAME                                                            READY   STATUS    RESTARTS       AGE
pod/kube-prometheus-stack-grafana-8fbf7df87-8vckd               3/3     Running   0              18s
pod/kube-prometheus-stack-kube-state-metrics-7c44b8c9c4-w8p8t   1/1     Running   0              18s
pod/kube-prometheus-stack-operator-75b7b9747d-dqrcn             1/1     Running   0              18s
pod/kube-prometheus-stack-prometheus-node-exporter-7fvkk        1/1     Running   0              18s
pod/kube-prometheus-stack-prometheus-node-exporter-m7trz        1/1     Running   0              18s
pod/prometheus-kube-prometheus-stack-prometheus-0               2/2     Running   0              18s

NAME                                                         CLASS    HOSTS                                  ADDRESS        PORTS   AGE
ingress.networking.k8s.io/kube-prometheus-stack-grafana      <none>   l02-grafana.cloudnativeapps.cloud      172.16.53.77   80      18s
ingress.networking.k8s.io/kube-prometheus-stack-prometheus   <none>   l02-prometheus.cloudnativeapps.cloud   172.16.53.77   80      18s

Deploy NVIDIA GPU Operator

Deploy the NVIDIA GPU Operator.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm upgrade -i nvidia-gpu-operator nvidia/gpu-operator -n nvidia-gpu-operator -f helm-chart-configs/nvidia-gpu-operator-values.yaml --create-namespace

Ensure the pods are up and running and the validator pods have been completed.

kubectl get pods -n nvidia-gpu-operator

NAME                                                              READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-rvspn                                       1/1     Running     0          18s
gpu-operator-69b7976bf4-5xbc9                                     1/1     Running     0          18s
nvidia-container-toolkit-daemonset-l27qm                          1/1     Running     0          18s
nvidia-cuda-validator-mwfkf                                       0/1     Completed   0          18s
nvidia-dcgm-exporter-r4jjf                                        1/1     Running     0          18s
nvidia-device-plugin-daemonset-f8r52                              1/1     Running     0          18s
nvidia-device-plugin-validator-97fkq                              0/1     Completed   0          18s
nvidia-driver-daemonset-s7jc6                                     1/1     Running     0          18s
nvidia-gpu-operator-node-feature-discovery-master-68495df82vfxd   1/1     Running     0          18s
nvidia-gpu-operator-node-feature-discovery-worker-nlflb           1/1     Running     0          18s
nvidia-operator-validator-5np5q                                   1/1     Running     0          18s

Open the Prometheus UI (e.g., l02-prometheus.cloudnativeapps.cloud) in your web browser, navigate to Status > Targets and search for nvidia. Ensure the serviceMonitor/nvidia-gpu-operator/nvidia-dcgm-exporter/0 and the serviceMonitor/nvidia-gpu-operator/gpu-operator/0 targets are UP.

These service monitors are enabled at the nvidia-gpu-operator-values.yaml level.

Deploy Portworx Operator

This tutorial uses Portworx Operator to demonstrate the storage-agnostic feature of using a ReadWriteMany (RWX) PVC on non-vSAN storage.

Sign in to Portworx Central. From the main landing page, click Continue under the Portworx Enterprise option, then click Continue under the Portworx Essentials/Portworx CSI section. The Essentials tier is sufficient and serves the purpose of the demo.

For the Platform field, select VMware Tanzu.

For the Storage Class field, enter default.

Keep the default value for the Namespace field (portworx).

Select the Kubernetes version, e.g., 1.24.10.

Click Save Spec and enter any name and tag for the new spec, e.g., l02-tkg-wld-gpu.

Use the generated links/commands from the Portworx portal to deploy the Portworx Operator and the Portworx StorageCluster.

For example:

kubectl apply -f 'https://install.portworx.com/2.13?comp=pxoperator&kbver=1.24.10&ns=portworx'
kubectl apply -f 'https://install.portworx.com/2.13?operator=true&mc=false&kbver=1.24.10&ns=portworx&oem=esse&user=672c1bf0-dc2a-11eb-a2c5-c24e499c7467&b=true&csicd=true&kd=sc%3Ddefault%2Csize%3D150&s=%22sc%3Ddefault%2Csize%3D150%22&c=px-cluster-c2b81686-8694-4bd4-a63b-4fbe7794ce72&stork=true&csi=true&mon=true&tel=false&st=k8s&e=PRE-EXEC%3Diptables%20-A%20INPUT%20-p%20tcp%20--match%20multiport%20--dports%209001%3A9020%20-j%20ACCEPT&promop=true'

Create a storage class for Portworx.

kubectl apply -f portworx/portworx-storage-class.yaml

Ensure the Portworx pods are up and running using kubectl get pod -n portworx.

Example output:

NAME                                                    READY   STATUS        RESTARTS   AGE
autopilot-7fc887b75b-d8d77                              1/1     Running       0          9m55s
portworx-api-xcwqt                                      1/1     Running       0          9m55s
portworx-kvdb-7bdfd                                     1/1     Running       0          3s
portworx-operator-59b4447bfd-hl84g                      1/1     Running       0          10m
portworx-pvc-controller-5d98c8dbbc-c7f2f                1/1     Running       0          9m55s
prometheus-px-prometheus-0                              2/2     Running       0          4s
px-cluster-7474d9a1-a507-4409-a911-39b4160ab0ba-62dms   2/2     Running       0          9m54s
px-csi-ext-846c7f4978-95582                             4/4     Running       0          9m57s
px-csi-ext-846c7f4978-hxbpd                             4/4     Running       0          9m57s
px-csi-ext-846c7f4978-lkrmr                             4/4     Running       0          9m57s
px-prometheus-operator-7d884bc8bc-ddnqg                 1/1     Running       0          10m
stork-6649cb56f5-ddbhb                                  1/1     Running       0          10m
stork-scheduler-65864d695b-fxt45                        1/1     Running       0          10m

Prepare Sample App Container Images

You must prepare and push the sample app container images to your Harbor.

You can use public images for the Repository service and the FFmpeg worker. However, for the Coordinator service, you must run a Docker build to create it.

Create a public project on Harbor.

HARBOR_HOSTNAME=l02-harbor.cloudnativeapps.cloud
HARBOR_USERNAME=admin
HARBOR_PASSWORD=VMware1!
HARBOR_PROJECT=ffmpeg-supply-chain-gpu-demo

./scripts/create-public-harbor-project.sh "$HARBOR_HOSTNAME" "$HARBOR_USERNAME" "$HARBOR_PASSWORD" "$HARBOR_PROJECT"

Extract the Harbor CA certificate, add it to the trust store and login into Harbor.

sudo mkdir -p "/etc/docker/certs.d/${HARBOR_HOSTNAME}"
HARBOR_CA_CERT=$(kubectl get secret harbor-tls -n tanzu-system-registry -o=jsonpath="{.data.ca\.crt}" | base64 -d)
sudo bash -c 'echo '"'$HARBOR_CA_CERT'"' > /etc/docker/certs.d/'${HARBOR_HOSTNAME}'/ca.crt'
sudo bash -c 'echo '"'$HARBOR_CA_CERT'"' > /usr/local/share/ca-certificates/harbor-ca.crt'
sudo update-ca-certificates

docker login "$HARBOR_HOSTNAME" -u "$HARBOR_USERNAME" -p "$HARBOR_PASSWORD"

Relocate the required images.

# FFmpeg worker image
# Source: https://github.com/jrottenberg/ffmpeg
FFMPEG_IMAGE=jrottenberg/ffmpeg
FFMPEG_IMAGE_TAG=5.1.2-nvidia2004

docker pull "$FFMPEG_IMAGE:$FFMPEG_IMAGE_TAG"
docker tag "$FFMPEG_IMAGE:$FFMPEG_IMAGE_TAG" "$HARBOR_HOSTNAME/$HARBOR_PROJECT/ffmpeg:$FFMPEG_IMAGE_TAG"
docker push "$HARBOR_HOSTNAME/$HARBOR_PROJECT/ffmpeg:$FFMPEG_IMAGE_TAG"

# Repository service image
# Source: https://github.com/mayth/go-simple-upload-server
REPOSITORY_IMAGE=mayth/simple-upload-server
REPOSITORY_IMAGE_TAG=latest

docker pull "$REPOSITORY_IMAGE:$REPOSITORY_IMAGE_TAG"
docker tag "$REPOSITORY_IMAGE:$REPOSITORY_IMAGE_TAG" "$HARBOR_HOSTNAME/$HARBOR_PROJECT/simple-upload-server:$REPOSITORY_IMAGE_TAG"
docker push "$HARBOR_HOSTNAME/$HARBOR_PROJECT/simple-upload-server:$REPOSITORY_IMAGE_TAG"

Build and push the Coordinator service image.

docker build coordinator-service-image/ -t "$HARBOR_HOSTNAME/$HARBOR_PROJECT/coordinator-service:v1.0.0" --push

Environment Validation

To validate the GPU functionality, you can use the validations/gpu-validation-deployment.yaml deployment to deploy a sample workload leveraging the GPU.

kubectl apply -f validations/gpu-validation-deployment.yaml

Ensure the pod is running using kubectl get pods -l app=gpu-validation.

Run the nvidia-smi command on the pod to ensure the GPU is present.

kubectl exec $(kubectl get pod -l app=gpu-validation -o jsonpath='{.items[].metadata.name}') -- nvidia-smi

Example output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:13:00.0 Off |                    0 |
| N/A   44C    P8    16W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Delete the sample deployment.

kubectl delete -f validations/gpu-validation-deployment.yaml

To validate the Portworx functionality, you can use the validations/portworx-rwx-deployment.yaml deployment to deploy sample NGINX instances utilizing a shared PVC.

kubectl apply -f validations/portworx-rwx-validation-deployment.yaml

Ensure pod NGINX pods are up and running and the PVC is connected to both of them.

kubectl get pods

NAME                     READY   STATUS    RESTARTS   AGE
nginx-84d6758fcb-844sm   1/1     Running   0          13h
nginx-84d6758fcb-trb65   1/1     Running   0          13h

kubectl describe nginx-pvc

Name:          nginx-pvc
Namespace:     default
StorageClass:  px-rwx-sc
Status:        Bound
Volume:        pvc-db6f9b31-0947-4315-872b-013ee4accb59
Labels:        app=nginx-pvc
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/portworx-volume
               volume.kubernetes.io/storage-provisioner: kubernetes.io/portworx-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       nginx-84d6758fcb-844sm
               nginx-84d6758fcb-trb65

Delete the sample deployment.

kubectl delete -f validations/portworx-rwx-validation-deployment.yaml

Deploy and Use the Sample App

Deploy the application.

kubectl apply -f deployment/ffmpeg-supply-chain-gpu-app-deployment.yaml

Set the namespace to ffmpeg-supply-chain-gpu-demo.

kubectl config set-context --current --namespace ffmpeg-supply-chain-gpu-demo

Ensure the Repository service and the Coordinator service are running.

kubectl get pods

NAME                                   READY   STATUS    RESTARTS   AGE
coordinator-service-66bdfb7d74-qv4qs   1/1     Running   0          110s
repository-service-7c48d9b666-hdkf8    1/1     Running   0          110s

Ensure the Repository service has been assigned an external IP address from NSX ALB and parse it into a variable.

kubectl get svc

NAME                           TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)        AGE
px-530793372492974469-server   ClusterIP      100.66.188.180   <none>         2049/TCP       2m32s
repository-service             LoadBalancer   100.64.254.233   172.16.53.76   80:32361/TCP   2m36s

REPO_SVC_IP=$(kubectl get svc repository-service -o jsonpath='{.status.loadBalancer.ingress[].ip}')
REPO_SVC_TOKEN=SoMuchForSecurity

From a new terminal, tail the logs of the Repository service.

kubectl logs -l app=repository-service -f

time="2023-03-23T11:42:50Z" level=info msg="starting up simple-upload-server"
time="2023-03-23T11:42:50Z" level=info msg="start listening" cors=false ip=0.0.0.0 port=25478 protected_method="[GET POST HEAD PUT]" root=/ffmpeg-processing token=SoMuchForSecurity upload_limit=524288000000000

From a new terminal, tail the logs of the Coordinator service.

kubectl logs -l app=coordinator-service -f

Validating connectivity to Kubernetes cluster...
Kubernetes control plane is running at https://100.64.0.1:443
CoreDNS is running at https://100.64.0.1:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Processing files in /ffmpeg-processing

Upload a video file to the repository.

curl -Ffile=@sample-videos/sample-video-01.mp4 "http://$REPO_SVC_IP/upload?token=$REPO_SVC_TOKEN"

Output:

{"ok":true,"path":"/files/sample-video-01.mp4"}

The terminal you are tailing the Repository service logs from should display the following entry, indicating the file has been uploaded:

time="2023-03-23T11:59:43Z" level=info msg="file uploaded by POST" path=/ffmpeg-processing/sample-video-01.mp4 size=16489417 url=/files/sample-video-01.mp4

The terminal you are tailing the Coordinator service logs from should display the following entry, indicating the uploaded file has been detected and is being processed:

Creating Kubernetes job for sample-video-01.mp4...
job.batch/ffmpeg-nvidia-5780 created

Check out the new processing job that has been created.

kubectl get job

Output:

NAME                 COMPLETIONS   DURATION   AGE
ffmpeg-nvidia-5780   0/1           47s        47s

Check out the running pod. You should see a ffmpeg-nvidia pod.

kubectl get pods

Output:

NAME                                   READY   STATUS              RESTARTS   AGE
coordinator-service-66bdfb7d74-qv4qs   1/1     Running             0          17m
ffmpeg-nvidia-5780-zkf6k               0/1     Running             0          24s
repository-service-7c48d9b666-hdkf8    1/1     Running             0          17m

Check out the logs of this pod.

kubectl logs ffmpeg-nvidia-5780-zkf6k

You should see the output of the FFmpeg job!

ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --disable-debug --disable-doc --disable-ffplay --enable-cuda --enable-cuvid --enable-fontconfig --enable-gpl --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libfdk_aac --enable-libfreetype --enable-libkvazaar --enable-libmp3lame --enable-libnpp --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libsrt --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxvid --enable-libzmq --enable-nonfree --enable-nvenc --enable-openssl --enable-postproc --enable-shared --enable-small --enable-version3 --extra-cflags='-I/opt/ffmpeg/include -I/opt/ffmpeg/include/ffnvcodec -I/usr/local/cuda/include/' --extra-ldflags='-L/opt/ffmpeg/lib -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib32/' --extra-libs=-ldl --extra-libs=-lpthread --prefix=/opt/ffmpeg
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/ffmpeg-processing/sample-video-01.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2022-08-02T02:32:57.000000Z
  Duration: 00:03:28.52, start: 0.000000, bitrate: 632 kb/s
  Stream #0:0[0x1](und): Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], 501 kb/s, 23.98 fps, 23.98 tbr, 24k tbn (default)
    Metadata:
      creation_time   : 2022-08-02T02:32:57.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/01/2022.
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2022-08-02T02:32:57.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/01/2022.
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_nvenc))
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
Output #0, matroska, to '/ffmpeg-processing/sample-video-01_processed-output.mkv':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Lavf59.27.100
  Stream #0:0(und): Video: h264 (H264 / 0x34363248), cuda(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 5000 kb/s, 23.98 fps, 1k tbn (default)
    Metadata:
      creation_time   : 2022-08-02T02:32:57.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/01/2022.
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.37.100 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/5000000 buffer size: 10000000 vbv_delay: N/A
  Stream #0:1(und): Audio: aac ([255][0][0][0] / 0x00FF), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2022-08-02T02:32:57.000000Z
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/01/2022.
      vendor_id       : [0][0][0][0]
frame= 4999 fps=1036 q=9.0 Lsize=  112307kB time=00:03:28.51 bitrate=4412.2kbits/s speed=43.2x
video:108947kB audio:3258kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.090445%

Go back to the terminal you are tailing the Repository logs from. You should see the following entry:

sample-video-01.mp4 has already been processed

You can view the processed/output files from any of the pods (either the Repository service or the Coordinator service, as they are both connected to the same PVC).

kubectl exec -it $(kubectl get pods -l app=repository-service -o jsonpath='{.items[].metadata.name}') -- ls -lh /ffmpeg-processing

Output:

total 125M
-rw-r--r--    1 root     root       15.7M Mar 23 11:59 sample-video-01.mp4
-rw-r--r--    1 root     root           0 Mar 23 11:59 sample-video-01.mp4.processed
-rw-r--r--    1 root     root      109.7M Mar 23 12:00 sample-video-01_processed-output.mkv

The *.processed file is a marker file indicating the file has been processed, and the *-output.mkv file is the resulting, transcoded/processed output file.

The Coordinator service is aware the file has already been processed.

Run the same ls -lh command on the Coordinator service.

kubectl exec -it $(kubectl get pods -l app=coordinator-service -o jsonpath='{.items[].metadata.name}') -- ls -lh /ffmpeg-processing

You should get the sample output. This is because, again, all our pods are connected to the same PVC, which is Portworx-managed. This also indicates that the ReadWriteMany (RWX) capability is functioning as expected.

You can also verify this by describing the PVC.

kubectl describe pvc ffmpeg-supply-chain-gpu-rwx-shared-pvc

Output:

Name:          ffmpeg-supply-chain-gpu-rwx-shared-pvc
Namespace:     ffmpeg-supply-chain-gpu-demo
StorageClass:  px-rwx-sc
Status:        Bound
Volume:        pvc-fd12f4aa-b254-48ed-aec1-a89e583ef6ed
Labels:        app=ffmpeg-supply-chain-gpu-rwx-shared-pvc
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/portworx-volume
               volume.kubernetes.io/storage-provisioner: kubernetes.io/portworx-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      50Gi
Access Modes:  RWX
VolumeMode:    Filesystem
Used By:       coordinator-service-66bdfb7d74-qv4qs
               ffmpeg-nvidia-5780-zkf6k
               repository-service-7c48d9b666-hdkf8
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  31m   persistentvolume-controller  Successfully provisioned volume pvc-fd12f4aa-b254-48ed-aec1-a89e583ef6ed using kubernetes.io/portworx-volume

As you can see, the Used By field shows that this PVC is indeed connected to all pods.

Empty the /ffmpeg-processing directory from the Repository service.

kubectl exec -it $(kubectl get pods -l app=repository-service -o jsonpath='{.items[].metadata.name}') -- sh -c 'rm /ffmpeg-processing/*'

Ensure the processing directory is now empty.

kubectl exec -it $(kubectl get pods -l app=repository-service -o jsonpath='{.items[].metadata.name}') -- ls -lh /ffmpeg-processing

Output:

total 0

Now, let’s upload two video files for processing at the same time while also monitoring the process from the Grafana UI.

Open the Grafana UI (e.g., l02-grafana.cloudnativeapps.cloud) in your web browser. Login to Grafana and navigate to Dashboards > Browse. Expand the NVIDIA folder and click the Nvidia GPU dashboard. Set the time range to the Last 5 minutes. Leave the dashboard open in the background and upload the files:

curl -Ffile=@sample-videos/sample-video-02.mp4 "http://$REPO_SVC_IP/upload?token=$REPO_SVC_TOKEN"
curl -Ffile=@sample-videos/sample-video-03.mp4 "http://$REPO_SVC_IP/upload?token=$REPO_SVC_TOKEN"

{"ok":true,"path":"/files/sample-video-02.mp4"}{"ok":true,"path":"/files/sample-video-03.mp4"}

You should see that one pod is running, and the other is pending.

NAME                                   READY   STATUS      RESTARTS   AGE
coordinator-service-66bdfb7d74-qv4qs   1/1     Running     0          63m
ffmpeg-nvidia-29029-7wbbv              1/1     Running     0          14s # The running pod
ffmpeg-nvidia-5780-zkf6k               0/1     Completed   0          45m
ffmpeg-nvidia-8696-rnvdm               0/1     Pending     0          3s # The pending pod
repository-service-7c48d9b666-hdkf8    1/1     Running     0          63m

This is because the only GPU we have in this environment is assigned to the running pod. The pending pod is waiting for the running pod to complete its job and release the GPU. Once the running job completes, the pending job will start running its job as well.

NAME                                   READY   STATUS      RESTARTS   AGE
coordinator-service-66bdfb7d74-qv4qs   1/1     Running     0          63m
ffmpeg-nvidia-29029-7wbbv              1/1     Completed   0          28s # The completed job
ffmpeg-nvidia-5780-zkf6k               0/1     Completed   0          45m
ffmpeg-nvidia-8696-rnvdm               1/1     running     0          14s # The now-running job
repository-service-7c48d9b666-hdkf8    1/1     Running     0          63m

Once all jobs are complete:

NAME                                   READY   STATUS      RESTARTS   AGE
coordinator-service-66bdfb7d74-qv4qs   1/1     Running     0          64m
ffmpeg-nvidia-29029-7wbbv              0/1     Completed   0          112s
ffmpeg-nvidia-5780-zkf6k               0/1     Completed   0          47m
ffmpeg-nvidia-8696-rnvdm               0/1     Completed   0          101s
repository-service-7c48d9b666-hdkf8    1/1     Running     0          64m

If you go back to the Grafana UI. You should start seeing the graphs, which are now visualizing the metrics of the GPU and its running workloads.

Note: you may want to use a larger video files to have a more long-running jobs to visualize these dashboards.

Navigate to the NVIDIA DCGM Exporter Dashboard, which shows additional data on the GPU metrics.

Using the Repository service, you can also retrieve the processed files.

To check whether a file exists on the repository, you can invoke a HEAD REST request. For example:

curl -I "http://$REPO_SVC_IP/files/sample-video-02_processed-output.mkv?token=$REPO_SVC_TOKEN"

HTTP status code of 200 in the response indicates the file exists.

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 307853707
Content-Type: video/webm
Last-Modified: Thu, 23 Mar 2023 12:58:22 GMT
Date: Thu, 23 Mar 2023 13:07:27 GMT

You can invoke a GET REST request to download a file from the repository. For example:

curl "http://$REPO_SVC_IP/files/sample-video-02_processed-output.mkv?token=$REPO_SVC_TOKEN" -o sample-video-02_processed-output.mkv

Wrap Up

While this post explicitly leverages the PCI passthrough method for GPU allocation, there are alternative solutions, such as the vGPU option, which allows for logical slicing of the GPU rather than allocating the entire GPU device to a single Kubernetes node. Each approach has pros and cons: The PCI passthrough approach may limit your vSphere infrastructure — for example, the ability to vMotion your Kubernetes nodes and scale out your Kubernetes cluster. On the other hand, the vGPU approach may impact the performance of the workloads utilizing the GPU. The PCI passthrough approach may potentially get better performance for your workloads.

If you plan to use GPUs in your TKG environment, I hope this guide gives you a good understanding of how this integration works on TKG as a reference for getting started.