Streamlining and Customizing Windows Image Builder for TKG
Tanzu Kubernetes Grid (TKG) is one of the few platforms providing out-of-the-box support and streamlined deployment of Windows Kubernetes clusters. VMware is actively investing in this area and constantly improving the support and capabilities around Windows on Kubernetes.
Unlike Linux-based clusters, for which VMware provides pre-packaged base OS images (typically based on Ubuntu and Photon OS), VMware cannot offer Windows pre-packaged images, primarily due to licensing restrictions, I suppose. Therefore, building your own Windows base OS image is one of the prerequisites for deploying a TKG Windows workload cluster. Fortunately, VMware leverages the upstream Image Builder project - a fantastic collection of cross-provider Kubernetes virtual machine image-building utilities intended to simplify and streamline the creation of base OS images for Kubernetes.
This process still requires you to go through several manual steps, which can often be tedious and error-prone, especially when managing those images’ lifecycle - patching, customizing, etc.
I have been involved in several activities involving Windows on Kubernetes and recently decided to build a single-click, automated process for these tasks. Some of these activities also required further customization of the Windows image, such as injecting the CSI Proxy to the Windows nodes, modifying the Windows registry, and more.
This post will walk you through building a Windows image for TKG using my automated approach and customizing the image to meet your needs. The customization parts were not obvious to me initially and took quite a bit of research, so I hope to save you some time on that as well. :)
Prerequisites
To follow along, you will need the following:
- A Linux workstation with Docker and Ansible installed. If you already use a jumpbox machine to manage a TKG environment, it probably makes sense to use that machine.
- A recent Windows Server 2019 ISO image (newer than April 2021). Download through your Microsoft Developer Network (MSDN) or Volume Licensing (VL) account. The use of evaluation media is not supported or recommended but should be OK for testing purposes.
- The latest VMware Tools Windows ISO image from here.
- A TKG management cluster. In this guide, I use TKG 2.1.1.
- Clone my tkg-windows-image-builder GitHub repository
Overview
My approach for building the Windows image for TKG is a two-step process:
Run an Ansible Playbook to generate the configuration needed for building the image. The Ansible Playbook leverages two Ansible roles.
roles ├── generate-image-builder-config │ ├── defaults │ │ └── main.yml │ ├── files │ │ └── ansible │ │ ├── playbook.yml │ │ └── roles │ │ ├── csi-proxy-setup │ │ │ ├── defaults │ │ │ │ └── main.yml │ │ │ ├── files │ │ │ │ ├── csi-proxy.exe │ │ │ │ └── exe_version_notes.txt │ │ │ ├── playbook.yaml │ │ │ └── tasks │ │ │ └── main.yml │ │ └── win-regedit │ │ ├── defaults │ │ │ └── main.yml │ │ ├── tasks │ │ │ └── main.yml │ │ └── vars │ │ └── main.yml │ ├── tasks │ │ └── main.yml │ └── templates │ ├── autounattend.xml.j2 │ ├── build-image.sh.j2 │ ├── builder.yaml.j2 │ └── windows.json.j2 └── upload-datastore-files └── tasks └── main.ymlThe
generate-image-builder-configrole is responsible for the following:- Deploying the TKG Windows Resource Bundle on your TKG management cluster. The bundle is a web server pod containing the required artifacts for building the image. During the build process, the artifacts are downloaded to the Windows image.
- Parsing the resource bundle artifact information from the Windows Resource Bundle and templating/generating the
windows.jsonusing the parsed data. - Templating/generating the
autounattend.xmlfile, used by the Windows image for generalization and initial customization of various system settings. - Generating the
build-image.shshell script used to trigger the build process in the next step.
The
upload-datastore-filesrole is responsible for uploading your VMware Tools ISO and Windows ISO to a vSphere datastore.Execute a shell script generated by the Ansible Playbook in the previous step to trigger the build process.
Building the Windows Image
First, ensure you have the required Ansible modules and pip packages the Ansible Playbook requires (kubernetes and pyvmomi).
pip3 install --upgrade -r requirements.txt
ansible-galaxy install -r requirements.yml
Place your VMware Tools ISO and Windows ISO on the jumpbox/machine you are working on. Mine are placed under /home/k8s/tkg:
$ tree /home/k8s/tkg
/home/k8s/tkg
├── VMware-tools-windows-12.2.0-21223074.iso
└── en-us_windows_server_2019_updated_aug_2021_x64_dvd_a6431a28.iso
0 directories, 2 files
Modify the vars.yml file. Refer to the example on my GitHub repository.
# TKG management cluster kubectl context
k8s_context: l02-tkg-mgmt-admin@l02-tkg-mgmt
# vSphere & Packer
vcenter_hostname: vcsa.cloudnativeapps.cloud
vcenter_username: your-vsphere-username
vcenter_password: your-vsphere-password
vsphere_datacenter: Main
vsphere_cluster: LAB-V3
vsphere_vm_folder: LABS/itay/l01/tkg/templates
vsphere_datastore: LAB-V3-vSANDatastore
vsphere_network: itay-k8s-mgmt
# Uncomment and set vsphere_resource_pool if desired
# vsphere_resource_pool: US
# Windows OS
vsphere_iso_datastore: LAB-V3-vSANDatastore
vsphere_datastore_dir_path: iso/tkg-windows
vm_tools_iso_file_name: VMware-tools-windows-12.2.0-21223074.iso
windows_iso_file_name: en-us_windows_server_2019_updated_aug_2021_x64_dvd_a6431a28.iso
# Valid options typically include: Windows Server 2019 SERVERSTANDARDCORE, Windows Server 2019 SERVERDATACENTERCORE
windows_image_id: Windows Server 2019 SERVERSTANDARDCORE
windows_updates_categories: CriticalUpdates SecurityUpdates UpdateRollups
# Comment out windows_product_key if an evaluation of Windows is desired
# windows_product_key: "your-windows-product-key"
vsphere_datastore_files:
- "/home/{{ ansible_user_id }}/tkg/{{ vm_tools_iso_file_name }}"
- "/home/{{ ansible_user_id }}/tkg/{{ windows_iso_file_name }}"
docker_enable_custom_ansible_roles: true
ansible_user_vars: "node_custom_roles_post='/home/imagebuilder/ansible/windows/custom_roles/csi-proxy-setup /home/imagebuilder/ansible/windows/custom_roles/win-regedit'"
The docker_enable_custom_ansible_roles parameter includes additional customizations for the Windows image in the build process.
As shown in the above example, these customizations are typically Ansible roles you specify in ansible_user_vars in a space-separated format. Refer to this directory in the GitHub repository as an example. For this example, I have included the injection of CSI Proxy into the Windows image and sample registry modifications.
To add your custom Ansible roles to the build process, place them under the roles/generate-image-builder-config/files/ansible/roles folder. Then, edit vars.yml and add your roles in the ansible_user_vars parameter. For example: "node_custom_roles_post='/home/imagebuilder/ansible/windows/custom_roles/csi-proxy-setup /home/imagebuilder/ansible/windows/custom_roles/win-regedit'"
Note: if you are using a different TKG release, you must override some of the default values specified in the Ansible role. These values are bound to a specific TKG release. Ensure the configuration matches your version of TKG. Refer to the official documentation for any further clarifications on these parameters. You can override any of the parameters by specifying them in your
vars.ymlfile.
Execute the Playbook:
ansible-playbook playbook.yml
Example output:
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [Generate Configurations and Prerequisites for Windows TKG Workload Clusters] ***************************************************************************
TASK [Gathering Facts] ***************************************************************************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Get Kubernetes Control Plane nodes] ************************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Extract a Control Plane node's IP address] *****************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Deploy TKG Windows Resource Bundle] ************************************************************************************
changed: [localhost]
TASK [generate-image-builder-config : Wait for TKG Windows Resource Bundle deployment] ***********************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Wait for TKG Windows Resource Bundle deployment] ***********************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Wait for TKG Windows Resource Bundle website to be available and retrieve JSON output] *********************************
ok: [localhost]
TASK [generate-image-builder-config : Convert TKG Windows Resource Bundle output to JSON] ********************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Parse TKG Windows Resource Bundle component details] *******************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Generate Windows JSON] *************************************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Create output directory for generated files] ***************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Generate windows.json] *************************************************************************************************
changed: [localhost]
TASK [generate-image-builder-config : Generate autounattend.xml content] *************************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Generate autounattend.xml] *********************************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Generate build-image.sh] ***********************************************************************************************
ok: [localhost]
TASK [generate-image-builder-config : Copy custom Ansible roles for Docker volume] ***************************************************************************
ok: [localhost]
TASK [upload-datastore-files : Create directory on datastore] ************************************************************************************************
ok: [localhost]
TASK [upload-datastore-files : Upload files to datastore] ****************************************************************************************************
changed: [localhost] => (item=/home/k8s/tkg/VMware-tools-windows-12.2.0-21223074.iso)
changed: [localhost] => (item=/home/k8s/tkg/en-us_windows_server_2019_updated_aug_2021_x64_dvd_a6431a28.iso)
PLAY RECAP ***************************************************************************************************************************************************
localhost : ok=18 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
An output directory has now been created for you by the Ansible Playbook.
$ tree output/
output/
├── ansible
│ ├── playbook.yml
│ └── roles
│ ├── csi-proxy-setup
│ │ ├── defaults
│ │ │ └── main.yml
│ │ ├── files
│ │ │ ├── csi-proxy.exe
│ │ │ └── exe_version_notes.txt
│ │ ├── playbook.yaml
│ │ └── tasks
│ │ └── main.yml
│ └── win-regedit
│ ├── defaults
│ │ └── main.yml
│ ├── tasks
│ │ └── main.yml
│ └── vars
│ └── main.yml
├── autounattend.xml
├── build-image.sh
└── windows.json
10 directories, 12 files
If you look at build-image.sh and windows.json, you will see the templated Packer configuration files and the shell script you will use to trigger the Packer build process.
Also, notice the ansible directory. This directory contains the customizations/additions we include in our build. The ansible directory will be mounted to the Docker container responsible for building the image.
windows.json example content:
{
"additional_download_files": "",
"additional_executables": "true",
"additional_executables_destination_path": "C:\\ProgramData\\Temp",
"additional_executables_list": "http://172.16.52.174:30008/files/antrea-windows/antrea-windows-advanced.zip,http://172.16.52.174:30008/files/kubernetes/kube-proxy.exe",
"additional_prepull_images": "mcr.microsoft.com/windows/servercore:ltsc2019",
"ansible_user_vars": "node_custom_roles_post='/home/imagebuilder/ansible/windows/custom_roles/csi-proxy-setup /home/imagebuilder/ansible/windows/custom_roles/win-regedit'",
"build_version": "windows-2019-kube-v1.24.10",
"cloudbase_init_url": "http://172.16.52.174:30008/files/cloudbase_init/CloudbaseInitSetup_1_1_4_x64.msi",
"cluster": "LAB-V3",
"containerd_sha256_windows": "d29f5276584e869a5933db668fd6f17b7417c48ac04dd1c2a2c7f412f948f89c",
"containerd_url": "http://172.16.52.174:30008/files/containerd/cri-containerd-v1.6.6+vmware.3.windows-amd64.tar",
"convert_to_template": "true",
"create_snapshot": "false",
"datacenter": "Main",
"datastore": "LAB-V3-vSANDatastore",
"disable_hypervisor": "false",
"disk_size": "81920",
"folder": "LABS/itay/l01/tkg/templates",
"goss_inspect_mode": "true",
"insecure_connection": "true",
"kubernetes_base_url": "http://172.16.52.174:30008/files/kubernetes/",
"kubernetes_semver": "v1.24.10+vmware.1",
"kubernetes_series": "v1.24.10",
"linked_clone": "false",
"load_additional_components": "true",
"netbios_host_name_compatibility": "false",
"network": "itay-k8s-mgmt",
"os_iso_path": "[LAB-V3-vSANDatastore] iso/tkg-windows/en-us_windows_server_2019_updated_aug_2021_x64_dvd_a6431a28.iso",
"password": "your-vsphere-password",
"pause_image": "mcr.microsoft.com/oss/kubernetes/pause:3.6",
"prepull": "false",
"resource_pool": "US",
"runtime": "containerd",
"template": "",
"unattend_timezone": "Eastern Standard Time",
"username": "your-vsphere-username",
"vcenter_server": "[email protected]",
"vmtools_iso_path": "[LAB-V3-vSANDatastore] iso/tkg-windows/VMware-tools-windows-12.2.0-21223074.iso",
"windows_updates_categories": "CriticalUpdates SecurityUpdates UpdateRollups",
"windows_updates_kbs": "",
"wins_url": "http://172.16.52.174:30008/files/wins/wins.exe",
"wins_version": "0.0.4"
}
You didn’t have to fill in all that information manually, though. Ansible did that for you. :)
If you are curious about the builder pod (the web server hosting the Windows artifacts), you can browse (or curl) to any of your management cluster node IPs via port 30008 to view the available content on that web server. For example:
# Get the IP address of any node in the cluster
NODE_IP_ADDRESS=$(kubectl get nodes -o jsonpath="{.items[0].status.addresses[?(@.type=='ExternalIP')].address}")
# Build URL
BUILDER_URL=http://$NODE_IP_ADDRESS:30008
# Invoke the request using 'curl' and beautify using 'jq'
curl -s "$BUILDER_URL" | jq
Example output:
{
"nssm": {
"version": "v2.24",
"path": "files/nssm/nssm.exe"
},
"wins": {
"version": "v0.4.11",
"path": "files/wins/wins.exe"
},
"ssh": {
"version": "v9.2.0.0p1-Beta",
"path": "files/ssh/OpenSSH-Win64.zip"
},
"cloudbase_init": {
"version": "1.1.4",
"path": "files/cloudbase_init/CloudbaseInitSetup_1_1_4_x64.msi"
},
"goss": {
"version": "v0.3.21",
"path": "files/goss/goss-alpha-windows-amd64.exe"
},
"containerd": {
"version": "v1.6.6+vmware.3",
"path": "files/containerd/cri-containerd-v1.6.6+vmware.3.windows-amd64.tar",
"sha256": "d29f5276584e869a5933db668fd6f17b7417c48ac04dd1c2a2c7f412f948f89c"
},
"antrea-windows": {
"version": "v1.7.2+vmware.1-advanced",
"path": "files/antrea-windows/antrea-windows-advanced.zip",
"sha256": "c556ec4960e6a4bcdef1bde1ad0975615948aab83e4c5a20a25d864988001b3a"
},
"kubelet": {
"version": "v1.24.10+vmware.1",
"path": "files/kubernetes/kubelet.exe",
"sha256": "4612625b473634729002c37a606ecd3223aa45f37306a865ae4c273507017ae5"
},
"kubeadm": {
"version": "v1.24.10+vmware.1",
"path": "files/kubernetes/kubeadm.exe",
"sha256": "c18ab789a5cfa2f5883291e8371623ecdfdceef4ca41b34d34e6491ca501dc75"
},
"kubectl": {
"version": "v1.24.10+vmware.1",
"path": "files/kubernetes/kubectl.exe",
"sha256": "9a97de8d3c88e6f75bd96692738ec3e9c030fcb216ff02f1f2a3d108d600e6b1"
},
"kube-proxy": {
"version": "v1.24.10+vmware.1",
"path": "files/kubernetes/kube-proxy.exe",
"sha256": "3a14fc13181bfbb39246e81461b5351740c833243f5a4354f68c194685a329d2"
}
}
The Windows image will retrieve these artifacts from this repository during the build.
Run the build-image.sh shell script to build your image.
./output/build-image.sh
Your build process is now running Packer and Ansible inside a Docker container. This may take a while.
Example output:
...
vsphere: output will be in this color.
==> vsphere: the vm/template LABS/itay/l01/tkg/templates/windows-2019-kube-v1.24.10 already exists, but deleting it due to -force flag
==> vsphere: LABS/itay/l01/tkg/templates/windows-2019-kube-v1.24.10 is a template, attempting to convert it to a vm
==> vsphere: Creating VM...
==> vsphere: Customizing hardware...
==> vsphere: Mounting ISO images...
==> vsphere: Adding configuration parameters...
==> vsphere: Creating floppy disk...
vsphere: Copying files flatly from floppy_files
vsphere: Copying file: ./packer/ova/windows/windows-2019/autounattend.xml
vsphere: Copying file: ./packer/ova/windows/disable-network-discovery.cmd
vsphere: Copying file: ./packer/ova/windows/disable-winrm.ps1
vsphere: Copying file: ./packer/ova/windows/enable-winrm.ps1
vsphere: Copying file: ./packer/ova/windows/sysprep.ps1
vsphere: Done copying files from floppy_files
vsphere: Collecting paths from floppy_dirs
vsphere: Resulting paths from floppy_dirs : [./packer/ova/windows/pvscsi]
vsphere: Recursively copying : ./packer/ova/windows/pvscsi
vsphere: Done copying paths from floppy_dirs
vsphere: Copying files from floppy_content
vsphere: Done copying files from floppy_content
==> vsphere: Uploading created floppy image
==> vsphere: Adding generated Floppy...
==> vsphere: Set boot order temporary...
==> vsphere: Power on VM...
==> vsphere: Waiting for IP...
...
In vSphere, you can see your Windows image being prepared.
If you run docker ps from a different terminal, you will see the running container.
For example:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fbd09e298f59 projects.registry.vmware.com/tkg/image-builder:v0.1.13_vmware.2 "/usr/bin/make build…" About a minute ago Up About a minute great_bose
If you exec into the container, you can see our custom_roles directory mounted on the container.
docker exec -it <your-container-id> bash
ls /home/imagebuilder/ansible/windows/custom_roles
Output:
csi-proxy-setup win-regedit
This directory contains the custom logic we add to the build process, as described above.
Once the build process is finished, the output of your build should be similar to the following:
vsphere (shell-local): Opening OVF source: windows-2019-kube-v1.24.10+vmware.1.ovf
vsphere (shell-local): Opening OVA target: windows-2019-kube-v1.24.10+vmware.1.ova
vsphere (shell-local): Writing OVA package: windows-2019-kube-v1.24.10+vmware.1.ova
vsphere (shell-local): Transfer Completed
vsphere (shell-local): Completed successfully
vsphere (shell-local): image-build-ova: cd .
vsphere (shell-local): image-build-ova: loaded windows-2019-kube-v1.24.10+vmware.1
vsphere (shell-local): image-build-ova: create ovf windows-2019-kube-v1.24.10+vmware.1.ovf
vsphere (shell-local): image-build-ova: creating OVA from windows-2019-kube-v1.24.10+vmware.1.ovf using ovftool
vsphere (shell-local): image-build-ova: create ova checksum windows-2019-kube-v1.24.10+vmware.1.ova.sha256
2023/03/28 04:56:21 [INFO] (telemetry) ending shell-local
==> Wait completed after 50 minutes 20 seconds
Build 'vsphere' finished after 50 minutes 20 seconds.
And your Windows machine should now be converted to a template.
Wrap Up
You can also leverage the approach described in this post in any of your automated pipelines for generating your Windows images (such as Jenkins, vRA/vRO, etc.). Again, the main goal is to eliminate the burden of manually maintaining such processes.
You can even leverage native Kubernetes Cluster API (CAPI) capabilities to trigger a rollout of your Kubernetes nodes by patching your Windows MachineDeployment resource from your TKG management cluster whenever a new Windows image is generated.
For example:
From your management cluster, you can get the present MachineDeployments.
kubectl get md
Example output:
NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION
l02-tkg-wld-multi-md-0-lin-qbkr5 l02-tkg-wld-multi 3 3 3 0 Running 3d14h v1.24.10+vmware.1
l02-tkg-wld-multi-md-1-win-wz2nv l02-tkg-wld-multi 3 3 3 0 Running 3d14h v1.24.10+vmware.1
l02-tkg-wld-multi-md-1-win-wz2nv is my Windows MachineDeployment in this case, so I can run something like the following to trigger a rollout for my Windows nodes:
WIN_MD_NAME=l02-tkg-wld-multi-md-1-win-wz2nv
kubectl patch machinedeployment "$WIN_MD_NAME" --type merge -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}"
When a roll-out is initiated, the newly-created/updated Windows image will be used to replace the running nodes.
If you are looking into Windows on TKG/Kubernetes, I hope this guide gives you a good starting point as a reference.


