Replacing your vCenter server certificate? TKG needs to know about it…
I recently ran into an issue where TKGm had suddenly failed to connect to the vCenter server.
The issue turned out to be TLS-related, and I noticed that the vCenter server certificate had been replaced…
Due to the certificate issue, Cluster API components failed to communicate with vSphere, causing cluster reconciliation to fail, among other vSphere-related operations.
Since all TKG clusters in the environment were deployed with the VSPHERE_TLS_THUMBPRINT parameter specified, replacing the vCenter certificate breaks the connection to vSphere, as the TLS thumbprint changes as well.
Unfortunately, there is no official reference for updating the TLS thumbprint in TKG, so I’ve done my research.
I found occurrences of the TLS thumbprint on the TKG management cluster, present on the vspherecluster custom resources, and the CPI (cloud provider interface) add-on secrets.
$ kubectl describe vspherecluster tkg-wld-cls
...
Owner References:
API Version: cluster.x-k8s.io/v1beta1
...
Spec:
Identity Ref:
Kind: Secret
Name: tkg-wld-cls
Server: my-vc-01.domain.com
Thumbprint: 26:3A:FF:3E:01:84:36:F5:BC:18:80:27:0E:14:59:AB:8E:88:EB:46
Status:
Conditions:
Last Transition Time: 2022-06-10T01:53:26Z
Status: True
...
$ kubectl get secret tkg-wld-cls-vsphere-cpi-addon -o jsonpath='{.data.values\.yaml}' | base64 -d
#@data/values
#@overlay/match-child-defaults missing_ok=True
---
vsphereCPI:
tlsThumbprint: 26:3A:FF:3E:01:84:36:F5:BC:18:80:27:0E:14:59:AB:8E:88:EB:46
server: my-vc-01.domain.com
datacenter: /my-datacenter
...
*In case you’re wondering (as I was, too), surprisingly, the TLS thumbprint is not present in the vSphere CSI config. vSphere CSI seems to skip TLS verification, at least as of TKGm 1.5.4, even if the TLS thumbprint is specified in your cluster config. So, as of now, there is nothing to update on the vSphere CSI.
After gathering this information, I thought it would be tedious to go through all clusters and update the TLS thumbprint of the new certificate everywhere, so I thought I might as well automate this process, so I came up with this script.
The script loops through all TKG clusters and patches the vspherecluster custom resources and the CPI add-on secrets to replace the old TLS thumbprint with the new one.
You can find the scripts and detailed instructions on my TKG repository on GitHub.
Usage Instructions
Clone my GitHub repository and cd into vmware-tkg/helpers/tkg-update-vcenter-tls-thumbprint.
git clone https://github.com/itaytalmi/vmware-tkg.git
cd vmware-tkg/helpers/tkg-update-vcenter-tls-thumbprint
Ensure the script is executable.
chmod +x tkg-update-vcenter-tls-thumbprint.sh
Retrieve the new vCenter TLS thumbprint. The thumbprint must be exactly in this format:
26:3A:FF:3E:01:84:36:F5:BC:18:80:27:0E:14:59:AB:8E:1B:9E:53
You can easily extract the thumbprint in this format using govc. For example:
export GOVC_INSECURE=true
export GOVC_URL=your_vcenter_fqdn
export GOVC_USERNAME=your_vsphere_user
export GOVC_PASSWORD=your_vsphere_password
govc about.cert -thumbprint
Execute the script using the following syntax:
./tkg-update-vcenter-tls-thumbprint.sh <TKG_MGMT_CLUSTER_NAME> <VCENTER_TLS_THUMBPRINT>
For example:
./tkg-update-vcenter-tls-thumbprint.sh tkg-mgmt-cls '26:3A:FF:3E:01:84:36:F5:BC:18:80:27:0E:14:59:AB:8E:1B:9E:53'
Note: Make sure you specify the thumbprint inside the single quotes as shown above.
Example output:
✔ successfully logged in to management cluster using the kubeconfig tkg-mgmt-cls
Checking for required plugins...
All required plugins are already installed and up-to-date
Tanzu context tkg-mgmt-cls has been set
Setting kubectl context
Switched to context "tkg-mgmt-cls-admin@tkg-mgmt-cls".
kubectl context tkg-mgmt-cls-admin@tkg-mgmt-cls has been set
Updating vCenter TLS thumbprint for workload cluster 'tkg-shared-services-cls'
secret/tkg-shared-services-cls-vsphere-cpi-addon patched
vspherecluster.infrastructure.cluster.x-k8s.io/tkg-shared-services-cls patched
Updating vCenter TLS thumbprint for workload cluster 'tkg-win-wld-cls'
secret/tkg-win-wld-cls-vsphere-cpi-addon patched
vspherecluster.infrastructure.cluster.x-k8s.io/tkg-win-wld-cls patched
Updating vCenter TLS thumbprint for workload cluster 'tkg-wld-cls'
secret/tkg-wld-cls-vsphere-cpi-addon patched
vspherecluster.infrastructure.cluster.x-k8s.io/tkg-wld-cls patched
secret/tkg-mgmt-cls-vsphere-cpi-addon patched
vspherecluster.infrastructure.cluster.x-k8s.io/tkg-mgmt-cls patched
The above will cause the CPI to reconcile on all clusters. Once CPI is reconciled, it will trust the new TLS thumbprint.
Although I did not run into any issues after applying the script, it has been reported on the TCE community that, in some cases, it is required to reboot the cluster nodes for the new thumbprint to apply.
