Upgrading NSX ALB in a TKG Environment
For quite a long time, the highest version of the NSX ALB TKG supported was 20.1.6/20.1.3, although 21.1.x has been available for a while, and I have been wondering when TKG would support it.
In the release notes of TKG 1.5.4, I recently noticed a note that has been added regarding NSX ALB 21.1.x under the Configuration variables section:
AVI_CONTROLLER_VERSION sets the NSX Advanced Load Balancer (ALB) version for NSX ALB v21.1.x deployments in Tanzu Kubernetes Grid.
However, I couldn’t find any official reference for upgrading an existing NSX ALB instance in a TKG environment, so I made my research. I found two references for the NSX ALB controller version on my TKG management cluster:
In the AKO Operator Add-on secret:
kubectl get secret tkg-mgmt-cls-ako-operator-addon -n tkg-system -o jsonpath='{.data.values\.yaml}' | base64 -d#@data/values #@overlay/match-child-defaults missing_ok=True --- akoOperator: avi_enable: true namespace: tkg-system-networking cluster_name: tkg-mgmt-cls config: avi_disable_ingress_class: true avi_ingress_default_ingress_controller: false avi_ingress_shard_vs_size: "" avi_ingress_service_type: "" avi_ingress_node_network_list: '""' avi_admin_credential_name: avi-controller-credentials avi_ca_name: avi-controller-ca avi_controller: it-nsxalb-ctrl.terasky.demo avi_username: admin avi_password: sample-password avi_cloud_name: Default-Cloud avi_service_engine_group: Default-Group avi_management_cluster_service_engine_group: Default-Group avi_data_network: k8s-vips avi_data_network_cidr: 10.100.154.0/24 avi_control_plane_network: k8s-vips avi_control_plane_network_cidr: 10.100.154.0/24 avi_ca_data_b64: LS0tLS1CRUdJTiBDR...... avi_labels: '""' avi_disable_static_route_sync: true avi_cni_plugin: antrea avi_control_plane_ha_provider: true avi_management_cluster_vip_network_name: k8s-vips avi_management_cluster_vip_network_cidr: 10.100.154.0/24 avi_management_cluster_control_plane_vip_network_name: k8s-vips avi_management_cluster_control_plane_vip_network_cidr: 10.100.154.0/24 avi_control_plane_endpoint_port: 6443 avi_controller_version: 20.1.3 # The NSX ALB controller versionIn the AKO Deployment Configs:
kubectl get akodeploymentconfigs.networking.tkg.tanzu.vmware.com install-ako-for-all -o yaml # And kubectl get akodeploymentconfigs.networking.tkg.tanzu.vmware.com install-ako-for-management-cluster -o yamlapiVersion: networking.tkg.tanzu.vmware.com/v1alpha1 kind: AKODeploymentConfig metadata: ... name: install-ako-for-all ... spec: adminCredentialRef: name: avi-controller-credentials namespace: tkg-system-networking certificateAuthorityRef: name: avi-controller-ca namespace: tkg-system-networking cloudName: Default-Cloud controlPlaneNetwork: cidr: 10.100.154.0/24 name: k8s-vips controller: it-nsxalb-ctrl.terasky.demo controllerVersion: 20.1.3 # The NSX ALB controller version dataNetwork: cidr: 10.100.154.0/24 name: k8s-vips extraConfigs: cniPlugin: antrea disableStaticRouteSync: true ingress: defaultIngressController: false disableIngressClass: true l4Config: autoFQDN: disabled networksConfig: {} serviceEngineGroup: Default-Group
As you can see, there are references for the default NSX ALB controller version, which is 20.1.3. Since AKO Operator uses the 20.1.3 API when interacting with NSX ALB, I realized that upgrading NSX ALB without updating AKO Operator to match the new version might break the compatibility between the two, so I came up with this shell script. The script takes two inputs - TKG management cluster name and the target NSX ALB controller version. It then patches the AKO Operator Add-on secret and the AKO Deployment Configs using kubectl. The script is only executed on the management cluster. Once the AKO Operator is updated and the ako-operator, load-balancer-and-ingress-service, and tanzu-addons-manager packages are reconciled, kapp-controller will apply the new configuration on all workload clusters.
Upgrade Instructions
For this example, I am upgrading a 3-node NSX ALB cluster from version 20.1.x to 21.1.4.
First, obtain the relevant upgrade package from https://portal.avipulse.vmware.com/software/vantage. It is typically a .pkg file.
Before starting the upgrade, log in to any of the NSX ALB controllers, go to Administration > Controller > Nodes, and ensure all NSX ALB controllers are healthy and active.
Go to Administration > Controller > Software and click Upload From Computer, then select your upgrade file, (e.g. controller-21.1.4-2p3-9009.pkg).
Wait for the upload to complete.
Once the upload completes, go to Administration > Controller > System Update, select the new version you have just uploaded at the bottom, and click Upgrade.
In the upgrade dialog, ensure the Upgrade All Service Engine Groups option is selected and keep the defaults, then click Continue.
Review any warnings raised by the pre-checks. It is usually safe to proceed. Click Confirm when ready.
Wait for the upgrade to complete.
The NSX ALB cluster VIP will be unavailable during the upgrade. However, if you wish to monitor the upgrade process, you can browse to any of the controllers. You should see the Upgrade in progress page.
Once the upgrade completes, you can access the UI from the cluster VIP address and observe the status of the nodes under Administration > Controller > Nodes. Ensure all nodes are active.
If you have also just upgraded to version
21.x.x, you have probably noticed the new UI. As of NSX ALB21.x.x, the UI is Clarity-based, like many other VMware products these days.
Updating TKG Configuration
Now that NSX ALB is upgraded, you must update the TKG configuration, reflecting the version you upgraded to.
As mentioned before, you can do so using my script, which is available on GitHub.
Clone my TKG GitHub repository and cd into vmware-tkg/helpers/tkg-nsxalb-upgrade.
Ensure the script is executable on your machine.
chmod +x tkg-update-nsxalb-version.sh*
Execute the script using the following syntax:
./tkg-update-nsxalb-version.sh <TKG_MGMT_CLUSTER_NAME> <NSXBALB_CONTROLLER_VERSION>
For example:
./tkg-update-nsxalb-version.sh tkg-mgmt-cls '21.1.4'
Example output:
Base directory: .
✔ successfully logged in to management cluster using the kubeconfig tkg-mgmt-cls
Checking for required plugins...
All required plugins are already installed and up-to-date
Tanzu context tkg-mgmt-cls has been set
Setting kubectl context
Switched to context "tkg-mgmt-cls-admin@tkg-mgmt-cls".
kubectl context tkg-mgmt-cls-admin@tkg-mgmt-cls has been set
Patching AKO Operator config
secret/tkg-mgmt-cls-ako-operator-addon patched
Patching AKODeploymentConfig resources
Patching resource 'akodeploymentconfig.networking.tkg.tanzu.vmware.com/install-ako-for-all'
akodeploymentconfig.networking.tkg.tanzu.vmware.com/install-ako-for-all patched
Patching resource 'akodeploymentconfig.networking.tkg.tanzu.vmware.com/install-ako-for-management-cluster'
akodeploymentconfig.networking.tkg.tanzu.vmware.com/install-ako-for-management-cluster patched
Target cluster 'https://10.100.154.230:6443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
App 'ako-operator' is owned by 'PackageInstall/ako-operator'
Triggering reconciliation for app 'ako-operator' in namespace 'tkg-system'2:03:00AM: Triggering reconciliation for app 'ako-operator' in namespace 'tkg-system'
2:03:00AM: Waiting for app reconciliation for 'ako-operator'
2:03:56AM: Fetching
| apiVersion: vendir.k14s.io/v1alpha1
| directories:
| - contents:
| - imgpkgBundle:
| image: projects.registry.vmware.com/tkg/packages/core/ako-operator@sha256:f1fd17e8de5b92f66c566050c557fd688ffe75205a97d2a646569d3587108462
| path: .
| path: "0"
| kind: LockConfig
|
2:03:56AM: Fetch succeeded
2:03:56AM: Template succeeded
2:03:56AM: Deploy started (2s ago)
2:04:16AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
| Changes
| Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
| (cluster) akodeploymentconfigs.networking.tkg.tanzu.vmware.com CustomResourceDefinition 0/0 t 3h - - reconcile ongoing Condition Established is not set
| Op: 0 create, 0 delete, 0 update, 1 noop
| Wait to: 1 reconcile, 0 delete, 0 noop
| 2:04:14AM: ---- applying 1 changes [0/1 done] ----
| 2:04:14AM: noop customresourcedefinition/akodeploymentconfigs.networking.tkg.tanzu.vmware.com (apiextensions.k8s.io/v1) cluster
| 2:04:14AM: ---- waiting on 1 changes [0/1 done] ----
2:04:18AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
| Changes
| Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
| (cluster) akodeploymentconfigs.networking.tkg.tanzu.vmware.com CustomResourceDefinition 0/0 t 3h - - reconcile ongoing Condition Established is not set
| Op: 0 create, 0 delete, 0 update, 1 noop
| Wait to: 1 reconcile, 0 delete, 0 noop
| 2:04:14AM: ---- applying 1 changes [0/1 done] ----
| 2:04:14AM: noop customresourcedefinition/akodeploymentconfigs.networking.tkg.tanzu.vmware.com (apiextensions.k8s.io/v1) cluster
| 2:04:14AM: ---- waiting on 1 changes [0/1 done] ----
| 2:04:18AM: ok: reconcile customresourcedefinition/akodeploymentconfigs.networking.tkg.tanzu.vmware.com (apiextensions.k8s.io/v1) cluster
| 2:04:18AM: ---- applying complete [1/1 done] ----
| 2:04:18AM: ---- waiting complete [1/1 done] ----
| Succeeded
2:04:18AM: Deploy succeeded (1s ago)
Succeeded
Target cluster 'https://10.100.154.230:6443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
App 'load-balancer-and-ingress-service' is owned by 'PackageInstall/load-balancer-and-ingress-service'
Triggering reconciliation for app 'load-balancer-and-ingress-service' in namespace 'tkg-system'2:04:19AM: Triggering reconciliation for app 'load-balancer-and-ingress-service' in namespace 'tkg-system'
2:04:19AM: Waiting for app reconciliation for 'load-balancer-and-ingress-service'
2:04:28AM: Fetching
| apiVersion: vendir.k14s.io/v1alpha1
| directories:
| - contents:
| - imgpkgBundle:
| image: projects.registry.vmware.com/tkg/packages/core/load-balancer-and-ingress-service@sha256:10bbc6abb07ea096ca82924ad0d44881d4c076131751c46c0dc64b1b57275423
| path: .
| path: "0"
| kind: LockConfig
|
2:04:28AM: Fetch succeeded
2:04:28AM: Template succeeded
2:04:28AM: Deploy started (2s ago)
2:04:48AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
| Changes
| Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
| (cluster) gatewayclasses.networking.x-k8s.io CustomResourceDefinition 0/0 t 3h - - reconcile ongoing Condition Established is not set
| ^ gateways.networking.x-k8s.io CustomResourceDefinition 0/0 t 3h - - reconcile ongoing Condition Established is not set
| Op: 0 create, 0 delete, 0 update, 2 noop
| Wait to: 2 reconcile, 0 delete, 0 noop
| 2:04:47AM: ---- applying 2 changes [0/2 done] ----
| 2:04:47AM: noop customresourcedefinition/gatewayclasses.networking.x-k8s.io (apiextensions.k8s.io/v1) cluster
| 2:04:47AM: noop customresourcedefinition/gateways.networking.x-k8s.io (apiextensions.k8s.io/v1) cluster
| 2:04:47AM: ---- waiting on 2 changes [0/2 done] ----
2:04:50AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
| Changes
| Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
| (cluster) gatewayclasses.networking.x-k8s.io CustomResourceDefinition 0/0 t 3h - - reconcile ongoing Condition Established is not set
| ^ gateways.networking.x-k8s.io CustomResourceDefinition 0/0 t 3h - - reconcile ongoing Condition Established is not set
| Op: 0 create, 0 delete, 0 update, 2 noop
| Wait to: 2 reconcile, 0 delete, 0 noop
| 2:04:47AM: ---- applying 2 changes [0/2 done] ----
| 2:04:47AM: noop customresourcedefinition/gatewayclasses.networking.x-k8s.io (apiextensions.k8s.io/v1) cluster
| 2:04:47AM: noop customresourcedefinition/gateways.networking.x-k8s.io (apiextensions.k8s.io/v1) cluster
| 2:04:47AM: ---- waiting on 2 changes [0/2 done] ----
| 2:04:50AM: ok: reconcile customresourcedefinition/gatewayclasses.networking.x-k8s.io (apiextensions.k8s.io/v1) cluster
| 2:04:50AM: ok: reconcile customresourcedefinition/gateways.networking.x-k8s.io (apiextensions.k8s.io/v1) cluster
| 2:04:50AM: ---- applying complete [2/2 done] ----
| 2:04:50AM: ---- waiting complete [2/2 done] ----
| Succeeded
2:04:50AM: Deploy succeeded (1s ago)
Succeeded
Target cluster 'https://10.100.154.230:6443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
App 'tanzu-addons-manager' is owned by 'PackageInstall/tanzu-addons-manager'
Triggering reconciliation for app 'tanzu-addons-manager' in namespace 'tkg-system'2:04:51AM: Triggering reconciliation for app 'tanzu-addons-manager' in namespace 'tkg-system'
2:04:52AM: Waiting for app reconciliation for 'tanzu-addons-manager'
2:04:52AM: Waiting for generation 4 to be observed
2:05:01AM: Fetching
| apiVersion: vendir.k14s.io/v1alpha1
| directories:
| - contents:
| - imgpkgBundle:
| image: projects.registry.vmware.com/tkg/packages/core/addons-manager@sha256:248662abcbf966fdda0b342906a6b70c19f94459f8f4d6a8d78210e6ae23c694
| path: .
| path: "0"
| kind: LockConfig
|
2:05:01AM: Fetch succeeded
2:05:01AM: Template succeeded
2:05:01AM: Deploy started (2s ago)
2:05:19AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: tkg-mgmt-cls-control-plane-cvcmd, 5+)
| Changes
| Namespace Name Kind Conds. Age Op Op st. Wait to Rs Ri
| Op: 0 create, 0 delete, 0 update, 0 noop
| Wait to: 0 reconcile, 0 delete, 0 noop
| Succeeded
2:05:19AM: Deploy succeeded (2s ago)
Succeeded
Done!
That’s it. If you now look at the AKO Operator Add-on secret and the AKO Deployment Configs, you will see that the controller version has been updated. You can do so using the same commands I mentioned before:
kubectl get secret tkg-mgmt-cls-ako-operator-addon -n tkg-system -o jsonpath='{.data.values\.yaml}' | base64 -d
kubectl get akodeploymentconfigs.networking.tkg.tanzu.vmware.com install-ako-for-all -o yaml
kubectl get akodeploymentconfigs.networking.tkg.tanzu.vmware.com install-ako-for-management-cluster -o yaml
For new TKG management clusters, ensure the
AVI_CONTROLLER_VERSIONparameter is set to the NSX ALB controller version in your cluster config YAML file. For example:AVI_CONTROLLER_VERSION: 21.1.4. According to the TKG 1.6 release notes, this is no longer required, as the NSX ALB version will be detected automatically.
I hope this helps anyone looking to upgrade NSX ALB in a TKG environment, and hopefully, this process will be automated within TKG at some point.










