MinIO on vSphere - Automated Deployment and Onboarding

In the world of Kubernetes, reliable S3-compliant object storage is essential for tasks like storing backups. However, not everyone has access to a native S3-compatible solution, and setting one up can feel like a daunting task. MinIO, an open-source object storage solution, is a popular choice to fill this gap. Its lightweight, high-performance architecture makes it an excellent option for Kubernetes users seeking quick and reliable storage.

MinIO is also one of the most widely adopted open-source object storage solutions, thanks to its simplicity and S3 compatibility. It’s perfect for Kubernetes environments that need a reliable and scalable storage layer for backups, logs, or other data.

Continue reading

Fixing Missing TKRs in Existing TKGS Deployments

2024-05-01 4 min read Cloud Native Kubernetes Tanzu TKG

I regularly check the Tanzu Kubernetes Releases (TKR) release notes page for new updates. Yesterday, a new TKR was released with support for Kubernetes 1.28.8, and while attempting to test this new version in my TKGS environment, I realized that the TKR was not present in my environment and I started wondering why, as normally, when new TKRs are released, they immediately become available for deployment, since the vCenter is subscribed to the VMware public content library where all the TKRs are hosted. This time, that was not the case, so I started investigating.

Continue reading

CAPV: Addressing Node Provisioning Issues Due to an Invalid State of ETCD

2023-12-01 7 min read Cloud Native Kubernetes Tanzu TKG

I recently ran into a strange scenario on a Kubernetes cluster after a sudden and unexpected crash it had experienced due to an issue in the underlying vSphere environment. In this case, the cluster was a TKG cluster (in fact, it happened to be the TKG management cluster), however, the same situation could have occurred on any cluster managed by Cluster API Provider vSphere (CAPV).

I have seen clusters unexpectedly crash many times before and most of the time, they successfully went back online when all nodes were up and running. In this case, however, some of the nodes could not boot properly, and Cluster API started attempting their reconciliation.

Continue reading

CAPV: Fixing and Cleaning Up Idle vCenter Server Sessions

2023-11-01 4 min read Cloud Native Kubernetes Tanzu TKG

I recently ran into an issue causing the vCenter server to crash almost daily. What seemed to be a random vCenter issue initially, turned out to be related to CAPV (Cluster API Provider vSphere), running on some of our Kubernetes clusters. That was also an edge case I had not seen before, so I decided to document and share it here.

Initially, the issue we were witnessing on the vCenter server was the following:

Continue reading

Streamlining and Customizing Windows Image Builder for TKG

2023-03-01 11 min read Cloud Native Kubernetes Tanzu TKG

Tanzu Kubernetes Grid (TKG) is one of the few platforms providing out-of-the-box support and streamlined deployment of Windows Kubernetes clusters. VMware is actively investing in this area and constantly improving the support and capabilities around Windows on Kubernetes.

Unlike Linux-based clusters, for which VMware provides pre-packaged base OS images (typically based on Ubuntu and Photon OS), VMware cannot offer Windows pre-packaged images, primarily due to licensing restrictions, I suppose. Therefore, building your own Windows base OS image is one of the prerequisites for deploying a TKG Windows workload cluster. Fortunately, VMware leverages the upstream Image Builder project - a fantastic collection of cross-provider Kubernetes virtual machine image-building utilities intended to simplify and streamline the creation of base OS images for Kubernetes.

Continue reading

Tanzu Kubernetes Grid GPU Integration

2023-03-01 16 min read Cloud Native Kubernetes Tanzu TKG

I recently had to demonstrate Tanzu Kubernetes Grid and its GPU integration capabilities. Developing a good use case and assembling the demo required some preliminary research.

During my research, I reached out to Jay Vyas, staff engineer at VMware, SIG Windows lead for Kubernetes, a Kubernetes legend, and an awesome guy in general. :) For those who don’t know Jay, he is also one of the authors of the fantastic book Core Kubernetes (look it up!).

Continue reading

Replacing your vCenter server certificate? TKG needs to know about it…

2023-01-01 3 min read Cloud Native Kubernetes Tanzu TKG

I recently ran into an issue where TKGm had suddenly failed to connect to the vCenter server.

The issue turned out to be TLS-related, and I noticed that the vCenter server certificate had been replaced…

Due to the certificate issue, Cluster API components failed to communicate with vSphere, causing cluster reconciliation to fail, among other vSphere-related operations.

Since all TKG clusters in the environment were deployed with the VSPHERE_TLS_THUMBPRINT parameter specified, replacing the vCenter certificate breaks the connection to vSphere, as the TLS thumbprint changes as well.

Continue reading
Older posts