<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Build. Run. Repeat.</title><link>https://buildrunrepeat.com/</link><description>Recent content on Build. Run. Repeat.</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Wed, 01 Jan 2025 09:00:00 -0400</lastBuildDate><atom:link href="https://buildrunrepeat.com/index.xml" rel="self" type="application/rss+xml"/><item><title>HashiCorp Consul Service Mesh on Kubernetes Series - Part 1 - Introduction and Setup</title><link>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-01-intro-and-setup/</link><pubDate>Wed, 01 Jan 2025 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-01-intro-and-setup/</guid><description>&lt;p&gt;Modern cloud-native architectures rely heavily on microservices, and Kubernetes has become the go-to platform for deploying, managing, and scaling these distributed applications. As the number of microservices grows, ensuring secure, reliable, and observable service-to-service communication becomes increasingly complex. This is where service mesh solutions, such as HashiCorp Consul, step in to provide a seamless approach to managing these challenges. In this blog post, we will delve into the integration of HashiCorp Consul Service Mesh with Kubernetes, exploring its architecture, features, and step-by-step deployment guide.&lt;/p&gt;</description></item><item><title>HashiCorp Consul Service Mesh on Kubernetes Series - Part 2 - Observability</title><link>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-02-observability/</link><pubDate>Wed, 01 Jan 2025 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-02-observability/</guid><description>&lt;p&gt;Modern service meshes require robust observability to ensure seamless operations, proactive troubleshooting, and performance optimization. In this section, we explore the observability features of HashiCorp Consul Service Mesh, including visualizing the service mesh, querying metrics, distributed tracing, and logging and auditing.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="visualizing-the-service-mesh"&gt;Visualizing the Service Mesh&lt;/h2&gt;
&lt;p&gt;The Consul UI is used for visualizing the service mesh and its topology.&lt;/p&gt;
&lt;p&gt;Use the &lt;code&gt;watch&lt;/code&gt; command to send requests to the application continually. Make sure HTTP status code &lt;code&gt;200&lt;/code&gt; is returned in the output.&lt;/p&gt;</description></item><item><title>HashiCorp Consul Service Mesh on Kubernetes Series - Part 3 - Traffic Management</title><link>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-03-traffic-mgmt/</link><pubDate>Wed, 01 Jan 2025 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-03-traffic-mgmt/</guid><description>&lt;p&gt;Efficient traffic management is essential for maintaining application reliability, optimizing performance, and implementing advanced deployment strategies in a service mesh. HashiCorp Consul provides powerful traffic management capabilities through service routers, splitters, and resolvers. In this section, we explore request routing, traffic shifting, request timeouts, and circuit breaking.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="request-routing"&gt;Request Routing&lt;/h2&gt;
&lt;p&gt;This section shows you how to route requests dynamically to multiple versions of a microservice.&lt;/p&gt;
&lt;p&gt;The Bookinfo sample consists of four separate microservices, each with multiple versions. Three different versions of one of the microservices, &lt;code&gt;reviews&lt;/code&gt;, have been deployed and are running concurrently. To illustrate the problem this causes, access the Bookinfo app&amp;rsquo;s &lt;code&gt;/productpage&lt;/code&gt; in a browser and refresh several times.&lt;/p&gt;</description></item><item><title>HashiCorp Consul Service Mesh on Kubernetes Series - Part 4 - Security</title><link>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-04-security/</link><pubDate>Wed, 01 Jan 2025 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/hashicorp-consul-k8s-service-mesh-series-04-security/</guid><description>&lt;p&gt;Security is a fundamental aspect of any service mesh, ensuring that all service-to-service communication is secure, controlled, and auditable. HashiCorp Consul provides robust security features, including mutual TLS (mTLS), access control, and rate limiting.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="mtls"&gt;mTLS&lt;/h2&gt;
&lt;p&gt;In this section, we will demonstrate mTLS with Consul. Consul enables and strictly enforces mTLS by default. All traffic sent through the Consul Connect Service Mesh is encrypted.&lt;/p&gt;
&lt;p&gt;This section is slightly different from the Istio mTLS section because:&lt;/p&gt;</description></item><item><title>HashiCorp Vault Enterprise - Performance Replication on Kubernetes</title><link>https://buildrunrepeat.com/posts/hashicorp-vault-enterprise-performance-replication-on-k8s/</link><pubDate>Wed, 01 Jan 2025 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/hashicorp-vault-enterprise-performance-replication-on-k8s/</guid><description>&lt;p&gt;This blog post dives into the technical implementation of Vault Enterprise replication within a Kubernetes environment. We’ll explore how to set up performance and disaster recovery replication, overcome common challenges, and ensure smooth synchronization between clusters. Whether you’re aiming for redundancy or better data locality, this guide will equip you with the insights and tools needed to leverage Vault’s enterprise-grade features in Kubernetes effectively.&lt;/p&gt;
&lt;h2 id="architecture"&gt;Architecture&lt;/h2&gt;
&lt;p&gt;
&lt;a href="https://buildrunrepeat.com/posts/hashicorp-vault-enterprise-performance-replication-on-k8s/images/001.png" data-dimbox data-dimbox-caption="Screenshot"&gt;
&lt;img alt="Screenshot" src="https://buildrunrepeat.com/posts/hashicorp-vault-enterprise-performance-replication-on-k8s/images/001.png"/&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;2 Kubernetes clusters. *Note: for simulation purposes, you can also use a single Kubernetes cluster with multiple namespaces to host both Vault clusters.&lt;/li&gt;
&lt;li&gt;Helm installed&lt;/li&gt;
&lt;li&gt;kubectl installed&lt;/li&gt;
&lt;li&gt;Vault CLI installed&lt;/li&gt;
&lt;li&gt;jq installed&lt;/li&gt;
&lt;li&gt;Vault Enterprise license&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: for this implementation LoadBalancer services are used on Kubernetes to expose the Vault services (the API/UI and the cluster address for replication). It is highly recommended to use a LoadBalancer rather than ingress to expose the cluster address for replication. Vault itself performs the TLS termination as the TLS certificates are mounted to the Vault pods from Kubernetes. Additionally, note that when enabling the replication, the primary cluster points to the secondary cluster address (port 8201) and not the API/UI address (port 8200). When the secondary cluster applies the replication token, however, it points to the API/UI address (port 8200) to unwrap it and compelete the setup of the replication. We will see this in more detail in the implementation section.&lt;/p&gt;</description></item><item><title>Harbor Registry - Automating LDAP/S Configuration - Part 1</title><link>https://buildrunrepeat.com/posts/harbor-registry-automating-ldap-configuration-part-1/</link><pubDate>Fri, 01 Nov 2024 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/harbor-registry-automating-ldap-configuration-part-1/</guid><description>&lt;p&gt;The Harbor Registry is involved in many of my Kubernetes implementations in the field, and in almost every implementation I am asked about the options to configure LDAP/S authentication for the registry. Unfortuntely, neither the community Helm chart nor the Tanzu Harbor package provides native inputs for this setup. Fortunately, the Harbor REST API enables LDAP configuration programmatically. Automating this process ensures consistency across environments, faster deployments, and reduced chances of human error.&lt;/p&gt;</description></item><item><title>MinIO on vSphere - Automated Deployment and Onboarding</title><link>https://buildrunrepeat.com/posts/minio-on-vsphere-automated-deployment-and-onboarding/</link><pubDate>Fri, 01 Nov 2024 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/minio-on-vsphere-automated-deployment-and-onboarding/</guid><description>&lt;p&gt;In the world of Kubernetes, reliable S3-compliant object storage is essential for tasks like storing backups. However, not everyone has access to a native S3-compatible solution, and setting one up can feel like a daunting task. MinIO, an open-source object storage solution, is a popular choice to fill this gap. Its lightweight, high-performance architecture makes it an excellent option for Kubernetes users seeking quick and reliable storage.&lt;/p&gt;
&lt;p&gt;MinIO is also one of the most widely adopted open-source object storage solutions, thanks to its simplicity and S3 compatibility. It’s perfect for Kubernetes environments that need a reliable and scalable storage layer for backups, logs, or other data.&lt;/p&gt;</description></item><item><title>Fixing Missing TKRs in Existing TKGS Deployments</title><link>https://buildrunrepeat.com/posts/fixing-missing-tkrs-in-existing-tkgs-deployment/</link><pubDate>Wed, 01 May 2024 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/fixing-missing-tkrs-in-existing-tkgs-deployment/</guid><description>&lt;p&gt;I regularly check the &lt;a href="https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-releases/services/rn/vmware-tanzu-kubernetes-releases-release-notes/index.html"&gt;Tanzu Kubernetes Releases (TKR) release notes page&lt;/a&gt; for new updates.
Yesterday, a new TKR was released with support for Kubernetes 1.28.8, and while attempting to test this new version in my TKGS environment, I realized that the TKR was not present in my environment and I started wondering why, as normally, when new TKRs are released, they immediately become available for deployment, since the vCenter is subscribed to the VMware public content library where all the TKRs are hosted. This time, that was not the case, so I started investigating.&lt;/p&gt;</description></item><item><title>HashiCorp Vault Intermediate CA Setup with Cert-Manager and Microsoft Root CA</title><link>https://buildrunrepeat.com/posts/hashicorp-vault-intermediate-ca-setup-with-cert-manager-and-ms-root-ca/</link><pubDate>Mon, 01 Jan 2024 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/hashicorp-vault-intermediate-ca-setup-with-cert-manager-and-ms-root-ca/</guid><description>&lt;p&gt;In this post, we&amp;rsquo;ll explore how to set up HashiCorp Vault as an Intermediate Certificate Authority (CA) on a Kubernetes cluster, using a Microsoft CA as the Root CA. We&amp;rsquo;ll then integrate this setup with cert-manager, a powerful Kubernetes add-on for automating the management and issuance of TLS certificates.&lt;/p&gt;
&lt;p&gt;The following is an architecture diagram for the use case I&amp;rsquo;ve built.&lt;/p&gt;
&lt;p&gt;
&lt;a href="https://buildrunrepeat.com/posts/hashicorp-vault-intermediate-ca-setup-with-cert-manager-and-ms-root-ca/images/019.png" data-dimbox data-dimbox-caption="Screenshot"&gt;
&lt;img alt="Screenshot" src="https://buildrunrepeat.com/posts/hashicorp-vault-intermediate-ca-setup-with-cert-manager-and-ms-root-ca/images/019.png"/&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Microsoft Windows server is used as the Root CA of the environment.&lt;/li&gt;
&lt;li&gt;A Kubernetes cluster hosting shared/common services, including HashiCorp Vault. This is a cluster that can serve many other purposes/solutions, consumed by other clusters. The Vault server is deployed on this cluster and serves as an intermediate CA server, under the Microsoft Root CA server.&lt;/li&gt;
&lt;li&gt;A second Kubernetes cluster hosting the application(s). Cert-Manager is deployed on this cluster, integrated with Vault, and handles the management and issuance of TLS certificates against Vault using the ClusterIssuer resource. A web application, exposed via ingress, is running on this cluster. The ingress resource consumes its TLS certificate from Vault.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="prerequisites"&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Atleast one running Kubernetes cluster. To follow along, you will need two Kubernetes clusters, one serving as the shared services cluster and the other as the workload/application cluster.&lt;/li&gt;
&lt;li&gt;Access to a Microsoft Root Certificate Authority (CA).&lt;/li&gt;
&lt;li&gt;The Helm CLI installed.&lt;/li&gt;
&lt;li&gt;Clone my &lt;a href="https://github.com/itaytalmi/k8s-vault-int-ca.git"&gt;GitHub repository&lt;/a&gt;. This repository contains all involved manifests, files and configurations needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setting-up-hashicorp-vault-as-intermediate-ca"&gt;Setting Up HashiCorp Vault as Intermediate CA&lt;/h2&gt;
&lt;h3 id="deploy-initialize-and-configure-vault"&gt;Deploy Initialize and Configure Vault&lt;/h3&gt;
&lt;p&gt;Install the Vault CLI. In the following example, Linux Ubuntu is used. If you are using a different operating system, refer to &lt;a href="https://developer.hashicorp.com/vault/install"&gt;these instructions&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Using HashiCorp Vault as Ingress TLS Certificate Issuer in TAP</title><link>https://buildrunrepeat.com/posts/tap-using-hashicorp-vault-as-ingress-tls-certificate-issuer/</link><pubDate>Mon, 01 Jan 2024 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/tap-using-hashicorp-vault-as-ingress-tls-certificate-issuer/</guid><description>&lt;h1 id="using-hashicorp-vault-as-ingress-tls-certificate-issuer-in-tap"&gt;Using HashiCorp Vault as Ingress TLS Certificate Issuer in TAP&lt;/h1&gt;
&lt;p&gt;Tanzu Application Platform (TAP) uses Contour HTTPProxy resources to expose several web components externally via ingress. Some of these components include the API Auto Registration, API Portal, Application Live View, Metadata Store, and TAP GUI. Web workloads deployed through TAP also leverage the same method for their ingress resources. For example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kubectl get httpproxy -A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NAMESPACE NAME FQDN TLS SECRET STATUS STATUS DESCRIPTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;api-auto-registration api-auto-registration-controller api-auto-registration.tap.cloudnativeapps.cloud api-auto-registration-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;api-portal api-portal api-portal.tap.cloudnativeapps.cloud api-portal-tls-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;app-live-view appliveview appliveview.tap.cloudnativeapps.cloud appliveview-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;metadata-store amr-cloudevent-handler-ingress amr-cloudevent-handler.tap.cloudnativeapps.cloud amr-cloudevent-handler-ingress-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;metadata-store amr-graphql-ingress amr-graphql.tap.cloudnativeapps.cloud amr-ingress-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;metadata-store metadata-store-ingress metadata-store.tap.cloudnativeapps.cloud ingress-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tap-demo-01 spring-petclinic-contour-76691bbb1936a7b010ca900ce58a3f57spring spring-petclinic.tap-demo-01.svc.cluster.local valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tap-demo-01 spring-petclinic-contour-88f827fbdc09abbb4ee2b887bba100edspring spring-petclinic.tap-demo-01.tap.cloudnativeapps.cloud tap-demo-01/route-a4b7b2c7-0a56-48b9-ad26-6b0e06ca1925 valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tap-demo-01 spring-petclinic-contour-spring-petclinic.tap-demo-01 spring-petclinic.tap-demo-01 valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tap-demo-01 spring-petclinic-contour-spring-petclinic.tap-demo-01.svc spring-petclinic.tap-demo-01.svc valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tap-gui tap-gui tap-gui.tap.cloudnativeapps.cloud tap-gui-cert valid Valid HTTPProxy
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;TAP uses a shared ingress issuer as a centralized certificate authority representation, providing a method to set up TLS for the entire platform. All participating components get their ingress certificates issued by it. This is the recommended best practice for issuing ingress certificates on the platform.&lt;/p&gt;</description></item><item><title>CAPV: Addressing Node Provisioning Issues Due to an Invalid State of ETCD</title><link>https://buildrunrepeat.com/posts/capv-addressing-node-provisioning-issues-due-to-invalid-state-of-etcd/</link><pubDate>Fri, 01 Dec 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/capv-addressing-node-provisioning-issues-due-to-invalid-state-of-etcd/</guid><description>&lt;p&gt;I recently ran into a strange scenario on a Kubernetes cluster after a sudden and unexpected crash it had experienced due to an issue in the underlying vSphere environment. In this case, the cluster was a TKG cluster (in fact, it happened to be the TKG management cluster), however, the same situation could have occurred on any cluster managed by Cluster API Provider vSphere (CAPV).&lt;/p&gt;
&lt;p&gt;I have seen clusters unexpectedly crash many times before and most of the time, they successfully went back online when all nodes were up and running. In this case, however, some of the nodes could not boot properly, and Cluster API started attempting their reconciliation.&lt;/p&gt;</description></item><item><title>CAPV: Fixing and Cleaning Up Idle vCenter Server Sessions</title><link>https://buildrunrepeat.com/posts/capv-fixing-and-cleaning-up-idle-vcenter-sessions/</link><pubDate>Wed, 01 Nov 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/capv-fixing-and-cleaning-up-idle-vcenter-sessions/</guid><description>&lt;p&gt;I recently ran into an issue causing the vCenter server to crash almost daily. What seemed to be a random vCenter issue initially, turned out to be related to CAPV (Cluster API Provider vSphere), running on some of our Kubernetes clusters. That was also an edge case I had not seen before, so I decided to document and share it here.&lt;/p&gt;
&lt;p&gt;Initially, the issue we were witnessing on the vCenter server was the following:&lt;/p&gt;</description></item><item><title>TKG 2.3: Fixing the Prometheus Data Source in the Grafana Package</title><link>https://buildrunrepeat.com/posts/tkg-2-3-fixing-the-prometheus-data-source-in-the-grafana-package/</link><pubDate>Fri, 01 Sep 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/tkg-2-3-fixing-the-prometheus-data-source-in-the-grafana-package/</guid><description>&lt;p&gt;With the release of TKG 2.3, the Grafana package was finally updated from version 7.5.x to 9.5.1.
If you have deployed the new Grafana package (&lt;code&gt;9.5.1+vmware.2-tkg.1&lt;/code&gt;) or upgraded your existing one to this version, you may have run into error messages in your Grafana dashboards.&lt;/p&gt;
&lt;p&gt;For example, in the &lt;code&gt;TKG Kubernetes cluster monitoring&lt;/code&gt; default dashboard, you may have run into the &lt;code&gt;Failed to call resource&lt;/code&gt; error when opening the dashboard and noticed that a lot of the data is missing.&lt;/p&gt;</description></item><item><title>TKG: Updating Pinniped Configuration and Addressing Common Issues</title><link>https://buildrunrepeat.com/posts/tkg-updating-pinniped-config-and-addressing-common-issues/</link><pubDate>Thu, 01 Jun 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/tkg-updating-pinniped-config-and-addressing-common-issues/</guid><description>&lt;p&gt;Most of the TKG engagements I&amp;rsquo;ve been involved in included Pinniped for Kubernetes authentication.
On many occasions, I have seen issues where the configuration provided to Pinniped was incorrect or partially incorrect. For example, common issues may be related to the LDAPS integration. Many environments I have seen utilize Active Directory as the authentication source, and Pinniped requires the LDAPS certificate, username, and password, which are often specified incorrectly. Since this configuration is not validated during the deployment, you end up with an invalid state of Pinniped on your management cluster.&lt;/p&gt;</description></item><item><title>Streamlining and Customizing Windows Image Builder for TKG</title><link>https://buildrunrepeat.com/posts/streamlining-and-customizing-windows-image-builder-in-tkg/</link><pubDate>Wed, 01 Mar 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/streamlining-and-customizing-windows-image-builder-in-tkg/</guid><description>&lt;p&gt;Tanzu Kubernetes Grid (TKG) is one of the few platforms providing out-of-the-box support and streamlined deployment of Windows Kubernetes clusters. VMware is actively investing in this area and constantly improving the support and capabilities around Windows on Kubernetes.&lt;/p&gt;
&lt;p&gt;Unlike Linux-based clusters, for which VMware provides pre-packaged base OS images (typically based on Ubuntu and Photon OS), VMware cannot offer Windows pre-packaged images, primarily due to licensing restrictions, I suppose. Therefore, building your own Windows base OS image is one of the prerequisites for deploying a TKG Windows workload cluster.
Fortunately, VMware leverages the &lt;a href="https://github.com/kubernetes-sigs/image-builder"&gt;upstream Image Builder project&lt;/a&gt; - a fantastic collection of cross-provider Kubernetes virtual machine image-building utilities intended to simplify and streamline the creation of base OS images for Kubernetes.&lt;/p&gt;</description></item><item><title>Tanzu Kubernetes Grid GPU Integration</title><link>https://buildrunrepeat.com/posts/tkg-gpu-integration/</link><pubDate>Wed, 01 Mar 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/tkg-gpu-integration/</guid><description>&lt;p&gt;I recently had to demonstrate Tanzu Kubernetes Grid and its GPU integration capabilities.
Developing a good use case and assembling the demo required some preliminary research.&lt;/p&gt;
&lt;p&gt;During my research, I reached out to Jay Vyas, staff engineer at VMware, SIG Windows lead for Kubernetes, a Kubernetes legend, and an awesome guy in general. :) For those who don&amp;rsquo;t know Jay, he is also one of the authors of the fantastic book &lt;code&gt;Core Kubernetes&lt;/code&gt; (look it up!).&lt;/p&gt;</description></item><item><title>Backstage Introduction, KubeCon &amp; CloudNativeCon Europe 2022</title><link>https://buildrunrepeat.com/posts/backstage-introduction-kubecon-cloudnativecon-europe-2022/</link><pubDate>Sun, 01 Jan 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/backstage-introduction-kubecon-cloudnativecon-europe-2022/</guid><description>&lt;p&gt;Thanks to TeraSky’s education program, I recently attended KubeCon &amp;amp; CloudNativeCon Europe 2022 in Valencia, Spain.&lt;/p&gt;
&lt;p&gt;The experience was incredible. While there were many interesting technical sessions on many exciting topics, I was most curious about &lt;a href="https://github.com/backstage/backstage"&gt;Backstage&lt;/a&gt; - which has sparked my interest ever since I started exploring VMware Tanzu Application Platform (TAP).&lt;/p&gt;
&lt;p&gt;I decided to attend a session entitled &amp;ldquo;Backstage: Restoring Order to Your Chaos&amp;rdquo;, given by Spotify software engineer Dave Zolotusky. Going into the session, I was stunned by the huge line of people trying to get into the room. That was something I had never seen before in any other session. Fortunately, I was lucky enough to secure one of the last seats.&lt;/p&gt;</description></item><item><title>Getting Started with Carvel ytt - Real-World Examples</title><link>https://buildrunrepeat.com/posts/getting-started-with-carvel-ytt-real-world-examples/</link><pubDate>Sun, 01 Jan 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/getting-started-with-carvel-ytt-real-world-examples/</guid><description>&lt;p&gt;Over the years of working with Tanzu Kubernetes Grid (TKG), one tool has stood out as a game-changer for resource customization: Carvel’s ytt. Whether tailoring cluster manifests, customizing TKG packages, or addressing unique deployment requirements, ytt has consistently been a fundamental part of the workflow. Its flexibility, power, and declarative approach make it an essential tool for anyone working deeply with Kubernetes in a TKG ecosystem.&lt;/p&gt;
&lt;p&gt;But what exactly is ytt? Short for &lt;code&gt;YAML Templating Tool&lt;/code&gt;, ytt is part of the Carvel suite of tools designed for Kubernetes resource management. It provides a powerful, programmable approach to templating YAML configurations by combining straightforward data values, overlays, and scripting capabilities. Unlike many traditional templating tools, ytt prioritizes structure and intent, making it easier to maintain, validate, and debug configurations—particularly in complex, large-scale Kubernetes environments.&lt;/p&gt;</description></item><item><title>Harbor Registry – Automating LDAP/S Configuration – Part 2</title><link>https://buildrunrepeat.com/posts/harbor-registry-automating-ldap-configuration-part-2/</link><pubDate>Sun, 01 Jan 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/harbor-registry-automating-ldap-configuration-part-2/</guid><description>&lt;p&gt;This post continues our two-part series on automating LDAP configuration for Harbor Registry. In the &lt;a href="https://buildrunrepeat.com/posts/harbor-registry-automating-ldap-configuration-part-1/"&gt;previous post&lt;/a&gt;, we demonstrated how to achieve this using Ansible, running externally. However, external automation has its challenges, such as firewall restrictions or limited API access in some cases/environments.&lt;/p&gt;
&lt;p&gt;Note: make sure you review the previous post as it provides a lot of additional background and clarifications on this process, LDAPS configuration, and more.&lt;/p&gt;
&lt;p&gt;Here, we explore an alternative approach using Terraform, running the automation directly inside the Kubernetes cluster hosting Harbor.
This method leverages native Kubernetes scheduling capabilities for running the configuration job in a fully declarative approach and does not require any network access to Harbor from the machine running the job.&lt;/p&gt;</description></item><item><title>Replacing your vCenter server certificate? TKG needs to know about it…</title><link>https://buildrunrepeat.com/posts/replacing-your-vcenter-server-certificate-tkg-needs-to-know-about-it/</link><pubDate>Sun, 01 Jan 2023 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/replacing-your-vcenter-server-certificate-tkg-needs-to-know-about-it/</guid><description>&lt;p&gt;I recently ran into an issue where TKGm had suddenly failed to connect to the vCenter server.&lt;/p&gt;
&lt;p&gt;The issue turned out to be TLS-related, and I noticed that the vCenter server certificate had been replaced&amp;hellip;&lt;/p&gt;
&lt;p&gt;Due to the certificate issue, Cluster API components failed to communicate with vSphere, causing cluster reconciliation to fail, among other vSphere-related operations.&lt;/p&gt;
&lt;p&gt;Since all TKG clusters in the environment were deployed with the &lt;code&gt;VSPHERE_TLS_THUMBPRINT&lt;/code&gt; parameter specified, replacing the vCenter certificate breaks the connection to vSphere, as the TLS thumbprint changes as well.&lt;/p&gt;</description></item><item><title>Upgrading NSX ALB in a TKG Environment</title><link>https://buildrunrepeat.com/posts/upgrading-nsx-alb-in-a-tkg-environment/</link><pubDate>Thu, 01 Sep 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/upgrading-nsx-alb-in-a-tkg-environment/</guid><description>&lt;p&gt;For quite a long time, the highest version of the NSX ALB TKG supported was &lt;code&gt;20.1.6/20.1.3&lt;/code&gt;, although &lt;code&gt;21.1.x&lt;/code&gt; has been available for a while, and I have been wondering when TKG would support it.
In the release notes of TKG &lt;code&gt;1.5.4&lt;/code&gt;, I recently noticed a note that has been added regarding NSX ALB &lt;code&gt;21.1.x&lt;/code&gt; under the &lt;code&gt;Configuration variables&lt;/code&gt; section:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;code&gt;AVI_CONTROLLER_VERSION&lt;/code&gt; sets the NSX Advanced Load Balancer (ALB) version for NSX ALB v21.1.x deployments in Tanzu Kubernetes Grid.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Customizing Elasticsearch indices using Fluent-Bit in TKG</title><link>https://buildrunrepeat.com/posts/customizing-elasticsearch-indices-using-fluent-bit-in-tkg/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/customizing-elasticsearch-indices-using-fluent-bit-in-tkg/</guid><description>&lt;p&gt;Fluent-Bit is currently the preferred option for log shipping in TKG and is provided out of the box as a Tanzu package that can be easily deployed on each TKG/Kubernetes cluster.&lt;/p&gt;
&lt;p&gt;A recent implementation required shipping all Kubernetes logs to Elasticsearch, complying with a specific naming convention for the Elasticsearch indices.&lt;/p&gt;
&lt;p&gt;Applying such customizations requires you to utilize the &lt;a href="https://docs.fluentbit.io/manual/pipeline/filters/lua"&gt;Lua filter&lt;/a&gt;. Using the Lua filter, you can modify incoming records by invoking custom scripts to apply your logic when processing the records.&lt;/p&gt;</description></item><item><title>Getting Harbor to trust your LDAPS certificate in TKG</title><link>https://buildrunrepeat.com/posts/getting-harbor-to-trust-your-ldaps-certificate-in-tkg/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/getting-harbor-to-trust-your-ldaps-certificate-in-tkg/</guid><description>&lt;p&gt;In a recent TKG implementation, it was required to configure Harbor with LDAPS rather than LDAP.&lt;/p&gt;
&lt;p&gt;I deployed the Harbor package on the TKG shared services cluster and configured LDAP. However, when testing the connection, I received an error message that was not informative at all:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Failed to verify LDAP server with error: error: ldap server network timeout.
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;
&lt;a href="https://buildrunrepeat.com/posts/getting-harbor-to-trust-your-ldaps-certificate-in-tkg/images/001.png" data-dimbox data-dimbox-caption="Screenshot"&gt;
&lt;img alt="Screenshot" src="https://buildrunrepeat.com/posts/getting-harbor-to-trust-your-ldaps-certificate-in-tkg/images/001.png"/&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Although the error message doesn&amp;rsquo;t explicitly say there&amp;rsquo;s a certificate issue and there is nothing in the &lt;code&gt;harbor-core&lt;/code&gt; container logs, it immediately made sense to me that the &lt;code&gt;harbor-core&lt;/code&gt; container didn&amp;rsquo;t trust my LDAPS/CA certificate, so I started investigating how the certificate could be injected somehow into Harbor. The Harbor package doesn&amp;rsquo;t have any input for the LDAPS/CA certificate in its data values file, so I knew I had to create &lt;a href="https://github.com/itaytalmi/vmware-tkg/blob/main/ytt-overlays/tkg-packages/harbor/ldaps-overlay/overlay-harbor-ldaps-cert.yaml"&gt;my own YTT overlay&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Getting kapp-controller to trust your CA certificates in TKG</title><link>https://buildrunrepeat.com/posts/getting-kapp-controller-to-trust-your-ca-certificates-in-tkg/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/getting-kapp-controller-to-trust-your-ca-certificates-in-tkg/</guid><description>&lt;p&gt;Have you ever had to deploy a package using kapp-controller from your Harbor private registry?&lt;/p&gt;
&lt;p&gt;I recently deployed the Tanzu RabbitMQ package to a TKGm workload cluster in an air-gapped/internet-restricted environment.&lt;/p&gt;
&lt;p&gt;Doing so in air-gapped environments requires you to push the packages into Harbor, then have kapp-controller deploy the package from Harbor.&lt;/p&gt;
&lt;p&gt;After adding the PackageRepository referencing my Harbor registry, I observed it couldn&amp;rsquo;t complete reconciling due to a certificate issue.&lt;/p&gt;</description></item><item><title>Harbor Registry: is your LDAP user unique?</title><link>https://buildrunrepeat.com/posts/harbor-registry-is-your-ldap-user-unique/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/harbor-registry-is-your-ldap-user-unique/</guid><description>&lt;p&gt;A recent project I was working on required granting different levels of permissions for several Active Directory service accounts on Harbor registry so that some can only pull images from the registry, and others can also push, etc.&lt;/p&gt;
&lt;p&gt;On the Harbor project, I had the following configuration for my users:&lt;/p&gt;
&lt;p&gt;
&lt;a href="https://buildrunrepeat.com/posts/harbor-registry-is-your-ldap-user-unique/images/001.png" data-dimbox data-dimbox-caption="Screenshot"&gt;
&lt;img alt="Screenshot" src="https://buildrunrepeat.com/posts/harbor-registry-is-your-ldap-user-unique/images/001.png"/&gt;
&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;harbor-group-01&lt;/code&gt; group contains an Active Directory user named &lt;code&gt;harbor-user-01&lt;/code&gt; and &lt;code&gt;harbor-group-02&lt;/code&gt; contains &lt;code&gt;harbor-user-02&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;From the command line, I was able to log in to Harbor with &lt;code&gt;harbor-user-01&lt;/code&gt;:&lt;/p&gt;</description></item><item><title>Is your TKG cluster name too long, or is it your DHCP Server…?</title><link>https://buildrunrepeat.com/posts/is-your-tkg-cluster-name-too-long-or-is-it-your-dhcp-server/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/is-your-tkg-cluster-name-too-long-or-is-it-your-dhcp-server/</guid><description>&lt;p&gt;Recently, when working on a TKGm implementation project, I initially ran into an issue that seemed very odd, as I hadn&amp;rsquo;t encountered such behavior in any other implementation before.&lt;/p&gt;
&lt;p&gt;The issue was that a workload cluster deployment hung after deploying the first control plane node. Until then, everything seemed just fine; as the cluster deployment had successfully initialized, NSX ALB had successfully allocated a control plane VIP. After that, however, the deployment had completely hung and seemed like it wouldn&amp;rsquo;t proceed.&lt;/p&gt;</description></item><item><title>Kubernetes Data Protection: Getting Started with Kasten (K10)</title><link>https://buildrunrepeat.com/posts/kubernetes-data-protection-getting-started-with-kasten/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/kubernetes-data-protection-getting-started-with-kasten/</guid><description>&lt;p&gt;In a recent Kubernetes project I was involved in, our team had to conduct an in-depth proof of concept for several Kubernetes data protection solutions. The main highlights of the PoC covered data protection for stateful applications and databases, disaster recovery, and application mobility, including relocating applications across Kubernetes clusters and even different types of Kubernetes clusters (for example, from TKG on-premise to AWS EKS, etc.).&lt;/p&gt;
&lt;p&gt;One of the solutions we evaluated was Kasten (K10), a data management platform for Kubernetes, which is now a part of Veeam. The implementation of Kasten was one of the smoothest we have ever experienced in terms of ease of use, stability, and general clarity around getting things done, as everything is very well documented, which certainly cannot be taken for granted these days. :)&lt;/p&gt;</description></item><item><title>Production-Grade Multi-Cluster TAP Installation Guide</title><link>https://buildrunrepeat.com/posts/production-grade-multi-cluster-tap-installation-guide/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/production-grade-multi-cluster-tap-installation-guide/</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="#introduction"&gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prerequisites"&gt;Prerequisites&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prepare-your-workstation"&gt;Prepare your Workstation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#relocate-tap-images-to-your-private-registry"&gt;Relocate TAP Images to your Private Registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#install-tap"&gt;Install TAP&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#view-cluster"&gt;View Cluster&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#set-up-the-installation-namespace"&gt;Set up the Installation Namespace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#issue-a-tls-certificate-for-tap-gui"&gt;Issue a TLS Certificate for TAP GUI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-a-database-for-tap-gui"&gt;Set up a Database for TAP GUI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-the-tap-gui-catalog-git-repository"&gt;Set up the TAP GUI Catalog Git Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-rbac-for-the-metadata-store"&gt;Set up RBAC for the Metadata Store&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-an-authentication-provider-for-tap-gui"&gt;Set up an Authentication Provider for TAP GUI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-rbac-for-the-build-run-and-iterate-clusters"&gt;Set up RBAC for the Build, Run and Iterate Clusters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-an-ingress-domain-tap-gui-hostname-and-ca-certificate"&gt;Set an Ingress Domain, TAP GUI Hostname and CA Certificate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#deploy-the-tap-package"&gt;Deploy the TAP Package&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#build-cluster"&gt;Build Cluster&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#set-up-the-installation-namespace-1"&gt;Set up the Installation Namespace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-metadata-store-authentication-and-ca-certificate"&gt;Set up Metadata Store Authentication and CA Certificate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prepare-a-sample-source-code-git-repository"&gt;Prepare a Sample Source Code Git Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#update-the-tap-values-file"&gt;Update the TAP Values File&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#deploy-the-tap-package-1"&gt;Deploy the TAP Package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#deploy-the-tbs-full-dependencies-package"&gt;Deploy the TBS Full Dependencies Package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-the-developer-namespace-and-deploy-a-workload"&gt;Set up the Developer Namespace and Deploy a Workload&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#run-cluster"&gt;Run Cluster&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#set-up-the-installation-namespace-2"&gt;Set up the Installation Namespace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#update-the-tap-values-file-1"&gt;Update the TAP Values File&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#deploy-the-tap-package-2"&gt;Deploy the TAP Package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-the-developer-namespace-and-deploy-a-workload-1"&gt;Set up the Developer Namespace and Deploy a Workload&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#iterate-cluster"&gt;Iterate Cluster&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#set-up-the-installation-namespace-3"&gt;Set up the Installation Namespace&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#update-the-tap-values-file-2"&gt;Update the TAP Values File&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#deploy-the-tap-package-3"&gt;Deploy the TAP Package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#deploy-the-tbs-full-dependencies-package-1"&gt;Deploy the TBS Full Dependencies Package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#set-up-the-developer-namespace-and-deploy-a-workload-2"&gt;Set up the Developer Namespace and Deploy a Workload&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#iterate-on-your-application"&gt;Iterate on your Application&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#wrap-up"&gt;Wrap Up&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Since my previous posts on &lt;a href="https://buildrunrepeat.com/posts/vmware-tanzu-application-platform-overview/"&gt;TAP Overview&lt;/a&gt; and &lt;a href="https://buildrunrepeat.com/posts/backstage-introduction-kubecon-cloudnativecon-europe-2022/"&gt;Backstage&lt;/a&gt;, I have been diving deeper into TAP, trying to establish the practices around it.&lt;/p&gt;</description></item><item><title>VMware Tanzu Application Platform Overview</title><link>https://buildrunrepeat.com/posts/vmware-tanzu-application-platform-overview/</link><pubDate>Mon, 01 Aug 2022 09:00:00 -0400</pubDate><guid>https://buildrunrepeat.com/posts/vmware-tanzu-application-platform-overview/</guid><description>&lt;p&gt;In the &lt;a href="https://buildrunrepeat.com/posts/backstage-introduction-kubecon-cloudnativecon-europe-2022/"&gt;first part&lt;/a&gt; of this series, I described what Backstage is and some of the advantages it aims to solve. VMware uses Backstage to enable its Tanzu Application Platform (TAP). Before we can understand how, however, we need to understand what TAP is and what it aims to do.&lt;/p&gt;
&lt;h2 id="so-what-exactly-is-the-tanzu-application-platform"&gt;So, what exactly is the Tanzu Application Platform?&lt;/h2&gt;
&lt;p&gt;TAP is a robust application development platform entirely focused on the developer experience. It provides a rich set of developer tools in a centralized user interface. It is the latest innovation in this space from VMware. It is a true game-changer, building upon community-adopted tooling and the existing products within the Tanzu Advanced Suite to offer a next-gen PaaS solution that aims to solve the same challenges the traditional PaaS systems solve, as well as the issues they introduced.&lt;/p&gt;</description></item></channel></rss>