TKG 2.3: Fixing the Prometheus Data Source in the Grafana Package
With the release of TKG 2.3, the Grafana package was finally updated from version 7.5.x to 9.5.1.
If you have deployed the new Grafana package (9.5.1+vmware.2-tkg.1) or upgraded your existing one to this version, you may have run into error messages in your Grafana dashboards.
For example, in the TKG Kubernetes cluster monitoring default dashboard, you may have run into the Failed to call resource error when opening the dashboard and noticed that a lot of the data is missing.
In other dashboards, such as the Kubernetes / API Server dashboard, errors may not occur, but the data is missing.
Customized/non-default dashboards may also show similar symptoms.
I started investigating this issue by looking at the default Prometheus Data Source and immediately noticed an error in the URL parameter under the HTTP section.
Testing the connection to the Data Source, I received the following error message:
Error reading Prometheus: parse "prometheus-server.tanzu-system-monitoring.svc.cluster.local": invalid URI for request
Looking at the official Grafana documentation, I realized it was now required to specify the full URL to the Prometheus instance, including http:// or https:// before the Prometheus instance IP address/FQDN.
Reference: https://grafana.com/docs/grafana/v9.5/datasources/prometheus/#provisioning-example
I then retrieved the configuration used by the Grafana package using the following command.
IMAGE_URL=$(kubectl -n tkg-system get packages grafana.tanzu.vmware.com.9.5.1+vmware.2-tkg.1 -o jsonpath='{.spec.template.spec.fetch[0].imgpkgBundle.image}')
echo $IMAGE_URL
imgpkg pull -b $IMAGE_URL -o /tmp/tkg-grafana
And looked at the default configuration specified in the values.yaml file of the package. Under grafana.config.datasource_yaml in the file, I found the default URL of the Prometheus Data Source:
grafana:
config:
datasource_yaml: |-
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: prometheus-server.tanzu-system-monitoring.svc.cluster.local
access: proxy
isDefault: true
The default Data Source configuration does not specify http:// or https:// before the Prometheus FQDN, which was fine in older Grafana versions, and is now required in the newer versions. This means that the Data Source configuration provided by the Grafana package is no longer valid for the modern Grafana versions.
To address this issue, I added grafana.config.datasource_yaml with the correct Prometheus URL to my values.yaml overrides as follows:
grafana:
secret:
admin_password: Vk13YXJlMSE=
config:
datasource_yaml: |-
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.tanzu-system-monitoring.svc.cluster.local
access: proxy
isDefault: true
And updated the Grafana package on my cluster using the updated data values.
tanzu package install grafana \
--package "$PKG_NAME" \
--version "$PKG_VERSION" \
--values-file grafana-data-values.yaml \
--namespace tkg-packages
In the output, you should see that the grafana-datasource ConfigMap is updated due to this change.
8:25:53PM: Pausing reconciliation for package installation 'grafana' in namespace 'tkg-packages'
8:25:54PM: Updating secret 'grafana-tkg-packages-values'
8:25:54PM: Creating overlay secrets
8:25:54PM: Resuming reconciliation for package installation 'grafana' in namespace 'tkg-packages'
8:25:54PM: Waiting for PackageInstall reconciliation for 'grafana'
8:25:54PM: Waiting for generation 3 to be observed
8:25:56PM: Fetching
| apiVersion: vendir.k14s.io/v1alpha1
| directories:
| - contents:
| - imgpkgBundle:
| image: projects.registry.vmware.com/tkg/packages/standard/grafana@sha256:7e9225bb461b470534f347a7990437c01956a603f916d0214159ad7634db08b2
| path: .
| path: "0"
| kind: LockConfig
|
8:25:56PM: Fetch succeeded
8:25:56PM: Template succeeded
8:25:56PM: Deploy started (2s ago)
8:25:58PM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: it-tkg-wld-02-npm26-n7lrs, 5+)
| Changes
| Namespace Name Kind Age Op Op st. Wait to Rs Ri
| tanzu-system-dashboards grafana-datasource ConfigMap 2h update - reconcile ok -
| Op: 0 create, 0 delete, 1 update, 0 noop, 0 exists
| Wait to: 1 reconcile, 0 delete, 0 noop
| 8:25:59PM: ---- applying 1 changes [0/1 done] ----
| 8:25:59PM: update configmap/grafana-datasource (v1) namespace: tanzu-system-dashboards
| 8:25:59PM: ---- waiting on 1 changes [0/1 done] ----
| 8:25:59PM: ok: reconcile configmap/grafana-datasource (v1) namespace: tanzu-system-dashboards
| 8:25:59PM: ---- applying complete [1/1 done] ----
| 8:25:59PM: ---- waiting complete [1/1 done] ----
| Succeeded
8:25:59PM: Deploy succeeded
You may have to delete the Grafana Pod, so that a new one will be created and mount the updated configuration from the ConfigMap.
kubectl delete pod -l app.kubernetes.io/name=grafana -n tanzu-system-dashboards
Ensure that the new Pod is running.
kubectl get pod -n tanzu-system-dashboards
Example output:
NAME READY STATUS RESTARTS AGE
grafana-58bf6bbb6c-vmvzx 2/2 Running 0 89s
If you refresh the Grafana UI and view the Prometheus Data Source, you should see the updated URL with no errors.
You should also be able to test the connection to the Data Source successfully.
Since the Data Source is now valid, your dashboards should start displaying data properly.
I have reported the issue in the Grafana package to VMware. Hopefully, a future version of the package will address this issue, but until then, I believe the workaround provided in this post is fairly easy to implement.








