#
Troubleshooting
This list is used to track issues and their remedies.
#
MongoDB Persistent Volume Claim Error
When installing MongoDB Helm Chart in a new cluster, we sometimes run into an error where the helm chart is unable to create a persistent volume, which leads to the persistent volume claim failing to bind:
no persistent volumes available for this claim and no storage class is set
This happens because in a new cluster, a default storage class for Kubernetes has not been set. Run the following commands to fix this:
# Get the name of the storage class
kubectl get storageclasses.storage.k8s.io
# If you get the following output, notice how there is no default tag next to the class name
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
# gp2 kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 632d
# Patch the storage class to make it default
kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
# Now running the following should show default storage class
kubectl get storageclasses.storage.k8s.io
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
# gp2 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 632d
#
Nginx Ingress Helm Upgrade Errors
For the 666628074417
account, upgrading Ingress controllers through Helm results in various errors due to the custom networking rules.
#
400 Bad Request - The plain HTTP request was sent to HTTPS port
To fix this issue, apply the correct nginx controller config defined here after updating nginx.
#
Keycloak Invalid Requester Error
Keycloak fails to forward headers, which leads to the following error in the network tab when the site is accessed:
Mixed Content: The page at 'https://keycloak.cbioportal.mskcc.org/auth/admin/master/console/' was loaded over HTTPS, but requested an insecure script 'http://keycloak.cbioportal.mskcc.org/auth/js/keycloak.js?version=4cbzu'. This request has been blocked; the content must be served over HTTPS.
The following error is seen in the kubernetes pods for the keycloak instance:
14:45:56,925 WARN [org.keycloak.events] (default task-2) type=LOGIN_ERROR, realmId=msk, clientId=null, userId=null, ipAddress=10.1.141.180, error=invalid_authn_request, reason=invalid_destination
14:46:39,551 WARN [org.keycloak.events] (default task-5) type=LOGIN_ERROR, realmId=msk, clientId=null, userId=null, ipAddress=10.1.140.190, error=invalid_authn_request, reason=invalid_destination
To solve this issue, use the correct forwarding rules by applying the configmap defined here after updating nginx.
#
Datadog failed to auto-detect cluster name
When deployed on AWS EC2 nodes using the BOTTLEROCKET_*
AMI types, Datadog fails to auto-detect the cluster name, resulting in the following errors:
2025-03-10 17:39:09 UTC | CLUSTER | WARN | (subcommands/start/command.go:235 in start) | Failed to auto-detect a Kubernetes cluster name. We recommend you set it manually via the cluster_name config option
2025-03-10 17:39:10 UTC | CLUSTER | ERROR | (pkg/collector/corechecks/loader.go:64 in Load) | core.loader: could not configure check orchestrator: orchestrator check is configured but the cluster name is empty
2025-03-10 17:39:10 UTC | CLUSTER | ERROR | (pkg/collector/scheduler.go:208 in getChecks) | Unable to load a check from instance of config 'orchestrator': Core Check Loader: Could not configure check orchestrator: orchestrator check is configured but the cluster name is empty
To solve this, either use the AL2_*
AMI type (NOT RECOMMENDED) or manually specify the cluster name in the Datadog helm values:
datadog:
clusterName: <cluster-name>
#
Datadog always out-of-sync with helm and argocd
By default, Datadog auto-assigns random values to the following on restarts:
- Token used for authentication between the cluster agent and node agents.
- Configmap for the APM Instrumentation KPIs.
This leads to Datadog being permanently out-of-sync when deployed with Helm and ArgoCD. To solve this issue, follow the steps below:
Provide a cluster agent token as a secret.
Generate a random 32-digit hexadecimal number.
openssl rand -hex 32
Create a secret with the number generated above.
apiVersion: v1 kind: Secret metadata: name: datadog stringData: token: <random-hex-32>
Use existing secret in Datadog helm values.
clusterAgent: tokenExistingSecret: datadog
Disable API Instrumentation KPIs by setting the
datadog.apm.instrumentation.skipKPITelemetry
tofalse
in the datadog helm values.datadog: apm: instrumentation: skipKPITelemetry: true