Configure Grafana Mimir autoscaling with Helm
Warning
Autoscaling support in the Helm chart is currently experimental. Use with caution in production environments and thoroughly test in a non-production environment first.
You can configure autoscaling for Mimir components using (Kubernetes Event-driven Autoscaling) KEDA and Kubernetes Horizontal Pod Autoscaler (HPA).
Before you begin
- Ensure you have a running Mimir cluster deployed with Helm.
- Verify you have the required permissions to modify Helm deployments.
- Familiarize yourself with Kubernetes HPA concepts.
Prerequisites
To use autoscaling, you need:
- KEDA installed in your Kubernetes cluster
- Prometheus metrics available for scaling decisions
Warning
Don’t use the same Mimir or Grafana Enterprise Metrics cluster for storing and querying autoscaling metrics. Using the same cluster can create a dangerous feedback loop.
For instance, if the Mimir or GEM cluster becomes unavailable, autoscaling stops working, because it cannot query the metrics. This prevents the cluster from automatically scaling up during high load or recovery. This inability to scale further exacerbates the cluster’s unavailability, which might, in turn, prevent the cluster from recovering.
Instead, use a separate Prometheus instance or a different metrics backend for autoscaling metrics.
Supported components
The Mimir Helm chart supports autoscaling for the following components:
About KEDA
KEDA is a Kubernetes operator that simplifies the setup of HPA with custom metrics from Prometheus. It consists of:
- An operator and external metrics server
- Support for multiple metric sources, including Prometheus
- Custom resources (
ScaledObject
) that define scaling parameters - Automatic HPA resource management
For more information, refer to the KEDA documentation.
Configure autoscaling for a new installation
Follow these steps to enable autoscaling when deploying Mimir for the first time.
Steps
Configure the Prometheus metrics source in your values file:
kedaAutoscaling: prometheusAddress: "http://prometheus.monitoring:9090" pollingInterval: 10
Enable and configure autoscaling for desired components:
querier: kedaAutoscaling: enabled: true minReplicaCount: 2 maxReplicaCount: 10
Deploy Mimir using Helm:
helm upgrade --install mimir grafana/mimir-distributed -f values.yaml
Expected outcome
After deployment:
- KEDA creates
ScaledObject
resources for configured components. - HPA resources are automatically created and begin monitoring metrics.
- Components scale based on configured thresholds and behaviors.
Migrate existing deployments to autoscaling
Follow these steps to enable autoscaling for an existing Mimir deployment.
Warning
Autoscaling support in the Helm chart is currently experimental. Migrating to autoscaling carries risks for cluster availability.
Enabling autoscaling removes the
replicas
field from deployments. If KEDA/HPA hasn’t started autoscaling a deployment yet, Kubernetes interprets no replicas as meaning 1 replica. This can cause an outage if the transition is not handled carefully. If you’re using GitOps tools like FluxCD or ArgoCD, you might need to take additional steps to manage the transition.Consider testing the migration in a non-production environment first.
Before you begin
- Back up your current Helm values.
- Plan for potential service disruption.
- Consider testing in a non-production environment first.
- Ensure you have a rollback plan ready.
- Consider migrating one component at a time to minimize risk.
Steps
Add the autoscaling configuration with
preserveReplicas
enabled:querier: kedaAutoscaling: enabled: true preserveReplicas: true # Maintains stability during migration # ... autoscaling configuration ...
Apply the changes and verify the KEDA setup:
# Apply changes helm upgrade mimir grafana/mimir-distributed -f values.yaml # Verify setup kubectl get hpa kubectl get scaledobject kubectl describe hpa
Wait 2-3 polling intervals to confirm that KEDA is managing scaling.
Remove
preserveReplicas
.querier: kedaAutoscaling: enabled: true # Remove preserveReplicas
Apply the updated configuration:
helm upgrade mimir grafana/mimir-distributed -f values.yaml
Troubleshooting
If pods scale down to 1 replica after removing preserveReplicas
:
Revert changes:
querier: kedaAutoscaling: enabled: true preserveReplicas: true
Verify KEDA setup:
- Check HPA status
- Verify metrics are being received
- Check for conflicts with other tools
- Ensure enough time was given for KEDA to take control (at least 2-3 polling intervals)
Try migrating again after resolving issues.
Note
If you’re using GitOps tools like FluxCD or ArgoCD, they might try to reconcile the state and conflict with HPA’s scaling decisions. Consult your GitOps tool’s documentation for handling HPA transitions.
Monitor autoscaling health
The following conditions indicate unhealthy autoscaling:
- KEDA operator is down:
ScaledObject
changes don’t propagate to HPA. - KEDA metrics server is down: HPA can’t receive updated metrics.
- HPA is unable to scale:
MimirAutoscalerNotActive
alert fires.
For production deployments, configure high availability for KEDA.
For more information about monitoring autoscaling, refer to Monitor Grafana Mimir.