Scaling HTTP Workloads to Zero with KEDA: A Journey into 'Serverless' on Kubernetes

When I first started running workloads on Kubernetes, I loved the container orchestration but missed the efficient resource usage of serverless platforms. That's why implementing KEDA (Kubernetes Event-Driven Autoscaling) in my cluster has been such a game-changer, especially for HTTP workloads. Let me share how I've set this up and what I've learned along the way.
The Problem with Traditional Kubernetes Scaling
Traditional Kubernetes deployments always run at least one pod, even when there's no traffic. For development environments or low-traffic services, this means paying for resources that sit idle most of the time. While the Horizontal Pod Autoscaler (HPA) helps scale up with load, it can't scale down to zero.
Enter KEDA: The Missing Piece
KEDA extends Kubernetes' autoscaling capabilities by adding the ability to scale deployments to zero pods when there's no traffic and quickly scale up when requests start flowing. It acts as an autoscaling agent between your HTTP workloads and Kubernetes' native scaling mechanisms.
The Implementation
Here's how I've set up HTTP-based scaling with KEDA in my cluster. The magic happens through two custom resources:
kind: HTTPScaledObject
apiVersion: http.keda.sh/v1alpha1
metadata:
name: vaultwarden
spec:
hosts:
- vaultwarden
scaleTargetRef:
name: bitwarden-vaultwarden
kind: StatefulSet
apiVersion: apps/v1
service: bitwarden-vaultwarden
port: 80
replicas:
min: 0
max: 1
scalingMetric:
requestRate:
granularity: 1s
targetValue: 5
window: 1m0s
scaledownPeriod: 3600 #1hr cooldown
The real beauty of this setup is how it integrates with my existing Kong gateway setup. KEDA's HTTP add-on acts as an interceptor, monitoring traffic patterns and making scaling decisions before requests reach the actual service.
The Results Have Been Impressive
After implementing KEDA for several HTTP services, I've seen:
- Resource utilization improvements as pods scale to zero during quiet periods
- Faster scale-up responses compared to traditional HPA
- Better cost efficiency, especially for development environments
Ultimately, this has led to fewer resourcing demands, allowing me to reduce my nodes without compromising performance.
Lessons Learned Along the Way
The journey wasn't without its learning moments. Here are some key insights:
First, cold starts are a real consideration. Just like with any serverless platform, the first request after scaling from zero takes longer. I've addressed this by setting appropriate activation thresholds for critical services.
Secondly, scaling stateful workloads can be a tricky endeavour! While I've managed to find success in getting them to scale down to zero, it definitely required some thoughtful tweaking along the way. The key was ensuring that my data remained safe and sound even when the workloads powered back down. It’s all about striking the right balance between efficiency and data integrity— and with a bit of careful implementation, it’s definitely achievable!
Beyond Basic HTTP Scaling
What makes KEDA particularly powerful is its ability to handle complex scaling scenarios. I've experimented with combining HTTP triggers with other metrics:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: advanced-scaler
spec:
triggers:
- type: http
metadata:
requestsPerSecond: "10"
- type: cpu
metadata:
type: Utilization
value: "60"
This setup ensures that scaling decisions consider both incoming traffic and resource utilization, providing more nuanced control over the scaling behaviour.
Conclusion
Implementing KEDA for HTTP workloads has transformed how I think about resource utilization in my Kubernetes cluster. It's bridged the gap between traditional Kubernetes deployments and serverless architectures, offering the best of both worlds: the flexibility and control of Kubernetes with the efficiency of serverless.
I like this approach to Knative because it feels more straightforward and doesn’t introduce a whole serverless framework like Knative does. While Knative is definitely a powerful option, many of its features go unused in most setups. That’s why I generally lean towards Keda, as it offers a more modular and flexible solution.
I highly recommend exploring KEDA for anyone running HTTP workloads on Kubernetes, especially in development or testing environments.