Discussion on Horizontal Pod Autoscaler with a demo on local k8s cluster

Asish M Madhu
DevOps for you
Published in
11 min readJun 27, 2021

--

“Resilience is our ability to bounce back from life’s challenges and to thrive, grow and expand.”

Photo Credit: Ryo Yoshitake https://unsplash.com/photos/cusz0Bg-5mQ

HorizontalPodAutoscaler(HPA) is a sperate resource type in kubernetes, which scales the number of pods, based on CPU, Memory utilization or some custom metrics. HPA helps to optimise the number of replicas that need to be maintained in an environment for your applications, which helps in distributing load. Behind the scenes, autoscaler controller updates the replicas of k8s resource, like a deployment, replicaset or statefulset.

The value it brings to table is a more resilient application, that can take care of itself at times of increase in demand for applications. But there should be enough pysical resources for the pods to expand. In this article I will try to eplain HPA aalong with a demo.

HPA controller peridically checks metrics. When the average cpu and memory goes too high, it tell k8s to increase the replica count of the target deployment. So it needs to know how to get the metrics from the cluster in the first place. We will have to connect the controller to a metrics collector. Values for CPU, memory etc are obtained from the average of your pods. The most important thing is to have limits defined in your deployments. Then provide the minimum and maximum pod count to the HPA configuration. HPA will also scale down based on a cooldown period. It uses below formula to calculate the number of replicas to maintain.

Algorithm to calculate Replicas

desired_replica = ceil(current_replica * (current value/ target value))

To understand this lets consider two different scenarios where in a decision need to be taken.

Scenario 1

Assume that an application has some business requirement and based on the physical limits we come up with some ideal CPU/memory numbers. We wish to maintain the target CPU below 60% for the worker nodes. Then there was a spike in traffic and the current utilization reached 90%. The deployment object had defined 3 replicas and the pods are taking heavy load. Lets use the algorithm to find the desired pod count.

Target CPU utilization : 60%
Current Utilization: 90%
Current pods: 3
Desired Pods = ceil(current_pods * (current value/ target value))
Desired Pods = ceil(3*(.9/.6)) = 5

In this scenario, HPA controller will change the replicas in the deployment to 5, and scheduler will update the need by adding 2 more pods.

Scenario 2

Lets assume that after some period the traffic has come down, and now the current utilization has come down to 20%. We wish the additional replicas be reduced automatically. Lets see how the HPA controller grants our wish.

Target CPU Utilization : 60%
Current Utilization: 20%
Current pods : 5
Desired Pods = ceil(5*(.3/.6)) = 2

Here it determines only 2 replicas is sufficient and tells deployment controller to reduce it to this number. But deployment object had a minimum replica count of 3. So kubernetes will honor that and reduce the replica count from 5 to 3.

Lab Setup

I am going to use a “kind” cluster on my local laptop. KIND is kubernetes in docker, which is good for demo and testing kubernetes applications. You get the flexibility to add worker nodes and manage multiple clusters using KIND. For more details checkout my article on Kind https://faun.pub/from-minikube-to-kind-c5b3a5cb95

I am using KINDs extraPortMapping feature for creating a cluster to forward ports from host to ingress controller. We will be using nginx ingress controller.

kind_cluster.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: hpacluster
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
- role: worker
- role: worker
- role: worker

Create the k8s cluster

asishs-MacBook-Air:kind$ kind create cluster --config hpa-lab.yaml
Creating cluster "hpacluster" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-hpacluster"
You can now use your cluster with:
kubectl cluster-info --context kind-hpaclusterHave a nice day! 👋

Check the cluster.

Lets create our deploy object. I am using a nginx image for this.

frontend.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
app: frontend
name: frontend
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: frontend
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: frontend
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30

frontend-service.yaml

apiVersion: v1
kind: Service
metadata:
labels:
app: frontend
name: frontend-svc
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 80
selector:
app: frontend

Install the nginx controller specific patches for KIND cluster. The manifests contains KIND specific patches to forward the hostPorts to the ingress controller.

kubectl -n ingress-nginx apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml

Wait till the ingress controller is ready to process

kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=90s

Create the ingress manifest for frontend service

frontend-ingress.yaml

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: frontend-ingress
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: frontend-svc
servicePort: 80

After applying the above objects, we should be able to reach the nginx service on localhost.

asishs-MacBook-Air:hpa$ kubectl get pods
NAME READY STATUS RESTARTS AGE
frontend-86968456b9-p7nc2 1/1 Running 0 60m
asishs-MacBook-Air:hpa$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend-svc ClusterIP 10.96.161.97 <none> 80/TCP 60m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 61m
asishs-MacBook-Air:hpa$ kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
frontend-ingress <none> * localhost 80 57m
asishs-MacBook-Air:hpa$ curl -I http://localhost
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2021 16:28:19 GMT
Content-Type: text/html
Content-Length: 612
Connection: keep-alive
Last-Modified: Tue, 25 May 2021 12:28:56 GMT
ETag: "60aced88-264"
Accept-Ranges: bytes

Lets add our HPA manifest and see what happens when we directly add it.

hpa.yaml

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: frontend-hpa
namespace: default
spec:
minReplicas: 3
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend

targetCPUUtilizationPercentage: 10

We are telling HPA to keep the target CPU utilization to 10%. Create the HPA object.

asishs-MacBook-Air:hpa$ kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/frontend-hpa created
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend <unknown>/10% 3 10 1 29s

Notice that the target percentage is shown as “unknown”. Which means our HPA controller is not able to get metrics of resources from this deployment. We can use a basic metric server to capture the metrics. If we need to define more advanced metrics, we can consider monitoring solutions like Prometheus.

Lets install Metrics server to our cluster.

asishs-MacBook-Air:hpa$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

Check we can gather metrics foerm the cluster

asishs-MacBook-Air:hpa$ k top nodes
W0620 22:53:30.277142 49629 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

When I check the logs for the metrics-server pod, I see that there is some certificate related errors

E0620 17:29:41.525715       1 scraper.go:139] "Failed to scrape node" err="Get \"https://172.18.0.4:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 172.18.0.4 because it doesn't contain any IP SANs" node="hpacluster-worker3"
E0620 17:29:41.534082 1 scraper.go:139] "Failed to scrape node" err="Get \"https://172.18.0.6:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 172.18.0.6 because it doesn't contain any IP SANs" node="hpacluster-worker4"

Lets disable this warning. Below is the complete set of args for the metrics-server deploy manifest which I am using.

...
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
...

Now lets check metrics server again.

asishs-MacBook-Air:hpa$ kubectl top nodes
W0621 07:24:43.894564 52298 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
hpacluster-control-plane 184m 4% 573Mi 28%
hpacluster-worker 126m 3% 122Mi 6%
hpacluster-worker2 25m 0% 106Mi 5%
hpacluster-worker3 85m 2% 93Mi 4%
hpacluster-worker4 74m 1% 93Mi 4%

Metrics server looks good now. Lets check HPA

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend <unknown>/10% 3 10 3 85s

Okey, HPA is still showing unknown. The missing part is adding limits to deploy object. Lets add that and see.

deploy manifest for frontend

spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 600m
memory: 128Mi
requests:
cpu: 200m
memory: 64Mi

HPA works only after we add limits to the deploy object. Now lets check HPA again.

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend <unknown>/10% 3 10 3 9m51s
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 10m

Please be patient, as the HPA controller will take some time to reflect. After sometime the target shows current utilization, in our case it is showing 0% current utilization and our target utilization is 10%, along with the min and max pods and replicas.

Let us hit our service with some traffic using apache benchmarking (ab) tool. Here I am starting a load on our frontend deploy object with 3 replica pods. When the traffic increases, we should see a spike in CPU and Memory utilisation for the pods which triggers the HPA controller to increase replicas. When the ab testing is finished, we should see the load reducing and correspondingly HPA controller reduces the pod count to the initial replica count.

  1. Starting the traffic to the service:
asishs-MacBook-Air:kind$ ab -n 1000000 -c 100 http://localhost/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)Server Software:
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 0 bytes
Concurrency Level: 100
Time taken for tests: 224.176 seconds
Complete requests: 98073
Failed requests: 0
Total transferred: 0 bytes
HTML transferred: 0 bytes
Requests per second: 437.48 [#/sec] (mean)
Time per request: 228.581 [ms] (mean)
Time per request: 2.286 [ms] (mean, across all concurrent requests)
Transfer rate: 0.00 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 129.0 0 19662
Processing: 0 1 4.3 1 544
Waiting: 0 0 0.0 0 0
Total: 0 2 129.1 1 19663
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 2
99% 2
100% 19663 (longest request)

2. Check HPA resource usage

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
asishs-MacBook-Air:hpa$ k get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 3%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 25%/10% 3 10 3

3. Check the POD resource usage using metrics server

asishs-MacBook-Air:hpa$ kubectl top pods
W0621 09:56:37.103195 53592 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) MEMORY(bytes)
frontend-78764b4d8-5k5ln 0m 1Mi
frontend-78764b4d8-fzmsd 0m 1Mi
frontend-78764b4d8-grdjr 0m 1Mi
asishs-MacBook-Air:hpa$ kubectl top pods
W0621 09:56:48.363569 53619 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME CPU(cores) MEMORY(bytes)
frontend-78764b4d8-5k5ln 38m 2Mi
frontend-78764b4d8-fzmsd 35m 3Mi
frontend-78764b4d8-grdjr 73m 1Mi

There is an increase of CPU and memory usage. In the HPA manifest for this deploy object, we had specified targetCPUUtilizationPercentage: 10

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
asishs-MacBook-Air:hpa$ k get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
frontend-hpa Deployment/frontend 0%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 3%/10% 3 10 3 138m
frontend-hpa Deployment/frontend 25%/10% 3 10 3
138m
frontend-hpa Deployment/frontend 4%/10% 3 10 6 138m
frontend-hpa Deployment/frontend 0%/10% 3 10 8 138m
frontend-hpa Deployment/frontend 0%/10% 3 10 8 139m

You can see that as soon as the CPU utilization reached 10%, it scaled the replicas. It is evident from the events logs.

asishs-MacBook-Air:hpa$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
26m Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 6
26m Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 8

4. After a while when the ab test is over, the load on the pods gradually reduces. When it comes below 10%, after a time period, HPA controller will reduce the replica count to the normal valuus.

asishs-MacBook-Air:hpa$ k get hpa -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
139m
frontend-hpa Deployment/frontend 0%/10% 3 10 8 145m
frontend-hpa Deployment/frontend 4%/10% 3 10 6 138m
frontend-hpa Deployment/frontend 0%/10% 3 10 3

From events logs;

asishs-MacBook-Air:hpa$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
2m51s Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 6
2m36s Normal ScalingReplicaSet deployment/frontend Scaled up replica set frontend-78764b4d8 to 8
4m6s Normal ScalingReplicaSet deployment/frontend Scaled down replica set frontend-78764b4d8 to 3

One thing which is noticable is that HPA is quick enough to scale out to handle the extra load, but it gives some time to scale in.

HPA has below default timing

  1. 30 seconds as interval between metrics check
  2. 3 mins for scale out operation
  3. 5 mins for scale in operation

These values are configurable on the controller side.

HPA thrashing

  • If HPA monitored the deployment and made immediate changes so frequently, then this would lead to thrashing or instability of service by adding and removing pods quickly.
  • We need to find a balance, where cluster is responsive to a trend in metrics and not too immediate.
  • We want to scale out fairly quickly to handle spikes and scale in a bit slower.
  • This is accomplished by “cool down” periods, by adding delays between two scale out or scale in operations, by giving a chance for the cluster to stabilize, honoring other scaling operations.

Best Practices

  • There should be resource limits on the pods specified. Without limits, HPA wont work.
  • The minimum replica count should be calculated properly and mentioned.
  • If your application requires some other metrics other than CPU, you have to deep dive on it and use the same. May be integrate with solutions like prometheus.
  • You need to consider that your application will take its own sweet time to start up, consider liveness-probe for example. So auto scaling will not be immediate. It can take several minutes to scale out. Give some buffer for your application to handle sudden spikes.
  • If your cluster is not able to handle the load, we might have to consider vertical scaling of nodes, or scaling cluster auto scaler.
  • Give a suitable buffer so that your application can handle spikes in traffic.
  • Your application should be stateless and no coupling between requests, with short requests.

Conclusion

HPA is a great feature in kubernetes which gives resilience to your application resources. It helps in mitigating a quick spike in traffic. But all is limited within the existing cluster capacity. It will not help you increase your clusters capacity. For that, you might have to consider vertical pod autoscaler, which will be my next topic. Thanks for the read and feel free to asks questions.

--

--

I enjoy exploring various opensource tools/technologies/ideas related to CloudComputing, DevOps, SRE and share my experience, understanding on the subject.