Discussion on Horizontal Pod Autoscaler with a demo on local k8s cluster

Published in

DevOps for you

11 min readJun 27, 2021

“Resilience is our ability to bounce back from life’s challenges and to thrive, grow and expand.”

Photo Credit: Ryo Yoshitake https://unsplash.com/photos/cusz0Bg-5mQ

HorizontalPodAutoscaler(HPA) is a sperate resource type in kubernetes, which scales the number of pods, based on CPU, Memory utilization or some custom metrics. HPA helps to optimise the number of replicas that need to be maintained in an environment for your applications, which helps in distributing load. Behind the scenes, autoscaler controller updates the replicas of k8s resource, like a deployment, replicaset or statefulset.

The value it brings to table is a more resilient application, that can take care of itself at times of increase in demand for applications. But there should be enough pysical resources for the pods to expand. In this article I will try to eplain HPA aalong with a demo.

HPA controller peridically checks metrics. When the average cpu and memory goes too high, it tell k8s to increase the replica count of the target deployment. So it needs to know how to get the metrics from the cluster in the first place. We will have to connect the controller to a metrics collector. Values for CPU, memory etc are obtained from the average of your pods. The most important thing is to have limits defined in your deployments. Then provide the minimum and maximum pod count to the HPA configuration. HPA will also scale down based on a cooldown period. It uses below formula to calculate the number of replicas to maintain.

Algorithm to calculate Replicas

desired_replica = ceil(current_replica * (current value/ target value))

To understand this lets consider two different scenarios where in a decision need to be taken.

Scenario 1

Assume that an application has some business requirement and based on the physical limits we come up with some ideal CPU/memory numbers. We wish to maintain the target CPU below 60% for the worker nodes. Then there was a spike in traffic and the current utilization reached 90%. The deployment object had defined 3 replicas and the pods are taking heavy load. Lets use the algorithm to find the desired pod count.

Target CPU utilization : 60%
Current Utilization: 90%
Current pods: 3
Desired Pods = ceil(current_pods * (current value/ target value))
Desired Pods = ceil(3*(.9/.6)) = 5

In this scenario, HPA controller will change the replicas in the deployment to 5, and scheduler will update the need by adding 2 more pods.

Scenario 2

Lets assume that after some period the traffic has come down, and now the current utilization has come down to 20%. We wish the additional replicas be reduced automatically. Lets see how the HPA controller grants our wish.

Target CPU Utilization : 60%
Current Utilization: 20%
Current pods : 5
Desired Pods = ceil(5*(.3/.6)) = 2

Here it determines only 2 replicas is sufficient and tells deployment controller to reduce it to this number. But deployment object had a minimum replica count of 3. So kubernetes will honor that and reduce the replica count from 5 to 3.

Lab Setup

I am going to use a “kind” cluster on my local laptop. KIND is kubernetes in docker, which is good for demo and testing kubernetes applications. You get the flexibility to add worker nodes and manage multiple clusters using KIND. For more details checkout my article on Kind https://faun.pub/from-minikube-to-kind-c5b3a5cb95

I am using KINDs extraPortMapping feature for creating a cluster to forward ports from host to ingress controller. We will be using nginx ingress controller.

kind_cluster.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: hpacluster
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
- role: worker
- role: worker
- role: worker
- role: worker

Create the k8s cluster

asishs-MacBook-Air:kind$ kind create cluster --config hpa-lab.yaml
Creating cluster "hpacluster" ...
 ✓ Ensuring node image (kindest/node:v1.21.1) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-hpacluster"
You can now use your cluster with:kubectl cluster-info --context kind-hpaclusterHave a nice day! 👋

Check the cluster.

Lets create our deploy object. I am using a nginx image for this.

frontend.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    app: frontend
  name: frontend
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: frontend
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx
        ports:
        - containerPort: 80
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

frontend-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: frontend
  name: frontend-svc
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: frontend

Install the nginx controller specific patches for KIND cluster. The manifests contains KIND specific patches to forward the hostPorts to the ingress controller.

kubectl -n ingress-nginx apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml

Wait till the ingress controller is ready to process

kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=controller \
  --timeout=90s

Create the ingress manifest for frontend service

frontend-ingress.yaml

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: frontend-ingress
spec:
  rules:
  - http:
      paths:
      - path: /
        backend:
          serviceName: frontend-svc
          servicePort: 80

After applying the above objects, we should be able to reach the nginx service on localhost.

asishs-MacBook-Air:hpa$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
frontend-86968456b9-p7nc2   1/1     Running   0          60m
asishs-MacBook-Air:hpa$ kubectl get svc
NAME           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
frontend-svc   ClusterIP   10.96.161.97   <none>        80/TCP    60m
kubernetes     ClusterIP   10.96.0.1      <none>        443/TCP   61m
asishs-MacBook-Air:hpa$ kubectl get ingress
NAME               CLASS    HOSTS   ADDRESS     PORTS   AGE
frontend-ingress   <none>   *       localhost   80      57masishs-MacBook-Air:hpa$ curl -I http://localhost
HTTP/1.1 200 OK
Date: Sun, 20 Jun 2021 16:28:19 GMT
Content-Type: text/html
Content-Length: 612
Connection: keep-alive
Last-Modified: Tue, 25 May 2021 12:28:56 GMT
ETag: "60aced88-264"
Accept-Ranges: bytes

Lets add our HPA manifest and see what happens when we directly add it.

hpa.yaml

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: frontend-hpa
  namespace: default
spec:
  minReplicas: 3
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  targetCPUUtilizationPercentage: 10

We are telling HPA to keep the target CPU utilization to 10%. Create the HPA object.

asishs-MacBook-Air:hpa$ kubectl apply -f hpa.yaml
horizontalpodautoscaler.autoscaling/frontend-hpa created
asishs-MacBook-Air:hpa$ kubectl get hpa
NAME           REFERENCE             TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   <unknown>/10%   3         10        1          29s

Notice that the target percentage is shown as “unknown”. Which means our HPA controller is not able to get metrics of resources from this deployment. We can use a basic metric server to capture the metrics. If we need to define more advanced metrics, we can consider monitoring solutions like Prometheus.

Lets install Metrics server to our cluster.

asishs-MacBook-Air:hpa$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

Check we can gather metrics foerm the cluster

asishs-MacBook-Air:hpa$ k top nodes
W0620 22:53:30.277142   49629 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

When I check the logs for the metrics-server pod, I see that there is some certificate related errors

E0620 17:29:41.525715       1 scraper.go:139] "Failed to scrape node" err="Get \"https://172.18.0.4:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 172.18.0.4 because it doesn't contain any IP SANs" node="hpacluster-worker3"
E0620 17:29:41.534082       1 scraper.go:139] "Failed to scrape node" err="Get \"https://172.18.0.6:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 172.18.0.6 because it doesn't contain any IP SANs" node="hpacluster-worker4"

Lets disable this warning. Below is the complete set of args for the metrics-server deploy manifest which I am using.

...
spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls
...

Now lets check metrics server again.

asishs-MacBook-Air:hpa$ kubectl top nodes
W0621 07:24:43.894564   52298 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
hpacluster-control-plane   184m         4%     573Mi           28%
hpacluster-worker          126m         3%     122Mi           6%
hpacluster-worker2         25m          0%     106Mi           5%
hpacluster-worker3         85m          2%     93Mi            4%
hpacluster-worker4         74m          1%     93Mi            4%

Metrics server looks good now. Lets check HPA

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME           REFERENCE             TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   <unknown>/10%   3         10        3          85s

Okey, HPA is still showing unknown. The missing part is adding limits to deploy object. Lets add that and see.

deploy manifest for frontend

spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx
        ports:
        - containerPort: 80
          protocol: TCP
        resources:
          limits:
            cpu: 600m
            memory: 128Mi
          requests:
            cpu: 200m
            memory: 64Mi

HPA works only after we add limits to the deploy object. Now lets check HPA again.

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME           REFERENCE             TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   <unknown>/10%   3         10        3          9m51sasishs-MacBook-Air:hpa$ kubectl get hpa
NAME           REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   0%/10%    3         10        3          10m

Please be patient, as the HPA controller will take some time to reflect. After sometime the target shows current utilization, in our case it is showing 0% current utilization and our target utilization is 10%, along with the min and max pods and replicas.

Let us hit our service with some traffic using apache benchmarking (ab) tool. Here I am starting a load on our frontend deploy object with 3 replica pods. When the traffic increases, we should see a spike in CPU and Memory utilisation for the pods which triggers the HPA controller to increase replicas. When the ab testing is finished, we should see the load reducing and correspondingly HPA controller reduces the pod count to the initial replica count.

Starting the traffic to the service:

asishs-MacBook-Air:kind$ ab -n 1000000 -c 100 http://localhost/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking localhost (be patient)Server Software:
Server Hostname:        localhost
Server Port:            80Document Path:          /
Document Length:        0 bytesConcurrency Level:      100
Time taken for tests:   224.176 seconds
Complete requests:      98073
Failed requests:        0
Total transferred:      0 bytes
HTML transferred:       0 bytes
Requests per second:    437.48 [#/sec] (mean)
Time per request:       228.581 [ms] (mean)
Time per request:       2.286 [ms] (mean, across all concurrent requests)
Transfer rate:          0.00 [Kbytes/sec] receivedConnection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1 129.0      0   19662
Processing:     0    1   4.3      1     544
Waiting:        0    0   0.0      0       0
Total:          0    2 129.1      1   19663Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      1
  95%      1
  98%      2
  99%      2
 100%  19663 (longest request)

2. Check HPA resource usage

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME           REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   0%/10%    3         10        3          138m
asishs-MacBook-Air:hpa$ k get hpa -w
NAME           REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   0%/10%    3         10        3          138m
frontend-hpa   Deployment/frontend   3%/10%    3         10        3          138m
frontend-hpa   Deployment/frontend   25%/10%   3         10        3

3. Check the POD resource usage using metrics server

asishs-MacBook-Air:hpa$ kubectl top pods
W0621 09:56:37.103195   53592 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                       CPU(cores)   MEMORY(bytes)
frontend-78764b4d8-5k5ln   0m           1Mi
frontend-78764b4d8-fzmsd   0m           1Mi
frontend-78764b4d8-grdjr   0m           1Miasishs-MacBook-Air:hpa$ kubectl top pods
W0621 09:56:48.363569   53619 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
NAME                       CPU(cores)   MEMORY(bytes)
frontend-78764b4d8-5k5ln   38m          2Mi
frontend-78764b4d8-fzmsd   35m          3Mi
frontend-78764b4d8-grdjr   73m          1Mi

There is an increase of CPU and memory usage. In the HPA manifest for this deploy object, we had specified targetCPUUtilizationPercentage: 10

asishs-MacBook-Air:hpa$ kubectl get hpa
NAME           REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   0%/10%    3         10        3          138m
asishs-MacBook-Air:hpa$ k get hpa -w
NAME           REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
frontend-hpa   Deployment/frontend   0%/10%    3         10        3          138m
frontend-hpa   Deployment/frontend   3%/10%    3         10        3          138m
frontend-hpa   Deployment/frontend   25%/10%   3         10        3          138m
frontend-hpa   Deployment/frontend   4%/10%    3         10        6          138m
frontend-hpa   Deployment/frontend   0%/10%    3         10        8          138m
frontend-hpa   Deployment/frontend   0%/10%    3         10        8          139m

You can see that as soon as the CPU utilization reached 10%, it scaled the replicas. It is evident from the events logs.

asishs-MacBook-Air:hpa$ kubectl get events
LAST SEEN   TYPE      REASON                   OBJECT                                 MESSAGE26m         Normal    ScalingReplicaSet        deployment/frontend                    Scaled up replica set frontend-78764b4d8 to 6
26m         Normal    ScalingReplicaSet        deployment/frontend                    Scaled up replica set frontend-78764b4d8 to 8

4. After a while when the ab test is over, the load on the pods gradually reduces. When it comes below 10%, after a time period, HPA controller will reduce the replica count to the normal valuus.

asishs-MacBook-Air:hpa$ k get hpa -w
NAME           REFERENCE             TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
139m
frontend-hpa   Deployment/frontend   0%/10%    3         10        8         145m
frontend-hpa   Deployment/frontend   4%/10%    3         10        6          138m
frontend-hpa   Deployment/frontend   0%/10%    3         10        3

From events logs;

asishs-MacBook-Air:hpa$ kubectl get events
LAST SEEN   TYPE      REASON                   OBJECT                                 MESSAGE
2m51s       Normal    ScalingReplicaSet   deployment/frontend                    Scaled up replica set frontend-78764b4d8 to 6
2m36s       Normal    ScalingReplicaSet   deployment/frontend                    Scaled up replica set frontend-78764b4d8 to 8
4m6s        Normal    ScalingReplicaSet   deployment/frontend                    Scaled down replica set frontend-78764b4d8 to 3

One thing which is noticable is that HPA is quick enough to scale out to handle the extra load, but it gives some time to scale in.

HPA has below default timing

30 seconds as interval between metrics check
3 mins for scale out operation
5 mins for scale in operation

These values are configurable on the controller side.

HPA thrashing

If HPA monitored the deployment and made immediate changes so frequently, then this would lead to thrashing or instability of service by adding and removing pods quickly.
We need to find a balance, where cluster is responsive to a trend in metrics and not too immediate.
We want to scale out fairly quickly to handle spikes and scale in a bit slower.
This is accomplished by “cool down” periods, by adding delays between two scale out or scale in operations, by giving a chance for the cluster to stabilize, honoring other scaling operations.

Best Practices

There should be resource limits on the pods specified. Without limits, HPA wont work.
The minimum replica count should be calculated properly and mentioned.
If your application requires some other metrics other than CPU, you have to deep dive on it and use the same. May be integrate with solutions like prometheus.
You need to consider that your application will take its own sweet time to start up, consider liveness-probe for example. So auto scaling will not be immediate. It can take several minutes to scale out. Give some buffer for your application to handle sudden spikes.
If your cluster is not able to handle the load, we might have to consider vertical scaling of nodes, or scaling cluster auto scaler.
Give a suitable buffer so that your application can handle spikes in traffic.
Your application should be stateless and no coupling between requests, with short requests.

Conclusion

HPA is a great feature in kubernetes which gives resilience to your application resources. It helps in mitigating a quick spike in traffic. But all is limited within the existing cluster capacity. It will not help you increase your clusters capacity. For that, you might have to consider vertical pod autoscaler, which will be my next topic. Thanks for the read and feel free to asks questions.