POD Auto Scaling in K8s using HPA

POD auto-scaling is the process of increasing AND decreasing infrastructure based on the demand

Auto Scaling can be done in 2 ways

Horizontal Scaling
Vertical Scaling

Horizontal Scaling means increasing the number of instances/systems

-Vertical Scaling means increasing the capacity of a single system

Note: For production, we will use Horizontal Scaling

HPA: Horizontal POD Autoscaling

VPA: Vertical POD Autoscaling (we don't use this)

HPA: Horizontal POD Autoscaler which will scale up/down the number of pod replicas of deployment, Replica Set or Replication Controller dynamically based on the observed Metrics (CPU or Memory Utilization)

HPA will interact with metric servers to identify the CPU/memory utilization of POD.

Note: By default metric service is not available

A metric server is an application that collects the metrics from objects such as pods, and nodes according to the state of CPU, and RAM and keeps them on time.

The metric server can be installed in the system as an add-on. YOu can take and install it directly from the repo

Metric installation Steps:

# Clone the repo
$ git clone https://github.com/rjthapaa/k8s_metrics_server.git

$ cd k8s_metrics_server
$ ls deploy/1.8+/

Apply manifest files from the manifest-server directly
$ kubectl apply -f deploy/1.8+/
Note : It will create a service account, role, role binding all the stuff.

# We can see metric server running in kube-system ns
$ kubectl get all -n kube-system

# Check the top nodes using metric server
$ kubectl top nodes

# Check the top pods using metric server
$ kubectl top pods

Note: When we install Metric server, it is installed under the kubernetes system namespaces

# To delete all the PODS and services
$ kubectl delete all –all

The metric server is installed now.

$ vi deployment.yaml

---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: hpa-demo-deployment
spec:
 replicas: 2
 selector:
   matchLabels:
     name: hpa-example
 template:
   metadata:
     labels:
       name: hpa-example
   spec:
     containers:
     - name: hpa-container
       image: k8s.gcr.io/hpa-example
       ports:
       - name: http
         containerPort: 80
       resources:
         requests:
           cpu: 100m
           memory: 64Mi
         limits:
           cpu: 200m
           memory: 256Mi
...

$ vi service.yaml

---
apiVersion: v1
kind: Service
metadata:
 name: hpa-examplesvc
 labels:
  name: hpaservice
spec:
 type: ClusterIP
 selector:
  app: hpa-example
 ports:
 - port: 80
   targetPort: 80
...

$ vi hpa.yaml

---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: hpa-exampleAutoscaler
spec:
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: hpa-example
 minReplicas: 1
 maxReplicas: 10
 targetCPUUtilizationPercentage: 50
...

Apply the below commands

$ kubectl apply -f deployment.yaml
$ kubectl apply -f service.yaml
$ kubectl apply -f hpa.yaml
$ kubectl get pods
$ kubectl get svc
$ kubectl get hpa

Use the below commands to check the CPU utilization and Pod information

$ kubectl get all -n kube-system
$ kubectl top nodes
$ kubectl top pods

As of now, we don't have any load on the application

Now we need to simulate the load on worker nodes

We can simulate the load using a Busy box

Let's connect to the Master node and apply the load

$ kubectl run -it --rm loadgenerator --image=busybox

$ wget -q -O- http://hpaclusterservice
wget -q -o- http://hpa-example

Here we need to map the ClusterIP to Domain