POD auto-scaling is the process of increasing AND decreasing infrastructure based on the demand
Auto Scaling can be done in 2 ways
Horizontal Scaling
Vertical Scaling
Horizontal Scaling means increasing the number of instances/systems
-Vertical Scaling means increasing the capacity of a single system
Note: For production, we will use Horizontal Scaling
HPA: Horizontal POD Autoscaling
VPA: Vertical POD Autoscaling (we don't use this)
HPA: Horizontal POD Autoscaler which will scale up/down the number of pod replicas of deployment, Replica Set or Replication Controller dynamically based on the observed Metrics (CPU or Memory Utilization)
HPA will interact with metric servers to identify the CPU/memory utilization of POD.
Note: By default metric service is not available
A metric server is an application that collects the metrics from objects such as pods, and nodes according to the state of CPU, and RAM and keeps them on time.
The metric server can be installed in the system as an add-on. YOu can take and install it directly from the repo
Metric installation Steps:
# Clone the repo
$ git clone https://github.com/rjthapaa/k8s_metrics_server.git
$ cd k8s_metrics_server
$ ls deploy/1.8+/
Apply manifest files from the manifest-server directly
$ kubectl apply -f deploy/1.8+/
Note : It will create a service account, role, role binding all the stuff.
# We can see metric server running in kube-system ns
$ kubectl get all -n kube-system
# Check the top nodes using metric server
$ kubectl top nodes
# Check the top pods using metric server
$ kubectl top pods
Note: When we install Metric server, it is installed under the kubernetes system namespaces
# To delete all the PODS and services
$ kubectl delete all –all
The metric server is installed now.
$ vi deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
spec:
replicas: 2
selector:
matchLabels:
name: hpa-example
template:
metadata:
labels:
name: hpa-example
spec:
containers:
- name: hpa-container
image: k8s.gcr.io/hpa-example
ports:
- name: http
containerPort: 80
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
...
$ vi service.yaml
---
apiVersion: v1
kind: Service
metadata:
name: hpa-examplesvc
labels:
name: hpaservice
spec:
type: ClusterIP
selector:
app: hpa-example
ports:
- port: 80
targetPort: 80
...
$ vi hpa.yaml
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-exampleAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-example
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
...
Apply the below commands
$ kubectl apply -f deployment.yaml
$ kubectl apply -f service.yaml
$ kubectl apply -f hpa.yaml
$ kubectl get pods
$ kubectl get svc
$ kubectl get hpa
Use the below commands to check the CPU utilization and Pod information
$ kubectl get all -n kube-system
$ kubectl top nodes
$ kubectl top pods
As of now, we don't have any load on the application
Now we need to simulate the load on worker nodes
We can simulate the load using a Busy box
Let's connect to the Master node and apply the load
$ kubectl run -it --rm loadgenerator --image=busybox
$ wget -q -O- http://hpaclusterservice
wget -q -o- http://hpa-example
Here we need to map the ClusterIP to Domain