1. preface

After the containerization of business applications, it is inevitable to face the problem that the Node resources of the Kubernetes cluster are not configured enough to cause the pods to run in time, and the purchase of too many Nodes will lead to idle waste of resources.

So how to use Kubernetes’ container orchestration capabilities and the flexibility and scale of cloud resources to ensure high elasticity and low cost of services?

This topic explores how to use kubernetes’ auto scaling components to improve the auto scaling capabilities of your applications and optimize your computing costs.

Note: Node in this article is equivalent to node. Node in a cluster and nodes in a cluster are one meaning.

2. auto scaling overview

auto scaling is a management service that automatically adjusts elastic computing resources economically according to business requirements and policies.

auto scaling can be divided into two dimensions:

  • Scheduling layer elasticity, which is mainly responsible for modifying scheduling capacity changes in Workload (e.g. Deployment). For example, HPA is a typical scheduling layer elastic component, through which you can adjust the number of replicas of the application, and the adjusted number of replicas will change the scheduling capacity occupied by the current Workload, thereby achieving the scaling of the scheduling layer.
  • Resource layer elasticity, mainly when the capacity planning of the cluster cannot meet the scheduling capacity of the cluster, the scheduling capacity is supplemented by horizontally popping Node.

the elastic components and capabilities of the two layers can be used separately or in combination, and decoupled in between through the capacity state at the scheduling level.

There are three different auto scaling strategies in Kubernetes, namely HPA (HorizontalPodAutoscaling), VPA (VerticalPodAutoscaling), and CA (ClusterAutoscaler). Among them, the scaling object of HPA and VPA is Pod, and the scaling object of CA is Node.

  • HPA: Scheduling layer elastic component, Built-in Kubernetes, pod horizontal scaling component, mainly for online business.
  • VPA: Scheduling layer elastic component, Kubernetes community open source, pod vertical scaling component, mainly for large monolithic applications. For apps that can’t scale horizontally, typically when a pod recovers unexpectedly.
  • CA: Resource layer elastic component, Kubernetes community open source, Node horizontal scaling component. Applicable in all scenarios.

In addition, major cloud vendors (such as Alibaba Cloud and Tencent Cloud) also provide Virtual Node components to provide serverless runtime environments. Users don’t need to care about Node resources, they just pay as much as they want for pods. It is suitable for scenarios such as online traffic bursts, CI/CD, and big data operations. When introducing Virtual Node, this article uses Alibaba Cloud as an example.

3. Pod scales horizontally with HPA

HPA (Horizontal Pod Autoscaler) is a built-in component of Kubernetes and the most commonly used pod resiliency scheme. HPA automatically adjusts the number of replicas of workload. HPA auto scaling features make Kubernetes very flexible adaptive, can quickly expand multiple pod copies within the user settings to cope with the sharp surge in business load, can also be in the case of smaller business load according to the actual situation to save computing resources to other services, the whole process of automation without human intervention, suitable for service fluctuations, the number of services and the need for frequent scaling of business scenarios.

HPA is suitable for objects that implement scale interfaces such as Deployment and StatefulSet, not objects that cannot be scaled, such as DaemonSet resources.

Kubernetes has built-in resources, usually creating a HorizontalPodAutoscaler object for workloads that need to be configured for horizontal auto scaling, and a Workload corresponding to a HorizontalPodAutoscaler.HorizontalPodAutoscaler

3.1 HPA SCALING PROCESS

The Pod-level auto-scaling feature is implemented by Kubernetes API resources and controllers. Resource utilization metrics determine the behavior of the controller, which periodically adjusts the number of replicas of the service pod based on pod resource utilization so that the measurement level of the workload matches the target value set by the user. Taking Deployment and CPU usage as an example, the scaling process is shown in the following figure:

THE DEFAULT HPA ONLY SUPPORTS CPU- AND MEMORY-BASED AUTO-SCALING, SUCH AS AUTOMATICALLY INCREASING THE NUMBER OF APP INSTANCES WHEN CPU USAGE EXCEEDS A THRESHOLD AND AUTOMATICALLY REDUCING THE NUMBER OF INSTANCES WHEN CPU USAGE FALLS BELOW THE THRESHOLD.

However, the default HPA-driven elastic dimension is relatively single, which cannot meet the daily operational needs. HPA can be used in conjunction with open source Keda, which can drive resiliency from the dimensions of events, timing, custom metrics, and more.

3.2 HPA CONSIDERATIONS

  • IF MULTIPLE AUTO SCALING METRICS ARE SET, HPA CALCULATES THE TARGET NUMBER OF REPLICAS BASED ON EACH METRIC AND TAKES THE MAXIMUM VALUE FOR SCALING.
  • When the metric type is selected as CPU Utilization (Per Request), you must set the CPU Request for the container.
  • HPA HAS A 10% VOLATILITY FACTOR WHEN CALCULATING THE TARGET NUMBER OF REPLICAS. HPA DOES NOT ADJUST THE NUMBER OF REPLICAS IF IT IS WITHIN THE FLUCTUATING RANGE.
  • If the service corresponds to a Deployment.spec.replicas value of 0, HPA will not work.
  • If you bind multiple HPAs to a single deployment at the same time, the CREATED HPAs take effect at the same time, causing duplicate copies of the workload to scale.

4. Pod scales vertically to VPA

VPA (VerticalPodAutoscaling) is a community open source component that needs to be manually deployed and installed on a Kubernetes cluster, and VPA provides the ability to scale vertically.

VPA automatically sets resource usage limits for Pods based on their resource usage, allowing the cluster to schedule Pods to the best nodes with sufficient resources. The VPA also maintains the percentage of resource Requests and Limits in the original container definition. In addition, VPA can be used to recommend more reasonable requests to users and improve the resource utilization of containers while ensuring that they have sufficient resources to use.

4.1 VPA BENEFITS

COMPARED TO HPA, VPA OFFERS THE FOLLOWING ADVANTAGES:

  • VPA SCALES FOR STATEFUL APPLICATIONS, WHILE HPA IS NOT SUITABLE FOR HORIZONTAL SCALING FOR STATEFUL APPLICATIONS.
  • Some applications have too large a Request setting, and resource utilization is still low when shrinking to a single pod, at which point you can use VPA to vertically reduce capacity to improve resource utilization.

4.2 VPA LIMITATIONS

THERE ARE THE FOLLOWING LIMITATIONS AND CONSIDERATIONS FOR USING VPA:

  • Updating the resource configuration of a running pod is an experimental feature of VPA that causes pods to rebuild and restart, and potentially be scheduled to other Nodes.
  • VPA does not evict pods that are not under replica controller management. Currently, for this type of Pod, Auto mode is equivalent to Initial mode.
  • CURRENTLY VPA CANNOT RUN CONCURRENTLY WITH HPA, WHICH MONITORS CPU AND MEMORY METRICS, UNLESS HPA ONLY MONITORS METRICS OTHER THAN CPU AND MEMORY.
  • VPA uses the admission webhook as its admission controller. If you have other admission webhooks in your cluster, you need to make sure that they don’t conflict with VPA. The order of execution of the Admission Controller is defined in the configuration parameters of the API Server.
  • VPA HANDLES THE VAST MAJORITY OF OOM EVENTS THAT OCCUR, BUT DOES NOT GUARANTEE THAT IT WILL BE VALID IN ALL SCENARIOS.
  • THE PERFORMANCE OF VPA HAS NOT BEEN TESTED IN LARGE CLUSTERS.
  • The value of the VPA’s modification of the Pod resource request may exceed the actual resource limit, such as the Node resource cap, idle resource, or resource quota, causing the Pod to be in the Pending state and unable to be scheduled. The simultaneous use of ClusterAutoscaler can solve this problem to some extent.
  • Multiple VPA pairing the same pod at the same time results in undefined behavior.

5. Node scales horizontally to the CA

Both HPA and VPA are elastomeric at the scheduling layer, addressing the auto scaling of pods. If the overall resource capacity of the cluster cannot meet the scheduled capacity of the cluster, the pods ejected by HPA and VPA will still be in the Pending state. In this case, you need to scale the resource layer.

In Kubernetes, Node auto-scaling is achieved horizontally through the community’s open source CA (ClusterAutoscaler) component. Community CA supports setting multiple scaling groups and setting scaling and scaling policies. On the basis of community CAs, major cloud vendors will add some unique functions, such as support for multi-AZ, multi-instance types, and multiple scaling modes, etc., to meet different Node scaling scenarios.

In Kubernetes, Node auto scaling works differently than the traditional usage threshold-based model.

5.1 traditional auto scaling model

The traditional auto scaling model is based on usage, for example, there are 3 Nodes in a cluster, and when the CPU and memory usage of nodes in the cluster exceed a certain threshold, a new Node pops up. But when you think deeply, you will find the following problems:

  • How is the Node resource usage threshold selected and judged?

in a cluster, some hotspot nodes will be more utilized, while the other node will be less utilized. if you select average utilization, auto scaling may not be timely. if you use the lowest node utilization, it will also cause a waste of pop-up resources.

  • How does the node relieve stress after popping up?

In Kubernetes, the application is a pod as the smallest unit. When the resource utilization of a pod is high, even if the total number of Nodes or clusters in which it is located at this time triggers elastic scaling, but the number of pods in the application and the corresponding resource limits of the pods do not change, the load pressure cannot be transferred to the newly expanded Node.

  • How to determine and execute node downsizing?

If you judge whether Node is shrunk based on resource usage, then it is very likely that the Request is large, but the Usage is very small Pod is evicted, when there are more Pods of this type in the cluster, it will cause the scheduling resources of the cluster to be full, and some pods cannot be scheduled.

5.2 Kubernetes Node Scaling Model

How does Kubernetes node scaling solve the above problem? Kubernetes is solved through a two-tier elastic model of scheduling and resource decoupling.

Changes in the application copy are triggered based on resource usage, i.e. changes in the scheduling unit (pod). When the scheduling water level of the cluster reaches 100%, the elastic expansion of the resource layer will be triggered, and when the Node resources are popped up, the unschedulable pods will be automatically scheduled to the newly popped Node, thereby reducing the load of the entire application.

  • How do I tell if Node pops up?

CaAs are triggered by listening to Pods in the Pending state. When the reason for the pod being pending is insufficient scheduling resources, it triggers the CA’s analog scheduling, and the simulation scheduler calculates which scaling group in the configured scaling group can schedule these Pending pods after the node pops up. If there is a scaling group that can be satisfied, then the corresponding Node will pop up.

Simulation scheduling is to treat a scaling group as an abstract Node, and the model specifications configured in the scaling group correspond to the CPU/memory capacity of Node, and then set the Label and Taint above the scaling group, that is, the Label and Taint of Node. The simulation scheduler incorporates the abstract Node into the scheduling reference when scheduling the simulation. If Pending’s Pod can be scheduled to an abstract Node, the number of Nodes required is calculated and the scaling group is driven to pop nodes.

  • How to judge the shrinkage of Node?

First of all, only the Node popn up by Auto Scaling will be shrunk, and the static Node cannot be taken over by the CA. The judgment of the shrinkage is made for each Node individually. When the scheduling utilization of any Node is lower than the scheduling threshold set, the Node’s shrinkage judgment is triggered. At this point, the CA will try to simulate evicting the pods on the Node to determine whether the current Node can be drained thoroughly. If there are special Pods (non-DaemonSet Pods in the kube-system namespace, PDB-controlled Pods, Pods not created by Controllers, etc.), the Node will be skipped and other candidate Nodes will be selected. When a Node eviction occurs, drainage occurs, evicting the Pods on the Node to other Nodes, and then offline the Node.

  • How do I choose between multiple scaling groups when scaling Node?

The choice between different scaling groups, which is actually equivalent to different virtual Nodes, and like the scheduling strategy, there is also a scoring mechanism. Nodes that first conform to the scheduling policy are filtered out first, and in Nodes that conform to the scheduling policy, they are selected according to affinity and other affinity policies. If none of the above policies exist, by default the CA makes a choice through the least-waste policy. The core of the least-waste strategy is that after simulating the pop-up Node, there are fewer resources left, and waste is reduced as much as possible.

5.3 CA RESTRICTIONS

THERE ARE THE FOLLOWING LIMITATIONS AND CONSIDERATIONS FOR USING CAS:

  • The number of scalable Nodes is limited by private networks, container networks, cloud provider Kubernetes cluster node quotas, and purchasable ECS quotas.
  • Scale-out Node is subject to current availability. If the model is sold out, it will not be able to expand the node.
  • Node has a long wait time from triggering scaling to delivery, which is not suitable for scenarios that require fast pod startup.
  • When a Node is downsized, if there are Pods on a Node that cannot be evicted, the Node will not be able to go offline, resulting in waste of resources.

6. Virtual Node

Virtual Node is a plugin developed by major cloud vendors based on the community open source project Virtual Kubelet as a virtual Kubelet API for connecting Kubernetes clusters and other platforms. The main scenario for Virtual Kubelets is to extend the Kubernetes API to a serverless container platform.

With virtual nodes, Kubernetes clusters can easily gain great resiliency without being limited by the compute capacity of the cluster’s nodes. Users can also flexibly and dynamically create pods on demand, eliminating the need for cluster capacity planning.

6.1 Introduction to Virtual Kubelets

Each node in a Kubernetes cluster starts a Kubelet process, which can be understood as an Agent in a Server-Agent architecture.

Virtual Kubelet is based on the typical features of Kubelet implementation, disguised as Kubelet upwards, thereby simulating node objects, docking Kubernetes native resource objects, and providing APIs down to connect with providers provided by other resource management platforms. Different platforms implement The Virtual Kubelet-defined approach, allowing Node to be supported by its corresponding Provider, implementing Serverless, or managing other Kubernetes clusters through providers.

The Virtual Kubelet emulates the Node resource object and is responsible for scheduling the Pod to a virtual node disguised by the Virtual Kubelet and then managing the Pod’s lifecycle.

From the perspective of Kubernetes API Server, Virtual Kubelet looks like a normal Kubelet, but the key difference is that Virtual Kubelet schedules pods elsewhere, such as in a cloud serverless API, rather than on a real Node.

The architecture of Virutal Kubelet is as follows:

6.2 ALIBABA CLOUD ECI ELASTIC SCHEDULING

The major cloud vendors basically provide serverless container service and Kubernetes Virtual Node capabilities, this article takes Alibaba Cloud as an example to introduce Alibaba Cloud’s elastic scheduling based on Virtual Node and ECI.

6.2.1 Introduction to Alibaba Cloud ECI and Virtual Node

Elastic Container Instance (ECI) is a container running service provided by Alibaba Cloud in combination with container and Serverless technology. By using ECI, when you deploy containers on Alibaba Cloud, you can directly run pods and containers on Alibaba Cloud without purchasing and managing ECS. From purchasing and configuring ECS to deploy containers (ECS mode) to directly deploying containers (ECI mode), ECI eliminates the operation and maintenance and management of the underlying server, and only needs to pay for the resources configured by the container (metered per second), which can save costs.

Alibaba Cloud Kubernetes Virtual Node is implemented through the ack-virtual-node component, which is based on the community open source project Virtual Kubelet, which extends the support for Aliyun Provider and makes extensive optimizations to seamlessly connect Kubernetes with elastic container instance ECI.

After having a virtual node, when the Kubernetes cluster Node resources are insufficient, there is no need to plan the compute capacity of the Node, and you can directly create a pod on demand under the virtual node, each pod corresponds to an ECI instance, and the ECI and the pods on the real Node in the cluster communicate with each other.

virtual nodes are ideal for running in the following scenarios, greatly reducing computing costs and improving computational elasticity efficiency:

  • USE ECI AS AN ELASTIC RESOURCE POOL TO HANDLE BURSTS.
  • online services have obvious peak and trough characteristics, and the use of virtual nodes can significantly reduce the maintenance of fixed resource pools and reduce computing costs.
  • offline tasks of computing, such as machine learning, are business applications that are not real-time demanding but are cost-sensitive.
  • CI/CD Pipeline ,例如 Jenkins、Gitlab-Runner 。
  • Job task, a scheduled task.

Virtual nodes and ECIs are like the “magic pocket” of the Kubernetes cluster, allowing us to get rid of the annoyance of Node’s insufficient computing power, and also avoid the idle waste of Node, meet the imagination of unlimited computing power, create pods on demand, and easily cope with the peaks and troughs of computing.

6.2.2 Scheduling Pods to ECI

In a mode that uses a mix of ECI and normal Node, you can generally schedule a pod to ECI in the following three ways:

(1) Configure the Pod Label

If there are individual pods that need to be scheduled to run on ECI, you can add a specific Label ( ) directly to the pod, and the pod will run on the ECI instance of the virtual node.alibabacloud.com/eci=true

(2) Configure the Namespace Label

If there is a class of Pods that need to be scheduled to run on ECI, you can create a Namespace and add a specific Label ( ) to that Namespace, and all Pods under that Namespace will run on the ECI instance of the virtual node.alibabacloud.com/eci=true

(3) CONFIGURE ECI ELASTIC SCHEDULING

ECI elastic scheduling is an elastic scheduling strategy provided by Alibaba Cloud, when you deploy services, you can add Annotations to the Pod Template to declare the resources that only use ordinary Node or the ECI resources of virtual nodes, or automatically use ECI resources when the resources of ordinary Node are insufficient to meet the different needs of elastic resources in different scenarios.

The corresponding Annotations configuration key is , the values are as follows:alibabacloud.com/burst-resource

  • By default, when you do not fill in Annotations, only the existing ECS resources of the cluster are used.
  • eci : WHEN THE ECS RESOURCES OF THE CURRENT CLUSTER ARE INSUFFICIENT, USE ECI ELASTIC RESOURCES.
  • eci_only : USE ONLY ECI ELASTIC RESOURCES, NOT ECS RESOURCES OF THE CLUSTER.

THE ABOVE THREE METHODS ALL REQUIRE CERTAIN MODIFICATIONS TO THE EXISTING RESOURCES, AND CANNOT ACHIEVE ZERO INTRUSION. FOR THIS CASE, ECI SUPPORTS CONFIGURATION TO RESOLVE.ECI Profile

In the ECI Profile, you can declare a Label that needs to match the Namespace or Pod, and for the Pods that the Label can match, it will be automatically scheduled to the ECI.

You can also declare the Annotation and Label that need to be appended to the Pod in the ECI Profile, and for the Pod on which the Label can match, the configured Annotation and Label will also be automatically appended.

6.3 Problems with mixing Virtual Node with normal Node

Still taking Alibaba Cloud as an example, the Kubernetes cluster on Alibaba Cloud deploys Virtual Node, using a mix of ECI and ordinary Node.

Imagine such a scenario: an application (Deployment) is configured with HPA and ECI elastic scheduling, in the case of insufficient ordinary Node resources, when the HPA expansion is triggered, some pods will be scheduled to ECI, but when HPA is reduced, the ECI instance will not be fixed, and the pod on the ordinary Node may be deleted and the ECI instance will be retained. Because ECI is pay-as-you-go, if the usage time is too long, the cost will be more expensive than the ECS (Alibaba Cloud Server) that is subscribed to the year.

this leads to two problems that need to be solved:

  • scheduling problem: how to control the change of scheduling policy when the number of replicas reaches a certain value.
  • Lifecycle management issues: How to prioritize certain pods during lifecycle management.

Neither Kubernetes’ native controller nor Workload handles the above scenarios well. The Elastic Workload component of Alibaba Cloud Kubernetes (not open source) and the OpenKruise of Alibaba Cloud open source provide good solutions.

7. Elastic Workload 和 OpenKruise

7.1 Introduction to Elastic Workload

Elastic Workload is a component unique to Alibaba Cloud Kubernetes, after installing this component, there will be a new resource type, ElasticWorkload is used similarly to HPA, through external mounting, no intrusion into the original business.ElasticWorkload

A typical ElasticWorkload is divided into two main parts:

  • The sourceTarget section mainly defines the type of the original Workload and the range within which the number of replicas can be varied.
  • The elasticUnit part is an array that defines the scheduling policy for the elastic units, and if there are multiple elastic units, in the order of the template.

The Elastic Workload Controller listens to the original Workload and clones and generates the Elastic Unit’s Workload based on the scheduling policy set by the elastic unit. Depending on the change in the total replicas in the ElasticWorkload, the number of replicas on the original Workload and elastic units is dynamically allocated.

Here’s an example of an ElasticWorkload:

apiVersion: autoscaling.alibabacloud.com/v1beta1kind: ElasticWorkloadmetadata:  name: elasticworkload-samplespec:  sourceTarget:    name: nginx-deployment-basic    kind: Deployment    apiVersion: apps/v1    min: 2    max: 4  replicas: 6  elasticUnit:  - name: virtual-kubelet    labels:      alibabacloud.com/eci: "true"    # min: 0   每个单元也可以指定自己的上下限。    # max: 10

copy the code

In addition, ElasticWorkload also supports use with HPA, which can act on ElasticWorkload, as shown in the following figure:

ElasticWorkload dynamically adjusts the distribution of copies of each cell based on the state of HPA, for example, if the current scale is from 6 to 4 copies, the copy of the elastic cell will be preferentially shrunk.

Here’s an example of HPA acting on an ElasticWorkload:

apiVersion: autoscaling/v2beta2kind: HorizontalPodAutoscalermetadata:  name: elastic-workload-demo  namespace: defaultspec:  scaleTargetRef:    apiVersion: autoscaling.alibabacloud.com/v1beta1    kind: ElasticWorkload    name: elasticworkload-sample  minReplicas: 2  maxReplicas: 10  metrics:  - type: Resource    resource:      name: cpu      target:        type: Utilization        averageUtilization: 50

copy the code

On the one hand, ElasticWorkload generates multiple workloads by cloning and overwriting scheduling policies, which realizes the management of scheduling policies, and on the other hand, adjusts the copy allocation of the original Workload and elastic units through the upper layer copy calculation, and realizes the prioritization of processing for a part of the pods.

The original Workload for ElasticWorkload currently only supports Deployment, not CloneSet (an enhanced Workload from OpenKruise, as mentioned below), and no support plans for the short term.

7.2 Introduction to OpenKruise

OpenKruise is an enhanced capability suite for Kubernetes, focusing on the deployment, upgrade, operation and maintenance of cloud-native applications, and stability protection. All features are extended via standards such as CRD and can be applied to any Kubernetes cluster with versions 1.16 or later.

7.2.1 OpenKruise Capabilities

  • Enhanced version of Workload

OpenKruise includes a series of enhanced versions of Workload, such as CloneSet, Advanced StatefulSet, Advanced DaemonSet, BroadcastJob, and more. Not only do they support basic features similar to Kubernetes native Workloads, but they also provide capabilities such as in-place upgrades, configurable scaling/publishing policies, concurrent operations, and more.

  • bypass management for apps

OpenKruise provides a variety of ways to manage application sidecar containers, multi-region deployments by bypassing them, which means that users can implement them without modifying the workload of the application.

For example, UnitedDeployment can provide a template to define an app and manage pods in multiple regions by managing multiple workloads. WorkloadSpread, on the other hand, can constrain stateless Workloads to scale out the regional distribution of pods, giving a single Workload the ability to multi-region and elastic deployment.

OpenKruise uses WorkloadSpread to solve the problem of mixing Virtual Node and ordinary Node mentioned above.

  • high availability protection

OpenKruise also makes a lot of efforts to protect the application for high availability. It currently protects Kubernetes resources from cascading deletion mechanisms, including CRDs, namespaces, and almost all Workload-type resources. Compared to Kubernetes’ native PDB, which only provides protection against Pod Eviction, PodUnavailableBudget can protect Pod Deletion, Eviction, Update, and many other scenarios.

7.2.2 WorkloadSpread

When OpenKruise is installed in a Kubernetes cluster, an additional WorkloadSpread resource is added. WorkloadSpread distributes Workload’s Pods to different types of Nodes according to certain rules, giving a single Workload multi-region deployment, elastic deployment, and granular management capabilities in a non-intrusive manner.

some common rules include:

  • Break up horizontally. For example, the average dispersion by node, Available Zone, etc.
  • Break up at a specified scale. For example, deploy pods proportionally into several specified Available Zones.
  • prioritized partition management. for example:
  • PRIORITY DEPLOYMENT TO ECS, DEPLOYMENT TO ECI WHEN RESOURCES ARE INSUFFICIENT.
  • Prioritize deploying a fixed number of pods to ECS and the rest to ECI.
  • customized partition management. for example:
  • Control Workload to deploy different numbers of Pods to different CPU architectures.
  • Make sure that the Pods on different CPU architectures have different resource quotas.

Each WorkloadSpread defines multiple regions (defined as subsets), each corresponding to a maxReplicas number. WorkloadSpread uses webhooks to inject domain information defined by a subset while controlling the order in which the pods scale.

Here’s an example of WorkloadSpread:

apiVersion: apps.kruise.io/v1alpha1kind: WorkloadSpreadmetadata:  name: workloadspread-demospec:  targetRef:    apiVersion: apps/v1 | apps.kruise.io/v1alpha1    kind: Deployment | CloneSet    name: workload-xxx  subsets:    - name: subset-a      requiredNodeSelectorTerm:        matchExpressions:          - key: topology.kubernetes.io/zone            operator: In            values:              - zone-a    preferredNodeSelectorTerms:      - weight: 1        preference:        matchExpressions:          - key: another-node-label-key            operator: In            values:              - another-node-label-value      maxReplicas: 3      tolertions: []      patch:        metadata:          labels:            xxx-specific-label: xxx    - name: subset-b      requiredNodeSelectorTerm:        matchExpressions:          - key: topology.kubernetes.io/zone            operator: In            values:              - zone-b  scheduleStrategy:    type: Adaptive | Fixed    adaptive:      rescheduleCriticalSeconds: 30

copy the code

Unlike ElasticWorkload, elasticWorkload manages multiple Workloads, while a WorkloadSpread only works on top of a single Workload, and Workload And WorkloadSpread correspond to each other.

WorkloadSpread currently supports Workload types such as CloneSet and Deployment.

7.3 How elasticworkload and workloadSpread are chosen

ElasticWorkload is unique to Alibaba Cloud Kubernetes, which is easy to bind to cloud providers and has a relatively high cost of use, and only supports The native Workload of Deployment.

WorkloadSpread is open source and can be used by any Kubernetes cluster above version 1.16, supporting native Workload Deployment and OpenKruise extension Workload Clonesets.

But WorkloadSpread’s priority deletion rules rely on Kubernetes’ deletion-cost feature. Cloneset already supports deletion-cost feature. The native Workload requires Kubernetes version greater than or equal to 1.21, and version 1.21 needs to explicitly enable PodDeletionCost feature-gate, which is enabled by default from version 1.22.

Therefore, if you use Alibaba Cloud’s Kubernetes, you can refer to the following items to select:

  • If you are using Deployment and the Kubernetes version is less than 1.21, you can only select ElasticWorkload.
  • If you are using Deployment and the Kubernetes version is greater than or equal to 1.21, select WorkloadSpread.
  • If you are using Cloneset and the Kubernetes version is greater than 1.16, select WorkloadSpread.

8. LOW-COST AND HIGHLY ELASTIC PRACTICE BASED ON ALIBABA CLOUD ECI

The above article introduces the commonly used Auto Scaling components of Kubernetes, and uses Alibaba Cloud as an example to introduce Virtual Node and ECI, as well as Alibaba Cloud’s Elastic Workload, the open source OpenKruise. This chapter explores how to use these components appropriately and the low-cost, high-resiliency practices based on ECI on Alibaba Cloud.

scenarios where auto scaling can be used:

  • Job tasks, such as Flink’s compute task, Jenkins’ Pipline.
  • CORE APPLICATIONS REQUIRE HPA TO HANDLE BURSTS.
  • WHEN THERE IS AN ACTIVITY, CONFIGURE A TIMED HPA FOR THE APPLICATION INVOLVED IN THE ACTIVITY, EXPAND AT THE BEGINNING OF THE ACTIVITY, AND SHRINK THE CAPACITY AT THE END OF THE ACTIVITY.
  • HpA ejected pods are in pending due to insufficient Node resources.
  • When the application is launched and published, the Pod is in the Pending state due to insufficient Node resources.

For these scenarios, you can use the elastic components of Kubernetes in combination to achieve high elasticity and low cost of the service.

Due to the long delivery time of Node horizontal scaling, the use of Node horizontal auto scaling is not considered.

The overall idea of pod horizontal scaling is to use Kubernetes’ HPA, Alibaba Cloud’s Virtual Node, and ECI to mix ECS and ECI on Alibaba Cloud, and to save costs by using ECS hosting for annual subscriptions; elastic services use ECI bearer, eliminating the need to perform elastic part capacity planning. Combined with Alibaba Cloud’s Elastic Workload or open-source component OpenKruise, ECI instances are preferentially deleted when the application is scaled up.

The following article will briefly introduce the horizontal scaling of the three commonly used resources of Job task, Deployment, and CloneSet.

As for pod vertical scaling, due to the immature VPA technology and the use of more restrictions, the automatic scaling capability of VPA is not considered. However, you can use the VPA to recommend a reasonable Request capability to improve the resource utilization of the container while ensuring that the container has sufficient resources to use, so as to avoid unreasonable resource request settings for the container.

8.1 Job tasks use only ECI

For Job tasks, add a specific Label directly to the Pod so that all Job tasks run on the ECI, and the ECI is released when the task ends.alibabacloud.com/eci=true

There is no need to reserve compute resources for Job tasks, eliminating the need for insufficient computing power and scaling of the cluster.

8.2 Deployment

If the Kubernetes cluster version is below 1.21, the ECI instance can be deleted first when implementing deployment reduction, and you can only use the Elastic Workload component of Alibaba Cloud.

If the Kubernetes cluster version is 1.21 and above, you can use OpenKruise’s WorkloadSpread, which will be mentioned in the next section on Cloneset.

Add Annotations to the Pod Template for all deployments to enable ECI elastic scheduling, and when the cluster ECS resources (normal Node) are insufficient, use ECI elastic resources.alibabacloud.com/burst-resource: eci

FOR APPLICATIONS WITHOUT HPA, ONLY ECI ELASTIC SCHEDULING IS USED. THE FINAL RESULT:

  • WHEN ECS RESOURCES ARE SUFFICIENT, ECS IS PREFERRED.
  • When ECS resources are low, schedule the pod to ECI. However, ECI instances are not automatically released until the next release, even if the normal Node resources are sufficient for normal Node scaling.
  • IF YOU MANUALLY REDUCE THE SIZE OF AN APP, THE ECI IS NOT PREFERENTIALLY REMOVED.

For applications configured with HPA, you can add ElasticWorkload resources to those applications. One application corresponds to one ElasticWorkload. HPA works on ElasticWorkloads.

the final result:

  • Normal pod priority scheduling to ECS.
  • When ECS resources are insufficient, normal pods are also scheduled to ECI. However, ECI instances are not automatically released until the next release, even if the normal Node resources are sufficient for normal Node scaling.
  • All HPA ejected pods are scheduled to ECI.
  • HPA SCALES DOWN ONLY ECI INSTANCES.
  • When the app is published, you only need to update the image in the source deployment, and the image in the elastic unit is automatically modified.

8.3 Cloneset

Before you create a CloneSet, create a WorkloadSpread resource. A WorkloadSpread operates on only one CloneSet.

For applications without HPA, neither WorkloadSpread’s Subset ECS nor Subset ECI sets a maximum number of replicas.

the final result:

  • WHEN ECS RESOURCES ARE SUFFICIENT, ECS IS PREFERRED.
  • When ECS resources are low, schedule the pod to ECI. However, ECI instances are not automatically released until the next release, even if the normal Node resources are sufficient for normal Node scaling.
  • WHEN YOU APPLY A MANUAL DOWNSIZING, THE ECI INSTANCE IS DELETED FIRST.

For applications with HPA, HPA still works on CloneSet. The maximum number of copies of Subset ECS for WorkloadSpread is set to be equal to the minimum number of copies for HPA, the Maximum Number of Copies for Subset ECI is not set, and the maximum number of copies of Subset ECS is modified synchronously when modifying the minimum number of copies of HPA.

the final result:

  • Normal Pod priority scheduling to ECS.
  • When ECS resources are insufficient, normal pods are also scheduled to ECI. However, ECI instances are not automatically released until the next release, even if the normal Node resources are sufficient for normal Node scaling.
  • All HPA ejected pods are scheduled to ECI.
  • ECI INSTANCES ARE ALSO PREFERENTIALLY DELETED WHEN HPA IS DOWNSIZED.

8.4 monitor computing resources

As can be seen from the horizontal auto scaling methods of Deployment and Cloneset above, ECI instances are not automatically deleted 100% in a timely manner.

ECI is pay-as-you-go, and if it is used for too long, it will be more expensive than ecss with annual subscriptions. Therefore, it is also necessary to combine monitoring and expand ordinary Node resources in a timely manner when ordinary Node resources are insufficient. If there are ECI instances running for a long time (for example, three days), you need to notify the application owner of these instances, let the application owner restart these ECI instances, and the new pod will be scheduled to ECS.

8.5 Use VPA to get the Request recommendation value

Some applications have too large a Request setting, and the resource utilization is still low when it is reduced to a pod, and the resource utilization can be improved by vertically reducing the capacity through VPA. However, field scaling for VPA is still experimental and is not recommended. You can use only VPA to get a reasonable Request recommendation value.

Once the VPA component is deployed on Kubernetes, a new resource type is added. You can create a VerticalPodAutoscaler object for each Deployment. VPA periodically obtains resource usage metrics for all containers under the Deployment from Metrics Server, calculates a reasonable Request recommendation value, and then records the recommendation value in the VerticalPodAutoscaler object corresponding to the Deployment.VerticalPodAutoscalerupdateModeOff

You can write your own code to take out the recommended values in the VerticalPodAutoscaler object, and then aggregate and calculate them in the dimension of the application, and finally display the results to the page. The application owner can visually see whether the application’s Request setting is reasonable on the page, and the operator can also use this data to promote the downgrading of the application.

9. summary

This article briefly introduces the auto scaling components such as HPA, VPA, CA, Virtual Kubelet, Alibaba Cloud ECI, Alibaba Cloud ElasticWorkload, and Openkruise WorkloadSpread, and explores how to use these components to achieve low-cost and high elasticity for Kubernetes.