id=”js_tags” class=”article-tag__list”> included in the collection #K8s
As a containerization platform, why tenant isolation on the basis of container isolation?
On the one hand, there are still certain limitations in container isolation, to be precise, it is impossible to achieve 100% isolation of host resources; On the other hand, it is the restriction and isolation of resources occupied by business lines at the platform level, such as preventing department A from forcibly occupying the machine resources requested by department B.
Sometimes the problems
that can be solved with code are small problems, and the collaboration between people is the real problem of efficiency.
In actual cases
we all say that containers use cgroup to ensure the isolation of host resources, but in fact, docker’s isolation function has not yet involved resources such as disks and network cards, and even CPU and memory have not been 100% isolated (such as CPU nodes).
Without any oversold resources, I will give a few examples that occur in real scenarios.
Case 1: Competition with host CPU/Memory causes performance degradation of
applications in groups A and B that are scheduled to different machines, and the performance is normal. However, being on the same machine will cause performance degradation, which is initially suspected to be caused by simultaneous competition for CPUs.
Some people say that it is because the CPU is not tied to the core, resulting in frequent CPU switching when the pod is running, which is also a way of thinking.
Case 2: Disk
IOPS compete for cache and database
applications scheduled to the same machine (both rely on local SSDs), which occupies a large number of IOPS when the database is archived, which will affect the read and write performance of the cache application to the SSD.
Subsequently, database instances became “targeted” because such applications required large-size SSDs, which consumed a large number of disk reads and writes at peak times. Instances of other departmental groups are reluctant to mix with them on co-hosts.
Case 3: The high-performance model was contested by
Group A because of the preparation activities, and applied for a batch of high-performance machine expansion. At the same time, we will limit the expansion quota of group B, but we cannot prevent the instances of group B from occupying the machines requested by group A. At the same time, the instances of Group A may also be expanded to the ordinary machine pool.
At this time, the requirement of group A is: the requested high-end machine only wants to run the application instances of this group.
examples of case 1 and 2, we found that the isolation of containers cannot be 100% guaranteed, and there are many specific reasons, which are also more complex, and need to be analyzed and answered by professionals.
In the case three example, we believe that it is reasonable to claim like Group A, that is, the application examples of this group only run on the machines applied for by this group.
As a whole, a namespace corresponds
to a department or group, and the resource usage quota of the department or group is limited by the
namespace reource quota.
Specifically, if you use pure affinity to control, then there are many rules that need to be configured, which are complex to manage and even put pressure on scheduling. Here are a few examples:
1. Each line-of-business application is unwilling to mix with the application of the storage group, and the platform imposes an antiAffinity with the storage group for all applications. 2. In fact, the storage group is not willing to mix with other groups, because the machines applied for by the storage group often have high-performance SSDs, and they do not want this batch of machines to be occupied by other groups CPU and memory, so it is necessary to configure the relevant Taints and Tolerations for this batch of machine nodes.
3. Even in the same group, there will be applications that are “disliked”. For example, the search group will also have cache or storage applications that are sensitive to disk reads and writes, and AI data model applications that are sensitive to CPU (even GPUs), and these service applications also want to be isolated from other service applications in the group.
Our policy – divide and conquer: first divide into large groups, and limit the originally complex problems to each business line; Finalization within the group, the existing resources are fixed according to performance, and the application in the group is actively adapted.
First divide into large
namespace corresponds to a department group;
1. Divide the machine according to the department group to which it belongs, and mark the node Node with the specified
2. All pods under a
namespace are marked with nodeSelector (you can also use the
podTemplate.nodeSelector in the parent object) and dispatched to the department group machine through
From the above figure, we expect that:
advertising department’s application is scheduled to the GPU model;
Storage department applications are scheduled to high-performance SSD models;
To achieve this, the left side adopts an indirect way (affinity of Pod and Node), and the right side adopts a direct way (Pod’s selective nodeSelector of Node). Obviously, the way on the right is simpler and more convenient to manage.
In addition, Affinity’s logical calculations can put a lot of burden on the scheduler, especially when the cluster is relatively large, we try to reduce the use of Affinity as much as possible.
When the pod and the
pod influence each other, if it is the application of the same department group, it can be directly handed over to the research and development of the department group to adjust; But if it is a cross-departmental application, then you have to ask R&D students on both sides to investigate, and this matter is complicated. The current separation of tenants by department also has the advantage of avoiding cross-departmental blame.
This article divides the entire complex problem into small problems that the business departments manage resource allocation through departmental isolation. As for how to further solve the problem within the scope of the business unit, we will talk about it in a later article.
public number (zhisheng ) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitor keywords such as to view more articles corresponding to keywords. like + Looking, less bugs 👇