>1 Background


  • Quick verification, landing MVP version

    • 2.1 Technology selection

    • 2.2 Architecture

    • design2.3 Process design2.4 Function

    • triggering2.5

    • Function execution

    • 2.6 Auto scaling

  • 3 Optimize core technologies to ensure business stability

    • 3.1 Auto scaling

    • optimization3.2 Cold start

    • optimization3.3

    • High availability

    • guarantee3.4 Container stability optimization4

  • Improve the ecology and implement the revenue

    • >4.1 Provide R&D tools

    • 4.2 Converged technology

    • ecology4.3 Open platform capabilities4.4

    • Support combined

  • deployment5 Landing scenarios and benefits

  • 6 Future Planning

  • Author Profile

  • Recruitment Information

The term serverless was coined in 2012 and became widely known in 2014 due to the rise of Amazon’s AWS Lambda serverless computing service. Often literally translated as “serverless,” serverless computing is the ability to build and run applications without thinking about servers. With serverless computing, applications still run on the server, but all server management is handled by the serverless platform. For example, machine application, code release, machine downtime, instance scaling and contracting, and disaster recovery in the computer room are all automatically completed by the platform, and service development only needs to consider the implementation of business logic.

Looking back at the development of the computing industry, infrastructure from physical machines to virtual machines, and then from virtual machines to containers; Service architecture from traditional monolithic application architecture to SOA architecture, and then from SOA architecture to microservices architecture. Looking at the overall technology development trend from the two main lines of infrastructure and service architecture, you may find that whether it is infrastructure or service architecture, it is evolving from large to small or from giant to micro, and the essential principle of this evolution is nothing more than to solve the problem of resource cost or R&D efficiency. Of course, serverless is no exception, it is also used to solve these two problems:

  • resource utilization : Serverless products support rapid elastic scalability, which can help services improve resource utilization, automatically expand the computing capacity and capacity of services to carry more user requests when business traffic peaks, and when business traffic drops, the resources used will also shrink at the same time to avoid resource waste.
  • R&D operation and maintenance efficiency : On serverless, developers generally only need to fill in the code path or upload the code package, and the platform can help complete the work of building and deploying. Developers do not directly face the machine, for the management of the machine, whether the machine is normal, and whether the traffic peak needs to be scaled or reduced, these do not need to be considered, and the serverless product helps the developer to complete. This frees them from tedious operation and maintenance work, moves from DevOps to NoOps, and focuses more on the implementation of business logic.

Although AWS launched its first serverless product, Lambda, in 2014, the application of serverless technology in China has been tepid. However, in the past two or three years, driven by containers, Kubernetes and cloud native technologies, serverless technology has developed rapidly, and major domestic Internet companies are actively building serverless-related products and exploring the landing of serverless technology. In this context, Meituan also began the construction of the serverless platform in early 2019, with the internal project name Nest.

Up to now, the Nest platform has been under construction for two years, and looking back on the overall construction process, it has mainly gone through the following three stages:

  • rapid verification, landing MVP version : Through technology selection, product and architecture design, and development iteration, we have quickly implemented the basic capabilities of serverless products, such as building, release, auto scaling, contact origin, and executing functions. After going live, we promoted the pilot access of some businesses to help verify and polish products.
  • Optimize core technologies to ensure business stability: With the early pilot business verification, we quickly found some stability-related problems of the product, mainly including the stability of elastic scaling, the speed of cold start, the availability of systems and services, and the stability of containers. In response to these problems, we have made special optimization and improvement of the technical points involved in each problem.
  • Improve the technology

  • ecology and implement the benefits: After optimizing the core technology points, the products are gradually mature and stable, but they still face ecological problems, such as lack of R&D tools, upstream and downstream products are not connected, and platform opening capabilities are insufficient, which affect or hinder the promotion and use of products. Therefore, we continue to improve the technical ecology of our products, remove barriers to service access and use, and implement the business benefits of our products.

2 Quick verification, landing MVP version

2.1 Technology


to build the Nest platform, the first solution is the technology selection problem, Nest mainly involves the selection of three key points: evolution route, infrastructure, development language.


At the beginning, serverless services mainly included FaaS (Function as a Service) and BaaS (Backend as a Service). ), the serverless product area has expanded in recent years, and it also includes application-oriented serverless services.

  • FaaS: is a function service that runs in a stateless compute container, and functions are usually event-driven, have a short lifetime (or even have only one call), and are completely managed by a third party. Relevant FaaS products in the industry include AWS’s Lambda and Alibaba Cloud’s Function Compute.
  • BaaS: It is a back-end service built on top of the cloud service ecosystem. Industry-relevant BaaS products include AWS’s S3, DynamoDB, etc.

Application-oriented serverless services: such as Knative, which provides comprehensive service hosting capabilities from code packages to image construction, deployment and instance auto scaling, public cloud products include Google Cloud Run (based on Knative), Alibaba Cloud’s SAE ( Serverless Application Engine)。

Within Meituan, BaaS products are actually internal middleware and underlying services, which have been very rich and mature after years of development. Therefore, the evolution of Meituan’s serverless products is mainly in two directions: function computing services and application-oriented serverless services. So how exactly does it evolve? At that time, it was mainly considered that FaaS function computing services in the industry were more mature and certain than application-oriented serverless services. Therefore, we decided to build FaaS function computing services first, and then build application-oriented serverless services.


Infrastructure Since auto scaling is a necessary capability of the serverless platform, serverless inevitably involves the scheduling and management of underlying resources. This is why there are many open source serverless products (such as OpenFaaS, Fission, Nuclio, Knative, etc.) in the current industry is based on Kubernetes, because this selection can take full advantage of the management capabilities of Kubernetes’ infrastructure. Meituan’s internal infrastructure product is Hulk, although Hulk is a product packaged based on Kubernetes, but Hulk considered the difficulty of landing and various reasons at the beginning of landing, and finally did not use Kubernetes in a native way, and also adopted a rich container model at the container layer.

In this historical context, we were faced with two options when making infrastructure selection: one was to use the company’s Hlk as Nest’s infrastructure (non-native Kubernetes). The second is to adopt native Kubernetes infrastructure. We consider that the current industry use of native Kubernetes is the mainstream trend and the use of native Kubernetes can also make full use of Kubernetes native capabilities, which can reduce duplicate development. So, the end result was that we adopted native Kubernetes as our infrastructure.


Although the development language is

more mainstream in the cloud native field or in the Kubernetes ecosystem, Java is the most widely used language in Meituan, and compared with Golang, Java has a better internal ecology. Therefore, we chose the Java language in the selection of languages. At the beginning of Nest product development, the Kubernetes community’s Java client was not perfect, but as the project progressed, the community’s Java client was gradually enriched and is now fully sufficient. In addition, we also contributed some Pull Requests in the process of use, giving back to the community.

2.2 Architecture


Based on the above evolution route, infrastructure, and development language selection, we carried out the architecture design of Nest products.

In the overall architecture, traffic is triggered by EventTrigger (event trigger sources, such as Nginx, application gateway, scheduled tasks, message queues, RPC calls, etc.) to the Nest platform, which will be routed to specific function instances according to the characteristics of the traffic and trigger function execution. The internal code logic of the function can call each BaaS service in the company, and finally complete the execution of the function and return the result.

Figure 1 FaaS architecture

In terms of technical implementation, the Nest platform uses Kubernetes as the base base and properly refers to some excellent designs of Knative, and is mainly composed of the following core parts within its architecture:

  • event gateway : The core capability is to connect traffic from external event sources and then route it to the function instance; In addition, the gateway is also responsible for counting the incoming and outgoing traffic information of each function to provide data support for scaling decisions for the auto scaling module.
  • Auto scaling: The core capability is responsible for the auto scaling of function instances, which mainly calculates the number of function target instances based on the traffic data of the function running and the instance threshold, and then adjusts the number of function instances with the help of Kubernetes’ resource control capabilities.
  • Controller: The core competency is responsible for the control logic implementation of the Kubernetes CRD (Custom Resource Definition).
  • Function instance: The running instance of the function. When Event Gateway traffic is triggered, the corresponding function code logic is executed in the function instance.
  • Governance platform: a user-oriented platform responsible for the construction, version, and release of functions, as well as the management of some function meta information.
Figure 2 Nest architecture diagram

2.3 Process design

In terms of specific CI/CD process, what is the difference between Nest and the traditional model? To illustrate this problem, let’s first look at the overall lifecycle of functions on the Nest platform. Specifically, there are four stages: build, release, deployment, and scaling.

    > build: The developed code and configuration are generated by building an image or executable.
  • Version: The image or executable file generated by the build plus the release configuration forms an immutable version.
  • Deploy: The version is released, which completes the deployment.
  • Scaling: Elastically scales and scales instances based on information such as the traffic and load of function instances.

In terms of these four stages, the essential difference between Nest and the traditional CI/CD process is deployment and scaling: traditional deployment is machine-aware, generally publishing code packages to a determined machine, but serverless is to mask the machine to the user (at the time of deployment, the number of instances of the function may still be 0). ); In addition, the traditional model generally does not have dynamic scaling and contracting, while serverless is different, the serverless platform will dynamically scale and shrink according to the traffic needs of the business. Auto scaling will be explained in detail in the following chapters, so we will only discuss the design of the deployment here.

The core point of the deployment is how to shield the machine to users? For this problem, we abstract the machine and propose the concept of grouping, which is composed of SET (the identification of the unitized architecture, which will be carried on the machine), swimlane (the test environment isolation logo, which will be marked on the machine), and the region (Shanghai, Beijing, etc.). Three pieces of information. User deployments only need to operate on the corresponding grouping, not on specific machines. Behind this, the Nest platform helps users manage machine resources, and each deployment will initialize the corresponding machine instance in real time according to the grouping information.

Figure 3 Function life cycle

2.4 Function triggering

Function execution is triggered by events. To complete the triggering of the function, you need to implement the following four processes



  • ingestion: register the event gateway information with the event source and introduce traffic to the event gateway. For example, for MQ event sources, MQ traffic is introduced to the event gateway by registering the MQ consumer group.
  • Traffic adaptation: Event Gateway adapts to incoming traffic from event sources.
  • Function discovery: The process

  • of obtaining function metadata (function instance information, configuration information, etc.), similar to the service discovery process of microservices. Event traffic accepted by Event Gateway needs to be sent to a concrete function instance, which requires function discovery. What is found here is essentially access to information stored in built-in resources in Kubernetes or CRD resources.
  • Function routing: The routing process of event traffic to a specific function instance. Here, in order to support traditional routing logic (such as SET, swimlane, area routing, etc.) and version routing capabilities, we use multi-layer routing, the first layer of routing to packets (SET, swimlane, area routing), and the second layer routing to specific versions. For instances in the same version, select specific instances through the load balancer. In addition, through this version routing, we easily support canary, blue-green releases.
Figure 4 Function triggering

2.5 Function execution

function is different from traditional service,

traditional service is an executable program, but function is different, function is a code fragment, itself can not be executed alone. So how does the function execute after the traffic is triggered to the function instance?

The primary problem of function execution is the running environment of the function: since the Nest platform is based on Kubernetes, the function must run in the Pod (instance) of Kubernetes, the inside of the Pod is the container, the inside of the container is the runtime, and the runtime is the entrance to the function traffic reception. Ultimately, it is the runtime that triggers the execution of the function. Everything seems to be so smooth, but we still encountered some difficulties when landing, the main difficulty is to allow developers to seamlessly use the company’s components within the function, such as OCTO (service framework), Celler (caching system), DB, etc.

In Meituan’s technology system, due to years of technical precipitation, it is difficult to run the company’s business logic in a pure container (without any other dependencies). Because the company’s containers precipitate a lot of environment or service governance capabilities, such as agent services for service governance, instance environment configuration, network configuration, etc.

Therefore, in order for the business to seamlessly use the components within the company, we reuse the company’s container system to reduce the cost of writing functions. But reusing the company’s container system is not that simple, because no one in the company has tried this path, Nest is the company’s first platform built on native Kubernetes, and the “first person to eat crabs” will always encounter some pitfalls. For these pits, we can only “open the road in the mountains and build bridges when encountering water” in the process of propulsion, and solve one by one. To sum up, the core of them is the CMDB and other technical systems opened up in the startup of the container, so that the container that runs the function is no different from the machine that the developer students usually apply.

Figure 5 Function execution

2.6 There

are three core problems of auto scaling: when to scale, how much to scale, and whether the speed of scaling is fast? That is, the problem of scaling timing, scaling algorithm, and scaling speed.

  • scaling timing: The expected number of instances of the function is calculated in real time according to the traffic metrics and scaled. The Metrics data of the traffic comes from Event Gateway, which mainly counts the concurrency metrics of the functions, and the Auto Scaling component actively obtains Metrics data from the event gateway once a second.
  • Scaling algorithm: Concurrency / Single instance threshold = Expected number of instances. Based on the collected metrics data and the threshold of service configuration, the desired number of instances is calculated through the algorithm, and then the specific number of instances is set through the Kubernetes interface. Although the whole algorithm looks simple, it is very stable and robust.
  • Scaling speed: Depends mainly on the cold start time, which will be explained in detail in the next chapter.

In addition to the basic scaling capability, we also support scaling to 0 and configuring the maximum and minimum number of instances (the smallest instance is a reserved instance )。 The specific implementation of scaling to 0 is that we add an activator module inside the event gateway, when the function has no instance, the request traffic of the function will be cached inside the activator, and then immediately drive the auto scaling component to scale out through the traffic Metrics, and after the scaled instance is started, the activator will retry the cached request to the scaled instance to trigger function execution.

Figure 6 Auto scaling

3 Optimize core technologies to ensure business

stability3.1 Elastic scaling optimization

The three elements of scaling timing, scaling algorithm, and scaling speed mentioned above are ideal models, especially the scaling speed, and the current technology cannot scale at the millisecond level. Therefore, in actual online scenarios, there will be some situations where auto scaling is not as expected, such as instance scaling is frequent or too late to scale, resulting in unstable services.

  • In view of the frequent scaling of instances, we maintain a sliding window of statistical data in the auto scaling component, smooth the indicator by calculating the mean, and also alleviate the frequent scaling problem by delaying scaling and real-time scaling. In addition, we have added a scaling strategy based on QPS indicators, because the QPS indicators will be more stable relative to the concurrency indicator.
  • In response to the problem of too late to scale, we take the means of scaling in advance, and scale out when the instance threshold reaches 70%, which can better alleviate this problem. In addition, we also support multi-metric hybrid scaling (concurrency, QPS, CPU, Memory), timing scaling and other strategies to meet various business needs.

The following figure shows a real case of online auto scaling (the configured minimum number of instances is 4, the single-instance threshold is 100, and the threshold utilization rate is 0.7), of which the upper half is the number of requests per second of the service, and the lower half is the decision-making chart of the scaling instance. The business copes perfectly with traffic spikes.

Figure 7 Auto scaling case

3.2 Cold start optimization

cold start refers to resource scheduling, image/code download, container start, runtime initialization, user code initialization and other links in the function call link. When the cold start is completed, the function instance is ready and subsequent requests can be executed directly by the function. Cold start is critical in the serverless world, and its time consumption determines the speed of auto scaling.

The so-called “martial arts in the world, all strong and unbroken, only fast and not broken”, this sentence is also used in the field of serverless. Imagine that if you pull up an instance fast enough, fast to the millisecond level, then almost all function instances can be scaled down to 0, and when there is traffic, the instance can be scaled out to process requests, which will greatly save machine resource costs for businesses with high and low peak traffic. Of course, the ideal is very full, and the reality is very skinny. It’s almost impossible to do it in milliseconds. However, as long as the cold start time is getting shorter and shorter, the cost will naturally become lower and lower, and the extremely short cold start time has great benefits for the availability and stability of the function when scaling.

Figure 8 Stages of cold start

Cold start optimization

is a step-by-step process, and we have gone through three main stages of cold start optimization: image startup optimization, resource pool optimization, and core path optimization.

  • Image startup optimization: We optimize image startup for time-consuming aspects of the image startup process (container startup and runtime initialization Targeted optimizations are carried out, mainly for the container IO speed limiting, some special agent startup time, boot disk and data disk data copy and other key points, and finally optimize the system time in the startup process from 42s to about 12s.
Figure 9 Image startup optimization results
  • resource pool optimization : The image startup time is optimized to 12 seconds, which is basically close to reaching the bottleneck point, and there is not much room for further optimization. So, we wonder if we can bypass the time-consuming part of image startup? Finally, we adopted a relatively simple idea of “space for time”, using the resource pool solution: cache some started instances, when the capacity needs to be expanded, directly obtain the instances from the resource pool, bypass the image to start the container, the final effect is obvious, and the startup system time is optimized from 12s to 3s. It should be noted here that the resource pool itself is also managed through Kubernetes’ Depolyment, and the instances in the pool will be automatically replenished immediately if they are removed.
Figure 10 Resource pool optimization results
  • core path optimization: On the basis of resource pool optimization, we once again strive for excellence, optimizing for the two time-consuming links of downloading and decompressing the code in the startup process, in the process we use high-performance compression and decompression algorithms (LZ4 and Zstd) and parallel download and decompression technology, which is very good. In addition, we also support the sinking of general logic (middleware, dependent packages, etc.), and finally optimize the end-to-end startup time of functions to 2s through preloading, which means that it only takes 2s (including function start) to scale out a function instance. If you exclude the initialization and startup time of the function itself, the platform-side time is already in milliseconds.

3.3 High availability guarantee

When it comes to high availability, for the general platform, it refers to the high availability of the platform itself, but the Nest platform

is different, and the high availability of Nest also includes functions hosted on the Nest platform. Therefore, Nest’s high availability guarantee needs to start from both platform and business functions.

3.3.1 Platform high


to high availability of the platform, Nest mainly from the architecture layer, service layer, monitoring operation layer, business perspective have made a comprehensive guarantee.


  • layer: We adopt a master-slave architecture for stateful services, such as Auto Scaling modules, and the slave node is immediately replaced when the master node is abnormal. In addition, we have implemented multiple layers of isolation on the architecture. Horizontal regional isolation: Kubernetes is strongly isolated between two clusters, and services (event gateway, auto scaling) are weakly isolated in two places (Shanghai’s auto scaling is only responsible for business scaling in the Shanghai Kubernetes cluster, and the event gateway has call requirements in two places and needs to access Kubernetes in two places.) )。 Vertical line-of-business isolation: service lines are strongly isolated, and different business lines use different cluster services. Resources at the Kubernetes layer are weakly isolated by namespaces.
Figure 11 Deployment architecture
    service layer: mainly refers to the event gateway service,

  • because all function traffic passes through the event gateway, so the availability of the event gateway is particularly important, this layer we support throttling and asynchronization, to ensure the stability of the service.
  • Monitoring operation layer: It mainly improves system monitoring alarms, sorts out core links, and promotes governance of relevant relying parties. In addition, we will regularly sort out SOPs and implement fault injection drills through the fault drill platform to find hidden system problems.
  • Service perspective layer: We have developed an online uninterrupted real-time inspection service to simulate the request traffic of user functions to detect whether the core links of the system are normal in real time.

3.3.2 Service High Availability

For high service availability, Nest mainly provides relevant guarantees from the service layer and the platform layer.


  • layer: supports service degradation and throttling capabilities: When a backend function fails, you can return the degradation result through the downgrade configuration. For abnormal function traffic, the platform supports limiting the traffic to prevent backend function instances from being overwhelmed by abnormal traffic.

  • Platform layer : Supports instance keeping, multi-level disaster recovery, and rich monitoring and alarm capabilities: When a function instance is abnormal, the platform automatically isolates the instance and immediately expands the new instance. The platform supports multi-region deployment, and functions in the same region are scattered into different data centers as much as possible. When a host, data center, or region fails, a new instance is immediately rebuilt in an available host, available data center, or zone. In addition, the platform automatically provides services with the monitoring of various indicators such as latency, success rate, instance scaling, and number of requests, and automatically triggers alarms to notify service developers and administrators when these indicators do not meet expectations.

Figure 12 Service monitoring

3.4 Container stability optimization

As mentioned earlier, serverless is different from the traditional model in the

CI/CD process, the traditional model is to prepare the machine in advance and then deploy the program, while the serverless is to elastically scale the instance in real time according to the high and low peaks of traffic. When a new instance is scaled out, it immediately handles business traffic. This may sound fine, but there are some problems in the rich container ecosystem: we found that the load on the newly scaled machine is very high, causing some service requests to fail to execute, affecting service availability.

After analysis, it is found that the main reason is that after the container starts, the O&M tool will perform agent upgrades, configuration modifications, and other operations, which are very CPU-intensive. In the same rich container, it naturally preempts the resources of the function process, resulting in unstable user processes. In addition, the resource configuration of function instances is generally much smaller than that of traditional service machines, which also exacerbates the problem. Based on this, we refer to the industry and cooperate with the container facility team to implement lightweight containers, placing all O&M agents in sidecar containers, and business processes in App containers separately. This container isolation mechanism is adopted to ensure business stability. At the same time, we also promoted the container cropping plan to remove some unnecessary agents.

Figure 13 Lightweight container

4 Improve the ecology and implement the benefits


serverless is a system engineering, which technically involves various technologies such as Kubernetes, containers, operating systems, JVMs, runtimes, etc., and involves all aspects of CI/CD processes in terms of platform capabilities.

In order to provide users with the ultimate development experience, we provide users with development tools support, such as CLI (Command Line Interface). ), WebIDE, etc. In order to solve the problem of interaction between existing upstream and downstream technology products, we have integrated with the company’s existing technology ecology to facilitate the use of development students. In order to facilitate the docking of downstream integration platforms, we have opened up the platform’s APIs to enable Nest to empower various downstream platforms. In view of the problem that containers are too heavy and the system overhead is large, resulting in low resource utilization of low-frequency service functions, we support function consolidation deployment to improve resource utilization exponentially.

4.1 Providing R&D

tools and

development tools can reduce the cost of using the platform and help developers quickly carry out CI/CD processes. At present, Nest provides CLI tools to help developers quickly complete operations such as creating applications, local builds, local tests, Debugs, and remote releases. Nest also provides a WebIDE, which supports online one-stop code modification, building, release, and testing.

4.2 The convergence technology ecosystem

is not enough to support these R&D tools, after the project was promoted and used, we soon found that the developers had new requirements for the platform, such as the inability to complete the operation of functions on the pipeline pipeline and offline service instance orchestration platform, which also formed some obstacles to the promotion of our project. Therefore, we integrate the mature technology ecology of these companies, open up platforms such as pipeline pipelines, and integrate into the existing upstream and downstream technology systems to solve users’ worries.

4.3 Open Platform Capabilities

There are many downstream solution platforms of Nest, such as SSR (Server Side Render ), service orchestration platform, etc., by docking with Nest’s OpenAPI, further liberating productivity. For example, without letting developers apply, manage, and maintain machine resources by themselves, users can quickly create, publish, and host an SSR project or orchestration program from 0 to 1.

In addition to opening up the platform’s API, Nest also provides users with the ability to customize resource pools, with this ability, developers can customize their own resource pools, customize their own machine environments, and even sink some general logic to achieve further optimization of cold starts.

4.4 Support for consolidated

deploymentConsolidated deployment means that multiple functions are deployed in a single machine instance. There are two main backgrounds of merged deployment:


  • current container is heavier, and the container itself has a large system overhead, resulting in low resource utilization of business processes (especially low-frequency services).
  • When the cold start time cannot meet the latency requirements of the business, we use reserved instances to solve the requirements of the business.

Based on these two backgrounds, we consider supporting consolidated deployment, deploying some low-frequency functions into the same machine instance to improve the resource utilization of business processes in the reserved instance.

In terms of specific implementation, we refer to the design scheme of Kubernetes and design a set of Sandbox-based function merge deployment system (each sandbox is a function resource ), compare pods to Kubernetes’ Node resources, sandbox to Kubernetes’ pod resources, and Nest sidecars to Kubelet. In order to achieve the unique deployment and scheduling capabilities of Sandbox, we also customize some Kubernetes resources (such as SandboxDeployment, SandboxReplicaSet, SandboxEndpoints, etc.) to support functions to be dynamically plugged and unplugged to specific pod instances.

Figure 14 Merging deployment architectures

In addition, in the form of merge deployment, the isolation between functions is also an unavoidable problem. In order to solve the problem of mutual interference between functions (merged in the same instance) as much as possible, in the implementation of the runtime, we adopt different strategies for the characteristics of the Node .js and Java languages: Node .js functions use different processes to achieve isolation, while Java language functions, we use class loading isolation. The main reason for this strategy is that Java processes take up much more memory space than .js Node processes.

5 Landing scenarios and benefits

At present, Nest products are very popular in the field of Meituan’s front-end Node .js, and they are also the most widely landed technology stack. At present, Nest products have achieved large-scale landing on the front end of Meituan, covering almost all business lines and accessing a large number of core traffic on the B/C side.


The specific landing front-end scenarios are: BFF (Backend For Frontend), CSR (Client Side Render)/SSR( Server Side Render), background management platform, scheduled tasks, data processing, etc.

    BFF scenario: The BFF

  • layer mainly provides data for front-end pages, adopts serverless mode, and front-end students do not need to consider the operation and maintenance links they are not good at, and easily realize BFF to SFF Serverless For Frontend) model.
  • CSR/SSR scenario: CSR/SSR refers to client-side rendering and server-side rendering, with the serverless platform, there is no need to consider the operation and maintenance links, and more front-end businesses try to use SSR to achieve rapid display of front-end above-the-fold screens.
  • Background management platform

  • scenario: The company has many background management platform Web services, although they are heavier than functions, but they can directly host the serverless platform and fully enjoy the ultimate release and operation and maintenance efficiency of the serverless platform.
  • Scheduled task scenario: The company has many periodic tasks, such as pulling data every few seconds, cleaning logs at 0 o’clock every day, collecting full data and generating reports every hour, etc., the serverless platform is directly connected with the task scheduling system, only need to write the task processing logic and configure the timing trigger on the platform. That is, to complete the access of scheduled tasks, there is no need to manage machine resources at all.
  • Data processing scenario

  • : MQ Topic as an event source to access the serverless platform, the platform will automatically subscribe to the message of the topic, when there is message consumption, trigger function execution, similar to the timing task scenario, as the user only needs to write the data processing logic and configure MQ triggers on the platform, That is, the access of the MQ consumer end is completed, and there is no need to manage machine resources at all.



benefits of serverless are very obvious, especially in the front-end field, a large number of business access has been the best illustration. The specific benefits are from the following two aspects


  • cost reduction: through the elastic scalability of serverless, the utilization rate of high-frequency business resources can be increased to 40%~50%; Low-frequency service functions can be deployed together, which can greatly reduce the running cost of functions.
  • Improve efficiency: The overall R&D efficiency is increased by about 40%.
    • From the perspective of code development, it provides complete CLI, WebIDE and other research and development tools, which can help developers generate code scaffolding, focus on writing business logic, and quickly complete local testing; In addition, business services can view logs and monitor online at zero cost.
    • From the perspective of release, through the cloud-native model, the business does not need to apply for a machine, and the release and rollback are second-level experiences. In addition, it can also use the natural capabilities of the platform and cooperate with the event gateway to achieve flow cutting and complete canary testing.
    • From the perspective of daily O&M, the business does not need to pay attention to the traditional problems such as machine failure, insufficient resources, and disaster recovery in the computer room, and when the business process is abnormal, Nest can automatically complete the isolation of abnormal instances, quickly pull up new instances to replace them, and reduce business impact.

6 Future planning

< ul class="list-paddingleft-2"

  • > scenario-based solution : There are many scenarios for accessing serverless, such as SSR, background management, BFF, etc., and different scenarios have different project templates and scene configurations, such as scaling configuration, trigger configuration, etc., in addition, different languages and configurations are also different. This virtually increases the cost of service use and hinders the access of new services. Therefore, we consider scenario-based ideas to build the platform, strongly associate the capabilities of the platform with the scenario, and deeply precipitate the basic configuration and resources of each scenario, so that in different scenarios, the business only needs simple configuration to play with serverless.
  • Serverless traditional microservices: That is, the application-oriented serverless service mentioned in the route selection. The most widely used development language in Meituan is Java, and there are a large number of traditional microservice projects within the company, which is obviously unrealistic if these projects are migrated to the function mode. Imagine if these traditional microservice projects can directly enjoy the technical dividends of serverless without transformation, and their business value is self-evident. Therefore, the serverless of traditional microservices is an important direction for us to expand our business in the future. On the implementation path, we will consider the technical integration of service governance system (such as ServiceMesh) and serverless, and the service governance component provides scaling index support for serverless and achieves accurate traffic allocation during the scaling process.
  • Cold start optimization: At present, although the cold start optimization of functions has achieved good results, especially the time-consuming system startup on the platform side, and the room for improvement is very limited, the startup time of the business code itself is still very prominent, especially the traditional Java microservices, which are basically minute-level startup time. Therefore, our subsequent cold start optimization will focus on the start-up time of the business itself, and strive to greatly reduce the start-up time of the business itself. In terms of specific optimization methods, we will consider the use of AppCDS, GraalVM and other technologies to reduce the time required for business start-up.
  • Other planning
    • enrich and improve R

    • &D tools and improve R&D efficiency , such as IDE plugins, etc.
    • Open up the upstream and downstream technology

    • ecology, deeply integrate into the company’s existing technology system, and reduce the obstacles caused by the upstream and downstream platforms.

    • Container lightweight, lightweight containers can bring better startup time and better resource utilization, so container lightweight has always been the unremitting pursuit of Serverless. In terms of specific landing, it is prepared to work with the container facility team to promote the deployment of some agents in the container in the DaemonSet mode, sink to the host, and increase the payload of the container.

    About author

    • Yin Qi, Hua Heng, Fei Fei, Zhiyang, Yi Kun, etc., from the application middleware team of the Infrastructure Department.
    • Jiawen, Kaixin, Yahui, etc., come from the big front-end team of financial technology platform.