>0. Written in front

  • 1. Turing Platform Introduction

  • 2. The construction background of Turing OS

  • 3

  • . Turing OS 1.0


    • Introduction to Turing OS

    • 1.03.2 Turing OS 1.0 legacy 4. Turing OS

  • 2.0

    • >4.1 Standardized lightweight SDK

    • 4.2 Algorithm plug-in4.3

    • Data channel

    • 4.4 Algorithm orchestration4.5

    • Multi-mode

  • integration4.6 Turing sandbox 4.7

  • Unified playback platform

  • 4.8 Performance stress test and tuning5

  • . Turing OS 2.0 construction achievements

      >5.1 algorithm development

    • process5.2 Turing OS 2.0

    • usage summary

  • 6. Summary and future outlook

  • 7. About the author

  • 8. Recruitment information

AI can be said to be the hot “star” of the current Internet industry. Whether it is an established giant or a traffic upstart, they are vigorously developing AI technology to empower their own business. Meituan began to explore the application of different machine learning models in various business scenarios very early, from the initial linear model, tree model, to deep neural networks, BERT, DQN, etc. in recent years, and successfully applied to search, recommendation, advertising, delivery and other businesses, and also achieved good results and output.

The algorithm platform built by Meituan’s delivery technology department – Turing (hereinafter referred to as Turing platform ), aiming to provide one-stop services, covering the whole process of data preprocessing, feature generation, model training, model evaluation, model deployment, online prediction, AB experiments, and algorithm effect evaluation, reducing the threshold for algorithm engineers, helping them get rid of tedious engineering development, and focusing on the iterative optimization of business and algorithm logic. For specific practice, you can refer to a technical blog previously pushed by the Meituan technical teamOne-stop Machine Learning Platform Construction Practice“.

With the completion of machine learning platform, feature platform,

AB platform, etc., the distribution technology team found that the online prediction part has gradually become the bottleneck of algorithm development and iteration, so we began to start the overall research and development of the Turing online service framework. This article will discuss in detail the design and practice of the online service framework in the Turing platform – Turing OS (Online Serving), hoping to help or inspire you.

With the gradual maturity of the Turing platform, including Meituan Delivery, more than 18 business parties have accessed the Turing platform, and the overall overview is roughly as follows: a total of 10+ BU (business units ), 100% coverage of Meituan delivery core business scenarios, support 500+ online models, 2500+ features, 180+ algorithm strategies, and support tens of billions of online predictions every day. Enabled by the Turing platform, the algorithm iteration cycle is reduced from the day level to the hour level, which greatly improves the iteration efficiency of the distribution algorithm.

1. Turing platform

introduction Turing

platform is

a one-stop algorithm platform, the overall architecture is shown in Figure 1 below, the underlying layer relies on Kubernetes and Docker, realizes the unified scheduling and management of CPU/GPU and other resources, integrates Spark ML, XGBoost, TensorFlow and other machine learning/deep learning frameworks, It includes one-stop platform functions such as feature production, model training, model deployment, online inference, and AB experiments, which supports various AI applications such as Meituan delivery and scheduling, time estimation, delivery range, search, and recommendation for business units such as flash sales, cycling, grocery shopping, and maps. The Turing platform mainly includes four functions: machine learning platform, feature platform, Turing online service (Online Serving), and AB experimental platform.

Figure 1 Overall architecture of Turing platform
  • machine learning platform: provides model training, task scheduling, model evaluation and model tuning, and implements drag-and-drop visual model training based on DAG.
  • Feature platform: provides online and offline feature production, feature extraction and feature aggregation, and pushes them to the online feature database to provide high-performance feature acquisition services.
  • Turing online service: Online Serving, hereinafter referred to as Turing OS, provides a unified platform solution for feature acquisition, data preprocessing, online deployment of models and algorithm strategies, and high-performance computing.
  • AB experiment platform: provides functions such as AA grouping in advance, AB triage during the event and effect evaluation after the event, covering the complete life cycle of AB experiments.


OS mainly refers to the online service module of the Turing platform, focusing on machine learning/deep learning online services, with the goal of enabling offline trained models to be quickly online, effectively improving the algorithm iteration efficiency of various business departments, quickly getting results, and generating value to the business. The following will focus on Turing Online Serving.

2. The construction background of Turing OS

In the early stage of the development

of Meituan’s delivery business, in order to support the rapid development of the business, quickly support algorithm launch, rapid trial and error, the engineering side of each business line independently developed a series of functions for online prediction, which is known as the “chimney mode”. This model is siloed and flexible, quickly supporting the individual needs of your business. However, with the gradual expansion of the business scale, the shortcomings of this “chimney mode” are highlighted, mainly in the following three aspects:

  • repetitive wheel: feature acquisition and preprocessing, feature version switching, model loading and switching, Online prediction and AB experiments are all developed independently and started from scratch.
  • Lack of platform capabilities

  • : Lack of platform-based O&M, management, monitoring, and tracking capabilities for the complete life cycle of feature and model iteration and launch, resulting in low R&D efficiency.
  • Serious coupling between algorithm and engineering: The boundary between algorithm and engineering is blurred, the coupling is serious, mutual restriction, and the efficiency of algorithm iteration is low.

The “

chimney model” made an indelible contribution in the early stages of business development, but as the volume of the business grew, the marginal benefits of this approach gradually decreased to an intolerable level, and a unified online service framework was urgently needed to change.

At present, most of the mainstream open source machine learning online service frameworks on the market only provide model prediction functions, not including pre- and post-processing modules, as shown in Figure 2 below.

Figure 2 Machine learning online service

For example, Google TensorFlow Serving is a high-performance open source online service framework for machine learning model serving, providing gRPC/HTTP interfaces for external calls, supporting model hot updates and automatic model version management, while solving pain points such as resource scheduling and service discovery, and providing stable and reliable services to the outside world. However, TensorFlow Serving does not contain preprocessing and post-processing modules, and the business engineer needs to pass the input preprocessing into tensors to TensorFlow Serving for model calculation, and then post-process the model calculation results. Pre-processing and post-processing logic is very important for algorithm strategy, iteration is also more frequent, this part is more closely combined with the model, more suitable for the algorithm classmates to be responsible, if implemented by the engineering side, the engineering students are simply implementing the logic designed by the algorithm classmates, the coupling is too serious, the iteration efficiency is low, and it is easy to lead to inconsistencies in design and specific implementation, causing online accidents.

In order to solve the above problems and provide users with a more convenient and easy-to-use algorithm platform, the Turing platform has built a unified online service framework, which is presented in the form of algorithm version by integrating modules such as model calculation and preprocessing/post-processing, and iterates, eliminating the complex interaction between algorithms and engineering.

Here we extend the algorithm definition, and the algorithm (also called algorithmic strategy) in this article can be understood as a combined function: y=f1(x)+fi(x)+…+fn(x), where fi(x) can be rule computing, model computing (machine learning and deep learning or non-model algorithm calculations (such as genetic algorithms, operations research optimization, etc.). Any adjustment of the combination factors in this combination function (such as model input and output changes, model type changes, or rule adjustments) can be regarded as an iteration of the algorithm version. Algorithm iteration is a cyclical process of algorithm development-launch-effect evaluation-improvement. The goal of Turing OS is to optimize the iterative efficiency of algorithms.

3. Turing OS 1.0

3.1 Turing OS 1.0 Introduction

In order to solve the problem of repetitive wheel building and lack of platforming capabilities in the development process of “Chimney Mode”, we set out to build the Turing OS 1.0 framework. The framework integrates model computation, preprocessing, and post-processing modules, and encapsulates complex feature acquisition and preprocessing, model computing, post-processing and other logic in the Turing online service framework in the form of SDK. Algorithm engineers develop personalized pre-processing and post-processing logic based on the Turing online service SDK; Business engineering integrates the Turing online service SDK and algorithm package, and calls the interfaces provided by the SDK for model calculation and algorithm calculation.

Through Turing OS 1.0, we solve the problem of independent development,

independent iteration and repeated wheel building by each business party, greatly simplifying the development work of algorithm engineers and engineering R&D personnel, and the project indirectly calls algorithm preprocessing and model calculation through the Turing online service framework, does not directly interact with the algorithm, and to a certain extent alleviates the coupling problem of engineering and algorithm.

As shown in Figure 3, the Turing Online Services framework at this stage integrates the following features:

< img src="https://mmbiz.qpic.cn/mmbiz_png/hEx03cFgUsXDic3y0FLhrR2CrGTfAmOULaHnbOCOFF472ZQ0gkmaIPW1CuHY8yA0Eql39ZqDbvO7jcUSDIGNia4Q/640?wx_fmt=png" >
Figure 3 Turing OS 1.0

3.1.1 Feature acquisition

  1. provides high availability, High-performance features acquire computing power online.
  2. The feature acquisition process is

  3. configured through custom MLDL (Machine Learning Definition Language), and the feature acquisition process is unified to improve the ease of use of online service features.
  4. DLBox (Deep Learning Box) supports placing the original vectorized features and models on the same node for local computing, solving the performance problem of recalling large-scale data in deep learning scenarios, and supporting high concurrency of various services and rapid algorithm iteration.

3.1.2 Model calculation

  1. supports both Local and Remote Two model deployment modes, corresponding to the deployment of models in the local and dedicated model online service clusters of business services; Through multi-machine asynchronous parallel computing, support CPU/GPU resource heterogeneity and other means to solve the performance problem of large-scale model computing; Solve the problem that a single machine cannot load a hyper-scale model through model Sharding.
  2. In terms of deep learning model computing, compilation and optimization technologies such as high-performance computing acceleration library MKL-DNN and TVM are used to further improve the inference performance of deep learning models.
  3. Through the configuration of model feature

  4. association relationship and preprocessing logic encapsulated by MLDL, the automation of feature acquisition, feature processing and assembly is realized, and the efficiency of model development and iteration is improved.

3.1.3 Algorithm calculation

  1. supports algorithm version management, AB routing, dynamic acquisition of models, features and parameters associated with algorithm versions, and hot update of models and parameters.
  2. It supports AB experiments and flexible grayscale release and amplification, and realizes the evaluation of AB experimental effects through unified tracking logs.

3.2 Turing OS 1.0 legacy issues

Turing OS 1.0 solves the problems of repeated wheel making, confusing features, and lack of platform capabilities of each business line, and supports the scenarios of large-scale algorithm online prediction and the demand for high-performance computing of each business line of Meituan Distribution by providing one-stop platform services. Make algorithm students pay more attention to the iterative optimization of the algorithm strategy itself, and improve the efficiency of algorithm iteration. However, the aforementioned three-party coupling problem of engineering, algorithm and platform has not been solved well, mainly reflected in:

    business engineering statically depends on the algorithm package, the algorithm package

  1. is deployed in the business engineering, and the update and iteration of the algorithm package requires the business engineering version.
  2. The algorithm package runs in the same JVM as the business project, although the RPC consumption is reduced, the computing performance of the algorithm package will affect the performance of the business engineering, and the stability of the

  3. business engineering is uncontrollable, such as the excessive consumption of CPU during TensorFlow model calculation, the loading of large models and the consumption of memory by switching.
  4. As the Turing platform provides more and more functions, the Turing online service SDK becomes more and more bloated, and business engineering must upgrade the Turing

  5. online service SDK to use the new features of the Turing platform, but the business engineering upgrade SDK is more risky and will slow down the deployment speed of business engineering.
Figure 4 Schematic diagram of three-way high coupling

Based on the above points, it can be seen that the high coupling of algorithms, engineering and Turing platform leads to many pain points in each other, as shown in Figure 4. These problems seriously affect

the efficiency of algorithm iteration, the algorithm iteration online test period is long, the efficiency is low:


  • pain points: algorithm package iteration strongly relies on business engineering online, each project release needs to go through a complete R & D testing cycle, the process is long, and the efficiency is low.
  • Engineering pain points: The algorithm package is in the same JVM as the business engineering, and the performance of the algorithm calculation will affect the performance of the business engineering service. At the same time, business engineering needs to follow the iteration of the algorithm package and release frequently, and the change may only involve upgrading the version of the algorithm package.
  • Turing platform pain points: Turing online service SDK is deployed in business engineering, and it is difficult to converge versions and compatibility; At the same time, it is difficult to promote the new Turing function, and it is necessary for business engineering to upgrade the Turing online service SDK.

Therefore, it

is necessary to better decouple the algorithm, engineering and Turing platform, not only to meet the needs of rapid iteration of algorithms, but also to meet the requirements of business engineering stability, and win-win cooperation.

4. Turing OS 2.0 In response to the pain points of high coupling of algorithms, engineering and Turing platform in the Turing OS 1.0 framework, we developed the Turing

OS 2.0

framework, with the goal of solving the problem of coupling algorithms, engineering and Turing platform, so that algorithm iteration does not need to rely on engineering releases, and new functions of the Turing platform are launched without business engineering upgrade SDK. Further improve the efficiency of algorithm iteration and engineering development.

Focusing on the goal of decoupling algorithms, engineering and Turing platform, in the Turing OS 2.0 framework, we have designed and developed functions such as plug-in hot deployment framework for algorithm packages, algorithm data channels and algorithm orchestration frameworks, which support self-service iterative launch of algorithms. At the same time, an algorithm verification platform integrating sandbox drainage, real-time playback, performance stress test and Debug test is designed and developed to ensure the high performance, correctness and stability of the algorithm strategy. The Turing OS 2.0 framework decouples algorithms, engineering and the Turing platform, realizing the respective closed loop of algorithm and engineering iteration. The entire process of most algorithm iterations does not require the participation of engineering R&D personnel and test engineers, and algorithm engineers can complete the iteration of algorithm strategies online in an hour; Through the empowerment of Turing OS 2.0, the R&D and iteration efficiency of algorithms has been greatly improved.

Figure 5 Turing OS Framework V2.0

The specific functional features of Turing OS 2.0 are as follows


    standardized lightweight SDK: business engineering

  • only needs to rely on a lightweight Turing OS SDK, without frequent upgrades, reducing the difficulty of engineering access, and decoupling business engineering and the Turing platform.
  • Algorithm plug-in: self-developed Turing algorithm plug-in framework, support algorithm package as a plug-in in Turing OS service hot deployment, decoupling algorithm and engineering; Multiple versions of multiple algorithm packages can be deployed in the Turing OS service, and each algorithm package has independent thread pool resources.
  • Data channel: In some complex algorithm scenarios, the algorithm strategy also needs to rely on business engineering to complete: 1) the algorithm obtains data internally, and the result can only be obtained through the business engineering call interface and passed to the algorithm; 2) The algorithm is called internally and can only be called simultaneously through business engineering transfer. In order to solve the above two points, we propose the concept of data channel, so that the algorithm itself has the ability to obtain data autonomously, rather than all data needs to be obtained by business engineering and then transparently transmitted to the algorithm.
  • Algorithm orchestration

  • : Multiple algorithms are combined into directed acyclic graph (DAG) in a serial or parallel manner, which can be regarded as an algorithm orchestration; The abstraction and precipitation of business algorithms correspond to the new architecture is the combination and orchestration of algorithms, and algorithm orchestration further empowers business launch and algorithm iteration, further improves the iteration efficiency of business algorithms, and further decouples algorithms and engineering.
  • Sandbox drainage: The Turing sandbox is a service that is physically isolated from Turing OS but has a completely consistent operating environment, and traffic passing through the sandbox will not have any impact on online services; The sandbox can verify the correctness of the algorithm logic, evaluate the performance of the algorithm calculation, and improve the efficiency of the R&D and testing process.
  • Turing playback and unified burying point: In the process of algorithm calculation and model calculation, a lot of important data (algorithm strategy, model, features, parameters and data channels and other related data) will be generated ), these data not only help to quickly troubleshoot the positioning system problems, but also provide an important data foundation for AB experiment report, sandbox drainage and performance stress test modules, in order to better automatically record, store and use these data, we design a real-time playback platform and unified burying point.
  • Performance stress test

  • : Turing OS integrates the capabilities of Meituan’s full-link stress test system Quake, reuses the traffic data collected by the unified playback platform to construct requests, and stresses tests the sandbox where the new version of the algorithm package is deployed to ensure the performance and stability of algorithm strategy iteration.
Figure 6 Overall architecture of Turing OS 2.0

The following will introduce the above several functional features to see how Turing OS 2.0 solves the pain points of three-way coupling of algorithms, engineering and the Turing platform.

4.1 Standardized lightweight

SDK In order to solve the coupling pain points of business engineering and Turing platform, that is, the Turing online service SDK is

deployed in business engineering, and the SDK version is difficult to converge, we mainly consider the Turing online service SDK from the aspects of SDK lightweight, simple and easy access, stable and scalable, safe and reliable, etc., to split and transform:

  • SDK is lightweight : Sink the original Turing OS SDK logic into the Turing OS service, and only provide a simple and general batch prediction interface; The SDK does not need to expose too many details related to the algorithm, and the algorithm version routing, real-time/offline feature acquisition, model calculation, etc. are hidden inside Turing OS. The lightweight SDK integrates the custom routing of Turing OS, and the business side does not need to pay attention to which Turing OS cluster the algorithm package is deployed in, which is completely transparent to the user.
  • Simple and easy to access: provide a unified and common Thrift interface for algorithm calculation, use Protobuf/Thrift to define algorithm input and output, compared with the current Java class definition interface advantage is compatibility guaranteed; After the Protobuf interface is defined, the algorithm and the project can be independently coded.
  • Extensible: The lightweight SDK version is stable, eliminating the need for repeated upgrades on the engineering side. Protobuf naturally supports serialization, and subsequent traffic copies, playback and burying points can be based on this.
  • High performance: For scenarios with large batch algorithm calculations that require high availability, such as batch prediction for C-end users, we design asynchronous batches and highly parallel methods to improve algorithm computing performance. For scenarios where single-task computing takes a long time, has high CPU consumption, and requires high availability, such as scheduling path planning by urban area, we design an optimal retry mechanism for fast client failures to ensure high availability and balance the computing resources of Turing OS.
  • Safe and reliable: Provides thread pool-level resource isolation for scenarios where multiple algorithm packages are deployed in a single Turing OS, vertically splits algorithm packages according to business scenarios for different business lines, provides physical-level cluster resource isolation, and adds a circuit breaker degradation mechanism to ensure a stable and reliable computing process.

4.2 Algorithm plug-in

By standardizing and lightweight transformation of the Turing OS SDK, we solve the pain points of coupling between business engineering and the Turing platform. By servitizing TuringOS, the pain point of coupling between algorithms and business engineering is solved. However, the pain points of coupling between the algorithm and the Turing platform still exist and the pain points increase: the iterative launch of the algorithm relies on the Turing OS service release, and fails to achieve the goal of three-way decoupling.

In order to solve the coupling pain points between the algorithm and the Turing

platform and further improve the iterative efficiency of the algorithm strategy, our next design idea is algorithm plug-in, Turing OS containerization: the algorithm package is deployed to Turing OS as a plug-in, and the algorithm package release does not require the Turing OS release, or even restart the Turing OS, as shown in Figure 7.

    > algorithm pluginization: we have developed the Turing OS algorithm plug-in framework, which supports algorithm packages to be deployed to Turing OS services in the form of plug-ins; The specific implementation scheme is to customize the algorithm class loader ClassLoader, different ClassLoaders load different algorithm package versions, and realize the hot deployment of algorithm packages by loading multi-version algorithm packages and pointer replacement.

    Turing OS containerization: Turing OS

  • acts as a plug-in container, loading different algorithm versions of algorithm packages, performing algorithm version routing and algorithm policy calculation, and the process of Turing OS after containerization transformation: 1) If the algorithm version does not need to add new parameters, neither the engineering side nor Turing OS needs to be released; 2) The main work of business engineering is to pass parameters to the algorithm, the logic is simple, if the input parameters do not change, there is no need to release the version, and the algorithm package version rhythm is controlled by itself.
Figure 7 Turing OS containerization-algorithm plug-in diagram

4.3 Data channel

Through the above means, we solve the coupling problem of algorithm, engineering and Turing platform in release iteration. However, in addition to the above coupling, there are some complex algorithm scenarios, and there is still coupling between algorithms and business engineering, which is mainly reflected in the algorithm relying on the following two points of data of business engineering:

  1. algorithm to obtain data inside : At present, the results are obtained through the service engineering call interface and passed to the algorithm, such as some service-oriented interface data, distributed KV cache data, etc., and the algorithm and business engineering need to be developed iteratively online.
  2. The algorithm is called internally : At present, it is achieved by calling algorithm A and algorithm B at the same time through business engineering and writing intermediate logic, for example, the input of algorithm A needs to use the result of algorithm B, or the result of algorithm A and algorithm B needs to be synthesized to obtain the final output, these operations are generally handled by business engineering. One option is to merge algorithm A and algorithm B into a large algorithm, but the disadvantage of this scheme is to increase the cost of algorithm A and algorithm B to independently conduct AB experiments and grayscale research and development.

In order to solve the above two points, we propose a data channel ), so that the algorithm itself has the ability to obtain data independently. Inside the algorithm, the algorithm can support the data channel by providing annotations through Turing OS, and the interactive interface between the algorithm and the business engineering only needs to pass some key parameters and context data, and the algorithm assembles the parameters required by the data channel by itself. After the transformation of data channelization, the algorithm interface is further simplified, the coupling between the algorithm and the project is further reduced, and the problem of invoking the algorithm inside the algorithm can be solved by the algorithm orchestration introduced below.

4.4 The algorithm orchestrates

a complete algorithm calculation process including the algorithm calculation part, as

well as the preprocessing logic for the input and the post-processing logic for the calculation results, etc., the algorithm calculation can be N times rule calculation, N times model calculation (machine learning and deep learning, etc.), or non-model algorithm calculation ( Such as genetic algorithms, operations research optimization, etc.), or a combination of multiple types of algorithms. We abstract this computational logic unit with independent input and output as an operator, which can be arranged and reused, and the general two types of operators are as follows:

  1. model calculation operator : that is, the model calculation engine performs model calculation, we support Local and Remote two model computing modes, in the Remote computing mode, the model may be deployed in different model clusters, the operator is a further encapsulation of model computing, the Local and Remote selection and model cluster routing and other functions are transparent to the user, algorithm engineers do not need to perceive, we will dynamically adjust according to the overall computing performance.
  2. Algorithm calculation operator: that is, the algorithm calculation engine in Turing OS performs algorithm policy calculation, and different algorithm plug-ins may be deployed in different Turing OS, and the routing function of the Turing OS cluster is also encapsulated and transparent to users.

Multiple operators are combined into a directed acyclic graph (DAG) by serial or parallel to form operator orchestration, and we currently have two ways to achieve operator orchestration:

    > Algorithm data channel: The algorithm computing engine

    in different Turing OS calls each other or the algorithm computing engine calls the model calculation engine, and the algorithm data channel is a specific means to realize operator orchestration.

    Algorithm total control logic: We extract a layer of algorithm total control logic layer in the

  1. upper layer of algorithm invocation to meet the situation of complex algorithm scenarios and multiple algorithm association dependencies, and the algorithm total control logic is implemented by algorithm engineers in the algorithm package; Through the algorithm master control logic function, algorithm engineers can arbitrarily arrange the relationship between algorithms and further decouple algorithms and engineering.

From the perspective of algorithm engineers, Turing OS provides services in the form of building blocks, and connects them in series and parallel in a standard way by combining independent sub-functions and operators, so as to form an online system that meets various needs.

Figure 8 Algorithm online service architecture based on operator orchestration

Under this architecture, the work of the algorithm mainly has the following three parts: 1) the algorithm engineer abstracts and models the business process; 2) Algorithm engineers conduct independent operator development and testing; 3) Algorithm engineers orchestrate and combine operators based on business process abstraction. Operator orchestration further empowers the launch of business functions and algorithm iteration, and further improves the efficiency of business algorithm iteration.

4.5 Multi-mode integration

As described above, Turing OS can deploy multiple versions of multiple algorithm packages as a container, and supports hot deployment of algorithm packages. Through plug-in hot deployment and orchestration, Turing OS decouples the three-way coupling of business engineering, algorithms and Turing, which greatly improves the iterative efficiency of algorithms. In order to further meet the requirements of the business, we provide two Turing OS deployment integration modes: Standalone mode and Embedded mode.

Standalone (independent mode) Standalone mode

, Turing OS

is deployed independently of business services, business services call algorithms through lightweight SDKs, and Turing lightweight SDKs encapsulate custom routes of Turing OS. and Thrift-RPC logic for calling Turing OS services.

Embedded (inline mode).

In some complex scenarios with high concurrency and high performance requirements, higher requirements are put forward for the integration mode and performance of our Turing OS. In the independent deployment mode, business engineering has RPC consumption for every algorithm calculation, so we implement the new integration mode of Turing OS – Embedded. In the Embedded mode, we provide the Turing OS framework code package, and the business side integrates the Turing OS framework package in its own engineering services, and the business service also serves as a Turing OS container, or calls the algorithm through a lightweight SDK to perform algorithm calculations locally in the business service. The characteristics of embedded Turing OS are as follows:


  1. engineering inherits the functions of algorithm package plug-in and hot deployment due to the integration of Turing OS framework code, and has the dual attributes of business functions and Turing OS containers.
  2. Business engineering does not directly rely on algorithm packages, but is dynamically managed by the Turing OS framework, and plug-in hot deployment of algorithm packages achieves the purpose of decoupling algorithms and projects.
  3. Business Engineering directly performs local algorithm calculation, which reduces the RPC and serialization consumption of algorithm calls, and reuses business engineering server resources, further reducing cluster resource consumption and improving resource utilization.

When the algorithm package plug-in is deployed, the business engineering integrated in the embedded mode loads the corresponding algorithm package as a container and routes it to the local computer for algorithm calculation, as shown in Figure 9 below.

Figure 9 Schematic diagram of Turing OS integrated mode Embed/RPC

Standalone and Embedded models have their own advantages and disadvantages, neither of them has an absolute advantage, and the use needs to be selected according to specific business scenarios. The comparison between the two modes is as follows:

of the

calls, high performance

The advantages and disadvantages deployment mode are applicable to the scenario
Standalone has a lower coupling degree, and the business side only relies on the Turing lightweight SDK to build a Turing OS cluster, which occupies machine resources. The RPC call overhead is suitable for large-scale calls, and the business scenario that requires distributed multi-machine asynchronous parallel computing
Embedded multiplexes business side machines, with high resource utilization. Without RPC cannot give full play to multi-machine asynchronous distributed parallelism, and can only be used in single-machine parallelism to suit small batch calls, and the business scenario with high requirements for RT performance of a single call

4.6 Turing sandbox

After Turing OS supports the hot deployment of algorithm plug-ins, the efficiency of algorithm iteration is greatly improved compared with before, and the online freedom of algorithm engineers is also greatly increased, without the need for scheduled development and testing of business engineering and testing. However, it also introduces new problems:

    before the algorithm is

  1. iteratively online, the traffic cannot be pre-calculated, and the effect of the algorithm is evaluated in advance before going online, which is difficult to verify before going online, and the test efficiency of algorithm engineers is low.
  2. At present, online real-time evaluation and verification are difficult, and the online performance and effect evaluation of algorithm strategies lack process-oriented automation tools.
  3. Frequent iterative rollouts are also a great challenge for the stability of Turing OS services and services.

At that time, the alternative was to deploy the algorithm strategy first and go online, cut small traffic at grayscale, and then analyze the unified buried log to evaluate the effect of the algorithm. The flaw of this solution is that the effect of the algorithm cannot be evaluated before it goes online, and the problem is found too late. If there is a problem with the grayscale function, it will affect the online business and generate a bad case. In response to the above-mentioned problems in the pre-online verification process, we developed the Turing sandbox to realize the full-link simulation experiment of the algorithm without interfering with the stability of online business.

The Turing sandbox is a service

that is physically isolated from the Turing OS service but runs in a completely consistent environment, and traffic passing through the sandbox will not have any impact on online services. As shown in Figure 10 below, online traffic is diverted to the online environment sandbox, and the configuration and data of each environment of Turing OS and Turing sandbox are consistent (version, parameters, characteristics, model, etc.). The new version of the algorithm (version V3 of algorithm package 1 in Figure 10 below) first deploys a sandbox to drain traffic to verify the correctness of the algorithm, and can also drain traffic in the sandbox for algorithm performance stress test. As an automated tool for the algorithm verification process, the Turing sandbox improves the efficiency of algorithm testing and further improves the iteration efficiency of algorithm versions.

Figure 10 Schematic diagram of Turing sandbox drainage verification


In order to facilitate the analysis of algorithm

effects and troubleshooting problems when abnormalities, we need to record the inputs, outputs, features and models used in the calculation process of the algorithm in order to restore the scene. However, a

large amount of data will be generated in the process of algorithm calculation, which brings challenges to storage and recording:

  1. large amount of data: one request may correspond to multiple algorithm model calculations, and rich feature values are often used, resulting in intermediate calculation data several times the requested amount.
  2. High concurrency: Centralized collection and storage of data generated by Turing OS services requires the ability to carry the sum of QPS traffic during peak periods of these services.
  3. Strong customization: Turing OS deploys dozens of different algorithms, their request and response formats are very different, and data such as features and data sources are difficult to unify.

In order to better record and store these important data, Turing OS designed and developed a unified playback platform to give solutions to the above problems, as shown in Figure 11 below:

    > ES and HBase are combined to store and play data, of which ES stores key index fields and HBase stores complete data records, giving full play to the advantages of both, and meeting the requirements of fast query and search and massive data storage.
  1. Using Google Protobuf’s DynamicMessage function, the original Google Protobuf format is extended, dynamically supports the definition and assembly of playback data formats, and supports synchronization with ES indexes, which not only ensures the high performance of serialization and storage, but also ensures the efficient access of data of each algorithm.
  2. Considering that the timeliness requirements of these data queries are not high, the message queue is used to decouple the sending and storage to achieve the effect of traffic peak shaving and valley filling, and the algorithms in the Turing OS platform are automatically accessed for playback through the playback client.
Figure 11 Schematic diagram of the Turing playback platform

4.8 Performance stress testing and tuning

Through the Turing sandbox and unified playback, Turing OS has the ability to quickly verify the correctness of algorithm data, but it lacks automated tools in algorithm calculation performance analysis. By integrating the capabilities of the company’s full-link stress testing system Quake (for Quake introduction, see “Practice of Quake in Meituan), Turing OS reuses the traffic data collected by the unified playback platform to construct requests, and stresses tests the Turing OS or Turing sandbox that deploys the new version of the algorithm package.

During the stress test, the

performance of the algorithm in different QPS scenarios is recorded, mainly including application indicators such as CPU and memory, response time-consuming data such as TP delay and timeout rate, and compared with the real online performance, historical stress test data and SLA of service commitment, and the stress test report and optimization guide are given. Turing OS is also connected to Meituan’s internal performance diagnosis and optimization platform Scalpel, which can generate analysis reports on thread stacks and performance hotspots during stress testing, assisting users to quickly locate performance bottlenecks and provide reference for specific optimization directions.

Figure 12 Turing full-link stress test and performance diagnosis diagram

5. Turing OS

2.0 Construction Achievements 5.1

The algorithm R&D process Through the plug-in

transformation of Turing OS’s algorithm plug-in and dynamic hot deployment capabilities, we decouple the algorithm, engineering and Turing platform, realize the respective closed loop of algorithm and engineering iteration, improve R&D efficiency, and greatly shorten the algorithm iteration online cycle:

    When the model is iterated, feature changed and the algorithm

  • strategy is iterated, the algorithm engineer can independently complete the development and testing of the whole link without the intervention of engineering developers and test engineers; At the same time, the algorithm package can be deployed independently, without any service online, and after the launch, it is well known that the engineering side and the product side pay attention to the changes in relevant indicators.
  • When new business scenarios and new algorithm strategies are accessed, algorithms and engineering need to be jointly developed, and after defining the Protobuf interface, algorithm engineers and engineering R&D personnel can independently develop code and go online.

By using automated tools such as sandbox drainage verification and performance stress test diagnosis provided by Turing OS, the efficiency of algorithm strategy iteration is further improved, and the online cycle of algorithm iteration is greatly shortened, from the day level to the hour level. Algorithm engineers independently develop, then deploy Turing OS for self-test and debugging, deploy sandboxes for drainage testing, evaluate the effect performance through the stress test platform, and finally independently deploy online, the whole process does not require the participation of engineering R&D personnel and Turing engineers, to achieve the goal of automatic operation and maintenance; At the same time, the execution performance of algorithm strategies and the operation stability of Turing OS are ensured by various means.

Figure 13 Turing algorithm research and development process

5.2 Turing OS 2.0

uses a summary Turing OS

(i.e. Turing Online Services Framework 2.0 The construction has been more than half a year, and the overall overview is roughly as follows: at present, 20+ Turing OS clusters have been built, 25+ algorithm packages and 50+ algorithms have been accessed, and the number of algorithm package deployments and launches is 200+ times per month; Supports tens of billions of algorithmic policy calculations per day. Enabled by Turing OS, most of the algorithm iteration process does not require the participation of engineering R&D personnel and test engineers, and algorithm engineers can complete the iterative launch of algorithm strategies in hours.

Currently, a Turing OS cluster can carry multiple algorithm packages of a single line of business or multiple sub-line of business algorithm packages of

a single department, algorithm packages and Turing OS clusters can be dynamically associated and dynamically deployed, and Turing OS supports physical resource isolation at both line-of-business level and algorithm package level. In order to facilitate the use of business parties, we provide comprehensive access documents and video courses. In addition to the Turing platform building a Turing OS cluster, any business party can basically build its own Turing OS service within 1 hour. We also provide best practice documents and performance tuning configurations, so that the business side can solve most problems on their own without guidance. At present, we are building automated operation and maintenance tools to further reduce the access threshold and operation and maintenance costs of Turing OS.

6. Summary and future prospectsOf

course, there is certainly no perfect algorithm platform and algorithm online service framework, and Turing OS still has a lot of room for progress. As we continue to explore machine learning and deep learning online services, there will be more and more application scenarios that need to be supported by Turing OS, and we will continue to build in the following aspects in the future:

  1. build Turing OS automated operation and maintenance tools and automated testing tools, support semi-automatic algorithm development, Further reduce platform access costs and O&M costs.
  2. Further improve the

  3. Turing OS framework, improve the algorithm support capabilities, support running in the Spark environment, and verify the correctness, performance and effect of the new functions of the algorithm based on massive data when the algorithm is iterated.
  4. Promote the construction of the

  5. Turing OS full-graph engine, provide graphical process orchestration tools and graph execution engines through the general components of abstract algorithm business, further empower business launch and algorithm iteration, and further improve iteration efficiency.

7. About the author


Yongbo, Ji Shang, Yanwei, Extraordinary, etc., are all from the algorithm platform group of Meituan Distribution Technology Department, responsible for the construction of the Turing platform and other related work.