• 1. Business background

  • 2. Overall idea

  • 3. Scheme design

  • 4. Experimentation

    • >4.1

      Experiment setting4.2 Experiment 1: Flat machine resources

    • increase business revenue4.3 Experiment 2

    • : Flat business income, reduce machine resources

  • 5. Summary and prospect

At present, Meituan’s daily takeaway orders have exceeded 40 million, becoming one of Meituan’s most important businesses. The takeaway advertising service has also developed from the initial support of a single business line to more than ten business lines now, and the overall traffic of the service is increasing, and the machine resources consumed have also reached a certain scale.

In the takeaway scenario, the traffic presents an obvious double-peak structure, that is, the traffic peaks during lunch and dinner, and the traffic is small during the rest of the time. Under this traffic characteristics, the takeaway advertising service faces greater performance pressure during peak hours, and there is a large amount of computing power redundancy during off-peak hours. On the one hand, the computing power consumed by traffic is not dynamically allocated according to the traffic value, resulting in insufficient allocation of computing power in high-value traffic, and the value is not fully tapped, while on low-value traffic, there is a phenomenon of wasting a lot of computing power. On the other hand, during off-peak hours, the system traffic is low, resulting in low overall resource utilization of the system and limiting the system to obtain higher business benefits.

Therefore, computing power needs to be allocated more rationally to be used more efficiently. At present, there is little research on dynamic computing power allocation in the industry, represented by DCAF [1] of Alibaba’s targeted advertising platform, which distributes differentiated computing power according to traffic value, and allocates different candidate queue lengths to traffic of different values to maximize benefits under limited resource constraints. DCAF provides an excellent solution, but it has certain limitations in the takeaway advertising scenario.

For the takeaway advertising scenario, the takeaway advertising technology team has carried out a series of explorations and improvements on the basis of the DCAF solution, and for the first time tried to combine the elastic allocation of the queue and the elastic allocation of the model, and achieved good benefits. On the one hand, under the condition that machine resources are flat, CPM can be increased by 2.3%; On the other hand, machine resources can be reduced by 40% when the business income is flat, and finally we pushed the plan of flat machine resources in the stage of food delivery list advertising.

2. The overall idea

In the takeaway advertising engine, in order to cope with the extreme online traffic pressure and the huge candidate set, we design the entire retrieval process into a funnel-type cascade architecture with decreasing candidate sets, mainly including recall, coarse row, fine arrangement, mechanism and other modules.

The

overall idea of realizing intelligent computing power is to allocate differentiated computing power for different value traffic under the constraints of the system’s computing capacity, so as to improve the efficiency of computing power distribution in the advertising retrieval process and maximize revenue. Intelligent computing power mainly includes the following four elements

:

1. Traffic value quantification: Traffic value refers to the benefits brought by traffic to the platform, advertisers, and users, and the system needs to have the ability to quantify traffic value.

2. Traffic computing power: Traffic computing power refers to the machine resources consumed by traffic in the system,

in the takeaway advertising scenario, the computing power consumed by traffic is closely related to system variables such as candidate set size, number of recall channels, model size, link complexity, etc., and the same system needs to have the ability to quantify traffic computing power.

3. Quantification of system computing capacity: system computing capacity refers to the sum of machine resources of the system, which is consistent with the dimension of traffic computing power, and the computing capacity of the system can usually be obtained by pressure testing and other means; In the process of system computing power allocation, it is necessary to ensure that the overall traffic computing power consumption does not exceed the computing capacity of the system.

4. Intelligent computing power allocation: Based on the above three elements, in the whole link of the advertising engine for intelligent computing power allocation, we define the means of computing power distribution as “flexible actions”, in the takeaway advertising scenario, we mainly summarize the following four actions:

  • elastic queue: Online retrieval is a funnel process, and different value flows can be assigned different candidate queue lengths in each module of the cascade funnel.
  • Elastic model: In the model estimation

  • service, different models can be allocated for different value traffic, and the large model has a better estimation effect than the small model and consumes more computing power.
  • Elastic channels: In multi-channel recalls, different value flows can be assigned different recall channels.
  • Elastic link: On the retrieval link

  • , different value traffic can be assigned to the retrieval link with different complexity.

The selectable range of these elastic actions is defined as “elastic gears”, e.g. queue lengths 100 and 200 correspond to two different gears of elastic queues. Under intelligent computing power, the distribution process of computing power is the intelligent decision-making process of elastic action and elastic gear.

Challenge analysis

In order to land intelligent computing power in the takeaway advertising scenario, we mainly face the following challenges:

    > problem solving
    • challenge: the goal of intelligent computing power is to optimize the allocation of computing power resources, which requires us to solve the problem of “maximizing traffic benefits under the constraints of system computing power”.
    • Solution: Refer to the existing scheme, disassemble the problem into three sub-problems of traffic value estimation, traffic computing power estimation and computing power allocation, and explore and improve the takeaway advertising scenario.
  • System stability assurance
    • challenge point : The allocation of system computing power to the intelligent computing power framework, from equal amount of computing power to intelligent computing power allocation, not only needs to ensure the stability of the intelligent computing power framework itself, but also needs to ensure the smooth operation of the whole link of the system.
    • Response ideas: In addition to conventional support methods such as monitoring alarms and circuit breaker and downgrading, we have realized real-time control functions based on system status to ensure system stability.
  • Versatility & Extensibility
    • > challenge point : Taking into account the reuse of basic capabilities and the expansion of personalized capabilities, it supports the two major directions of takeaway advertising recommendation and search, and the access of multiple business scenarios.
    • Response idea: The core components provide reusable and scalable capabilities in the form of SDKs, and support the combination of multiple flexible actions and efficient access in multiple business scenarios based on general value evaluation indicators, computing power evaluation indicators and intelligent computing power frameworks.

3. Scheme design

After in-depth Co-Design by the engineering team and the algorithm team, we designed a set of intelligent computing power framework for multi-action combination decision-making. The whole framework is composed of decision-making components, collection components and regulation components, of which the decision-making component, as the core of the intelligent computing power framework, embeds the application service in the form of SDK, providing reusable and scalable multi-action combination optimal gear decision-making ability and system stability guarantee ability, empowering all stages of the advertising engine; Acquisition components and control components provide support for system stability. The following is mainly a detailed introduction to the two modules of optimal gear decision and system stability assurance.

3.1

Based

on the existing flexible queue solution scheme in the industry, we have carried out a series of explorations and improvements:

  • by selecting a more general traffic computing power evaluation index, At the same time, the traffic computing power estimation module is added to ensure the universality of quantitative indicators and improve the accuracy, so as to solve the problem of in-generalization and inaccuracy of computing power.
  • By constructing special gears, the combination of elastic queue and elastic model was tried for the first time, and the problem that some traffic could not be modeled was solved.

Based on the above strategies, we have realized the optimal gear decision of multi-elastic action combinations.

3.1.1 Problem modeling

existing schemes

DCAF [1] The problem is transformed into the corresponding dual problem to be solved, and the decision formula is obtained to realize flexible queue allocation.

The

above modeling scheme has the following problems in the takeaway advertising scenario:

    > the problem of computing power
    • problem description: In the original paper, the hashrate is only related to the queue length. In fact, in the takeaway advertising system, the computing power consumption of different traffic of the same length may be different (for example, traffic from different sources will follow different models and links), and the queue length alone cannot accurately quantify the computing power.
    • Solution: In order to model the computing power consumption of the traffic and fit the characteristics of the traffic side, we will use the computing power by Expand to , and increase the traffic computing power estimation module.
  • Low-value traffic modeling
  • problem

      > problem description : The derivation requires that arbitrary requests be satisfied, at which point some low-value, high-computing power consumption () will not be modeled. In addition, in the takeaway advertising scenario, because the return queue must meet business requirements such as fill rate, the queue length cannot be arbitrarily selected, and the gear space must be met and Indicates the upper and lower bounds of the currently requested gear, and the value is generally the minimum number of filled advertisements), which further leads to more traffic that does not meet the constraint.
    • Solution (elastic model): In order to reach all traffic in the takeaway ad scenario, we increase the number of stalls , in the actual landing, this gear corresponds to assigning the request to the simple model; Relatively complex models, after the simple model of traffic distribution, the computing power consumption tends to 0 (simple models can be vocabulary or LR models, etc.).

As shown in the figure, since the computing power and value of the gear are known, there is no need to estimate the value and computing power of different models. Subsequent traffic value estimation and traffic computing power estimation work are oriented to elastic queues.

3.1.2 Decision Framework

as shown The optimal gear decision-making module is divided into two stages: offline and online, including the following four submodules:

  • traffic value estimation module (offline + online): estimate the value of traffic in different gears.
  • Traffic computing power estimation module (offline + online): Estimate the computing power of traffic in different gears.
  • Offline λ Solve Module (offline): Solves the optimal λ using a dichotomous search algorithm by playing back historical traffic.
  • Online decision-making module (online): For online traffic, the optimal gear is calculated based on the gear decision formula, and different models and queue lengths are assigned according to the calculation results.

3.1.3 Traffic

value estimation

Traffic value estimation is the core of intelligent computing power decision-making, which requires certain accuracy. While online model estimation will increase the retrieval link time-consuming, we use the offline XGB model estimation + online vocabulary search scheme to ensure the accuracy of estimation and be lightweight enough.

Selection of value evaluation indicators: Generally speaking, traffic value refers to the revenue brought by the current traffic to the advertising platform; In the takeaway advertising scenario, we pay attention to the revenue of the platform as well as the revenue of the merchant, so the indicator of our traffic estimate is selected as flat platform income Merchants receive in.

As shown in the figure, the traffic value estimation module consists of two stages: offline and online.

Offline stage

:

    feature screening

  • & binning: Based on offline feature importance analysis and distribution, feature screening and binning are carried out.
  • The main problem < model

  • trainingul
  • class=”list-paddingleft-2>”

  • : in the early stage, we use a statistical scheme, and in the case of more feature bins, Data sparseness is a serious problem, and the larger the queue length, the more sparse the data.
  • Solution: XGB model is used instead of statistical scheme to enhance generalization ability.
  • Value store of buckets: The estimated value results of buckets with different features are written into the vocabulary in KV structure.
  • Online stage

    :

    • feature extraction & analysis: feature extraction and analysis, The key value is generated according to the offline bucket sorting rule.
    • Raw value estimate: Find the value of the corresponding bucket based on the key value, and then calculate the requested value under the original queue length by linear interpolation.
    • Gear value estimation: As shown in the figure below, with the help of rough row scoring, the value estimation of different gears is realized by calculating the attenuation of the value under different gears.

    3.1.4

    The landing of intelligent computing power in the traffic computing power estimation

    industry is mainly based on elastic queues, which generally use queue length as a traffic computing power evaluation index, and queue length as a traffic computing power evaluation index faces the following two problems:

      > Generality problem: In the elastic model, elastic channel, and elastic link, the computing power consumption of traffic is not uniquely determined by the queue length, for example, traffic from different sources may follow different models (the computing power consumption of different models may be different).
    • Accuracy: In the takeaway advertising scenario, even for elastic queue actions, the traffic computing power consumption and queue length are not simply linear.

    Selection of computing power evaluation

    indicators: In order to solve the above problems, we use the CPU time consumed by the traffic as the traffic computing power evaluation index.

    As shown in the figure, the traffic computing power estimation includes two stages: offline and online.

    Offline stage

    :

      feature screening

    • & binning: Based on offline feature importance analysis and distribution, feature screening and binning are carried out.
    • Model
    • training

      • training process: first divide the samples into different feature bins (the queue length is different in the same bin , other characteristics are the same), and then fit the relationship between the hashrate and queue length for different bins.
      • Main problem: Due to the uneven distribution of data, after the queue length is greater than a certain threshold, due to the sparse data, the statistical value of computing power in the bucket begins to fluctuate, which is not conducive to online decision-making.
      • In order to solve the problem of data sparseness and fit the unpacking phenomenon in real business, we use a piecewise linear fitting scheme to fit the relationship between computing power and queue length as a piecewise linear function (the figure below is the fitting result of the relationship between computing power and queue length in a feature bin).
    • Computing power vocabulary storage: The estimated computing power results of different feature buckets are written to the vocabulary in KV structure.

    Online stage

    :

    • feature extraction & analysis: feature extraction and analysis, The key value is generated according to the offline bucket sorting rule.
    • Computing power estimate: Find the computing power of the corresponding bucket according to the key value.

    3.1.5 Gear Decisions

    1. The offline λ solution

    is based on the value estimation and computing power estimation modules, and uses the binary search algorithm to solve the optimal λ by playing back the historical traffic.

    The core step of offline lambda solution is traffic playback: by replaying historical simultaneous traffic, online logic is multiplexed to simulate the decision of the optimal gear under the current lambda for each request.

    Main problem and solution

    • problem description: After the offline traffic is large, after dividing the traffic into multiple time slices, the solution process of λ needs to be played back multiple times, and the simulation complexity is high.
    • Solution: Quickly solve the optimal λ through flow sampling, multiple time slices and parallel solving.

    2. Online gear decision

    first defines a set of candidate gears (not included) based on the business scenario ), then iterate through the currently requested candidate gears and calculate the payoff for each gear , and finally make the decision according to the following rules.

    Optimal gear calculation:

    perform the computing power allocation action according to the

    gear:

      when the

    • maximum gear corresponds to the revenue, select the maximum revenue gear.
    • When, that is, the income corresponding to the maximum gear, the traffic distribution simple model.

    3.2 System stability guarantee

    Under intelligent computing power, the system

    is converted from equal computing power allocation to dynamic computing power allocation, in order to ensure the stability of system services, we provide conventional measures such as circuit breaker and downgrade, and also realize the PID real-time control function based on the system status.

      traffic access

    • : Provide different traffic access rules such as advertising space, city, and time period to control intelligent computing power traffic.
    • Monitoring and alarm:

    • Provides monitoring and threshold alarm for changes in gear, performance, and computing power in two dimensions: PV and the system as a whole.
    • Circuit breaker and degrading: Real-time monitoring of intelligent computing power abnormalities, after reaching the configured abnormal threshold, intelligent computing power will automatically fuse down and downgrade, and equal amount of computing power will be allocated.
    • Asynchronous decision-making

    • : In order to ensure that the overall link time of the main process does not grow, intelligent computing power decision-making is an asynchronous process, and equal computing power is allocated after timeout.

    3.2.2 PID

    real-time regulation

    PID (Proportion Integration Differentiation) based on system state is a mainstream control algorithm controlled by proportion, integration, and differentiation, and we perceive changes through real-time monitoring and change perception of system status. Real-time regulation of system status based on PID algorithm to ensure the stability of system status.

    The system status can usually be measured by the CPU/GPU utilization of the system, QPS, RT (Avg, TP99, TP999, etc.), call failure rate (FailRate) and other indicators.

    Based

    on this principle, we selected TP999, FailRate, and CpuUtils as the control targets.

    The control strategy

    is based on the PID governor and supports a variety of control strategies:

    • control action gear : For elastic queues, you can adjust the coefficient or upper limit of the queue length to adjust the candidate queue length.
    • Regulation decision formula

    • : The lambda coefficient of the decision decision formula can be adjusted to adjust the overall computing power consumption of the system.

    Governance process

    • System status reporting: The delivery engine service provides real-time feedback on the current status of the system through monitoring the system and synchronizes to MQ, in which the core indicators have a timeliness of 10s.
    • Acquisition components: Based on the Flink stream processing framework, real-time analysis and aggregation system status data are performed, denoising and smoothing are performed, and the processed data is written to KV storage.
    • Regulation components: Based on the PID algorithm, the system state changes are perceived by polling, the computing power of the system is regulated in real time according to the selected control target, and the control results are fed back to the decision-making components of the delivery engine to form a closed loop of regulation.

    After the intelligent computing power is connected to PID real-time regulation, when the system load is high, it can be quickly and stably and effectively feedback adjustment to maintain the system performance at the target water level.

    4

    . Experiment

    4.1 Experiment setting

    • selection of system computing capacity : In order to ensure that the online system can be quickly adjusted according to the real-time traffic, combined with the traffic characteristics of takeaway, we choose 15min as the smallest control unit; In the actual scenario, we select the peak time slice with the highest traffic and stable performance in the past few days, and count the total CPU time consumed in the slice as the system computing capacity C.
    • Baseline selection: The online traffic without intelligent computing power is selected as the control group.
    • Traffic value estimation: The traffic of the last multiple days is selected as the training data, and the XGB model is used to estimate and write the results to the vocabulary.
    • Traffic estimation: The traffic of the last multiple days is selected as the training data, the change of computing power with the length of the queue is fitted as a piecewise linear function, and the final estimated result is written into the vocabulary.
    • Offline λ solution: In the takeaway scenario, which is basically the same as the month-on-month traffic change trend, we calculate the optimal λ in each time slice (15 minutes as a time slice) offline by replaying yesterday’s traffic and storing it as a vocabulary.
    • Experimental idea: The system computing capacity C used in offline simulation can control the consumption of online computing power, so the experiment of flat machine resources or flat business income can be achieved by adjusting the system computing capacity C used for offline solution.

    4.2 Experiment 1: Flat machine resources to improve business revenue

    revenue source analysis

    • peak traffic hours, ensure system stability, and improve business benefits through differentiated computing power allocation.
    • During off-peak traffic hours, improve system resource utilization and convert idle machine resources into business benefits.

    4.3 Experiment 2: Flat business revenue, reduced machine resources

    Analysis of revenue sources:

    • to achieve the goal of reducing machine resources by suppressing the computing power consumption of the afternoon peak and evening peak. As can be seen from the figure below, during peak hours, the machine resource consumption of the experimental group accounts for about 60% of the control group.
    • At the same time, differentiated computing power allocation is carried out during peak hours, and resource utilization is improved during off-peak hours to fill in the overall business revenue.

    5. Summarize and look forward

    This article mainly introduces the thinking and optimization ideas of intelligent computing power in the construction process of takeaway advertising from 0 to 1 from two aspects: optimal gear decision-making and system stability guarantee.

    In the future, in terms of algorithm strategy, we will try to model and solve the optimal allocation of computing power under the full-link combination of the system based on the evolution algorithm and the reinforcement learning algorithm. In terms of engine architecture, the system simulation ability, online decision-making ability and stability guarantee ability are continuously optimized, and at the same time, it is tried to combine with the company’s elastic scaling system to exert the greater value of intelligent computing power.

    6. References

    [1] Jiang, B., Zhang, P., Chen, R., Luo, X., Yang, Y., Wang, G., … & Gai, K. (2020). DCAF: A Dynamic Computation Allocation Framework for Online Serving System. arXiv preprint arXiv:2006.09684.

    7. About the author

    Shunhui, Jiahong, Song Wei, Guoliang, Qianlong, Le Bin, etc., are all from Meituan’s takeaway advertising technology team.