Each article in this series is based on some actual production practice needs, solves some problems in production practice, and helps partners solve some practical production problems. I believe that everyone has watched the live broadcast more or less, so have you ever thought about how to build the overall live broadcast real-time data in the company if you are responsible for building the company? This series of articles mainly introduces the whole process of live real-time data construction, if it is helpful to small partners, welcome to like + watch again ~

    > “WHAT: I believe that everyone has watched the live broadcast more or less, and even if you are an anchor or the business you are responsible for is related to live broadcast, have you ever thought about what indicators do you care about the most and what live broadcast data you need to pay attention to and build in the live broadcast business scenario?”
  • “WHY: Why do you need to build real-time live streaming data? Isn’t it enough to be offline?”
  • “HOW: How can real-time live streaming data empower business? How to divide the live broadcast real-time data module according to the needs of the company’s live broadcast scenario? How do you build live real-time data?”
  • “WHO: What kind of components need to be used in the process of building live real-time data? What part is each component responsible for?”

Let’s start with the above questions ~

live broadcast + short video, the next battlefield of content operation

With the development of Internet technology, online live broadcast has received more and more attention, and the popularity of live broadcast has decreased in recent times after the gushing outbreak a few years ago. With the popularization of mobile terminals and the acceleration of the network, short videos have quickly won the favor of major platforms, fans and capital in a short, flat and fast large-traffic transmission way, so many live broadcast software has begun to access the functions of short videos. At the same time, some apps developed mainly for short videos have also added live broadcast functions to the software, and live broadcast and short videos make up for each other’s shortcomings and complement each other, bringing users a better user experience and more traffic to major platforms, and the “live broadcast + short video” model has also become a new development trend.

This series of articles mainly revolves around the construction of live real-time data. This article is the first article in this series, Requirements and Architecture, mainly divided into three parts, in order of “WHY – WHAT – HOW”, from these three perspectives, to answer the three questions raised at the beginning, of which the “WHO” part is introduced in the subsequent construction details of this series of articles!

WHY: Why do you need to build live streaming data?

Compared with the production and consumption of short videos, the link between the live broadcast anchor and the audience watching the live broadcast is established in the live broadcast room, and the interaction between each other is only generated in the live broadcast room, and usually, the duration of a live broadcast is

only within a few hours, so the production and consumption timeliness of live broadcast will be stronger than that of short video, so the live broadcast data has a higher demand for real-time.

WHAT: What live real-time data do you need to pay attention to and build?

What live real-time data do you need to pay attention to and build? In other words, it is based on the “needs of data analysis business” to decide what kind of live real-time data to build.

Live broadcast is a link between the anchor and the audience to

contact and interact, in which all operations are carried out around the anchor and the audience, data analysis students will analyze from this most basic perspective, so first we can divide the entire live broadcast data according to “live production” and “live consumption”.

In addition to this perspective, students who analyze data will also analyze insights from different levels of “global live broadcast business insights” and “single live room insights”, so they can also be divided according to “market data” and “single live room data”.

From these two perspectives,

it can basically cover the requirements for live broadcast business analysis scenarios, so live real-time data can naturally be divided and constructed from these two perspectives.

In summary, the overall “live real-time data service division and enabling application architecture” is shown in the following figure.

In the service segmentation and enabling application architecture

, “real-time data of the live market” monitors the live broadcast business at a macro level and provides the ability to predict the market. Among them, the minute-grained time series can quickly locate the peak time of each behavior of the live broadcast, and detailed attribution can be made based on this moment. In addition, when live streaming is doing operational activities, it can also quickly view the activity effect of operational activities based on real-time data, enabling real-time optimization of activity strategies.

“Real-time live broadcast data in a single live broadcast room” can monitor the live broadcast service of a single live broadcast room

at a fine-grained level, which can be used to output live data battle reports during the live broadcast process, and to evaluate the real-time effect of the input resources in a single live broadcast room in real time based on the effect of the data battle report.

Detailed live streaming real-time data requirements and samples are shown below.

The “

production side

” of

the broader market


  • : Total number of live broadcast rooms….
  • “Dimension”
  • : “Example” of the portrait of the live broadcast room and the user portrait of the host

  • : [the total number of live streamers] and the “consumption side” of [the live broadcast room is a game live broadcast].
    • “metric”: the overall number of viewers, likes, comments….
    • “Dimension”: “Examples

    • of other dimensions reported by audience user portraits and logs
    • : [total number of viewers]

    [currently watching live broadcast in Hebei]. The single live broadcast room “production side” of the single live broadcast room

    is generally some portrait information, so there are fewer such indicators, and it will not be discussed for the time being.

    “Consumption side

    • indicators”: the number of viewers, likes, and comments in a single live broadcast room….
    • “Dimension”: “Examples

    • of other dimensions reported by audience user portraits and logs:
    • [

    • total number of viewers] in a live broadcast room [18-23 years old age group].

    Now that you know what the live real-time data is included in the construction of live broadcasts, it’s time to do a big job.

    HOW: How to build?

    How to build? In other words, from a technical point of view, how to transform the “business requirements of live real-time data” into “technical solutions for live live data”?

    From a technical point of view, the above-mentioned requirements for the construction of live real-time data can be summarized in one word: “live real-time multi-dimensional indicators”.


    , i.e. output indicators are multi-dimensional, including public and non-public dimensions.

    The first category is the “public dimension”. It contains three parts, live broadcast room portrait, host user portrait, audience user portrait, public two words represent that this dimension can be shared and used by multiple indicators. For example, after a live broadcast room is broadcast, the portrait of the live broadcast room only needs to be built once, and can be reused by multiple indicators many times, which can be used not only as a dimension of production and consumption indicators on the large market side, but also as a dimension of production and consumption indicators in a single live broadcast room.

    The second category is the “non-public dimension”. The non-public dimension is bound to a specific consumption behavior, that is, it is bound to a certain metric, and is reported together with the log report. Example: The type of client (Android?) when a viewer watches a live stream? IOS? ), the province and other dimensions when watching the live broadcast, such dimensions are only related to the current consumption behavior and cannot be shared by other indicators.



    are actually PV, UV indicators. The simple understanding is the corresponding xx amount in each dimension.

    The technical architecture

    of indicator

    live live real-time data construction corresponds to the process of live live real-time data construction mainly includes two parts: public part and non-public part.

    The public part is the construction of real-time public dimension tables.

    The non-public

    part is the non-public dimension of the indicator and the construction of corresponding production and consumption indicators.

    Directly give the overall “

    technical architecture” diagram, and the subsequent articles in this series will introduce the detailed reasons for such an overall architecture design.

    A brief description of

    the technical architecture.

    Among them, the data source includes the production side and the consumption side data source;

    The data processing part includes the construction of public real-time dimension tables and indicator construction, and the construction of some public dimension tables also uses offline construction to provide support;

    Finally, there is the data collection part, which produces multi-dimensional indicators on the production side and the consumption side for data analysts to use.

    The above is the entire “WHY – WHAT – HOW” analysis process, this article first gives the technical architecture design, and the subsequent article introduces the technical solutions used in detail~


    This article first raises several questions about the construction of live real-time data. Triggered by these questions, the following three subsections are introduced.

    The first section briefly introduces the reasons for the timeliness of live broadcasting, so the demand for real-time data is even stronger.

    From the perspective of data analysis, the second section introduces what content is included in the live real-time data we need to build, and divides the modules from the perspective of large market/single live broadcast room and production/consumption.

    The third section designs the overall architecture of the technical solution for the data requirements.

    The last section summarizes this article.

    If you also have the same construction needs or you have built live real-time data, welcome to leave a message or leave a link to your article to communicate with each other~

    Buy Me A Coffee