Each article in this series starts from some actual production practice needs, solves some problems in production practice, and throws bricks and jade to help partners solve some practical production problems. This article is the second part of the live real-time data construction series, this article mainly introduces the whole process of the construction of the real-time dimension table of the live broadcast room, if it is helpful to the partners, welcome to like + watch again~

Attach an article.

Production Practices | Flink-based live real-time data construction (1) | The Requirements and

Architecture section reviews the “Technical Architecture” diagram in the previous section

.

The technical architecture from the data source to data processing and finally to the data sink part, the whole architecture

is relatively easy to understand.

However, everyone’s doubts may focus on the construction of three dimensional tables, including “host user portrait dimension, audience user portrait dimension, live room portrait dimension“.

Technical architectureWe

still start from the following perspectives, analyze the scene, and answer these questions to introduce the construction process of the above three dimensional tables.

Question

    “What

  • : What is the live real-time public portrait dimension?” What is an offline public portrait dimension? The difference?”
  • “WHY: Why are the three types of public portrait dimension tables in the architecture diagram divided according to real-time and offline?” Why do we need to build a real-time public portrait dimension, and offline public portrait tables cannot meet the needs?”
  • “HOW: How can we build a real-time public portrait dimension table that satisfies live real-time data?”
  • “WHO: What kind of components need to be used to build a live real-time public portrait dashboard?” Why were these components chosen for construction?”

WHAT: Real-time & offline public portrait tables?

The concept

begins with a brief introduction, “Real-time & Offline Public Portrait Dimension” The content stored in is other inherent properties of an entity (such as the age of the user entity, etc.), in my understanding that these two concepts themselves are abstract concepts, and the “host user portrait dimension, audience user portrait dimension, live room portrait dimension” introduced in this article is its concrete implementation.

Other bigwigs will have a deeper understanding of “real-time public portrait dimension” & “offline public portrait dimension” in the article explanation, here I only explain my understanding in the process of live real-time data construction~

difference

In fact, the difference between

these two words can be distinguished from the name, the biggest difference between real-time public portrait dimension table and offline public portrait dimension table is the difference in “timeliness” required by data construction and application scenarios.

Offline public portrait dimension table

features

:

  • scene“: suitable for offline scenes, “timeliness requirements are relatively weak” scenes, Provide image dimension filling or marking service
  • “construction” for indicators: generally build “applications

  • in an
  • offline t + 1 way: the data

  • used is offline t + 1 data
  • Example: A user portrait table in a data warehouse provides profiling services for application-layer data; For example, it is necessary to count not only the total UV, but also the UV of the age group.

Real-time public portrait dimension table

features:

  • scene” : Suitable for real-time scenarios, scenarios with “strong timeliness requirements“, providing image dimension filling or marking services for indicators “construction”: real-time
  • construction, delay is generally “applied” at the second level
  • : The data used is built in real time, and must be available in real time (obtained after a second-level delay) and use

WHY: Why build a real-time public portrait dimension?

Why are the three types of public portrait dimension tables in the architecture diagram divided according to real-time and offline? Why do I need to build a real-time public portrait dimension, and offline public portrait tables cannot meet the requirements?

These questions can actually be answered around our live real-time data construction and application scenarios.

Continuing from the previous technical architecture diagram, the public dimension table that needs to be built for live broadcast real-time data is divided into the following three categories

:

    “live

  • room profile dimension table”: it contains information such as the live broadcast category, broadcast client, title, and broadcast address corresponding to the live broadcast
  • “Anchor Portrait Dimension”: the anchor name, anchor category, gender, age group, etc. corresponding to the anchor
  • “Audience Portrait Dimension”: the gender, age group, etc. of the audience corresponding to the audience

The live room portrait dimension table

first throws out the conclusion: “The live room portraits are all the inherent attribute portraits of the live broadcast room, and the construction process of the live room portrait dimension table is real-time”.

Since most of the live broadcast duration varies from a few hours, with the beginning of the live broadcast, the interaction of the host domain audience is also generated, so that the indicators of live broadcast production and consumption

also begin to produce, with the end of the live broadcast, the interaction between the host and the audience is over, the corresponding live broadcast production and consumption indicators do not exist, so the value that the live broadcast room portrait can provide to other indicators as a dimension table quickly disappears, so the application scenario characteristics of the live broadcast room portrait (title, broadcast address) are “Very timely”. Therefore, the live broadcast room profile table needs to meet the requirements of real-time construction and real-time query and acquisition for the construction and application of live broadcast production and consumption indicators.

Anchor & Audience User Portrait Dimension Table

Conclusion: “This kind of portrait is a portrait of the user’s inherent attributes, not the inherent attributes of the live broadcast room, and is not strongly related to the live broadcast room. The construction process of the Anchor & Viewer User Portrait Dimension can be offline.”

Regardless of the start and exit of the live broadcast room, the

production and consumption during the live broadcast process, the portrait of the anchor and the portrait of the audience will basically not change. (For example, in most cases, when a user’s age profile has been determined to be 18 – 23, even if the user has opened 10 live broadcasts, or if the user has watched 10 live broadcasts, the age determination will basically not change). Therefore, the host user portrait dimension table & audience user portrait table can meet the requirements of offline T+1 construction and providing data services for real-time acquisition for the construction and application of live broadcast production and consumption indicators.

Notes:

Anchor & audience user portraits need to use machine learning to determine the output of user portrait information such as gender and age group based on user production and consumption behavior and other information. There are also many scenes where such portraits are constructed in real time for real-time personalized recommendations. However, the live real-time data construction in this paper has weak timeliness requirements for these two types of portraits, so it is constructed in an offline way.

❞ HOW

+ WHO: How to build? Built with what?

The live room life cycle & the entire life cycle of the live streaming room are shown in the

figure.

Life cycle
  • 1. The host creates a live broadcast room, and the live broadcast room enters the state of starting broadcasting;
  • 2. After the audience enters the live broadcast room, interact with the host in the live broadcast room;
  • 3. Finally, the

  • host closes the live broadcast room, marking the end of the life cycle of the live broadcast room.

Live room portrait table – real-time real-time

portrait dimension table construction. The “red” font in the figure above shows the construction and application process of the real-time portrait dimension.

Real-time data streaming of the

portrait in the live broadcast room

  • 1. When the host starts broadcasting and the live broadcast room is live, the live broadcast room generates the live room portrait information, At this time, the profile information can be built in real time to the real-time image dimension table of the live broadcast room. And at the same time, the real-time indicators on the production side can be constructed, and the built-up “real-time dimension table of portrait in the live broadcast room + offline dimension table of anchor & audience portrait” can be used to fill in the dimensions of the indicators on the production side;
  • 2. When the audience enters the live broadcast room,

  • interacts with the anchor and generates a series of consumption behaviors, they can then build real-time indicators on the consumption side, and use the “real-time dimension table of portrait in the live broadcast room + offline dimension table of the host & audience portrait” to fill in the dimensions of the indicators on the consumption side;
  • 3. When the host shuts down the live broadcast room, the image of the live broadcast room can be deleted from the real-time dimension table of the live room portrait.

Through the above analysis, we can understand that the requirements for the construction of real-time dimension tables of portraits in the live broadcast room are as follows:

    > Real-time portrait: support real-time construction and real-time access;
  • Real-time profile: The live broadcast data of the construction is a real-time indicator, that is, the access response time (millisecond level) requires low latency;
  • Public profile: It is necessary to support multiple access requests for high-traffic production and consumption real-time tasks, that is, to provide high-QPS profile data services;
  • Public portrait: high stability.

Therefore, the component selection naturally falls into the category of cache, and we finally chose redis as the storage engine for our real-time dimension table after the scheme comparison.

The

hash in Redis is used as the dimension table storage structure, and the image dimension storage design in the live room is shown in the following figure.

The dimension stores

flink real-time dimension table construction code examples

 public class LiveStreamRealtimeDimBuilderJob {

    public static void  main(String[] args) throws Exception {


        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<byte[]> source = SourceFactory.getSourceDataStream();


        source.process(new ProcessFunction<byte[], String>() {
            @Override
            public void  processElement(byte[] bytes, Context context, Collector collector) throws Exception {                CommonModel c = CommonModel.parseFrom(bytes);

Start airing


if  (c.isStartLiveStream()) {                    RedisConfig                            .get()                            .hmset(c.getLiveStreamId()                                    , ImmutableMap. builder()

                                            .put("type", c.getType())


                                            .put("client", c.getClient())
                                            .put("title", c.getTitle())
                                            .put("address" , c.getAddress())                                            .build()                            );                    RedisConfig                            .get()

                            .expire(c.getLiveStreamId(), 30 * 24 * 60 * 60);


} else if (c.isEndLiveStream()) {
Off RedisConfig .get()

                            .expire(c.getLiveStreamId(), 2 * 24 * 60 * 60);

                }            }        });        env.execute();    }

    @Data


    public static class CommonModel {
        private String liveStreamId;  live room id
private String type;  Live room type
private String client;  Start broadcasting client
private String title;  private
String address; The live broadcast address

is public static CommonModel parseFrom(byte[] bytes) {


 The logic determines return null based on the business logic
;        }

public boolean isStartLiveStream() {


The logic determines return based on business logic
false;        }

public boolean isEndLiveStream() {


The logic determines return false based on business logic
;        

} }}

Anchor & Audience User Portrait Dimension – Construction of offline

offline portrait Dimension Table. It mainly contains user portraits, gender, age and other information of the anchor and audience. The “blue” font shown below shows the application process of offline portrait dimension table.

When the life cycle anchor & audience portrait data is transferred to

the real-time data of the production side and the consumption side of the live broadcast room, the

profile dimension is filled by using the host & audience portrait.

Among the storage

components, the storage component selection of the offline portrait dimension table is the same as real-time, the same redis, and the image information storage method is also stored using the Redis hash structure.

T + 1 offline mode to build portrait data and synchronize data, and synchronize the built full host and audience user portraits to the Redis cache.

To summarize

this article, which continues from the above, introduces the whole process of real-time & offline public portrait construction. Several construction issues were raised, and starting from these questions, the following three subsections were mainly introduced.

The first section briefly introduces the concept of real-time & offline public portrait tables.

From the perspective of data application scenarios, the second section introduces why it is necessary to build a real-time public portrait dimension.

The third section mainly introduces the construction process of real-time portrait dimension table and detailed technical solutions.

The last section summarizes this article.

If you have also built a real-time portrait dimension table, or have the same needs, welcome to leave a message or leave a link to your article to communicate with each other~

Buy Me A Coffee