❝
Each article in this series starts from some actual production practice needs, solves some problems in production practice, and throws bricks and jade to help partners solve some practical production problems. This article is the second part of the live real-time data construction series, this article mainly introduces the whole process of the construction of the real-time dimension table of the live broadcast room, if it is helpful to the partners, welcome to like + watch again~
❞
Attach an article.

Production Practices | Flink-based live real-time data construction (1) | The Requirements and
Architecture section reviews the “Technical Architecture” diagram in the previous section
.
is relatively easy to understand.
However, everyone’s doubts may focus on the construction of three dimensional tables, including “host user portrait dimension, audience user portrait dimension, live room portrait dimension“.
still start from the following perspectives, analyze the scene, and answer these questions to introduce the construction process of the above three dimensional tables.
Question
- “What
-
: What is the live real-time public portrait dimension?” What is an offline public portrait dimension? The difference?” -
“WHY: Why are the three types of public portrait dimension tables in the architecture diagram divided according to real-time and offline?” Why do we need to build a real-time public portrait dimension, and offline public portrait tables cannot meet the needs?” -
“HOW: How can we build a real-time public portrait dimension table that satisfies live real-time data?” -
“WHO: What kind of components need to be used to build a live real-time public portrait dashboard?” Why were these components chosen for construction?”
WHAT: Real-time & offline public portrait tables?
The concept
begins with a brief introduction, “Real-time & Offline Public Portrait Dimension” The content stored in is other inherent properties of an entity (such as the age of the user entity, etc.), in my understanding that these two concepts themselves are abstract concepts, and the “host user portrait dimension, audience user portrait dimension, live room portrait dimension” introduced in this article is its concrete implementation.
Other bigwigs will have a deeper understanding of “real-time public portrait dimension” & “offline public portrait dimension” in the article explanation, here I only explain my understanding in the process of live real-time data construction~
difference
In fact, the difference between
these two words can be distinguished from the name, the biggest difference between real-time public portrait dimension table and offline public portrait dimension table is the difference in “timeliness” required by data construction and application scenarios.
Offline public portrait dimension table
features
:
- “
-
scene“: suitable for offline scenes, “timeliness requirements are relatively weak” scenes, Provide image dimension filling or marking service -
“ in an -
used is offline t + 1 data -
Example: A user portrait table in a data warehouse provides profiling services for application-layer data; For example, it is necessary to count not only the total UV, but also the UV of the age group.
“construction” for indicators: generally build “applications
offline t + 1 way: the data
Real-time public portrait dimension table
features:
- “
-
scene” : Suitable for real-time scenarios, scenarios with “strong timeliness requirements“, providing image dimension filling or marking services for indicators “construction”: real-time -
construction, delay is generally “applied” at the second level -
: The data used is built in real time, and must be available in real time (obtained after a second-level delay) and use
WHY: Why build a real-time public portrait dimension?
Why are the three types of public portrait dimension tables in the architecture diagram divided according to real-time and offline? Why do I need to build a real-time public portrait dimension, and offline public portrait tables cannot meet the requirements?
These questions can actually be answered around our live real-time data construction and application scenarios.
Continuing from the previous technical architecture diagram, the public dimension table that needs to be built for live broadcast real-time data is divided into the following three categories
:
- “live
-
room profile dimension table”: it contains information such as the live broadcast category, broadcast client, title, and broadcast address corresponding to the live broadcast -
“Anchor Portrait Dimension”: the anchor name, anchor category, gender, age group, etc. corresponding to the anchor -
“Audience Portrait Dimension”: the gender, age group, etc. of the audience corresponding to the audience
The live room portrait dimension table
first throws out the conclusion: “The live room portraits are all the inherent attribute portraits of the live broadcast room, and the construction process of the live room portrait dimension table is real-time”.
Since most of the live broadcast duration varies from a few hours, with the beginning of the live broadcast, the interaction of the host domain audience is also generated, so that the indicators of live broadcast production and consumption
also begin to produce, with the end of the live broadcast, the interaction between the host and the audience is over, the corresponding live broadcast production and consumption indicators do not exist, so the value that the live broadcast room portrait can provide to other indicators as a dimension table quickly disappears, so the application scenario characteristics of the live broadcast room portrait (title, broadcast address) are “Very timely”. Therefore, the live broadcast room profile table needs to meet the requirements of real-time construction and real-time query and acquisition for the construction and application of live broadcast production and consumption indicators.
Anchor & Audience User Portrait Dimension Table
Conclusion: “This kind of portrait is a portrait of the user’s inherent attributes, not the inherent attributes of the live broadcast room, and is not strongly related to the live broadcast room. The construction process of the Anchor & Viewer User Portrait Dimension can be offline.”
Regardless of the start and exit of the live broadcast room, the
production and consumption during the live broadcast process, the portrait of the anchor and the portrait of the audience will basically not change. (For example, in most cases, when a user’s age profile has been determined to be 18 – 23, even if the user has opened 10 live broadcasts, or if the user has watched 10 live broadcasts, the age determination will basically not change). Therefore, the host user portrait dimension table & audience user portrait table can meet the requirements of offline T+1 construction and providing data services for real-time acquisition for the construction and application of live broadcast production and consumption indicators.
❝
Notes:
Anchor & audience user portraits need to use machine learning to determine the output of user portrait information such as gender and age group based on user production and consumption behavior and other information. There are also many scenes where such portraits are constructed in real time for real-time personalized recommendations. However, the live real-time data construction in this paper has weak timeliness requirements for these two types of portraits, so it is constructed in an offline way.
❞ HOW
+ WHO: How to build? Built with what?
The live room life cycle & the entire life cycle of the live streaming room are shown in the
figure.
-
1. The host creates a live broadcast room, and the live broadcast room enters the state of starting broadcasting; -
2. After the audience enters the live broadcast room, interact with the host in the live broadcast room; -
host closes the live broadcast room, marking the end of the life cycle of the live broadcast room.
3. Finally, the
Live room portrait table – real-time real-time
portrait dimension table construction. The “red” font in the figure above shows the construction and application process of the real-time portrait dimension.
Real-time data streaming of the
portrait in the live broadcast room
-
1. When the host starts broadcasting and the live broadcast room is live, the live broadcast room generates the live room portrait information, At this time, the profile information can be built in real time to the real-time image dimension table of the live broadcast room. And at the same time, the real-time indicators on the production side can be constructed, and the built-up “real-time dimension table of portrait in the live broadcast room + offline dimension table of anchor & audience portrait” can be used to fill in the dimensions of the indicators on the production side; -
interacts with the anchor and generates a series of consumption behaviors, they can then build real-time indicators on the consumption side, and use the “real-time dimension table of portrait in the live broadcast room + offline dimension table of the host & audience portrait” to fill in the dimensions of the indicators on the consumption side; -
3. When the host shuts down the live broadcast room, the image of the live broadcast room can be deleted from the real-time dimension table of the live room portrait.
2. When the audience enters the live broadcast room,
Through the above analysis, we can understand that the requirements for the construction of real-time dimension tables of portraits in the live broadcast room are as follows:
-
Real-time profile: The live broadcast data of the construction is a real-time indicator, that is, the access response time (millisecond level) requires low latency; -
Public profile: It is necessary to support multiple access requests for high-traffic production and consumption real-time tasks, that is, to provide high-QPS profile data services; -
Public portrait: high stability.
Therefore, the component selection naturally falls into the category of cache, and we finally chose redis as the storage engine for our real-time dimension table after the scheme comparison.
The
hash in Redis is used as the dimension table storage structure, and the image dimension storage design in the live room is shown in the following figure.
flink real-time dimension table construction code examples
public class LiveStreamRealtimeDimBuilderJob { public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<byte[]> source = SourceFactory.getSourceDataStream();
source.process(new ProcessFunction<byte[], String>() {
@Override
public void processElement(byte[] bytes, Context context, Collector collector) throws Exception { CommonModel c = CommonModel.parseFrom(bytes); Start airing
if (c.isStartLiveStream()) { RedisConfig .get() .hmset(c.getLiveStreamId() , ImmutableMap. builder() .put("type", c.getType())
.put("client", c.getClient())
.put("title", c.getTitle())
.put("address" , c.getAddress()) .build() ); RedisConfig .get() .expire(c.getLiveStreamId(), 30 * 24 * 60 * 60);
} else if (c.isEndLiveStream()) {
Off RedisConfig .get() .expire(c.getLiveStreamId(), 2 * 24 * 60 * 60);
} } }); env.execute(); } @Data
public static class CommonModel {
private String liveStreamId; live room id
private String type; Live room type
private String client; Start broadcasting client
private String title; private
String address; The live broadcast address is public static CommonModel parseFrom(byte[] bytes) {
The logic determines return null based on the business logic
; } public boolean isStartLiveStream() {
The logic determines return based on business logic
false; } public boolean isEndLiveStream() {
The logic determines return false based on business logic
;
} }}
Anchor & Audience User Portrait Dimension – Construction of offline
offline portrait Dimension Table. It mainly contains user portraits, gender, age and other information of the anchor and audience. The “blue” font shown below shows the application process of offline portrait dimension table.
When the life cycle anchor & audience portrait data is transferred to
profile dimension is filled by using the host & audience portrait.
Among the storage
components, the storage component selection of the offline portrait dimension table is the same as real-time, the same redis, and the image information storage method is also stored using the Redis hash structure.
T + 1 offline mode to build portrait data and synchronize data, and synchronize the built full host and audience user portraits to the Redis cache.
To summarize
this article, which continues from the above, introduces the whole process of real-time & offline public portrait construction. Several construction issues were raised, and starting from these questions, the following three subsections were mainly introduced.
The first section briefly introduces the concept of real-time & offline public portrait tables.
From the perspective of data application scenarios, the second section introduces why it is necessary to build a real-time public portrait dimension.
The third section mainly introduces the construction process of real-time portrait dimension table and detailed technical solutions.
The last section summarizes this article.
If you have also built a real-time portrait dimension table, or have the same needs, welcome to leave a message or leave a link to your article to communicate with each other~