It is mainly divided into two parts:

Schedule the daily schedule to generate an offline portrait of the daily active user T+1 and import it into HBASE.

1) Flink subscriber behavior data and processes behavior data according to the specific business requirements of the profile.

2) Send the action operator of the processed behavior data to build a unified portrait framework to Kafka. The Action contains information such as the tag name, the tag value, the processing operator corresponding to the tag, and the behavior time.

3) The portrait framework consumes Action information and does the corresponding operator type processing according to the configured information. For example, map, List, String and a series of other types of processing.

4) Write the processed real-time portrait to Redis.

The entire converged link for offline, real-time profiling as a whole is described above. From data preparation, data processing, data fusion to finally providing a complete portrait, it is actually similar to the Lambda architecture. Of course, at the batch level, we have adopted different processing methods considering the requirements of different business domains for the integrity of T+1 daily live images. For example, directly write this part of the daily active portrait into Redis instead of updating it through lazy loading, which allows the algorithm side itself to be used in combination with the actual scene. Another point is whether the batch layer can be further optimized to reduce maintenance costs, such as HBASE’s intermediate storage, which is currently exploring snapshots based on daily generation of offline portraits, directly from ODPS for Load use, and further exploring how to make full use of offline portraits while reducing development costs.

*Text/Kim

Recommended activities

Topic:

Derivative Technology Salon – Middleware Architecture & Stability Governance Practical Session

Time:

October 22 14:00-18:00

How to Register: