Author of this issue
Wei Zefeng
Bilibili Senior Development Engineer
Joined Station B in 2021 and now serves the Infrastructure Real-Time Team, responsible for real-time streaming and Flink CDC-related work.
Gao Ruichao
Bilibili Senior Development Engineer
Joined Station B in 2021 and now serves the Infrastructure Real-Time Team, responsible for real-time streaming and Flink connector-related work.
01 Background
Lancer is a real-time streaming platform of Station B, which carries the data reporting/collection, transmission, and integration work of the whole station server and client, and the second-level delay is the lifeline of the data platform of Station B as the entrance of the data warehouse. Currently the daily peak is 5000w/s rps, 3PB/day, 4K+ stream data synchronization capability.
The scale of data such a service places high demands on the reliability, scalability and maintainability of the product. The implementation of streaming is a very challenging thing, focusing on fast, accurate and stable requirements, Lancer’s overall evolution has experienced the evolution of three stages of large pipeline model, BU granular pipeline model, and single-stream single operation model.
02 Keyword description
logid: The data flow reported by each business party is identified by the logid, which is the meta-information identification of the data in the transmission + integration process.
Data source: The data enters the portal of the lancer, for example: log-agent, bfe-agent, flink cdc
lancer-gateway: The gateway that receives data reports.
Data buffer layer: Also known as internal kafka, it is used to decouple data reporting and data distribution.
lancer-collector: Also known as data synchronization, different end-to-end data synchronization can be completed according to actual scenarios.
03 Technology Evolution
The evolution of the entire B station streaming data transmission architecture has gone through three stages.
3.1 Architecture V1.0 – flume-based
Large Pipe Data Transfer Architecture (before 2019)
At the beginning of the establishment of the streaming architecture of Station B, the number of data traffic and data flow strips was relatively small, so the data flow of the whole station was mixed in one pipeline for processing, based on the flume secondary customized data transmission architecture, the architecture is as follows:
The entire architecture is divided into data sources, data gateways, data buffers, and data distribution layers from data generation to landing.
The data reporting terminal basically uses the sdk method to directly send http and grpc requests for reporting.
The data gateway lancer-gateway is a data gateway based on flume secondary iteration, which is used to carry data reporting, and supports two protocols: http is used to carry public network data reporting (web/app), and grpc is used to carry server-side data reporting within IDC.
The data buffer layer is implemented using kafka to decouple data reporting and data distribution.
The data distribution layer lancer-collector is also a data distribution layer based on flume secondary iterations, which is used to synchronize data from the buffer layer to ODS.
The v1.0 architecture exposes some pain points in use:
1. The data source side has poor controllability and fault tolerance for data reporting, for example:
In the event of a data gateway failure, the data source lacks caching capability and cannot directly reverse the pressure, which is a potential for data loss.
Heavy SDK: Various adaptation logic needs to be added to the SDK to cope with reporting exceptions
2. The overall architecture is a large pipeline model, the division and isolation of resources are not clear, the overall maintenance cost is high, and the isolation of its own faults is poor.
3. Some drawbacks of flume-based secondary iterations:
The logic is complex, the performance is poor, and the functions we need are relatively single
The hdfs distribution scenario does not support exactly once semantics, and each restart will cause a large number of data to be duplicated
3.2 Architecture V2.0-BU granularity
Pipelined Architecture (2020-2021)
For the flaws of v1.0, we introduced the architecture v2.0, the architecture is as follows:
The key details of this architecture are as follows:
1. Enhanced edge controllability at the source end of data reporting
Deploy log-agent on the server to host server-side data reporting.
The bfe-agent is deployed on the cdn to carry public network (web, app) data reporting.
The log-agent/bfe-agent integrates data buffering, pre-aggregation, flow control, retry, degradation, etc., and the data reporting sdk only needs to focus on the data generation and reporting logic.
The agent side routes data to different pipes based on the BU attribute of the logid.
2. The data pipeline is built with BU as the granularity, the resources between the pipelines are isolated, each pipeline contains a complete set of independent data transmission links, and the data pipeline supports rapid construction based on airflow. Fault isolation is achieved at BU level.
3. The data gateway is upgraded to self-developed lancer-gateway 2.0, with simplified logic, support for flow control backpressure, and adapting to kafka failover, based on k8s deployment.
4. HDFS distribution is implemented based on the flink jar: supports exactly once semantic guarantees.
Compared with v1.0, the V2.0 architecture focuses on improving the controllability of the data reporting edge, resource division and isolation between BU granularity pipes. However, with the rapid increase in the scale of streaming data transmission in Station B, higher and higher requirements have been put forward for the timeliness, cost and quality of data transmission, and V2.0 has gradually exposed some defects:
1. Poor logid level isolation:
A sharp increase in the traffic of a logid inside a single pipeline, several times or even dozens of times, will still cause the data distribution delay of the entire pipeline.
If the distribution layer component in a single pipeline fails to restart, for example, if the flink jar job corresponding to the hdfs distribution hangs up and restarts and resumes from checkpoint, the hdfs distribution of all logids in this pipeline will have the hidden danger of archiving delay.
2. The gateway is an asynchronous sending model, and in extreme cases (component crashes), there is a risk of data loss.
3.ods local hot spots/fault effects amplification of layers
Since a job in the distribution layer distributes multiple logids at the same time, this large job model is more susceptible to the local hotspot of the ods layer, for example, a datanode hotspot in hdfs will cause a distribution job to write blockage as a whole, which in turn affects other logids of this distribution job.
The invalidation of all copies of a single block of HDFS will cause the corresponding distribution task to hang up and restart as a whole.
4. HDFS small file problem amplification
The flink jar job corresponding to the hdfs distribution is relatively large in order to ensure the throughput. Therefore, for all logids in the pipeline, the number of files with concurrency size will be opened at the same time, and for logids with low traffic, the number of small files will become larger.
In view of the above pain points, the most direct solution is to further isolate the overall architecture and realize data transmission + distribution in the dimension of a single logid. The main challenges are as follows:
How to ensure that the whole link is isolated in logids, how to reasonably control the flow and ensure the isolation between data streams when the use of resources is controllable
It requires a lot of interaction with external systems, and how to adapt to various problems of external systems: local hot spots, failures
The number of integration operations has increased exponentially, how to ensure high performance, stability and efficient management, O&M, and quality monitoring.
3.3 Architecture V3.0 – Flink SQL-based
Single-stream, single-job data integration scenario
In the V3.0 architecture, we have isolated the overall transmission link and optimized the single operation data stream, and supported the data distribution scenario based on Flink SQL. The architecture is as follows:
Compared with v2.0, the resource pool capacity management is still based on BU as the granularity, but the transmission and distribution of each logid are independent of each other and do not affect each other. The specific logic is as follows:
agent: The overall reporting SDK and the agent receive + send logic are isolated according to the logid, and the collection and transmission between the logids are isolated from each other.
lancer-gateway 3.0: The request processing of logids is isolated from each other, and when the kafka transmission is blocked, it is directly counter-pressured to the agent end, which is described in detail below.
Data buffer layer: Each logid corresponds to a separate internal kafka topic to implement data buffering.
Data distribution layer: The distribution layer distributes the data of each logid to start a separate flink sql job, and the processing of a single logid is blocked, which only causes the data of each logid to accumulate.
Compared to previous implementations, the v3.0 architecture has the following advantages:
1. Reliability:
Functional quality of the collation link can ensure that the data is not lost, the gateway layer sends data in a synchronous manner, can ensure that the data is persisted to the internal kafka; flink supports state recovery and exactly once semantics, which also ensures that data is not lost.
2. Maintainability:
Isolation is isolated between logids, one logid has a problem, and the other logids are not affected.
Resource allocation is based on the minimum unit of logid, which can precisely control the resource usage of a single logid.
3. Scalability:
Flexible management and control can be made in a single logid: flexible scaling resources
04 V3.0 architecture concrete implementation
We highlight the implementation of the various layers of the current V3.0 structure.
4.1 Data reporting edge layer
4.1.1 log-agent
Based on go-developed and plug-in architecture, deployed on physical machines, reliable and efficient support for server-side data reporting.
The time architecture is divided into three layers: collection, processing, and transmission, and has the following main characteristics:
Supports file collection and unix sock two data reporting methods
Communicate with the gateway GRPC: ACK+backoff retry + flow control
The overall reporting SDK and agent receive + send logic are isolated and transformed according to the logid, and the single logid processing is isolated from each other: each logid starts an independent pipeline for collection, parsing, and sending.
The gateway is based on service discovery, adaptive gateway tuning
In the case of sending obstruction, local stacking occurs on a disk-based basis
Embedded point monitoring of logid granularity, real-time monitoring of data processing status
CGroup resource limit: CPU + memory
Data aggregation and sending, improve transmission efficiency
Support physical machine and container logs for this collection, configuration released with the application, adaptive configuration addition, deletion, and change.
4.1.2 bfe-agent
Based on go-developed and deployed in cdn, it is used to carry public network data reporting.
Edge cdn node, cdn server deployment nginx and bfe-agent, bfe-agent overall implementation architecture is similar to log-agent, for the web and app-side data reporting request QPS high, strong suddenness characteristics, mainly strengthen the following capabilities:
Handle sharp increases in traffic: Local buffering based on edge nodes acts as a peak slash
Strategy (downgrade, flow control) front, enhance controllability
Logid level triage isolation, support for hierarchical division
Aggregate compressed backhaul to improve data transmission efficiency, reduce costs, and reduce back-to-origin QPS by more than 90%.
4.2 Data reporting gateway layer
In the v3.0 scenario, the schema of the data gateway is as follows:
The data gateway features are as follows:
Kafka’s generic proxy layer: supports the grpc/http protocol
Based on kafka send callback, the synchronous sending model is implemented to ensure that the data is not lost: after the data is written to the kafka, the request is returned to ack
Requests are not split: The agent-based aggregation mechanism only supports a single record request per time, so one record corresponds to a message from the cache layer kakfa
lancer-gateway 3.0 sends the request to the corresponding kafka cluster based on the topic information of the request
lancer-gateway 3.0 adapts to local hotspots of kafka clusters: supports partition dynamic culling
The logid corresponds to the topic one-to-one, and the processing process is isolated from each other: one topic is blocked from being sent, and does not affect other topics
The implementation difficulty in the entire data gateway is: how to ensure isolation and fairness in the process of single gateway carrying multi-logid processing, we refer to the GMP mechanism in Golang, the overall data flow is as follows:
1. If you receive a request, put the request in the request queue corresponding to the logid, and if the queue is full, the request is directly rejected
2. Each kafka cluster will initialize a kafka producer pool of N size, where each producer will traverse all the queues and send data.
3. For each logid request queue, the resource consumption is limited from two maintenance to ensure fairness and isolation
Limit the number of producers bound to a logid queue
Based on time slices limits the length of time that a producer serves a single queue
4.3 Data reporting distribution layer
With the maturity of flink in the field of real-time computing, its advantages of high performance, low latency, exactly once semantic guarantee, batch flow integration, rich data source support, and active community have prompted us to choose flink sql as the data distribution layer. At present, we mainly support three scenarios: kafka→hive, kafka→kafka, cdc→kafka->hudi/hive:
1. kafka→hive
Stream, import data to hive in real time.
file rolling on check, guaranteed exactly once.
Write partitions and archives according to event time, and the archive delay is less than 15min
Two storage formats are supported: text+lzo (row memory) and orc+zstd (column memory).
Supports incremental synchronization of downstream jobs.
2. kafka→kafka
Streaming, support real-time synchronization of data
Supports the transparent transmission of kafka header metadata information
3. cdc→kafka->hudi/hive
Synchronize full and incremental data in the form of real-time streaming, and the use scenarios of the entire CDC are divided into two links
cdc → kafka
Based on cdc 2.1, full and incremental binlog synchronization of synchronous mysql
Single sql jobs support the synchronization of database shards and multiple tables and multiple databases.
Supports offloading to different data buffer layers kafka topic based on db and table custom policies
kafka→hudi/hive
The consumption order topic is synchronized to a single hudi/hive table, and the event_time partition is supported.
Ensure eventual consistency of data
05 Flink connector feature iteration
In the support of Flink SQL data distribution scenarios, the community-native connector has been optimized accordingly according to the actual needs we encountered, and we will introduce it in a targeted manner.
5.1 hive sink connector optimization
Off-stream null partition committed
Background: The pull up of offline work of Station B depends on the submission of upstream partitions, and the judgment of HDFS partition submission depends on the advancement of the overall watermark of the job, but how to submit partitions in the case of some logids being interrupted
Workaround:
As shown in the figure: When all StreamFileWriters have not processed any data within two consecutive checkpoints, StreamingFileCommiter will determine that a flow break has occurred and commit the partition at the current time.
Supports downstream incremental data synchronization
Background: The traditional way of ods to dwd data synchronization only starts after the ods layer partition is ready, the timeliness is poor, how to accelerate the synchronization of data?
Workaround:
Do not rely on ods layer partition ready, when the ods directory Chinese is generated, you can start the processing of data, and read the data file incrementally.
Through the HDFS list operation to get the files that need to be read, the pressure on NameNode is greater, for this we provide a file list index (including file names and data strips), downstream only need to read the index, you can get the incremental file list.
The implementation indexes the file state into state, generates a temporary file of the .inflight state in snapshot, and renames the file into a commit official file in notifyCheckpointComplete, providing exactly once semantic guarantees.
The downstream job reads the file index and supports incremental data synchronization from ods to dwd.
orc+zstd
Background: Compared with row storage, columnar storage has significant advantages over compression ratio.
Solution: Support orc+zstd, tested, compared to text+lzo, space saving of more than 40%.
hdfs asynchronous close
Background: Snapshot phase flush data, close files often because individual files slow down the overall throughput.
Workaround:
Throws the file that closes out into the asynchronous queue. That is, the close action does not block the processing of the entire main link, and improves the throughput in the case of local hot spots in HDFS. The list of asynchronous close files is saved to pendingPartsForCurrent Checkpoint and persisted to state. When the failure recovers, the file can also continue to be closed.
The introduction of asynchronous close introduces the hidden danger of partition creation in advance, which introduces the judgment of the bucket state. For a partition, the commit of the partition is made in the commit in the commit operator only if the pendingPartsForCurrent Checkpoint in all buckets belonging to the partition is empty (all files are closed).
Small file merge
Background: The rolling strategy of rolling on checkpoint causes the number of files to bloat, putting a lot of pressure on the namenode.
Workaround:
A small file merge feature was introduced, and after checkpoint was completed, the merge operation was triggered by the Streaming writer’s notifyCheckpointComplete method, sending EndCheckpoint signals downstream.
After the coordinator receives each writer’s EndCheckpoint, it begins to group the files, encapsulated into compactunits broadcast downstream, and after all units are sent, EndCompaction is broadcast.
The compact operator finds its own task and starts processing, and when EndCompaction is received, sends partition submission information downstream.
5.2 kafka connector optimization
Support protobuf format
Background: Users have the need to work with data in protobuf format
Workaround:
Use protoc to generate Java classes, package jars, and upload them to the real-time computing platform.
Implement the corresponding DeserializationSchema and SerializationSchema, dynamically load the pb class and call the method through reflection to complete the interchange of pb bytes and RowData.
kafka sink supports custom offload
Background: Users want the flexibility to customize sending messages to specified kafka clusters and topics as needed in a SQL job.
Workaround:
Support user-defined udf, flexibly select the field in sql as the input parameter of udf, inside the udf, the user customizes the logic according to the business scenario, and returns the topic or broker list. The final sink is internally sent to the corresponding kafka cluster and topic.
kakfa sink dynamically loads udf internally, obtains the corresponding broker and topic in real time through the reflection mechanism, and supports the caching of the results.
Example:
5.3 cdc connector optimization
Multi-library multi-table scenarios are supported in sql scenarios
Background: The native flink cdc source can only synchronize the same DDL-defined tables in a single sql task, and if you need to synchronize heterogeneous DDL, you have to start multiple independent jobs for synchronization. This entails additional overhead for resources.
Workaround:
sql definition goes to DDL:
The native flink cdc source transforms and parses all monitored data according to the sql ddl definition during deserialization, and passes it downstream in the form of RowData. We have added a new format method in cdc-source: changelog bytes serialization method. The format no longer performs column conversion and parsing when deserializing the data, but converts all columns directly to changelog-json binary transmission, and the outer layer encapsulates the binary data directly into RowData and passes it downstream. Transparent to the downstream, downstream can directly through changelog-json deserialization when consuming kafka data. And because this change reduces the conversion and parsing of the column once, through the actual test, it is found that in addition to the automatic perception of the schema change, it can also increase the swallowing by 1 times. In the kafka sink connector, offloading according to db and table can support sending to different topics.
Expand metadata, add sequence:
Synchronizing incremental data into kafka, since kafka has multiple partitions, it will inevitably lead to message out-of-order problems. Therefore, it is necessary to provide a strictly monotonically increasing sequence within a single task for downstream consumers to sort and ensure the eventual consistency of the data. Finally, we extract the gtid in the binlog as the sequence id of the binlog message, expose the processing through metadata, write it to the header of the kafka record, and set the sequence to 0 for the full amount of data.
Off-stream scene partition submission support
Background: Since the entire CDC scheme has two separate jobs, upstream and downstream, and both are based on the event time to promote watermark to do partition submission, the downstream watermark’s propulsion may be affected by the normal data flow or upstream operation abnormality two reasons, if the correct judgment?
Workaround:
Define a new type of record HeartbeatRecord within the cdc source connector, which is the current time. When a table data is found to stop being sent, periodic mock heartbeat data is sent. In the case of normal outflow, downstream operations can normally advance the watermark according to the heartbeat information, and this information can be filtered and discarded.
Example of the final cdc connector sql:
06 Architecture stability optimization
In order to ensure the stable and efficient operation of streaming, we have made some optimizations in the following aspects, which are introduced separately:
6.1 Pipeline hotspot optimization
In the process of normal operation, the job often encounters local hotspot problems, such as kafka/hdfs io hotspot leads to a decrease in the consumption speed of local parallelism or write is blocked, and the yarn queue machine load is uneven, resulting in the partial parallelism of the job, although the reasons are various, but in essence, one of the commonalities of these problems is due to local data delay caused by local hotspots. In view of this problem, we optimize from the two dimensions of local traffic scheduling and global traffic scheduling.
Local traffic scheduling
The optimization idea of local traffic scheduling is to redistribute traffic between partitions within a single producer and task. Optimization is currently done at two points:
bsql Task managerInternal Subtask Upstream and Downstream Communication Optimization:
The integration job does not have the need for keyby, based on the Flink Credit-based Flow Control counter-pressure mechanism, you can determine the processing load of the downstream task through the Backlog Size, then we can modify the way Round-robin sends to select the lower load of the downstream Channel according to the Backlog Size information of the Channel. Note: This strategy is only effective if there is a rebalance/rescale between the source and sink ends. There is some serialization overhead, but testing is acceptable.
kafka producer partition automatic reject mechanism:
When the kafka producer sends a data callback exception (most of which is timeout) exceeds a certain threshold, the corresponding tp will be culled from the available partition list, and subsequent records will no longer be sent to the rejected tp. At the same time, the rejected tp will be followed by a recovery test, and if the data can be sent normally, it will be re-placed into the available partition list. This mechanism is currently implemented in both the flink kafka sink connector and the standard kafka client.
Global traffic scheduling
The optimization idea of global traffic scheduling is the traffic allocation between the entire transmission link level, at present, we will link the producer (lancer-gateway) and the consumer (flink sql kafka source), when the consumer has a tp consumption lag, through the registration blacklist (lag partition) to the zookeeper, the upstream producer perceives the blacklist, stops sending data to the high lag partition.
Flink kafka source is based on the flink AggregateFunction mechanism, kafka source subtask is reported to the job manager, and the job manager registers the blacklist to the zookeeper based on the global lag judgment
Blacklist judgment logic: When a single tp lag > min (global lag average, global lag median) * multiples & single tp lag is greater than the lag absolute value, where “single tp lag is greater than the absolute value of lag” is to circumvent this mechanism is too sensitive, “single tp lag > min (global lag average, global lag median) * multiple” is used to filter out the head of the lag tp. In order to prevent the blacklist from being too large, the upper limit of the number of tps removed from the blacklist must not be greater than a certain proportion of the total tp number.
Local traffic scheduling and global traffic scheduling have certain complementarities in the optimization effect of pipeline hotspots, but they also have their own advantages.
6.2 Quality monitoring of full-link buried points
Data quality is an important part, usually data quality includes completeness, timeliness, accuracy, consistency, uniqueness and other aspects, for data transmission scenarios, in person we focus on completeness and timeliness of the two aspects
The overall quality scheme roughly includes two general directions of monitoring data acquisition and rule configuration, and the overall architecture is as follows:
Monitor data acquisition
We developed our own trace system: on a logid basis, we monitor the buried points at each level of the data processing process
Each layer contains three aspects: receiving, sending, and internal errors. All buried point data is window-aligned with data creation time (ctime) and time-consuming processing between and within layers is counted by updating utime.
By monitoring the buried point, the data processing time, integrity, and error number can be calculated in real time: end-to-end, inter-level, and intra-layer.
The current solution defect: flink sql hangs up and recovers from ck, monitoring data can not guarantee power, and further improvement is needed in the future.
Monitor alarm rules
We have graded the data flow, each level specifies a different level of protection (SLA), SLA line breaking, alarm notification oncall student processing.
Delay Archive Alarm: The hdfs partition commits a delay to trigger an alarm.
Real-time integrity monitoring: Based on trace data, real-time monitoring of end-to-end integrity, number of received/landed items
Offline data integrity: After the hdfs partition is ready, the dqc rule is triggered to run, and the number of received (trace data) / landed data (hive query number) is compared.
Transmission delay monitoring: Based on trace data, calculate the quantile of end-to-end data transmission delay.
DQC Blocking: After an offline data integrity exception occurs, the scheduling of downstream jobs is blocked.
6.3 Kafka synchronous interrupt repeat optimization
Compared with the flume scheme in the 2.0 solution, an obvious change in the implementation scheme of kafka to kafka based on flink sql is that the restart of the job, failure recovery will lead to the overall interruption and a certain proportion of data duplication (recovery from checkpoint), so how to reduce the user’s perception of such problems is crucial.
First of all, sort out the possible causes of the problem: 1) job upgrade and restart 2) task manager failure 3) job manager failure 4) checkpoint continuous failure, and according to the overall submission process of flink job, the key link affecting the job recovery speed is the application of resources. According to the above analysis and targeted tests, the following optimization methods are used for the interruption of the kafka synchronization scenario:
Checkpoint interval is set to 10s: reduce the proportion of data duplicates caused by recovery from checkpoint
Submit jobs based on session mode: There is no need to repeatedly request resources for job restart
jobmanager.execution.failover-strategy=region, after a single tm hangs up, only the corresponding region is restored, without restoring the entire job. The integration job DAG is relatively simple, and it can avoid the occurrence of rebalance as much as possible and reduce the proportion of recovery.
Use small resource granularity task manager (2core cpu, 8GB memory, 2 slot): Under the same resource scale, the number of tm becomes more, and the impact of single tm hanging down becomes significantly lower.
Redundant task manager for high-quality jobs: Redundant one tm, when a single tm hangs up, the traffic is almost unaffected
Implement job manager ha based on zookeeper: After opening jm ha, jm hangs up the task without interruption
For the scenario of continuous failure of checkpoint, we introduce regional checkpoint, with region (rather than the entire topology) as the unit of checkpoint management, to prevent the failure of individual tasks from causing the failure of the entire job, which can effectively prevent the amount of data that needs to be traced back in the case of continuous failure of individual tasks, and reduce cluster fluctuations (network, HDFS). IO, etc.) on checkpoint
After the above optimization, after testing a (50core, 400GB memory, 50 slot) scale operation, the optimization effect is as follows:
6.4 Kafka traffic dynamic failover capability
In order to ensure timely data reporting, Lancer has a high dependence on the transmission success rate of kafka in the data buffer layer, and often encounters cases such as peak periods or kafka write bottlenecks caused by traffic jitter. Referring to the Netflix Hystrix fusing principle, we implemented a dynamic kafka failover mechanism at the gateway layer: the gateway can calculate the fuse rate based on real-time data transmission, dynamically adjusting the traffic between normal kafka and failover kafka based on the fuse rate.
Calculate the fuse ratio based on the sliding time window: the size of the sliding window is 10, and the number of successes and failures within 1s is counted in each window.
Fuse Status: Off/On/Half Open, Fuse Rate = fail_total/sum_total , In order to avoid extreme cases the flow rate is fully cut to failover, the fuse rate needs to have an upper limit configuration. Downgrade strategy after fuse: Normal kafka tries to cut failover after fuse, failover kafka cuts back to normal if it also blows
Judgment logic:
6.5 Full-link flow control, back-voltage, and downgrade
In the whole process from the end report to the data landing, in order to ensure stability and controllability, in addition to the above means, we have also introduced the overall flow control, reverse pressure, downgrade and other technical means, the following comprehensive introduction.
From the back to the front, it is divided into several links:
1. Data distribution layer:
If there is a consumption delay, the data is pushed back to the data buffer layer kafka
The traffic balance between subtasks is done internally by backlog backpressure in a single job
2. Data Gateway Layer:
If the writing kafka is delayed, the flow control code (429) is directly returned to the data reporting end
The local tp processing delay is adapted between the data gateway layer and the data distribution layer through kafka tp-level flow scheduling.
3. Data reporting layer:
Flow control return for data gateway adaptation: Do a fallback retry
Stacking of data based on local disks
Configure dynamic push to take effect active sampling/degrading stacking
6.6 Quality verification during the development phase
In order to ensure the correctness and stability of the overall service during the development phase, we designed a complete testing framework during the development phase.
Before the new version goes online, we will run the old and new job links at the same time, drop the data into two hive tables respectively, and perform md5 verification of the number and content of the whole partition, and the verification results are issued in the form of hourly level/day level reports. This testing framework guarantees end-to-end correctness during version iteration.
At the same time, in order to ensure the accuracy of the data in abnormal extreme cases, we have also introduced chaos testing and actively injected some exceptions. Exceptions include: job manager hangs, taskmanager hangs, job random restarts, local hot spots, dirty data, and so on.
07 Future outlook
The link architecture is upgraded to access the company-level data gateway (Databus), the architecture is unified and can cover more data reporting scenarios.
Cloud-native, embracing K8S, user-oriented quota management, and automatic resource AutoScale.
Embrace batch flow integration, strengthen incremental integration, cover offline batch integration scenarios, and create a unified integration framework based on Flink.
—— Activity Recommendation——
References:
Performance tuning for high-performance Java computing services
Uncover the Rust in the eyes of the front end!
Visualization of the practice of service orchestration in financial APP
HttpClient is optimized for high concurrency practices in vivo in-house browsers
B station cloud native mixed part technical practice
This article is reproduced by the highly available architecture. Technical originality and architecture practice articles, welcome to submit through the official account menu “Contact Us”.