Abstract: This article is compiled from the topic “Pravega Flink Connector Past, Present and Future” shared by Dell Technologies software engineer Yumin Zhou at Flink Forward Asia 2020
Tips: Click “Read Original“ to view the original video~ Scan the code at the end of the article to follow the Pravega Maker Contest
Pravega and the Pravega connector Introduction to the Pravega
Review Flink 1.11 high-end feature sharing
Pravega Maker Contest Introduction
welcome to give Flink likes and sends star~
Pravega and the Pravega connector
The name of the Pravega project is derived from Sanskrit, meaning good speed. The project originated in 2016, based on the Apache V2 protocol on Github, and joined the CNCF family in November 2020 as a sandbox project of CNCF.
The Pravega project is a new enterprise-class storage system designed for large-scale data streaming scenarios that complements the shortcomings of traditional Message Queuing storage. It also adds enterprise-level features to maintain boundaryless, high-performance reads and writes for streams, such as auto scaling and tiered storage, which can help enterprise users reduce the cost of use and maintenance. At the same time, we also have many years of technical precipitation in the field of storage, and can rely on the company’s commercial storage products to provide customers with persistent storage.
The above architecture diagram describes Pravega’s typical read and write scenarios, which is used to introduce Pravega terminology to help you further understand the system architecture.
The middle part is a Pravega’s cluster, which is an abstract system of streams as a whole. Stream can be thought of as an analogy to Kafka’s topic. Similarly, Pravega’s Segment can be compared to Kafka’s Partition as a concept for data partitioning while providing dynamic scaling.
Segment stores binary data streams and, depending on the size of the traffic, merges or splits occur to free up or pool resources. At this time, the segment will perform a seal operation to prevent new data from being written, and then the newly created segment will receive the new data.
On the left side of the picture is the scene of data writing, which supports the writing of append only. Users can specify a routing key for each event to determine the ownership of a segment. This can be compared to Kafka Partitioner. The data on a single routing key is in order, ensuring that the order of reads is the same as that of writing.
On the right side of the picture is the scene of data reading, and multiple readers will have a Reader Group to control. The Reader Group controls the load balancing between readers to ensure that all segments are evenly distributed among readers. At the same time, it also provides a checkpoint mechanism to form a consistent stream slicing to ensure data failure recovery. For “read”, we support both batch and stream semantics. For streaming scenarios, we support tail read; For batch scenarios, we will consider high concurrency more to achieve high throughput.
The past of the Pravega Flink connector
Pravega Flink connector is the connector originally supported by Pravega, also because Pravega and Flink are very consistent in their design philosophy, both are stream-based batch streaming integrated systems, capable of forming a complete solution for storage and computing.
1. Pravega’s history
The connector has been a standalone Github project since 2017. In 2017, we developed on Flink version 1.3, when Flink PMC members, including Stephan Ewen, joined to build the most basic source/sink function, supporting the most basic reads and writes, and also included Pravega Checkpoint integration, which will be described later.
One of the most important highlights of 2018 was end-to-end support for precise, one-time semantics. There was a lot of discussion between the team and the Flink community, and Pravega first supported the feature of transactional write clients, and the community collaborated on this basis to implement a distributed checkpoint-based transaction function based on a set of two-phase commit semantics based on the sink function. Later, Flink further abstracted the two-phase commit API, the well-known TwoPhaseCommitSinkFunction interface, and was also adopted by the Kafka connector. The community has blogs dedicated to this interface, as well as end-to-end one-off semantics.
In 2019, more connectors have made some additions to other APIs, including batch reading and table API support.
main focus in 2020 is the integration of Flink 1.11, with a focus on FLIP-27 and the integration of new features in FLIP-95.
2. Checkpoint integration implementation
Taking Kafka as an example, let’s first look at how Kafka integrates with Flink Checkpoint.
The diagram above shows a typical Kafka “read” architecture. The Flink checkpoint implementation based on the Chandy-Lamport algorithm sends an RPC request to Task Executor when the Job master triggers a checkpoint. After receiving it, it merges the Kafka commit offset in its state store back into the Job Manager to form a Checkpoint Metadata.
After careful thinking, you can actually find some of these small problems:
> scaling and dynamic balancing support. When Partition is adjusted, or for Pravega, how to ensure the consistency of Merge when Partition is dynamically expanded and scaled in.
Another point is that the Task needs to maintain an offset message, and the entire design is coupled to Kafka’s internal abstract offset coupling.
Based on these shortcomings, Pravega has its own in-house Checkpoint mechanism, let’s see how it integrates with Flink’s Checkpoint.
Read Pravega Stream as well. Instead of sending RPC requests to Task Executor, the Job master sends a Checkpoint request to Pravega using the interface of ExternallyInducedSource.
At the same time, Pravega uses the StateSynchronizer component to synchronize and coordinate all readers, and sends Checkpoint events between all readers. When Task Executor reads the Checkpoint Event, the entire Pravega marks the completion of the Checkpoint, and the returned Pravega Checkpoint is saved to the Job master state to complete the Checkpoint.
This implementation is actually cleaner for Flink, because it has no implementation details for coupling external systems, and the entire Checkpoint work is left to Pravega.
Third, review the high-end features of Flink 1.11 to share
Flink 1.11 is an important release in 2020, and there are actually many challenges for the connector, mainly focusing on the implementation of two FLIPs: FLIP-27 and FLIP-95. The team also spent a lot of time integrating these two new features, and encountered some issues and challenges along the way. Let’s share with you how we step on pits and fill in pits. This article will use FLIP-95 as an example.
1. FLIP-95 INTEGRATION
FLIP-95 is a new Table API with similar motivations to FLIP-27, which is also to achieve a batch-flow integrated interface, while also better supporting CDC integration. For the lengthy configuration keys, the corresponding FLIP-122 is also proposed to simplify the configuration key setting.
1.1 Pravega’s old Table API
From the image above, you can see
Pravega’s Table API before Flink 1.10, and from the DDL of the table in the figure, you can see that
> Use update mode and append to distinguish between batch and stream, and the distinction between batch stream data is not intuitive.
In response to these problems, we also have a great incentive to implement such a new set of APIs, allowing users to better use table abstractions. The entire framework is shown in the figure, and with the help of the entire new framework, all configuration items are defined through the ConfigOption interface and managed centrally in the PravegaOptions class.
The configuration is also very long and complex, and the read Stream needs to be configured through a very long configuration key such as connector.reader.stream-info.0.
At the code level, there is also a lot of coupling with the DataStream API that is difficult to maintain.
■ 1.2 Pravega’s new Table API
The following figure shows the implementation of the latest Table API table building, which is very simplified compared with the previous one, and there are many optimizations in terms of functions, such as the configuration of enterprise-level security options, the specified function of multiple streams and starting streamcuts.
2. Flink-18641 Solution process experience sharing
Next, I would like to share here a small experience of Flink 1.11 integration, which is about an issue resolution process. Flink-18641 is an issue we encountered when integrating version 1.11.0. During the upgrade process, a CheckpointException is reported in the unit test. Next is our complete debug process.
We have also summarized the following considerations for our compatriots in the open source community:
will first debug the breakpoint step by step, by looking at the error log and analyzing the relevant Pravega and Flink source code to determine that it is Flink Some questions related to CheckpointCoordinator;
Then we also looked at some of the community’s commit records and found that after Flink 1.10, the CheckpointCoordinator threading model, from the original lock-controlled model to the Mailbox model. This model caused some of our original synchronous serialization to execute some logic, which was run in parallel, which led to the error;
Looking further at the pull request for this change, I also got in touch via email and some relevant Committers. Finally, I confirmed the problem on the dev mailing list and opened this JIRA ticket.
> Search the mailing lists and JIRA for anyone else who has asked a similar question;
In fact, as a developer in China, there are in addition to mailing lists and JIRA. We also have DingTalk groups and videos to contact a lot of Committers. In fact, it is more of a process of communication, doing open source is to communicate with the community more, which can promote the common growth between projects.
Completely describe the problem, provide detailed version information, error log and reproduction steps;
After receiving feedback from community members, further meetings can be held to discuss solutions;
English is required in non-Chinese environments.
Finally, I also hope that the community can pay more attention to the Pravega project and promote the common development of Pravega connector and Flink.
Registration for the Pravega Maker Contest by Dell Technologies will open on July 15 and officially open on August 13. The competition aims to bring together developers, designers and entrepreneurs to discuss with like-minded people how Pravega’s best features can be used to solve challenges and create valuable solutions.
👇 Scan the QR code to pay attention to the event details & competition judging criteria 👇
The bigger work in the future is Pravega schema registry integration. The Pravega schema registry provides the management of Pravega stream’s metadata, including the data schema and how it is serialized, and stores it. This feature accompanied Pravega 0.8 with the release of the project’s first open source version. We will implement Pravega’s Catalog based on this project in a later version 0.10, making the Flink table API easier to use.
Secondly, we also always pay attention to the new trends of the Flink community, and will actively integrate new versions and new functions of the community, including FLIP-143 and FLIP-129;
community is also gradually completing the transition to the new Test Framework based on docker containers, and we are also monitoring and integrating.
For more technical questions related to Flink, you can scan the code to join the community DingTalk communication group~
Follow “Flink Chinese Community” to get more technical dry goods ▼
poke me , view the original video~