When every digital enterprise encounters the concept of flow and batch integration, it will have some questions about this concept, what is flow and batch integration? Where does this concept come from? What benefits does this concept bring to users, developers, and businesses? Follow the blogger’s understanding and brains.


What exactly is the integration of flow and batch?

The source of the batch? Where does the stream come from?

Why do you need to do flow batch integration?

Starting from


current situation

of data development

, we explore the ideal flow and batch integration capability support and

finally !!! to the landing of data warehouses

The engine capability (hive, etc.)
of n years ago


very friendly to file

and batch data processing

supportData is mostly hourly, day-level delay

Conclusion: Batch is proposed from the perspective of batch storage and processing engine capability support

In recent years, engine capabilities (flink, etc.)

have gradually

achieved second- and minute-level delays


streaming data processing and fault tolerance to support better data

from the

user’s point of view

For the same indicators, there are offline and real-time ones, and the caliber cannot be unified in some scenarios!

The flow batch integration that bloggers understand is more from the perspective of platform capability support

, so here is the focus on the expectations on the engine + toolchain