❝ When every digital enterprise encounters the concept of flow and batch integration, it will have some questions about this concept, what is flow and batch integration? Where does this concept come from? What benefits does this concept bring to users, developers, and businesses? Follow the blogger’s understanding and brains. ❞ Preface What exactly is the integration of flow and batch? The source of the batch? Where does the stream come from? Why do you need to do flow batch integration? Starting from the current situation of data development , we explore the ideal flow and batch integration capability support and finally !!! to the landing of data warehouses
The engine capability (hive, etc.)
of n years ago
is
very friendly to file
and batch data processing
supportData is mostly hourly, day-level delay
Conclusion: Batch is proposed from the perspective of batch storage and processing engine capability support
In recent years, engine capabilities (flink, etc.)
have gradually
achieved second- and minute-level delays
for
streaming data processing and fault tolerance to support better data
from the
user’s point of view
For the same indicators, there are offline and real-time ones, and the caliber cannot be unified in some scenarios!
The flow batch integration that bloggers understand is more from the perspective of platform capability support
, so here is the focus on the expectations on the engine + toolchain