Data Warehouse Modeling-Implementing OneData Empirical
Data Governance Problem
- data
-
islands: the data of various departments, products, and businesses are isolated from each other, and it is difficult to get through the common ID -
Duplicate construction: Repeated development, calculation, and storage bring high data costs -
Data ambiguity: The caliber of the indicator definition is inconsistent, resulting in calculation bias and difficulty in application
OneData SystemOneData is a methodology that Alibaba has accumulated in years of big data development and governance practice, including the three concepts of OneModel, OneService, and OneID.
OneModel unified data construction and management
refines the indicator positioning into: atomic indicators, time periods, modifiers (statistical granularity, business qualification, etc.), through these definitions, various derived indicators are designed; Based on data layering, dimension tables, detailed fact tables, and summary fact tables are designed.
OneService Unified Data Service
is based on the idea of reusing rather than copying data, and its capabilities include: thematic data services that use thematic logical tables to mask complex physical tables; General query + OLAP analysis + unified and diversified data service of online services; Cross-source data services that mask multiple heterogeneous data sources.
OneID
unified data extraction is based on unified entity recognition, connection and label production to achieve data integration, including: ID automatic identification and connection; behavioral elements and rules of conduct;
Label production.
Guidelines
-
First, when building a big data warehouse, it is necessary to conduct sufficient business research and demand analysis
. This is the cornerstone of data warehouse construction, and whether business research and demand analysis are done sufficiently directly determines whether the data warehouse construction is successful. -
overall data architecture design
is carried out, mainly to divide the data according to the data domain; According to the dimensional modeling theory, thebus matrix is constructed, and the business process and dimensions are abstracted
. -
Thirdly, the relevant indicator system is abstracted and sorted out by abstracting the report requirements
, and the tool is used to complete the specification definition and model design of the indicator. -
Finally, there is code development and O&M
.
Secondly, the
Whether the
implementation process
business research is sufficient will directly determine whether the data warehouse construction is successful.
There are two ways to conduct demand research
-
based on communication with analysts and business operators (email, IM, offline) to understand the needs; -
After researching and analyzing the existing reports in the reporting system, it is clear what the data should be made of.
In many cases, it is the specific data requirements that drive the data warehouse team to understand the business data of the business system, and there is no strict sequence between the two.
Data domain division
-
data domain refers to the collection of business processes or dimensions that are oriented to business analysis. -
The business process can be summarized as an inseparable behavioral event, such as order, payment, and refund. -
In order to ensure the vitality of the entire system, the data domain needs to be abstracted and refined, and maintained and updated for a long time, but not easily changed. -
When dividing data domains, it can not only cover all current business needs, but also be included in existing data domains or expand new data domains without impact when new services enter.
Build the bus matrix
After conducting sufficient business research and requirements research, it is necessary to build the bus matrix.
Two things need to be done
< ol class="list-paddingleft-2">
Which dimensions the business process
Summary
OneData implementation process is a highly iterative and dynamic process, generally using a spiral implementation method. After the overall architecture design is complete, iterative model design and review based on data domains begins.
In the process of model implementation such as architecture design, specification definition and model design, a review mechanism is introduced to ensure the correctness of the model implementation process.
end
public number (zhisheng ) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitor keywords such as to view more articles corresponding to keywords.
like + Looking, less bugs 👇