Data Warehouse Modeling-Implementing OneData Empirical

Data Governance Problem

    data

  • islands: the data of various departments, products, and businesses are isolated from each other, and it is difficult to get through the common ID
  • Duplicate construction: Repeated development, calculation, and storage bring high data costs
  • Data ambiguity: The caliber of the indicator definition is inconsistent, resulting in calculation bias and difficulty in application

OneData SystemOneData is a methodology that Alibaba has accumulated in years of big data development and governance practice, including the three concepts of OneModel, OneService, and OneID.

OneModel unified data construction and management

refines the indicator positioning into: atomic indicators, time periods, modifiers (statistical granularity, business qualification, etc.), through these definitions, various derived indicators are designed; Based on data layering, dimension tables, detailed fact tables, and summary fact tables are designed.

OneService Unified Data Service

is based on the idea of reusing rather than copying data, and its capabilities include: thematic data services that use thematic logical tables to mask complex physical tables; General query + OLAP analysis + unified and diversified data service of online services; Cross-source data services that mask multiple heterogeneous data sources.

OneID

unified data extraction is based on unified entity recognition, connection and label production to achieve data integration, including: ID automatic identification and connection; behavioral elements and rules of conduct;

Label production.

Guidelines

  • First, when building a big data warehouse, it is necessary to conduct sufficient business research and demand analysis. This is the cornerstone of data warehouse construction, and whether business research and demand analysis are done sufficiently directly determines whether the data warehouse construction is successful.
  • Secondly, the

  • overall data architecture design is carried out, mainly to divide the data according to the data domain; According to the dimensional modeling theory, the bus matrix is constructed, and the business process and dimensions are abstracted.
  • Thirdly, the relevant indicator system is abstracted and sorted out by abstracting the report requirements, and the tool is used to complete the specification definition and model design of the indicator.
  • Finally, there is code development and O&M.

Whether the

implementation process

business research is sufficient will directly determine whether the data warehouse construction is successful.

There are two ways to conduct demand research

  1. based on communication with analysts and business operators (email, IM, offline) to understand the needs;
  2. After researching and analyzing the existing reports in the reporting system, it is clear what the data should be made of.

In many cases, it is the specific data requirements that drive the data warehouse team to understand the business data of the business system, and there is no strict sequence between the two.

Data domain division

  • data domain refers to the collection of business processes or dimensions that are oriented to business analysis.
  • The business process can be summarized as an inseparable behavioral event, such as order, payment, and refund.
  • In order to ensure the vitality of the entire system, the data domain needs to be abstracted and refined, and maintained and updated for a long time, but not easily changed.
  • When dividing data domains, it can not only cover all current business needs, but also be included in existing data domains or expand new data domains without impact when new services enter.

Build the bus matrix

After conducting sufficient business research and requirements research, it is necessary to build the bus matrix.

Two things need to be done

< ol class="list-paddingleft-2">

  • clarify what business processes are under each data domain;
  • Which dimensions the business process

  • is related to, and define the business process and dimensions under each data domain.
  • Summary

    OneData implementation process is a highly iterative and dynamic process, generally using a spiral implementation method. After the overall architecture design is complete, iterative model design and review based on data domains begins.

    In the process of model implementation such as architecture design, specification definition and model design, a review mechanism is introduced to ensure the correctness of the model implementation process.

    end

     


     

    public number (zhisheng ) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitor keywords such as to view more articles corresponding to keywords.

    like + Looking, less bugs 👇