Click on the card above to enter the big data homepage in 3 minutes
Then click on the top right corner of “Set as Star”
Receive good articles faster than others
1. Pain point analysis
Mainly from the business, technology, product three perspectives:
The business analysis scenario indicators and dimensions are not clear;
Frequent demand changes and iterations, bloated data reports, and uneven data;
It costs more for users to analyze specific business problems to find data and check and confirm data.
Indicator definition, indicator naming confusion, indicators are not unique, indicator maintenance caliber is inconsistent;
Index production, duplicate construction; The cost of data exchange is high;
Index consumption, data export is not uniform, repeated output, output caliber is inconsistent;
Lack of system productization to support data flow from production to consumption without system product level opening;
2. Management objectives
Unified indicator and dimension management, indicator naming, calculation caliber, unique statistical source, dimension definition specification, dimension values consistent
Business Objectives: Unified data export and scenario-based coverage
Product Objectives Productization of index system management tools; The productization of the content of the indicator system supports decision-making, analysis, and operation, such as decision-making Polaris and intelligent operation analysis products
3. Model architecture
Line of business
Business segment definition principles: business logic level for abstraction, physical organizational structure level for subdivision, according to the actual business situation for the level of split refinement, hierarchical classification recommendations for up to three levels of splitting, level one subdivision can be determined by the company’s level unified specifications, level two and subsequent splits can be split according to the actual business of the business line.
For example, the two-wheeled vehicle and four-wheeler in the business logic level of Didi Chuxing belong to the abstract travel business sector (level one) in the travel field, and according to the physical organizational structure level, it is subdividing Puhui, online car-hailing, taxi, and hitchhiking (level two), and then according to actual business needs, it can be subdivided, online ride-hailing can be subdivided into solo rides and carpools, and Puhui can be subdivided into bicycles and enterprise levels.
Refers to a collection that abstracts business processes or dimensions for business analysis. Among them, the business process can be summarized as a non-split behavior event, and under the business process, indicators can be defined; A dimension is a measure of the environment, such as a passenger call list event, and the call ticket type is a dimension. In order to ensure the vitality of the entire system, the data domain needs to be abstracted and refined, and it needs to be maintained and updated for a long time, and the change needs to be carried out by the change process.
Refers to the company’s business activity events, such as call bills, payments are business processes. Among them, the business process cannot be split.
Used to clarify the time range or time point of the statistics, such as the last 30 days, natural weeks, deadlines, etc.
The decoration type
is an abstract division of modifiers. The modifier type is subordinate to a business domain, such as the access terminal type of the log domain covers modifiers such as APP and PC.
It refers to the abstraction of the business scenario qualification of indicators other than the statistical dimension, and the modifier belongs to a modifier type, such as the modifier APP and PC side under the access terminal type of the log domain.
Atomic metrics and measures have the same meaning, based on the behavior of a business event, are non-splittable indicators in the business definition, with a clear business meaning, such as the amount paid.
A dimension is the environment for a measure that reflects a class of attributes that are a class of properties that make up a dimension, also known as an entity object. A dimension belongs to a data domain, such as a geographic dimension (which includes countries, regions, provinces, cities, and so on), and a time dimension (which includes content at the level of year, quarter, month, week, day, etc.).
Dimension attributes belong to a dimension, such as the country name, country ID, and province name in the geographic dimension.
The derived indicator is 1 atomic indicator + multiple modifiers (optional) + time period, which is the circle of the statistical scope of the atomic indicator business. Derived indicators are divided into the following two types:
Transactional metrics: Metrics that measure business processes. For example, the amount of call orders, the amount of order payments, such indicators need to maintain atomic indicators and modifiers, and create derived indicators on this basis.
Derivative indicators are compounded on the basis of transactional indicators and stock indicators. There are mainly ratio, proportional and statistical means
It is mainly constructed by using dimensional modeling methods, and the basic business detail fact table mainly stores the collection of dimension attributes and measures/atomic indicators; The analysis business summary fact table is classified and stored according to the indicator category (deduplication indicator, non-deduplication indicator), the non-deduplication index summary fact table stores the statistical dimension collection, atomic indicator, or derived indicator, and the deduplication indicator summary fact table only stores the statistical label collection of the analytical entity.
The indicator system is mainly combined with the hierarchical architecture of the data warehouse model at the physical realization level of the data warehouse, and the indicator data of Didi is mainly stored in the DWM layer as the core management of the indicators.
Includes basic and technical information, which is maintained and managed by different roles.
The basic information corresponds to the business information of the dimension, which is maintained by the business manager, data product or BI analyst, mainly including the dimension name, business definition, and business classification.
The data information corresponding to the dimension of technical information is developed and maintained by the data, mainly including whether there is a dimension table (whether it is an enumerated dimension or an independent physical dimension table), whether it is a date dimension, the corresponding code English name and Chinese name, the corresponding name English name and the Chinese name. If the dimension has a dimension physical table, you need to bind it to the corresponding dimension physical table and set the fields corresponding to code and name. If the dimension is an enumeration dimension, you need to fill in the corresponding code and name. The unified management of dimensions is conducive to the standardization of data tables in the future, and it is also convenient for users to query and use.
Including basic information, technical information and derivative information, maintained and managed by different roles.
The basic information corresponds to the business information of the indicators, which is maintained by business managers, data products or BI analysts, mainly including attribution information (business sectors, data domains, business processes), basic information (indicator name, indicator English name, indicator definition, statistical algorithm description, indicator type (deduplication, non-deduplication)), business scenario information (analysis dimension, scenario description);
The physical model information corresponding to the technical information of the index is maintained by data research and development, mainly including the corresponding physical table and field information;
Derivative information corresponds to correlated derivative or derivative indicator information, correlated data applications and business scenario information, which is convenient for users to query which other indicators and data applications are used, and provides the ability to trace the data source of index lineage analysis.
Atomic metric definition attribution information + basic information + business scenario information derived indicator definition time period + modifier set + atomic metric decoration type mainly includes type description, statistical algorithm description, data source (optional)
5. The construction process of the index system
The modeling process mainly guides engineers to abstract and classify the indicators involved in the demand scenario from the business perspective, unify the business terminology, reduce communication costs, and avoid repeated construction of subsequent indicators.
The analysis data system is the physical collection of the summary fact tables in the model architecture, and the business logic level abstracts the index system according to the business analysis object or scenario. Didi Chuxing mainly abstracts the theme according to the analysis object, such as driver theme, safety theme, experience theme, city theme, etc. The classification of indicators is mainly based on the abstract classification of actual business processes, such as driver transaction indicators, driver registration indicators, driver growth indicators, etc. The basic data system is the physical collection of detailed fact tables and basic dimension tables in the model architecture, and the business logic level abstracts such as driver compliance and passenger registration according to actual business scenarios, and restores the core business processes of the business.
The development process is to guide engineers to carry out index system production, operation and maintenance and quality control from a technical perspective, and is also a bridge for communication and coordination between data products or data analysts and digital warehouse R&D.
Overview of the indicator system map
The indicator system map can also be called the data analysis map, which is mainly based on the actual business scenario abstraction business analysis entity, and integrates and sorts out the collection of business classification, analysis indicators and dimensions involved in the entity. Construction method: mainly through business thinking, user perspective to build, the business and data closely linked, the indicators structured classification organization.
It is convenient for users to quickly locate the required indicators and dimensions, and at the same time, through the business scenario precipitation index system, it can quickly reach user data appeals.
It is conducive to the design of subsequent index production models, the boundaries of data content, the iterative quantification of data system construction and the landing of data assets.
Indicator system map model
Example of indicator system graph
Productization of the indicator system
The product set involved in the indicator system is mainly based on its life cycle for corresponding construction, through the product tools to open up the data flow, to achieve the unification, automation, standardization of the indicator system management. Because the essential goal of the construction of the index system is to serve the business and realize the data-driven business value, the core principle of the construction is “light standard, heavy scene, from control to service”. Improve the efficiency of user data usage and accelerate business innovation iterations through the convergence of tools, products, technologies, and organizations.
Among them, the products that are strongly related to the methodology of the index system are the landing of the index dictionary tool, and the positioning and value of its products:
Support the indicator management specification from the method to the landing of the tool, automatically generate the normative indicators, solve the problem of confusing the name of the indicator, the index is not unique, and eliminate the ambiguity of the data
Unified external provision of standard indicator caliber and metadata information
Tool Design Process (Methodology – > Definition – > Production – > Consumption)
The overall introduction of the indicator system construction methodology & practice and the construction of tool products, the indicator dictionary and development tools have achieved process through, and the connection with data consumption products will provide data services through DataAPI in the future.
Everyone is very welcome to add my personal WeChat, and we will discuss the issues related to big data together in the group