2022 Episode 055

0. Introduction

1. Data technology challenges to security

2. Tokenization – Digital World Bank System

3. Introduction to tokenization solutions

3.1 What is tokenization

3.2 Tokenization basic design

3.3 Token generation logic

3.4 Logical architecture of tokenization scheme

3.5 Tokenized application panorama

4. Tokenized security implementation

4.1 Security essentials of tokenization

4.2 Security Risks and Security Design

5. Engineering practice experience

6. Unfinished Matters

In front of the enterprise is two ways, both through data technology innovation to ensure survival and development, but also to ensure the security of user data. In the choice and balance of these two roads, some enterprises have fallen, and some enterprises have survived and burst out with new vitality.

It can be seen from this that only by changing our thinking and having the courage to innovate can we turn crises into opportunities and develop in the long run. We must recognize the turning trend: the digital era has shifted from extensive and inefficient carbon growth in the first half to high-quality and efficient data carbon neutrality based on efficient data management and governance capabilities. To survive and stand out in this transformation, scientific and technological innovation is an important starting point, and the focus is to grasp the two core ideas:

In the digital application environment, data has the following characteristics:

The tokenization solution refers to the real-world banking system. Before the advent of the banking system, economic activity in the market was mainly cash transactions. The overexposure of cash has led to a large number of thefts and robberies, and although the dart board business is prevalent, only a few rich people can afford to hire them, so a large number of social assets are lost. The banking system came into being: after the user obtains cash, the first time to go to the bank to exchange cash for a deposit (equivalent substitution), and then the entire society is in circulation of this substitution – electronic cash, only in a few cases to exchange for cash. With the penetration of the banking system and the popularity of various online payment applications, this kind of cash use scenario is becoming less and less. If you want to grab money, you can only go to the bank, and the bank is through key protection.

As shown in Figure 3 above, by promoting tokenization, we can compress the actual accessible plaintext service to 2 digits, and reduce the exposure of data services to less than 1%.

Tokenization is a scheme to replace personal sensitive data through the insensitive data equivalent Token, which circulates in business systems to reduce data risk and meet privacy compliance. Tokenization is a kind of de-tokenization technology. It first appeared in the payment card industry (PCI) scenario to replace bank cards (PANs), and there is a trend to replace personal sensitive information (PII) in the general digital scenario.

1. Availability Implementation

2. Invisibility implementation

In order to meet the data protection capabilities in complex scenarios, tokenization solutions are required to meet several main architectural requirements:

Tokenization services need to meet the compatibility, security, and availability of all business scenarios, mainly through a variety of access integration solutions. and integrate the necessary security measures. Tokenized services are logically divided into access layer, service layer, and storage layer.

Component Description:

1. Online data sources

The main data source of sensitive data, once entering the company, needs to connect with the tokenization service API to convert it into a token and store it in the database. In certain scenarios, the data will also be connected to the data warehouse. The data source also has the role of sharing sensitive data to the downstream provider, which can be stored through APIs, MQ, or shared storage media such as S3.

2. Data source of the data warehouse

Directly poured into or from the line, sensitive data into the data warehouse, you need to enable tokenization tasks, the plaintext into tokens, and then provide to other downstream big data applications.

3. Tokenized Services

a) Tokenized online services provide plaintext exchange token services for online transactions and factual tasks through APIs.

a) Distribute the encryption key for tokenization and encrypt the plaintext into a ciphertext field.

a) Regular intermediate applications: services that can complete business functions based on tokens. Get the token from the data source and pass it downstream. 

1. Tokenization service itself security risks and control

(2) It can only be used within the runtime of the tokenized service; 

(3) Regular rotation of salt, it is recommended that daily or weekly, used salt for safe removal; 

b) Tokenization runtime security: Tokenized services use dedicated systems and have been specially hardened. 

c) Tokenized storage security: Considering the big data scenario and various storage requirements, tokenized storage itself does not store sensitive information, but only contains indexes, tokens, and ciphertexts. At the same time, tokenized storage requires strict access control. 

(1) The API requires reliable service authentication, and it is recommended that MTLS + Oauth2 tickets be enabled, and access log auditing is enabled;

(2) The token exchange plaintext logic only returns the ciphertext, and the request service uses KMS to decrypt it locally, and centrally controls the decryption permission; 

2. Secondary security generated by ecological upstream and downstream services and applications

Whether it is the data source or the downstream plaintext data consumer, because it has the tokenized interface access authorization, it is technically possible to remotely call the interface and traverse the full amount of token and plaintext mapping. Therefore, security measures need to extend to these systems and users to ensure that no data leakage is caused by these erroneous behaviors or program vulnerabilities.

b) Strictly prohibit any form of illegal plaintext, especially the forwarding and transfer of data in the mapping relationship between Token and plaintext;

c) It is forbidden to set up a proxy, and the data service subject must directly connect with the tokenized service;

d) All ecosystems must undergo a full security review, including subsequent changes. Ensure baseline compliance;

e) For all upstream and downstream services, into the monitoring system, including its storage, data interface, application code logic, bloodline;

Tokenized services are not complicated by design, and once implemented, they will completely change the organization’s data usage habits and fundamentally solve the contradiction between data use efficiency and security compliance.

However, its strong protective effect is based on the transformation of data use logic, breaking the old plaintext data usage habits, and the landing process faces great challenges, including neglect to maintain application code, redundant, chaotic historical data, complex and chaotic access logic, which will bring obstacles to system transformation. All businesses involving sensitive data are required to cooperate with the transformation, and projects of this scale must be coordinated from multiple aspects such as process planning, organizational support and technical support, and a large amount of experience has also accumulated in the process of Meituan’s promotion of the company’s transformation that can be referenced.

Subsequently, data security governance will continue to be extended.

At the data level, tokenization does not solve the problem of unstructured data such as pictures and videos. You may need to go directly through encryption. Tokenization does not solve the problem of data exchange across enterprise trust boundaries, which requires new technologies such as private computing and multi-party secure computing. The main object of tokenization is the structured PII information that exists in DB and Hive. Semi-structured data hidden in JSON and unstructured PII data in logs and files are not processed and need to be done with powerful data discovery and data governance tools.

———-  END  ———-

  | Cloud-native container security practices

  | Transformer’s practice in Meituan search sorting

Read more

Front end |  Algorithm | Backend | data