Click on the blue letters above to follow us

Relational database itself is relatively easy to become a bottleneck in system performance, stand-alone storage capacity, number of connections, processing capacity, etc. are very limited, the “statefulness” of the database itself makes it not as easy to scale as the Web and application servers. Under the test of massive data and high concurrent access in the Internet industry, smart technicians have proposed database sub-table technology (also known as sharding and sharding in some places). At the same time, popular distributed system middleware (such as MongoDB, ElasticSearch, etc.) all support Sharding in a friendly way, and its principles and ideas are similar.

At present, for the optimization of massive data, its database sub-table is MySQL eternal topic, generally believe that MySQL is a simple database, after a certain amount of data processing query efficiency is reduced, if you need to continue to maintain high performance operation, you must be database or table. Regarding the amount of data is the limit of how large, this article will not discuss, students who study source code have confirmed that the problem of too large lock granularity inside MySQL or Innodb greatly limits MySQL’s ability to provide QPS or process large-scale data. At this point, the average user has to wait for the official continuous launch of the optimized version.

From the perspective of general O&M, under what circumstances do we need to consider database sharding and table sharding?

First of all, the database sharding table mentioned here refers to the physical splitting of database data into multiple instances or multiple machines, rather than slicing in situ similar to partition tables.

Principle zero: If you can not divide, you will not divide

Yes, MySQL is a relational database, and the relationships between database tables map business logic from a certain perspective. Any behavior of database and table separation will improve the complexity of business logic to some extent, and in addition to the storage and access of data hosted by the database, it is also one of the important tasks to assist the business to better implement the requirements and logic. Database shards and tables bring about multiple consequences such as data consolidation, separation of query or update conditions, separation of transactions, etc., and the complexity of business implementation often doubles or increases exponentially. Therefore, before sorting the database and sub-tables, don’t divide for the sake of partitioning, do other things that you can do, such as upgrading hardware, upgrading, upgrading the network, upgrading the database version, reading and writing splitting, load balancing, and so on. The premise of all database shards is that you have tried your best.

Principle 1: The amount of data is too large, and normal O&M affects normal business access

The operation and maintenance mentioned here, for example:

1) A backup of the database. If a single table or a single instance is too large, a large number of disk IO or network IO resources are required to make backups. For example, 1T of data, when the network transmission occupies 50MB, it takes 20,000 seconds to transmit it, and the maintenance risk in this whole process is higher than usual. Our approach at Qunar is to add a second network card to all database machines for backup, or SST, Group Communication, and other internal data transfers. 1T data backup, will also occupy a lot of disk IO, if it is SSD is fine, of course, here ignore some manufacturers of products in the centralized IO will have some BUG problems. If it is an ordinary physical disk, xtrabackup is executed without limiting the flow, and the instance is basically unavailable.

2) Modification of the data table. If a table is too large, when DDL is done on this table, MySQL will lock the whole table, which may be very long, and the business cannot access the table during this time, which has a great impact. The solution is similar to Tencent Game DBA’s own transformation of the online second table, but they can only add fields at present, which is still invalid for other DDLs; Or use pt-online-schema-change, of course, in the process of use, it needs to build triggers and shadow tables, but also takes a long, long time, all the time in the process of this operation, can be regarded as risk time. Splitting the table and reducing the total amount helps to improve this risk.

3) The entire table hotspot, data access and update frequently, often have locks waiting, you do not have the ability to modify the source code, reduce the granularity of the lock, then will only physically split the data in it, exchange space for time, disguised to reduce access pressure.

Principle two: The table is not well designed and needs to be split vertically for some fields

Here’s an example, if you have a user table, it might look like this when it was originally designed:

Imagine scenario one: your business wins the lottery and the number of users soars from 100w to 1 billion. In order to count active users, you record each person’s recent login time when he logs in. And the user is very active, constantly to update this login_time, engage in your table is constantly being updated, the pressure is very large. So, at this time, as long as you consider splitting it, from the perspective of the business, the best way is to split the last_login_time first, we call it user_time. In doing so, the code of the business can only be modified when this field is used. If you don’t do this and just slices the users table horizontally, then all the places that access the users table must be modified. Maybe you say, I have proxy, I can merge data dynamically. So far I’ve never seen anyone whose proxy doesn’t affect performance.

Scenario two: personal_info this field is not useful, you just let the user fill in some personal hobbies when registering, basically do not query. It didn’t matter if there was it in the beginning. But then I found two problems, first, this field takes up a lot of space, because it is text, there are many people who like to introduce themselves at length. What’s worse is that second, I don’t know which product manager on a whim, saying that personal information is allowed to be made public, so that everyone can better understand each other. Then, under the influence of everyone’s curiosity and voyeurism, access to this field has increased significantly. The database pressure can not resist instantly, at this time, we have to consider the vertical split of this table.

Principle Three: Some data tables have seen infinite growth

Good examples are given, various comments, messages, logging. This growth is not proportional to the population, but uncontrollable, such as the Weibo feed broadcast, I send a message, will spread to many, many people. Although the principal may only have one copy, it is not excluded that some indexes or routes have such storage requirements. At this time, increasing storage and improving the configuration of the machine has been pale and weak, and horizontal slicing is the best practice. There are many criteria for splitting, by user, by time, by use, not one by one example.

Principle Four: Security and usability considerations

This is easy to understand, eggs do not put in a basket, I do not want my database to have problems, but I hope that when there is a problem do not affect 100% of the users, the less the proportion of this impact, the better, then, horizontal slicing can solve this problem, the user, inventory, orders, etc. are originally with the unified resource slicing, each small database instance to bear a small part of the business, so that the overall availability will be improved. This is still more suitable for a business like Qunar, between people, between some inventory and inventory, the correlation is not too large, you can do some such cutting.

Principle 5: Business coupling considerations

This is a bit similar to the above, mainly at the level of business, our train ticket business and roast leg of lamb business is completely unrelated business, although the amount of data for each business may not be too large, put in a MySQL instance is completely no problem, but it is likely that the DBA or developer level of the roasted leg of lamb business is very poor, and it is impossible to give you some moths at every turn, directly hang up the database. At this time, although the personnel of the train ticket business were very skilled and worked hard, they were still spanked by the boss. The solution is simple: can’t afford it, can’t afford it.

Split vertically

There are two common types of vertical splitting: vertical database and vertical table splitting. Vertical table splitting is more common in daily development and design, and the popular saying is called “large table splitting small table”, and the split is based on “columns” (fields) in relational databases. Usually, there are many fields in a table, you can create a new “extended table”, and split the fields that are not frequently used or have a large length into the “extended table”, as shown in the following figure:

In the case of many fields, splitting is indeed easier to develop and maintain (I have seen a legacy system with a large table containing more than 100 columns). In a sense, it can also avoid the problem of “spreading” (MySQL, MSSQL underlying are stored through “data pages”, “spread” problems may cause additional performance overhead, not expanded here, interested friends can consult relevant materials for research).

Splitting fields is recommended to be done during the database design phase. If it is split in the development process, you need to rewrite the previous query statement, which will bring additional costs and risks, and caution is recommended.

Vertical database is based on the correlation of the data table in the database for splitting, for example: a database has both user data and order data, then vertical splitting can put user data into the user database, order data into the order library. Vertical table is a way to vertically split the data table, the common is to split a multi-field large table by common fields and non-use fields, the number of data records in each table is generally the same, but the fields are not the same, using the primary key association.

In addition, in today’s prevalence of “microservices” has become very popular, according to the business module to divide different databases, is also a vertical split. Instead of putting all the data tables in the same database as in the early days. As shown in the following figure:

Vertical splitting advantages:

You can make the row data smaller, and a block can hold more data, and the number of I/Os when querying is reduced (fewer blocks are read per query).

The purpose of maximizing the use of Cache can be achieved by putting together fields that do not change frequently when splitting vertically, and putting together fields that change frequently.

Simple data maintenance.

Disadvantages of vertical splitting:

The primary key is redundant and you need to manage the redundant columns.

The table join operation that causes the table join (which increases CPU overhead) can reduce database pressure by joining on the business server.

There is still the problem of too much data per table (horizontal splitting is required).

Transaction processing is complex.

Vertical split summary:

The “service-oriented” split operation at the system level can solve the coupling and performance bottlenecks at the business system level, which is conducive to the expansion and maintenance of the system. The splitting at the database level is also the same. Similar to the “governance” and “downgrade” mechanisms of services, we can also “hierarchically” manage, maintain, monitor, and scale different types of business types of data.

As we all know, the database is often the easiest to become the bottleneck of the application system, and the database itself is “stateful”, compared to the Web and application server, it is more difficult to achieve “scale-out”. In the case of high concurrency, vertical database can break through the bottleneck of IO, number of connections and stand-alone hardware resources to a certain extent, which is an important means to optimize the database architecture in large-scale distributed systems.

Then, many people do not fundamentally understand why they want to split, nor do they master the principles and techniques of splitting, but blindly imitate the practices of large manufacturers. Causes many problems (e.g. cross-library joins, distributed transactions, etc.) after splitting.

Split horizontally

Horizontal splitting is to store data shards through a certain strategy, divided into two parts: database table and database table, each piece of data will be scattered to different MySQL tables or libraries, to achieve distributed effect, can support a very large amount of data.

The database sub-table, just to solve the problem of a single table data is too large, because the table data is not distributed to different machines, so it does not have much effect on reducing the pressure of MySQL server, we still compete for IO, CPU, network on the same physical machine, this has to be solved by sub-database sub-table.

The most common way is to hash and split after modulo through fields such as primary key or time. This is shown in the following figure:

There are two kinds of current sub-tables: static sub-tables and dynamic sub-tables:

Static sub-table: estimate the amount that the table can achieve in advance, and then directly calculate the number of tables that need to be created according to how much data each table needs to store. For example, 100 million data per table 100W data then you have to build 100 tables, and then calculate each piece of data stored in that table through a certain hash algorithm. It’s actually a bit like using a partition table. One of the fatalities of the static sub-table is that when so many tables are not satisfied, the difficulty and cost of re-expansion will be very high.

Dynamic table splitting: It is also a table of big data quantities, which can avoid the sequelae caused by static table sharding. Of course, there is also a need to have something more in design (which is often acceptable to us).

In a sense, some systems use “separation of hot and cold data” (migrating some less used historical data to other databases). In terms of business functions, queries that usually only provide hot data by default are similar. In the scenario of high concurrency and massive data, database sub-table can effectively alleviate the performance bottleneck and pressure of single machine and single database, and break through the bottleneck of IO, number of connections, and hardware resources. Of course, the cost of the invested hardware will also be higher. At the same time, this also introduces some complex technical problems and challenges (e.g., complex queries across shards, cross-shard transactions, etc.).

Advantages of horizontal splitting:

There is no single-database big data and high concurrency performance bottlenecks.

There are fewer transformations on the application side.

Improves system stability and load capacity.

Horizontal split disadvantages:

Shard transaction consistency is difficult to resolve.

Cross-node Join has poor performance and complex logic.

Data is difficult to expand multiple times and the amount of maintenance is extremely large.

Problems and solutions brought about by vertical database triage:

Cross-library join issues

Before splitting, many of the lists and detail pages in the system can be done with sql join. After splitting, the database may be distributed on different instances and different hosts, and joins will become very troublesome. And based on architectural specifications, performance, security and other considerations, it is generally forbidden to join across libraries. So what to do? First of all, we must consider the design of the vertical sub-library, if you can adjust, then give priority to adjustment. If the situation cannot be adjusted, the following author will combine the actual experience of the past, summarize several common solutions and analyze their applicable scenarios.

Several solutions for cross-library Join

Global tables

The so-called global table is a number of tables that may be dependent on by all modules in the system. Compare it to what we understand as a “data dictionary”. To avoid cross-library joins, we can keep a copy of such tables in every other database. At the same time, this type of data is often rarely modified (or even almost nevertheless), so there is no need to worry too much about “consistency”.

Field redundancy

This is a typical anti-paradigm design, which is more common in the Internet industry, usually for performance to avoid join queries.

For example, a very simple scenario in the e-commerce business: while saving the “Seller Id” in the “order table”, the seller’s “Name” field is also redundant, so that when querying the order details, there is no need to query the “Seller User Table”.

Field redundancy can bring convenience and is a manifestation of “space for time”. However, its application scenarios are also relatively limited, which is more suitable for cases with fewer dependent fields. The most complex is the problem of data consistency, which is difficult to guarantee, and can be guaranteed with the help of triggers in the database or at the business code level. Of course, it is also necessary to look at the consistency requirements in light of actual business scenarios. Just like the example above, if the seller modifies the Name, does it need to be updated in the order information synchronously?

Data synchronization

The tab_a table in the Timer A database is tbl_b related to the B database, and the specified table can be synchronized at regular intervals. Of course, synchronization would have had an impact on the database, requiring a balance between performance impact and data timeliness. This avoids complex cross-database queries. I used to implement it through ETL tools in my projects.

System layer assembly

At the system level, by calling components or services of different modules, data is obtained and field assembly is performed. It’s easy to say, but it’s not that simple in practice, especially when there are problems with database design but can’t be easily adjusted. The situation is often complex.

Cross-database transactions (distributed transactions) issues

After splitting the database by business, the problem of “distributed transactions” is inevitable. To understand distributed transactions, you need to understand the “XA interface” and “two-phase commit”. It is worth mentioning that xa support in MySQL 5.5x and 5.6x is problematic and can lead to master-slave data inconsistencies. It wasn’t fixed until version 5.7x. Java applications can implement XA transactions (JTA in J2EE) using the Atomikos framework. Interested readers can refer to the Distributed Transaction Consistency Solution by themselves, link to

Distributed globally unique ID

In many small and medium-sized projects, we often use the database self-increment feature directly to generate the primary key ID, which is really simple. In the environment of database sharding and tables, the data is distributed on different shards, and it can no longer be directly generated with the help of the self-growth characteristics of the database, otherwise the primary keys of the data tables on different shards will be duplicated. A brief introduction to several ID generation algorithms that have been used and understood.

1. Twitter’s Snowflake (aka “Snowflake Algorithm”)

2. UUID/GUID (supported by both general applications and databases)

3. MongoDB ObjectID (UUID-like approach)

4. Ticket Server (database surviving, Flickr uses this way)

Among them, Twitter’s Snowflake algorithm is the most used in distributed systems projects in recent years, and no problems of duplication or concurrency have been found. The algorithm generates a 64-bit unique Id (consisting of a 41-bit timestamp + 10-bit custom machine code + a 13-bit cumulative counter). There is not much to introduce here, and interested readers can consult the relevant materials on their own.

How to select the shard field

Before starting a shard, we first determine the shard field (also known as the “shard key”). Many common examples and scenarios are split using ID or time fields. This is not absolute, my suggestion is to combine the actual business, through statistical analysis of the sql statements executed in the system, to select the most frequently used or the most important field in the table that needs to be sharded as the shard field.

Common sharding strategies are random sharding and continuous sharding, as shown in the following figure:

When you need to use shard fields for range lookup, continuous shards can quickly locate shards for efficient querying, and in most cases can effectively avoid the problem of cross-shard queries. Later, if you want to expand the entire sharded cluster, you only need to add nodes, and you do not need to migrate the data of other shards. However, continuous sharding may also have the problem of data hotspots, just like the example of sharding by time field in the figure, some nodes may be frequently queried, and hot data nodes become the bottleneck of the entire cluster. Some nodes may store historical data and rarely need to be queried.

Random sharding is not actually random, and it also follows certain rules. Usually, we use the hash modulo method to split the shard, so sometimes it is also called discrete sharding. The data of random sharding is relatively uniform, and it is not easy to have bottlenecks such as hot spots and concurrent access. However, later sharding clusters need to migrate old data when they are expanded. The use of the consistent hash algorithm can largely avoid this problem, so many middleware sharding clusters will use the consistent hash algorithm. Discrete shards are also prone to complex problems with cross-shard queries.

Data migration, capacity planning, capacity expansion, etc

Few projects will start to consider sharding design at an early stage, and will generally be prepared in advance when the rapid development of the business faces performance and storage bottlenecks. Therefore, it is inevitable that the problem of historical data migration needs to be considered. The general practice is to read out the historical data through the program, and then write the data to each shard node according to the specified sharding rules.

In addition, we need to calculate how many shards are needed based on the current data volume and QPS, etc., and calculate how many shards are needed (generally it is recommended that the amount of single-table data on a single shard should not exceed 1000W).

If you use random sharding, you need to consider the later expansion problem, which will be relatively troublesome. If range sharding is used, only nodes need to be added to automatically expand.

Sort pagination across shards

In general, pagination requires sorting by specified fields. When the sort field is a shard field, we can easily locate the specified shard through the shard rule, and when the sort field is not a shard field, the situation will become more complicated. For the accuracy of the final result, we need to sort and return the data in different shard nodes, summarize and sort the result sets returned by different shards, and finally return to the user. This is shown in the following figure:

The image above is just the simplest case (taking the first page of data) and doesn’t seem to have much of a performance impact. However, if you want to take out the data on page 10, the situation will become much more complicated, as shown in the following figure:

Some readers may not quite understand why it can’t be handled as simply as it is to get the first page of data (sorting takes out the first 10 and then merging and sorting). In fact, it is not difficult to understand, because the data in each shard node may be random, in order to sort accurately, the first N pages of data of all shard nodes must be sorted and merged, and finally the overall sorting must be performed. Obviously, such operations are more resource-intensive, and the more the user turns backwards, the worse the system performance will be.

Function handling across shards

When using functions such as Max, Min, Sum, Count, etc. for statistics and calculations, it is necessary to perform the corresponding function processing on each shard data source, and then reprocess each result set, and finally return the processing result. This is shown in the following figure:

Past Recommendations

Time series database InfluxDB engine analysis

Quintessence! Analyze 12 Zookeeper knowledge points in a simple way

Can’t tell the difference between ARM and X86 architecture, let’s look at the product form of the server

Get a comprehensive understanding of the OpenStack architecture

MAC authentication technology

An overview of access and authentication technologies

Poke here