First, the analysis of the expansion plan

At the beginning of the project, we deployed three databases A, B, and C, which were the size of the database to meet our business needs. To distribute the data evenly, we use uid%3 for modulo sharding at the Service layer to distribute the data evenly across three databases.

As shown in the figure:

Later, with the increase of the number of users, the data information generated by the user is continuously added to the database, and finally the optimal storage capacity of the database is reached. If you continue to add data to the database at this time, it will cause basic operations such as CRUD of the database to slow down, which will affect the response speed of the entire service.

At this time, we need to add new nodes to expand the database horizontally, so after adding the new database D, the size of the database changes from the original 3 to 4.

As shown in the figure:

At this time, due to the change of sharding rules (uid%3 becomes uid%4), most of the data cannot be fathomed, and it needs to be re-allocated and a large amount of data migration processing is required. For example, if the previous uid is uid=3 modulo 3%3=0, it is allocated on the A library, and after the new addition of the D library, uid=3 modulo 3%4=3, is allocated on the D library.

Add a node, about 90% of the data needs to be migrated, which will face a lot of data pressure and cause great instability to the service.

2. Downtime program

Publish an announcement

In order to re-split the data, we need to notify the user in advance before stopping the service, for example: our service will be upgraded at yyyy-MM-dd, please understand the inconvenience caused to you.

Stop the service

Close Service.

Offline data migration (splitting, reallocating data)

The data in the old library is split and redistributed according to the service layer’s algorithm.

Data validation

Develop a custom program to verify and compare the data in the old library and the new library.

Change the configuration

Modify the configuration algorithm of the service layer, that is, change the original uid%3 to uid%4.

Restore service

Restart the Service service.

Roll back the plan

For each of the above steps, there must be a data rollback plan, once a link (such as: data migration, recovery service, etc.) execution fails, immediately roll back and start again.

After stopping the service, the migration work can be guaranteed to proceed normally, but the service stops, hurts the user experience, and causes time pressure, and the migration must be completed within the specified time.

3. Stop writing the plan

Supports read/write splitting

The database supports read-write splitting, before the expansion, each database provides read-write function, data redistribution process, set each database to read-only state, turn off the write function.

Upgrade announcement

In order to re-split the data, we need to notify the user in advance before stopping the writing, for example: our service will be upgraded at yyyy-MM-dd, please understand the inconvenience caused to you.

Interrupt write operations, isolate write data sources (or intercept returns a unified prompt)

In the service layer, all write requests are intercepted and prompt information is uniformly returned, such as: the service is being upgraded, and only the read service is provided to the outside world.

Synchronous processing of data

The data in the old library is redistributed and migrated (copied) according to the service layer’s algorithm.

Data validation

Develop a custom program to back up the data in the old library, use the backed up data and the reallocated data for verification and comparison.

Change the configuration

Through the configuration center, modify the configuration algorithm of the service layer, that is, change the original uid%3 to uid%4, which does not require the service to be restarted.

Resume write operations

Set the database recovery read and write function to remove the interception prompt of the service layer.

Data sanitization

Use the delete statement to delete redundant data.

Roll back the plan

For each of the above steps, there must be a data rollback plan, once a link (such as: data migration, etc.) fails, immediately roll back and start again.

Disadvantages: The data replication process requires a lot of time, the write-down time is too long, and the data needs to be copied first, and then the redundant data is cleaned.

4. Log scheme

The core is to synchronize the migration of the database through logs, and the main operation steps are as follows:

1) Before data migration, the business application accesses the old database node

2) Logging

Before upgrading, record the log of “data modification on the old database” (here the modification includes addition, deletion, and change), this log does not need to record detailed data information, the main record:

Modified libraries;

Modified tables;

The modified unique primary key;

Modify the action type.

Log recording does not have to pay attention to what information is added, modify the data format, only need to record the above data information, so that the log format is fixed, so as to ensure the universality of the program.

Service upgrade logging is less risky:

Write and modify the interface is a minority, and the change point is small;

The upgrade only adds some logs, is implemented asynchronously, and does not have much impact on business functions.

3) Data migration

Develop a custom data migration tool to migrate data from the old library to the new library.

The entire process still uses the old library for external services.

Data synchronization tools are not complex to implement.

Only the old library is read, and if there is a problem with synchronization, the new library can be rolled back.

Migration can be performed at a limited rate or in batches without time pressure.

After the data migration is complete, you cannot switch to the new library to provide services.

Because the old library still provides services online, the data in the library will change at any time, but the data of these changes is not synchronized to the new library, and the data of the old library and the new library are inconsistent, so it cannot be switched directly, and the data needs to be synchronized completely.

4) Incremental log migration

Develop a log migration tool to level the difference data in the process of migrating the data above, and the processing steps:

Read the log log to get the specific library, table, and primary key that have changed;

Read out the primary key record in the old library;

Based on the primary key ID, replace the records in the new library.

This ensures maximum data consistency. Risk analysis:

The whole process, still the old library to provide services online;

The log migration tool implements a low degree of complexity;

Any time the problem is found, it can be re-established and there is sufficient room for fault tolerance;

The processing log can be replayed at a limited speed without time pressure due to the impact on the line.

However, after the log delta synchronization is complete, you cannot switch to the new database.

Because during the incremental synchronization of logs, there may be data changes in the old library, resulting in data inconsistencies, so it is necessary to further read the logs and level the data records; The incremental synchronization process of the log may produce new data at any time, and the data of the new library and the old library will also be an infinite approximation process.

5) Data verification

Prepare the data validation tools to compare the data in the old and new libraries until the data is exactly the same.

6) Switch to the new library

After the data comparison is completed, the traffic transfer is switched to the new library, and the new library provides services to complete the migration.

However, in the limit case, even through the above data verification processing, there may be 99.99% data consistency, can not guarantee complete consistency, this time you can do a readonly read-only function in the old library, or downgrade the traffic mask, wait for the log incremental synchronization tool to completely level out, and then switch the new library.

At this point, the migration and expansion of the log scheme is completed, and the entire process can continue to provide services online, which will only affect the availability of services for a short time.

The disadvantage of this scheme is that the operation is cumbersome, it needs to adapt to multiple synchronous processing tools, the cost is high, and it is necessary to formulate the synchronous processing of personalized services, which is not universal and takes a long time cycle.

5. Double-write scheme (small and medium-sized data)

The double-write scheme can be implemented through canal or mq.

Add new libraries, increase the corresponding number according to the existing nodes.

Data migration: to avoid incremental impact, first disconnect the master and slave, then import (takes a long time), synchronize and do the verification

Incremental synchronization: Enable the Canal synchronization service, listen to the slave node database, and then turn on master-slave synchronization, and the data received by the slave node will be passed to the new DB node through the Canal service.

Switch to new library: Switch access traffic to a new service via Nginx.

Fix switching exception data: During the switching process, if Canal is not synchronized, but has switched to the new library request (such as placing an order, modifying funds, but not yet synchronized), you can read the detection exception log through the customized program, do automatic repair or manual processing.

In view of this situation, it is best to stop the Internet access in the early morning when the number of users is small, and switch to reduce the generation of abnormal data.

Data verification: In order to ensure the complete consistency of data, it is necessary to check the quantitative integrity of the data.

6. Smooth 2N scheme (big data volume)

1) Online database, in order to ensure its high availability, generally each master library will be configured with a slave library, the master library is responsible for reading and writing, and the slave library is responsible for reading. As shown in the following figure, A, B is the main library, and A0 and B0 are from the library.

2) When it is necessary to expand the capacity, we upgrade A0 and B0 to the new master database node, so that the 2 sub-databases become 4 sub-databases. At the same time, in the upper layer of the shard configuration, do a good job of mapping, the rules are as follows:

Assign the data of uid%4=0 and uid%4=2 to the A and A0 main libraries, respectively.

Assign data from uid%4=1 and uid%4=3 to the main libraries of B and B0.

3) Because the data of A and A0 libraries are the same, and the data of B and B0 are the same, there is no need to do data migration at this time. Only one need to adjust and change the shard configuration, and update it through the configuration center without restarting.

Since the data of uid%2 was previously allocated in 2 libraries, it needs to be distributed into 4 libraries after expansion, but because the old data still exists (node of uid%4=0, and half of the data of uid%4=2), it is necessary to clean up the redundant data.

This cleanup, which does not affect the consistency of online data, can be carried out anytime, anywhere.

4) After the processing is completed, in order to ensure the high availability of data, as well as the next step of expansion requirements in the future.

You can reassign a slave library to an existing master library.

Second, smooth the practice of 2N expansion scheme

1. Realize the dynamic expansion of application service levels

Deployment architecture before expansion:

1) MariaDB service installation

(1) Switch Alibaba Cloud Image Service (YUM installation is too slow to switch)

(2) Configure the YUM source

Add the following:

(3) Perform the installation

(4) If it has been installed before, it needs to be removed first (if there is no previous installation, you can ignore this step)

Stop the Mariadb service

Uninstall the Mariadb service

Delete data and configurations

(5) After starting MariaDB, execute the Security Configuration Wizard command to improve the security of the database according to the Security Configuration Wizard

(6) Enable the user’s remote connection permission

Enable remote connection permission for connecting user root;

Enter the MySQL service and do the following:

2) MariaDB dual master synchronization

(1) Add configuration in Server1

Add the following configuration in /etc/my.cnf:

Note that Server1 self-increments to odd digits:

auto-increment-offset=1 primary key self-incrementing cardinality, starting at 1.

auto-increment-increment=2 primary key self-increment offset, 2 each time.

(2) Add configuration in Server2

Modify /etc/my.cnf:

Server2 self-increments to even digits:

auto-increment-offset=2 primary key self-incrementing cardinality, starting at 2.

auto-increment-increment=2 primary key self-increment offset, 2 each time.

After the configuration modification is complete, restart the database.

(3) Synchronize the authorization configuration

Create a replica in Server1 for master-slave synchronization users:

To query the log file and the offset, you need to use it when you enable synchronization:

Similarly, create a replica in Server2 for master-slave synchronization users:

Query log files with offset:

(4) Configure master-slave synchronization information

In Server1, execute:

In Server2, execute:

(5) Enable dual-master synchronization

In Server1 and Server2, respectively:

Query the synchronization information in Server1:

Query synchronization information in Server2:

Both Slave_IO_Running and Slave_SQL_Running are Yes, indicating that the dual-primary synchronization configuration was successful.

3) KeepLive is installed with a highly available configuration

(1) Install keepalived on both Server1 and Server2 nodes

(2) Turn off the firewall

(3) Set the host name

Server1 node:

Server2 node:

(4) Server1 node configuration

Create a mariadb.sh shutdown script

/usr/local/shell/mariadb.sh:

Add execution permissions:

(5) Server2 node configuration

Differences from Server1:

Note: Both nodes are set to BACKUP

If you do not want to compete for the VIP of the standby node after the restart, you can set this option

Note: This configuration can only be set on the backup host, and this host has a higher priority than the other one

(6) Verify high availability

Stop the master node MariaDB service and verify that it is automatically switched.

4) Build an application service project

(1) Introduction to ShardingJDBC

ShardingJDBC is a product under ShardingSphere positioned as a lightweight Java framework that provides additional services at the JDBC layer of Java. It uses a client-side direct-attached database, provides services in the form of jar packages without additional deployment and dependencies, and can be understood as an enhanced version of the JDBC driver, fully compatible with JDBC and various ORM frameworks.

Works with any JDBC-based ORM framework such as JPA, Hibernate, Mybatis, Spring JDBC Template or using JDBC directly.

Support for any third-party database connection pool, such as: DBCP, C3P0, BoneCP, Druid, HikariCP, etc.

Databases that implement any JDBC specification are supported, and MySQL, Oracle, SQLServer, PostgreSQL, and any database that follows the SQL92 standard are currently supported

(2) ShardingJDBC initialization process

1) Configure the ShardingRuleConfiguration object

2) Configure the TableRuleConfiguration object for table sharding rules and set the database and table sharding policies

3) Assemble the Rule object with the DataSource object through the Factory object

4) ShardingJDBC uses DataSource objects for database sharding

(3) ShardingJDBC integrated configuration

Maven depends

The rule is configured with application.yml

Create a DataSource

(4) Verify that the application service dynamically expands

Configure two data sources, pointing to Server1 and Server2

Sharding configures only one data source

Dynamically add another data source

(5) Precautions

Sharding JDBC, Mycat, Drds and other products are distributed database middleware, compared to direct data source operations, there will be some limitations, Sharding JDBC in use, pay attention to the following problems:

Limited support for subqueries

COMMITTING is not supported

OR, UNION, and UNION ALL are not supported

Special INSERTS are not supported

Each INSERT statement can only insert one piece of data, and statements with multiple rows of data after VALUES are not supported

DISTINCT aggregation is not supported

Dual virtual table queries are not supported

SELECT LAST_INSERT_ID(), auto-increment sequences are not supported

CASE WHEN is not supported

2. Realize the 2N expansion of database with second-level smoothing

Expansion deployment architecture:

1) Add database VIP

(1) On the Server2 node, increase the VIP

Modify /etc/keepalived/keepalived.conf

Note the configuration items:

2) Add dynamic data sources to App Service

Modify the App Service configuration and add a new data source to point to the newly set VIP: 192.168.116.151.

Through the application service interface, dynamic capacity expansion and adjustment.

3) Cancel the original dual master synchronization

mysql -uroot -p654321

(1) Enter Server1

(2) Enter Server2

(3) Verify whether the data is desynchronized through the application service interface

4) Install MariaDB Expansion Server

(1) Create two new virtual machines, Server3 and Server4

(2) Install the MariaDB service on both Server3 and Server4 nodes

Refer to the MariaDB service installation above.

(3) Configure Server3 and Server1 to implement a new dual-master synchronization

Server3 node, modify /etc/my.cnf

Restart the Server3 database

Create a replica for master-slave synchronization of users

In the Server1 node, perform a full backup of the data

View and record master status information

Copy the backup server1 .sql to the Server3 node via the scp command

Restore the data to the Server3 node

Configure master-slave synchronization information

According to the master status information above, perform in Server3:

Turn on master-slave synchronization

If a problem occurs, restore the master-slave synchronization information:

Check the synchronization status information

Configure synchronization of Server1 and Server3 nodes

To view the log information for Server3:

On the Server1 node, configure the synchronization information:

(4) Configure the dual primary synchronization of Server4 and Server2

Server4 node, modify /etc/my.cnf

Restart the Server4 database

Create a replica for master-slave synchronization of users

In the Server2 node, perform a full backup of the data

View and record master status information

Copy the backup server2 .sql to the Server4 node via the scp command

Restore the data to the Server4 node

Configure master-slave synchronization information

According to the master status information above, execute in Server4:

Turn on master-slave synchronization

Note that if a problem occurs, restore the master-slave synchronization information:

Check the synchronization status information

Configure synchronization between Server2 and Server4 nodes

To view the log information for Server4:

On the Server2 node, configure the synchronization information:

5) Increase the KeepLive service for high availability

Ensure that the newly added Server3 and Server4 nodes have the Keepalived service installed.

Modify the Server3 node configuration

Note that the IP address is configured correctly, and restart the service after the modification is complete.

Create a mariadb.sh shutdown script

/usr/local/shell/mariadb.sh:

Add execution permissions:

Modify the Server4 node configuration

Create a mariadb.sh shutdown script

/usr/local/shell/mariadb.sh:

To add execute permissions to all user groups:

Restart the Keepalived service after the modification

6) Clean the data and verify

Use the Application Service Dynamic Scaling API to make adjustments and verifications

Clean the data on the Server1 node

According to the modulo rules, keep the accountNo for even data

Clean the data on the Server2 node

According to the modulo rule, keep the accountNo as odd data

Third, keepalived high availability configuration compendium

In Server1 (192.168.116.140):

In Server2 (192.168.116.141):

In Server3 (192.168.116.142):

In Server4 (192.168.116.143):

1. Server1 and Server2 double main relationship

1)Server1: keepalived.conf

vi /etc/keepalived/keepalived.conf

2)Server2:keepalived.conf

vi /etc/keepalived/keepalived.conf

2. Added database VIP

1)Server2:keepalived.conf

vi /etc/keepalived/keepalived.conf

3. Server1 and Server3 double main relationship

1)Server3: keepalived.conf

vi /etc/keepalived/keepalived.conf

4. Server2 and Server4 double main relationship

1)Server4: keepalived.conf

vi /etc/keepalived/keepalived.conf