First, the analysis of the expansion plan
At the beginning of the project, we deployed three databases A, B, and C, which were the size of the database to meet our business needs. To distribute the data evenly, we use uid%3 for modulo sharding at the Service layer to distribute the data evenly across three databases.
As shown in the figure:
Later, with the increase of the number of users, the data information generated by the user is continuously added to the database, and finally the optimal storage capacity of the database is reached. If you continue to add data to the database at this time, it will cause basic operations such as CRUD of the database to slow down, which will affect the response speed of the entire service.
At this time, we need to add new nodes to expand the database horizontally, so after adding the new database D, the size of the database changes from the original 3 to 4.
As shown in the figure:
At this time, due to the change of sharding rules (uid%3 becomes uid%4), most of the data cannot be fathomed, and it needs to be re-allocated and a large amount of data migration processing is required. For example, if the previous uid is uid=3 modulo 3%3=0, it is allocated on the A library, and after the new addition of the D library, uid=3 modulo 3%4=3, is allocated on the D library.
Add a node, about 90% of the data needs to be migrated, which will face a lot of data pressure and cause great instability to the service.
2. Downtime program
Publish an announcement
In order to re-split the data, we need to notify the user in advance before stopping the service, for example: our service will be upgraded at yyyy-MM-dd, please understand the inconvenience caused to you.
Stop the service
Close Service.
Offline data migration (splitting, reallocating data)
The data in the old library is split and redistributed according to the service layer’s algorithm.
Data validation
Develop a custom program to verify and compare the data in the old library and the new library.
Change the configuration
Modify the configuration algorithm of the service layer, that is, change the original uid%3 to uid%4.
Restore service
Restart the Service service.
Roll back the plan
For each of the above steps, there must be a data rollback plan, once a link (such as: data migration, recovery service, etc.) execution fails, immediately roll back and start again.
After stopping the service, the migration work can be guaranteed to proceed normally, but the service stops, hurts the user experience, and causes time pressure, and the migration must be completed within the specified time.
3. Stop writing the plan
Supports read/write splitting
The database supports read-write splitting, before the expansion, each database provides read-write function, data redistribution process, set each database to read-only state, turn off the write function.
Upgrade announcement
In order to re-split the data, we need to notify the user in advance before stopping the writing, for example: our service will be upgraded at yyyy-MM-dd, please understand the inconvenience caused to you.
Interrupt write operations, isolate write data sources (or intercept returns a unified prompt)
In the service layer, all write requests are intercepted and prompt information is uniformly returned, such as: the service is being upgraded, and only the read service is provided to the outside world.
Synchronous processing of data
The data in the old library is redistributed and migrated (copied) according to the service layer’s algorithm.
Data validation
Develop a custom program to back up the data in the old library, use the backed up data and the reallocated data for verification and comparison.
Change the configuration
Through the configuration center, modify the configuration algorithm of the service layer, that is, change the original uid%3 to uid%4, which does not require the service to be restarted.
Resume write operations
Set the database recovery read and write function to remove the interception prompt of the service layer.
Data sanitization
Use the delete statement to delete redundant data.
Roll back the plan
For each of the above steps, there must be a data rollback plan, once a link (such as: data migration, etc.) fails, immediately roll back and start again.
Disadvantages: The data replication process requires a lot of time, the write-down time is too long, and the data needs to be copied first, and then the redundant data is cleaned.
4. Log scheme
The core is to synchronize the migration of the database through logs, and the main operation steps are as follows:
1) Before data migration, the business application accesses the old database node
2) Logging
Before upgrading, record the log of “data modification on the old database” (here the modification includes addition, deletion, and change), this log does not need to record detailed data information, the main record:
Modified libraries;
Modified tables;
The modified unique primary key;
Modify the action type.
Log recording does not have to pay attention to what information is added, modify the data format, only need to record the above data information, so that the log format is fixed, so as to ensure the universality of the program.
Service upgrade logging is less risky:
Write and modify the interface is a minority, and the change point is small;
The upgrade only adds some logs, is implemented asynchronously, and does not have much impact on business functions.
3) Data migration
Develop a custom data migration tool to migrate data from the old library to the new library.
The entire process still uses the old library for external services.
Data synchronization tools are not complex to implement.
Only the old library is read, and if there is a problem with synchronization, the new library can be rolled back.
Migration can be performed at a limited rate or in batches without time pressure.
After the data migration is complete, you cannot switch to the new library to provide services.
Because the old library still provides services online, the data in the library will change at any time, but the data of these changes is not synchronized to the new library, and the data of the old library and the new library are inconsistent, so it cannot be switched directly, and the data needs to be synchronized completely.
4) Incremental log migration
Develop a log migration tool to level the difference data in the process of migrating the data above, and the processing steps:
Read the log log to get the specific library, table, and primary key that have changed;
Read out the primary key record in the old library;
Based on the primary key ID, replace the records in the new library.
This ensures maximum data consistency. Risk analysis:
The whole process, still the old library to provide services online;
The log migration tool implements a low degree of complexity;
Any time the problem is found, it can be re-established and there is sufficient room for fault tolerance;
The processing log can be replayed at a limited speed without time pressure due to the impact on the line.
However, after the log delta synchronization is complete, you cannot switch to the new database.
Because during the incremental synchronization of logs, there may be data changes in the old library, resulting in data inconsistencies, so it is necessary to further read the logs and level the data records; The incremental synchronization process of the log may produce new data at any time, and the data of the new library and the old library will also be an infinite approximation process.
5) Data verification
Prepare the data validation tools to compare the data in the old and new libraries until the data is exactly the same.
6) Switch to the new library
After the data comparison is completed, the traffic transfer is switched to the new library, and the new library provides services to complete the migration.
However, in the limit case, even through the above data verification processing, there may be 99.99% data consistency, can not guarantee complete consistency, this time you can do a readonly read-only function in the old library, or downgrade the traffic mask, wait for the log incremental synchronization tool to completely level out, and then switch the new library.
At this point, the migration and expansion of the log scheme is completed, and the entire process can continue to provide services online, which will only affect the availability of services for a short time.
The disadvantage of this scheme is that the operation is cumbersome, it needs to adapt to multiple synchronous processing tools, the cost is high, and it is necessary to formulate the synchronous processing of personalized services, which is not universal and takes a long time cycle.
5. Double-write scheme (small and medium-sized data)
The double-write scheme can be implemented through canal or mq.
Add new libraries, increase the corresponding number according to the existing nodes.
Data migration: to avoid incremental impact, first disconnect the master and slave, then import (takes a long time), synchronize and do the verification
Incremental synchronization: Enable the Canal synchronization service, listen to the slave node database, and then turn on master-slave synchronization, and the data received by the slave node will be passed to the new DB node through the Canal service.
Switch to new library: Switch access traffic to a new service via Nginx.
Fix switching exception data: During the switching process, if Canal is not synchronized, but has switched to the new library request (such as placing an order, modifying funds, but not yet synchronized), you can read the detection exception log through the customized program, do automatic repair or manual processing.
In view of this situation, it is best to stop the Internet access in the early morning when the number of users is small, and switch to reduce the generation of abnormal data.
Data verification: In order to ensure the complete consistency of data, it is necessary to check the quantitative integrity of the data.
6. Smooth 2N scheme (big data volume)
1) Online database, in order to ensure its high availability, generally each master library will be configured with a slave library, the master library is responsible for reading and writing, and the slave library is responsible for reading. As shown in the following figure, A, B is the main library, and A0 and B0 are from the library.
2) When it is necessary to expand the capacity, we upgrade A0 and B0 to the new master database node, so that the 2 sub-databases become 4 sub-databases. At the same time, in the upper layer of the shard configuration, do a good job of mapping, the rules are as follows:
Assign the data of uid%4=0 and uid%4=2 to the A and A0 main libraries, respectively.
Assign data from uid%4=1 and uid%4=3 to the main libraries of B and B0.
3) Because the data of A and A0 libraries are the same, and the data of B and B0 are the same, there is no need to do data migration at this time. Only one need to adjust and change the shard configuration, and update it through the configuration center without restarting.
Since the data of uid%2 was previously allocated in 2 libraries, it needs to be distributed into 4 libraries after expansion, but because the old data still exists (node of uid%4=0, and half of the data of uid%4=2), it is necessary to clean up the redundant data.
This cleanup, which does not affect the consistency of online data, can be carried out anytime, anywhere.
4) After the processing is completed, in order to ensure the high availability of data, as well as the next step of expansion requirements in the future.
You can reassign a slave library to an existing master library.
Second, smooth the practice of 2N expansion scheme
1. Realize the dynamic expansion of application service levels
Deployment architecture before expansion:
1) MariaDB service installation
(1) Switch Alibaba Cloud Image Service (YUM installation is too slow to switch)
(2) Configure the YUM source
Add the following:
(3) Perform the installation
(4) If it has been installed before, it needs to be removed first (if there is no previous installation, you can ignore this step)
Stop the Mariadb service
Uninstall the Mariadb service
Delete data and configurations
(5) After starting MariaDB, execute the Security Configuration Wizard command to improve the security of the database according to the Security Configuration Wizard
(6) Enable the user’s remote connection permission
Enable remote connection permission for connecting user root;
Enter the MySQL service and do the following:
2) MariaDB dual master synchronization
(1) Add configuration in Server1
Add the following configuration in /etc/my.cnf:
Note that Server1 self-increments to odd digits:
auto-increment-offset=1 primary key self-incrementing cardinality, starting at 1.
auto-increment-increment=2 primary key self-increment offset, 2 each time.
(2) Add configuration in Server2
Modify /etc/my.cnf:
Server2 self-increments to even digits:
auto-increment-offset=2 primary key self-incrementing cardinality, starting at 2.
auto-increment-increment=2 primary key self-increment offset, 2 each time.
After the configuration modification is complete, restart the database.
(3) Synchronize the authorization configuration
Create a replica in Server1 for master-slave synchronization users:
To query the log file and the offset, you need to use it when you enable synchronization:
Similarly, create a replica in Server2 for master-slave synchronization users:
Query log files with offset:
(4) Configure master-slave synchronization information
In Server1, execute:
In Server2, execute:
(5) Enable dual-master synchronization
In Server1 and Server2, respectively:
Query the synchronization information in Server1:
Query synchronization information in Server2:
Both Slave_IO_Running and Slave_SQL_Running are Yes, indicating that the dual-primary synchronization configuration was successful.
3) KeepLive is installed with a highly available configuration
(1) Install keepalived on both Server1 and Server2 nodes
(2) Turn off the firewall
(3) Set the host name
Server1 node:
Server2 node:
(4) Server1 node configuration
Create a mariadb.sh shutdown script
/usr/local/shell/mariadb.sh:
Add execution permissions:
(5) Server2 node configuration
Differences from Server1:
Note: Both nodes are set to BACKUP
If you do not want to compete for the VIP of the standby node after the restart, you can set this option
Note: This configuration can only be set on the backup host, and this host has a higher priority than the other one
(6) Verify high availability
Stop the master node MariaDB service and verify that it is automatically switched.
4) Build an application service project
(1) Introduction to ShardingJDBC
ShardingJDBC is a product under ShardingSphere positioned as a lightweight Java framework that provides additional services at the JDBC layer of Java. It uses a client-side direct-attached database, provides services in the form of jar packages without additional deployment and dependencies, and can be understood as an enhanced version of the JDBC driver, fully compatible with JDBC and various ORM frameworks.
Works with any JDBC-based ORM framework such as JPA, Hibernate, Mybatis, Spring JDBC Template or using JDBC directly.
Support for any third-party database connection pool, such as: DBCP, C3P0, BoneCP, Druid, HikariCP, etc.
Databases that implement any JDBC specification are supported, and MySQL, Oracle, SQLServer, PostgreSQL, and any database that follows the SQL92 standard are currently supported
(2) ShardingJDBC initialization process
1) Configure the ShardingRuleConfiguration object
2) Configure the TableRuleConfiguration object for table sharding rules and set the database and table sharding policies
3) Assemble the Rule object with the DataSource object through the Factory object
4) ShardingJDBC uses DataSource objects for database sharding
(3) ShardingJDBC integrated configuration
Maven depends
The rule is configured with application.yml
Create a DataSource
(4) Verify that the application service dynamically expands
Configure two data sources, pointing to Server1 and Server2
Sharding configures only one data source
Dynamically add another data source
(5) Precautions
Sharding JDBC, Mycat, Drds and other products are distributed database middleware, compared to direct data source operations, there will be some limitations, Sharding JDBC in use, pay attention to the following problems:
Limited support for subqueries
COMMITTING is not supported
OR, UNION, and UNION ALL are not supported
Special INSERTS are not supported
Each INSERT statement can only insert one piece of data, and statements with multiple rows of data after VALUES are not supported
DISTINCT aggregation is not supported
Dual virtual table queries are not supported
SELECT LAST_INSERT_ID(), auto-increment sequences are not supported
CASE WHEN is not supported
2. Realize the 2N expansion of database with second-level smoothing
Expansion deployment architecture:
1) Add database VIP
(1) On the Server2 node, increase the VIP
Modify /etc/keepalived/keepalived.conf
Note the configuration items:
2) Add dynamic data sources to App Service
Modify the App Service configuration and add a new data source to point to the newly set VIP: 192.168.116.151.
Through the application service interface, dynamic capacity expansion and adjustment.
3) Cancel the original dual master synchronization
mysql -uroot -p654321
(1) Enter Server1
(2) Enter Server2
(3) Verify whether the data is desynchronized through the application service interface
4) Install MariaDB Expansion Server
(1) Create two new virtual machines, Server3 and Server4
(2) Install the MariaDB service on both Server3 and Server4 nodes
Refer to the MariaDB service installation above.
(3) Configure Server3 and Server1 to implement a new dual-master synchronization
Server3 node, modify /etc/my.cnf
Restart the Server3 database
Create a replica for master-slave synchronization of users
In the Server1 node, perform a full backup of the data
View and record master status information
Copy the backup server1 .sql to the Server3 node via the scp command
Restore the data to the Server3 node
Configure master-slave synchronization information
According to the master status information above, perform in Server3:
Turn on master-slave synchronization
If a problem occurs, restore the master-slave synchronization information:
Check the synchronization status information
Configure synchronization of Server1 and Server3 nodes
To view the log information for Server3:
On the Server1 node, configure the synchronization information:
(4) Configure the dual primary synchronization of Server4 and Server2
Server4 node, modify /etc/my.cnf
Restart the Server4 database
Create a replica for master-slave synchronization of users
In the Server2 node, perform a full backup of the data
View and record master status information
Copy the backup server2 .sql to the Server4 node via the scp command
Restore the data to the Server4 node
Configure master-slave synchronization information
According to the master status information above, execute in Server4:
Turn on master-slave synchronization
Note that if a problem occurs, restore the master-slave synchronization information:
Check the synchronization status information
Configure synchronization between Server2 and Server4 nodes
To view the log information for Server4:
On the Server2 node, configure the synchronization information:
5) Increase the KeepLive service for high availability
Ensure that the newly added Server3 and Server4 nodes have the Keepalived service installed.
Modify the Server3 node configuration
Note that the IP address is configured correctly, and restart the service after the modification is complete.
Create a mariadb.sh shutdown script
/usr/local/shell/mariadb.sh:
Add execution permissions:
Modify the Server4 node configuration
Create a mariadb.sh shutdown script
/usr/local/shell/mariadb.sh:
To add execute permissions to all user groups:
Restart the Keepalived service after the modification
6) Clean the data and verify
Use the Application Service Dynamic Scaling API to make adjustments and verifications
Clean the data on the Server1 node
According to the modulo rules, keep the accountNo for even data
Clean the data on the Server2 node
According to the modulo rule, keep the accountNo as odd data
Third, keepalived high availability configuration compendium
In Server1 (192.168.116.140):
In Server2 (192.168.116.141):
In Server3 (192.168.116.142):
In Server4 (192.168.116.143):
1. Server1 and Server2 double main relationship
1)Server1: keepalived.conf
vi /etc/keepalived/keepalived.conf
2)Server2:keepalived.conf
vi /etc/keepalived/keepalived.conf
2. Added database VIP
1)Server2:keepalived.conf
vi /etc/keepalived/keepalived.conf
3. Server1 and Server3 double main relationship
1)Server3: keepalived.conf
vi /etc/keepalived/keepalived.conf
4. Server2 and Server4 double main relationship
1)Server4: keepalived.conf
vi /etc/keepalived/keepalived.conf