Redis has high-performance data read and write functions, which are widely used in caching scenarios, one is to improve the performance of business systems, and the other is to resist high concurrent traffic requests for the database.
Using Redis as a caching component requires some of the following issues to be prevented, which could cause production accidents.
How to deal with Redis and MySQL data consistency issues?
Today, let’s explore how caching works and how to deal with cache coherency with you.
Before we get started, I think we need to agree on the following two points:
It is sufficient to ensure the eventual consistency of the database and the cache, and there is no need to pursue strong consistency.
1. What is database and cache coherency
Data consistency refers to:
There is no such data in the cache, and the value in the database = the latest value.
The back-pushing cache is inconsistent with the database:
The cache or database has old data, causing the thread to read the old data.
Why is there a data consistency issue?
When using Redis as a cache, when the data changes, we need to double write to ensure that the cache is consistent with the data in the database.
Database and cache, after all, is two sets of systems, if you want to ensure strong consistency, it is necessary to introduce distributed consistency protocols such as 2PC or Paxos, or distributed locks, etc., which is difficult to implement, and will definitely have an impact on performance.
If the consistency of the data is really required, is it really necessary to introduce caching?
2. Cache usage policy
When using caching, there are typically several cache usage strategies to improve system performance:
The so-called “bypass cache” means that the operations of reading the cache, reading the database and updating the cache are all done by the application system, and the most commonly used cache strategy for the business system.
The reading logic is as follows:
If the cache hits, it is returned directly.
The timing diagram is as follows:
Simple implementation and performance gains.
The pseudocode implemented is as follows:
Because data is loaded into the cache only after a cache miss, the response time for an initial invocation of the data request adds some overhead because additional cache filling and database queries are time-consuming.
When writing data using the cache-aside pattern, the following process is as follows.
Invalidate or update cached data in the cache;
When using cache-side, the most common write strategy is to write data directly to the database, but the cache may be inconsistent with the database.
We should set an expiration time for the cache, which is the solution to ensure eventual consistency.
If the expiration time is too short, the application continuously queries the data from the database. Similarly, if the expiration time is too long and the cache is not invalidated when it is updated, the cached data is most likely dirty.
The most common way is to delete the cache and invalidate the cached data.
Why not update the cache?
When the update cost of the cache is high and you need to access multiple tables for joint computing, it is recommended to delete the cache directly instead of updating the cache data to ensure consistency.
In the case of high concurrency, the data that may be found in the query is the old value, and the specific code brother will analyze it, so don’t worry.
When the cache misses, the data is also loaded from the database, written to the cache and returned to the application system.
Although read-through and cache-aside are very similar, in cache-aside the application is responsible for fetching data from the database and populating the cache.
Read-Through shifts the responsibility for getting the values in the data store to the cache provider.
Read-Through implements the separation of concerns principle. The code only interacts with the cache, and it is up to the cache component to manage the synchronization of data between itself and the database.
2.3 Write-Through Synchronous write
Similar to Read-Through, when a write request occurs, Write-Through shifts write responsibility to the caching system, and the cache abstraction layer completes the update of the cached and database data
The timing flow chart is as follows:
The main benefit of Write-Through is that the application system does not need to consider fault handling and retry logic, leaving the cache abstraction layer to manage the implementation.
It doesn’t make sense to use this policy directly alone, because it writes the cache first and then the database, which introduces additional latency to write operations.
When Write-Through is used in conjunction with Read-Through, you can take advantage of Read-Through while also ensuring data consistency without having to think about how to invalidate cache settings.
This strategy reverses the order in which Cache-Aside fills the cache, not by lazying it into the cache after a cache miss, but by writing the cache to the cache first, and then the data to the database by the cache component.
Query performance is best because it is possible that the data to be queried has already been written to the cache.
Infrequently requested data is also written to the cache, resulting in a larger and more expensive cache.
This diagram looks like Write-Through at first glance, but it is not, the difference is the arrow of the last arrow: it changes from solid to line.
This means that the caching system will update the database data asynchronously, and the application system will only interact with the caching system.
Applications do not have to wait for database updates to complete, which improves application performance because updates to the database are the slowest operations.
Under this strategy, the consistency of the cache and the database is not strong, and it is not recommended for systems with high consistency.
3. Analysis of coherency issues under bypass cache
The most commonly used in business scenarios is the Cache-Aside (bypass cache) policy, under which the client’s reading process for data reads the cache first, and returns if hit; If a miss is missed, the data is read from the database and the data is written to the cache, so the read operation does not cause the cache to be inconsistent with the database.
The focus is on writes, both the database and the cache need to be modified, and there is a sequence between the two, which may lead to data inconsistency. For writing, we need to consider two questions:
When the data changes, choose to modify the cache (update) or delete the cache (delete)?
Update the database before deleting the cache.
The next analysis does not have to be rote memorized, the key is that in the process of deduction, we only need to consider whether the following two scenarios will bring serious problems:
Will reading data be inconsistent in high concurrency situations?
Why not consider the first failure and the second success?
Since the first one fails, the second one does not need to be executed, and the exception information such as 50x can be returned directly in the first step, and there will be no inconsistency.
Only the first success, the second failure is a headache, and to ensure their atomicity involves the category of distributed transactions.
3.1 Update the cache before updating the database
If the cache is successfully updated first and the write database fails, it will cause the cache to be the latest data, the database is old data, and the cache is dirty data.
Later, other queries will get this data when they immediately request it, and this data database does not exist.
With data that does not exist in the database, it makes no sense to cache and return it to the client.
The program goes directly to Pass.
3.2 Update the database before updating the cache
All normal is as follows:
Update the cache again, successfully.
At this point, let’s deduce that if the atomicity of these two operations is destroyed: what problems will occur if the first step succeeds and the second step fails?
This can cause the database to be up-to-date and the cache to be old, resulting in consistency issues.
I will not draw the diagram, similar to the previous one, just dial down the position of Redis and MySQL.
Xie Bage often 996, waist soreness neck pain, bug more and more written, want to massage massage and massage to improve the next programming skills.
Affected by the epidemic, the list is not easy to come by, and the technicians of the high-end clubhouse are scrambling to take this order, high concurrency ah brothers.
After entering the store, the front desk will enter the customer information into the system, and the service technician who performs set xx = the initial value to be determined indicates that no one is currently receiving and saving it to the database and cache, and then arrange the technician massage service.
This is shown in the following figure:
No. 98 technician first to the strong, send the system set Xie Bage service technician = 98 instructions to write to the database, at this time the system’s network fluctuations, stuttering, the data has not yet had time to write to the cache.
Next, technician 520 also sent set Xie Bage service technician = 520 to the system to write to the database, and also wrote this data to the cache.
At this time, the previous 98 technician’s write cache request began to execute, and the service technician = 98 of the data set Xie Bage was successfully written to the cache.
Finally, it was found that the value of the database = set Xie Bage’s service technician = 520, while the cached value = set Xie Bage’s service technician = 98.
The most recent data from technician 520 in the cache is overwritten by technician 98’s old data.
Therefore, in a scenario with high concurrency, multiple threads write data and then write to the cache at the same time, and there will be inconsistencies in the cache being the old value and the database being the latest value.
The program passes directly.
If the first step fails, a 50x exception is returned without data inconsistencies.
3.3 Delete the cache before updating the database
According to the routine mentioned earlier in “Brother Code”, assuming that the first operation is successful, what happens under the inference of the failure of the second operation? What happens in a high concurrency scenario?
Suppose you now have two requests: write request A and read request B.
Write request A The first step is to delete the cache successfully, and the failure to write data to the database will cause the data to be lost, and the database saves the old value.
Then another read asked B to come in, found that the cache did not exist, read the old data from the database and wrote to the cache.
Or No. 98 technician first to start strong, the system receives a request to delete the cache data, when the system is ready to set Xiao rookie chicken service technician = 98 to write to the database when the stuttering, too late to write.
At this time, the lobby manager executes a reading request to the system, checks whether Xiao Cai Chicken has a technician reception, convenient to arrange technician services, the system finds that there is no data in the cache, so it reads the old data from the database Set Xiao Cai Chicken Service Technician = To be determined, and write to the cache.
At this time, the original Caton No. 98 technician wrote the data set Xiao Rookie Service Technician = 98 to the database operation completed.
This will result in the cached old data, and the most data cannot be read until the cache expires. Xiao Caiji had been taken by technician No. 98, but the lobby manager thought that no one was receiving him.
The scheme passes, because the first step succeeds and the second step fails, will cause the database to be old data, and there is no data in the cache to continue to read the old value from the database to write to the cache, resulting in data inconsistencies, and there will be one more cahche.
Whether it is an anomaly or a high concurrency scenario, it can lead to data inconsistencies. miss。
3.4 Update the database before deleting the cache
After the previous three solutions, all of them have been passed, and the final solution is analyzed whether it works or not.
According to the “routine”, determine what problems will be caused by abnormal and high concurrency, respectively.
The policy can know that if it fails in the write database stage, it will return the client exception directly, and no caching operation is required.
So the first step failure does not result in data inconsistencies.
The focus is on the success of the first step in writing the latest data to the database, what if the cache is deleted and fails?
You can put these two operations in a single transaction, and if the cache delete fails, roll back the write database.
It is not suitable in the scenario of high concurrency, and it is easy to have large transactions, resulting in deadlock problems.
If you do not roll back, then the database is new data, cache or old data, the data is inconsistent, what to do?
Therefore, we must find a way to make the cache delete successful, otherwise we can only wait until the expiration date expires.
Use the retry mechanism.
For example, if you try three times and fail three times, you will log to the database, and use the distributed scheduling component xxl-job to implement subsequent processing.
In high-concurrency scenarios, retries are best achieved using asynchronous methods, such as sending messages to mq middleware, for asynchronous decoupling.
Or use the Canal framework to subscribe to MySQL binlog logs, listen for the corresponding update requests, and perform delete the corresponding cache operations.
Let’s analyze what will be wrong with high concurrent reading and writing…
Technician No. 98 started first, took over the business of Xiao Caiji, and the database executed set Xiao Caiji Service Technician = 98; or the network was stuck, and there was no time to perform the delete cache operation.
The supervisor Candy executed a read request to the system, checked whether there was a technician reception for Xiao Cai chicken, and found that there was data in the cache Xiao Cai Chicken Service Technician = To be determined, directly returned the information to the client, and the supervisor thought that no one was received.
Originally, technician No. 98 took the order, and the operation that did not delete the cache was now successfully deleted because of Caton.
A read request may read a small amount of old data, but soon the old data will be deleted, and subsequent requests can get the latest data, which is not a big problem.
There is also a more extreme case, when the cache is automatically invalidated, it encounters a high concurrent read and write situation, assuming that there will be two requests, one thread A does the query operation, and one thread B does the update operation, then there will be the following situation:
The expiration time of the cache expires and the cache expires.
Thread A reads the request read cache, and if it is not hit, the query database gets an old value (because B writes the new value, which is relatively the old value), and is ready to write the data to the cache when the sending network problem is stuck.
Thread B performs a write operation, writing the new value to the database.
Thread B performs the deletion of the cache.
Thread A continues, wakes up from the stutter, and writes the old value of the query to the cache.
Brother code, this play, there is still a situation of inconsistency.
Don’t panic, the probability of this happening is minimal, and the necessary conditions for the above situation to occur are:
The write database operation in step (3) takes a short time and faster than the read operation in step (2) so that step (4) precedes step (5).
The cache just reached the expiration time.
Usually the QPS of MySQL stand-alone is about 5K, while the TPS is about 1k, (ps: Tomcat’s QPS is about 4K, TPS = 1k).
Database reads are much faster than write operations (which is why read/write splits are done), so it is difficult to see if step (3) is faster than step (2), and it is also necessary to invalidate with the cache.
Therefore, when using the bypass caching strategy, it is recommended for write operations: update the database first, and then delete the cache.
4. What are the consistency solutions?
Finally, for the Cache-Aside (bypass caching) policy, the write operation uses the case of updating the database first and then deleting the cache, let’s analyze what are the data coherency solutions?
4.1 Cache delay double delete
How can I avoid dirty data if I delete the cache first and then update the database?
The delayed double-delete strategy is adopted.
Sleep for 500 milliseconds, and then delete the cache.
This only results in a maximum of 500 milliseconds of dirty data read time. The key is how to determine this sleep time?
The purpose of the delay is to ensure that the read request ends and that the write request can delete the cached dirty data caused by the read request.
Therefore, we need to evaluate the time of reading data business logic of the project by ourselves, and add a few hundred milliseconds as a delay time on the basis of the reading time.
4.2 Delete cache retry mechanism
What should I do if the cache deletion fails? For example, if the second deletion fails to delay the double deletion, it is not impossible to delete the dirty data.
Use the retry mechanism to ensure that the cache deletion is successful.
For example, if you try three times and fail three times, log to the database and send a warning to let the human intervention.
In high-concurrency scenarios, retries are best achieved using asynchronous methods, such as sending messages to mq middleware, for asynchronous decoupling.
Step (5) If the deletion fails and the maximum number of retries is not reached, the message is re-queued until the deletion is successful, otherwise it is logged to the database and manually intervened.
This solution has a disadvantage, that is, it causes intrusion into the business code, so there is a next solution, starting a service that specifically subscribes to the database binlog to read the data that needs to be deleted for cache deletion.
4.3 Read binlog asynchronously delete
Update the database;
The database will record the operation information in the binlog log;
Use canal to subscribe to the binlog log to get the target data and key;
The cache deletion system gets the canal data, parses the target key, and attempts to delete the cache.
If the deletion fails, the message is sent to Message Queuing;
The cache deletion system refetches the data from the message queue and performs the delete operation again.
The best practice for caching strategies is Cache Aside Pattern. They are divided into read cache best practices and write cache best practices.
Best practices for reading caching: read the cache first, return when hit; If it misses, the database is queried and then written to the database.
Write caching best practices:
Write the database first, and then operate the cache;
Directly delete the cache, not modify, because when the update cost of the cache is very high, you need to access multiple tables of joint computing, it is recommended to delete the cache directly, instead of updating, in addition, the delete cache operation is simple, the side effect is only to add a chache miss, it is recommended that you use this strategy.
Under the above best practices, in order to ensure the consistency of the cache and the database as much as possible, we can use deferred double delete.
To prevent deletion failures, we use an asynchronous retry mechanism to ensure correct deletion, asynchronous mechanism we can send delete messages to mq message middleware, or use canal subscription MySQL binlog log dictation requests to delete the corresponding cache.
So, what if I have to ensure absolute consistency, first give the conclusion:
There is no way to achieve absolute consistency, this is determined by the CAP theory, the caching system is suitable for the scenario of non-strong consistency, so it belongs to the AP in the CAP.
Therefore, we have to be perfect, and we can achieve the ultimate consistency in the BASE theory.
In fact, once the cache is used in the solution, it often means that we give up the strong consistency of the data, but it also means that our system can get some performance improvements.
That’s exactly what tradeoff is called.
– EOF –
Caching and database consistency issues, read this article is enough
Distributed data consistency thinking – B-side system consistency
How Cache works, Cache consistency, all you want to know is here
Got a harvest after reading this article? Please forward and share it with more people
Follow “ImportNew” to improve your Java skills
Likes and looks are the biggest support ❤️