As a high-performance cache, Redis is widely used in various services, such as game leaderboards, distributed locks and other scenarios. After long-term operation in IEG, we have also encountered some pain points of Reddis, such as high memory footprint, poor data reliability, and cumbersome business maintenance cache and storage consistency. Tendis jointly developed by Tencent Interactive Entertainment’s CROS DBA team and Tencent Cloud database team has launched: Cache, Hybrid Storage and Storage Edition three different product forms, for different business needs, this article mainly introduces the overall architecture of Hybrid Storage Edition, and reveals the internal principles in detail.

This article first introduces some pain points encountered by Tencent IEG in operating Reddis, and then introduces three different product forms of Tendis jointly developed by Tencent Interactive Entertainment’s CROS DBA team and Tencent Cloud database team. Finally, the architecture of the hot and cold hybrid storage version is introduced, and the functional characteristics of each component are highlighted.

Background introductionIn

the

process of use, the following pain points are mainly encountered:

  • high memory cost
    1. different

    2. QPS requirements at different stages of the business For example, the game business, the new game that has just been launched is particularly hot, in order to support tens of millions of simultaneous online users, it is necessary to continuously expand and increase the machine. After a period of operation, there may be fewer gamers, less frequent visits (QPS), still a lot of machines, and high maintenance costs.
    3. When you need to reserve memory for Redis to save full data, you need a fork process. Linux’s fork system calls are based on the Copy On Write mechanism, and if Redis has a large number of write operations during this time, the parent and child processes need to maintain a memory each. Therefore, the machines where Redis is deployed often need to reserve half of the memory.
  • The Redis + MySQL architecture requires a lot of effort on the business side to maintain the consistency of the cache and database.
  • Data reliability Redis is essentially an in-memory database, although users can use AOF’s Always disk to ensure data reliability, but it will bring a significant decrease in performance, so the production environment is rarely used. In addition, rollback is not supported, and asynchronous replication will cause data loss after the master fails.
  • Asynchronous replication Redis master/standby uses asynchronous replication, which is an inherent problem with asynchronous replication. The active and standby use asynchronous replication, which has low response latency and high performance, but after the master fails, it will cause data loss.

What is Tendis ?

Tendis is a Redis storage solution that combines the advantages of Tencent’s massive KV storage, and is 100% compatible with all Redis protocols and Redis 4.0 data models. As a highly available and high-performance distributed KV storage database, Tendis has launched three different product forms: cache version, hybrid storage version and storage version from different dimensions such as access latency, persistence requirements, and overall cost, and open source the storage version. Interested partners can go to Github to follow our project: Tencent/Tendis

Tendis The cached edition is suitable for services that are particularly sensitive to latency requirements and have high QPS requirements. Custom development based on community Redis version 4.0.

Tendis Storage Edition is suitable for large-capacity, latency-insensitive services, and all data is stored on disk, which is suitable for storing warm and cold data. Tendis Storage Edition is an open source distributed high-performance KV storage system independently designed and developed by Tencent Interactive Entertainment’s CROS DBA team and Tencent Cloud database team. In addition, a lot of optimizations have been made in reliability, replication mechanism, concurrency control, gossip implementation, and data migration, and some difficult problems in Redis clusters have been solved. Fully compatible with the Redis protocol and uses RocksDB as the underlying storage engine.

Tendis Hot and Cold Hybrid Storage Edition combines the advantages of the cache version and the storage version, the cache layer stores hot data, and the full data is stored in the storage layer. This not only ensures the access performance of hot data, but also ensures the reliability of full data, and supports automatic cooling of hot data.

Tendis Hot and Cold Hybrid Storage Edition The overall architecture

Tendis Hot and

Cold Hybrid Storage Edition is mainly composed of Proxy, cache layer Redis, and storage layer Tendis Storage Edition and synchronization layer Redis-sync composition

, the functions of each component are as follows: Proxy

component: responsible for routing client requests, distributing commands from different keys to the correct shards, while Proxy is also responsible for the collection of part of the monitoring data, as well as the online disabling of high-risk commands.

Caching layer Redis

Cluster: The cache layer Redis is developed based on community Redis 4.0. Redis has the following features: 1) Versioning 2) Automatically eliminate cold data from the cache layer and load hot data from the storage layer to the cache layer; 3) Use Cuckoo Filter to represent full Keys to prevent cache penetration; 4) Based on RDB+AOF scaling mode, scaling is more efficient and convenient.

Storage layer Tendis

Cluster: Tendis Storage Edition is a KV storage engine compatible with the Redis protocol developed by Tencent based on RocksDB, which has been operating within the Tencent Group for many years, and its performance and stability have been fully verified. In the hybrid storage system, it is mainly responsible for the storage and reading of full data, as well as data backup, incremental log backup and other functions.

Synchronization layer Redis-sync: 1) Parallel data import Storage layer Tendis; 2) The service is stateless, and the fault is pulled up again; 3) Automatic data routing.

Some important features of Tendis hot and cold hybrid storage are introduced: <

ul class=”list-paddingleft-2″>

  • cache layer Redis Cluster and storage layer Tendis Cluster are scaled and scaled, cluster autonomous management, etc.
  • Automatic cooling of cold data to reduce memory cost; Automatic caching of hot data, reducing access latency
  • caching layer Redis Cluster

    hot and cold mixed storage cache layer Redis has added the following functions on the basis of the Community Edition:

    • Version control
    • hot and cold data interaction
    • Cuckoo Filter avoidance cache traversal
    • intelligent elimination algorithm
    • based on RDB+ The following AOF scaling

    describes

    these features in detail.

    Versioning is first based on the Community Edition Redis churn is versioning

    . We add a Version for each Key and each Aof, and the Version is monotonically increasing. After each update/addition of a key, assign the version of the current node to Key and Value, and then assign the global version++; Add 64bits in redisObject as shown below, where 48bits are used for versioning.

    typedef struct redisObject {
      unsigned type:4;
      unsigned encoding:4;
      unsigned lru:LRU_BITS; 
    /* LRU time (relative to global lru_clock) or                            * LFU data (least significant 8 bits frequency

                                * and most significant 16 bits access time). */

      int refcount;

      /* for hybrid storage */


      unsigned flag:4;                           /* OBJ_FLAG_... */
      unsigned reserved:4;
      unsigned counter:8;                        /* for cold-data-cache-policy */
      unsigned long long revision:REVISION_BITS; /* for value version */

      void *ptr;

    } robj;

    The introduction of version control mainly brings the following advantages:

    after the Redis master and standby of the community version are disconnected and reconnected, if the data corresponding to the psync_offset sent by the slave is not in the repl_backlog of the current master , the active and standby need to perform full synchronization again. After reintroducing Version, the slave disconnects and reconnects and sends the Master the PSYNC replid psync_offset version command with Version. If the above situation occurs, the master generates incremental RDBs for data greater than or equal to the Version and sends them to the slave, thereby solving the problem that incremental synchronization is relatively slow.

    If the synchronization layer Redis-sync

    experiences a network transient (briefly disconnected from the cache layer or storage layer), as a stateless synchronization component, Redis-sync will re-pull the incremental data that is not synchronized to Tendis and resend it to Tendis. Each Aof has a Version, and Tendis will only execute Aof larger than the current Version when executing to avoid data inconsistencies caused by multiple aof executions.

    Hot and cold data interaction

    Cold data recovery refers to when the user accesses a key that is not in the cache layer and needs to reload data from the storage layer to the cache layer. Data recovery Here is the cache layer directly and the storage layer direct interaction, when the cold keys access request is relatively large, data recovery is easy to become a bottleneck, so each Tendis node to establish a connection pool, specifically responsible for cold and hot data recovery with this Tendis node.

    The specific process of users accessing a key is as follows:

    1. first determine whether the key is in the cache layer, and if the cache layer exists, execute the command; If the cache layer does not exist, query Cuckoo Filter to determine whether the key is possible in the storage layer;
    2. If the key

    3. may be in the storage layer, send the dumpx dbid key withttl command to the storage layer to try to get data from the storage layer and block the currently requesting client;
    4. The storage layer

    5. receives dumpx, and if the key is in the storage layer, it returns the RESTOREEX dbid key ttl value to the caching layer; If the key is not in the storage layer (false positive of the Cuckoo filter), the DUMPXERROR key is returned to the cache layer;
    6. AFTER THE STORAGE LAYER RECEIVES RESTOREEX OR DUMPXERROR, IT RECOVERS COLD DATA. The blocking client can then be woken up to execute the client’s request.

      Key Cooling

    and Cuckoo Filter Here we mainly explain the evolution of hybrid storage from the 1:1 version of the cache layer to cache full keys, to the N:M version of the cache layer to expel keys and values at the same time, and we introduced the

    Cuckoo Filter

    Avoid cache pass-through while saving a lot of memory.

      Key

    1. Cooling Background Introduction The 1:1 version of hot and cold hybrid storage launched in June 2020, the cache layer Redis stores full Keys and Hot Values(All Keys + Hot values), storage layer Tendis stores full amounts of Keys and Values(All Keys + All values). After running online for a period of time, it was found that the memory overhead of full keys was particularly large, and the benefits of hot and cold mixing were not obvious. In order to further free up memory space and improve the efficiency of caching, we abandon the Redis cache full keys scheme, and eliminate both keys and values from the cache layer when expelling.
    2. The Cuckoo Filter addresses cache breakdown and cache penetration If the cache layer does not store the full number of keys, the problem of cache breakdown and cache penetration will occur. To solve this problem, the caching layer introduces the Cuckoo Filter to represent the full number of keys. We needed a Membership Query structure that supports deletion, is dynamically scalable and has high space utilization, and after our research and comparative analysis, we finally chose Dynamic Cuckoo Filter.
    3. The Dynamic Cuckoo Filter implementation project initially referenced the implementation of the Cuckoo Filter in RedisBloom, and encountered

    4. some pitfalls during the development process, and the Cuckoo Filter implemented by RedisBloom will be deleted by mistake when it is deleted, and finally mentioned to RedisBloom PR (Fix Cuckoo filter compact cause deleted by mistake #260) Fixed the issue.
    5. The benefits of

    6. key cooling are finally eliminated from the cache layer at the same time, and the benefits of reducing memory are great. For example, a service on the live network has a total of 6620 W keys, which occupies 18408 MB of memory when caching full keys, and only occupies 593MB after the key cools.

    Intelligent retirement/loading strategy

    as a hybrid hot and cold storage system, hot data in the cache layer, full data in the storage layer. The key problem is the elimination and loading strategy, which directly affects the efficiency of the cache, and there are two main points for subdivision: 1) When the cache layer memory is full, which data is selected to be eliminated; 2) When a user accesses the data of the storage layer, whether it needs to be put into the caching layer.

    1. first introduces the elimination strategy of hybrid storage, mainly the following two elimination strategies:
      • > maxmemory-policy When the Redis memory usage reaches maxmemory, the system will evict the key/value from the cache layer according to the memory policy of maxmemory-policy to free up memory space. (Eviction refers to the removal of the key/value from the cache layer, and the storage layer and the cache layer Cuckoo filter still have the key; )
      • value-eviction-policy If value-eviction-policy is configured, the background periodically evicts key/value that the user has not accessed for N days from memory;
        cache

      1. loading strategy to avoid cache pollution problems (such as Scan-like access, traversing the data in the storage layer, eliminating the real hot data of the cache layer, resulting in cache inefficiency). We implement a cache loading strategy: only data that is accessed more frequently than a certain threshold within a specified period of time is loaded into the cache, where the time and threshold are configurable.

      Scaling process based on RDB+AOF Community

      Edition Redis:

      Community Edition Some problems with Redis scaling:

        the

      1. input and migrating settings are not atomic

      First set the slot of the target node to the importing state, and then set the slot of the source node to the migrating state. If, conversely, the request is repeated between the two non-atomic operations: the source node is set to migrating and the destination node has not yet set the migrating state.

      The Migrate command migrates one or more keys at a time, and migrating the entire slot to the target node requires multiple network interactions.

      Because the Migrate command is a synchronous command, it cannot process requests from other users during the migration process, so it may affect your business. (The delay time fluctuates greatly).

      Due to the

      above problems in Community Edition of Redis, we have implemented the scaling method based on RDB+Aof, and the general process is as follows:

      1. control to add new nodes, and plan slots to be relocated;
      2. The

      3. control side issues the slot synchronization command to the target node: cluster slotsync beginSlot endSlot [[beginSlot endSlot]...].
      4. The destination node

      5. sends sync [slot ...] to the source node, and the command requests that the synchronization
      6. slot data

      7. source node generate a consistent snapshot of the specified slot data full data (RDB) and send it to the destination node
      8. The source node starts to continuously send incremental data (AOF) The
      9. controller

      10. locates and obtains the lagging values (diff_bytes) of the source node and the target node, and if the lagging value is within the specified threshold, the controller sends cluster slotfailover to the target node (The process is similar to Reddis’s cluster failover, which first blocks the source node to write, then waits for the destination node and the source node to fall behind 0, and finally attributes the migrated slots to the target node.)

      synchronization layer Redis-sync

      emulates the behavior of the Redis Slave, receiving RDBs and AOFs and then importing them into the storage layer Tendis in parallel. The synchronization layer mainly needs to solve the following problems:

      • concurrently imported to the storage layer Tendis, how to ensure that the timing is correct
      • HANDLING OF SPECIAL COMMANDS, SUCH AS FLUSHALL/FLUSHDB/SWAPDB/SELECT/MULTI, ETC
      • As a stateless synchronization component, how to ensure data breakpoint continuation after failure?
      • The cache layer and the

      • storage layer are scaled and scaled separately, how do I route requests to the correct Tendis node?

      to solve the above three problems , we implemented the following function:

        > Serial within slots, parallel between slots For problem 1, Redis-sync uses the same algorithm for calculating slots as Redis, parses to specific commands, and puts them into the corresponding queue (slot%QueueSize) according to the slot to which the key belongs. Therefore, the data of the same slot is written serially, and the data of different sockets can be written in parallel without timing disorder.
      • String-and-Merge Conversion For Problem 2, Redis-sync converts between parallel and serial modes. For example, when you receive the FLUSHDB command, you need to execute all the commands before the FLUSHDB command and then execute the FLUSHDB command.
      • Periodic Escalation For issue 3, Redis-sync periodically persists the version of the aof aof sent to the storage layer to the storage layer. How to redis-sync failure, first get the last sent location from the storage layer, and then send psync to the corresponding Redis node to request synchronization.
      • Automatic data routing For Problem 4, Redis-sync periodically obtains the mapping of slots to Tendis nodes from the storage layer and maintains a connection pool for these Tendis nodes. The request arrives from the caching layer, computes the slot to which the request belongs, and sends it to the correct Tendis node.

      Storage layer Tendis Cluster Tendis

      is a distributed high-performance KV database compatible with Redis core data structures and protocols, mainly with the following characteristics:

        compatible

      • with Redis protocol Fully compatible with Redis protocol, supports Redis main data structures and interfaces, and is compatible with most native Redis commands.
      • Persistent storage Using rocksdb as the storage engine, all data is stored in rocksdb in a specific format, and the maximum support for petabytes of storage.
      • Decentralized architecture Similar to the distributed implementation of Redis Cluster, all nodes communicate through the Gossip protocol, and hashtags can be specified to control data distribution and access, with extremely low usage and operation and maintenance costs.
      • Horizontal scaling The cluster supports adding and deleting nodes, and

      • data can be migrated between any two nodes according to the slot, which is transparent to application O&M personnel during the expansion and scaling process, and supports expansion to 1,000 nodes.
      • Automatic failover Automatically detects the failed node, and when the failure occurs, the slave will automatically promote to the master to continue to provide services.

      Abstract: Original source cloud.tencent.com/developer/article/1815554 “Tencent Technical Engineering Official Number” is welcome to reprint, keep the summary, thank you!

      end

       


      public number (zhisheng) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitoring < keywords such as span class="js_darkmode__148"> to view more articles corresponding to keywords.

      like + Looking, less bugs 👇