friend and I discussed a topic he interviewed for a large company some time ago :

Enterprise IM such as enterprise WeChat, DingTalk in the group message has a read unread function, when the sender just sent the message, the other group members in the current group are unread, one after another people read this message, at this time the details of the message become x people read, y people unread, as shown in the figure below, there is a specific read unread list (evil function, see the message of colleagues or bosses can not pretend not to see), Each message corresponds to a unique messageid (uint64_t), each user corresponds to a unique userid (uint64_t), how should the read and unread details corresponding to this message be saved?

I gave a very simple and rude solution for the first time:

for each messageid, save the current readids + unreadids,

when group member A has read a certain message, remove A userid from unreadids and write it to readids, and the client updates to the details list corresponding to messageids, you can show that m people have read and n people have not read

Obviously, the interviewer will not be satisfied with such a simple and rough plan, and ask if there is a better plan?

Careful analysis, according to the current design, each message, read unread

details to occupy 8B * the number of group members of the memory, if an active group of 200 people, each message, read unread to 1600B, if the average daily message volume is 1k, then each such group, 1.6MB disk space per day, for the client, especially the mobile phone, occupying disk space is unacceptable to the user, and can not delete the work message, for the server side, if the user group is particularly large, then the cost of database storage is not small


in fact, unread read is a 0/1 mark, can maintain a bitmap to achieve? What exactly should be done?

The group metadata holds the mapping of userid to auto-incrementing mapid

struct UserInfo { uint64_t userid; uint32_t mapid; }; struct GroupMetaInfo { vector  members; string name; uint32_t maxid;  other info}; 

In this way, every time a group member joins a group, there is a two-way mapping of mapid<->usreid, if there are 5 members ABCDE in the group, then corresponding to mapid 1-5, the message details storage corresponding to messageid can be designed as

{ uint32_t maxid, uint8_t readbit[]} 

For example, the above case is {5, readbit[0] = bin(0000 0000)}; It occupies 5B(4+1), when A sends a message, D has read the message, it is updated to {5,readbit[0]= bin(0000 1000)}, and when the remaining 4 people have read the message, it is updated to {5, readbit[0]=bin(0001 1110)}

This is a rough scheme, and there are some details worth considering:


    What about the withdrawn members? For example, C quits the group, maxid is still 5 when sending the message, the total number of read + unread should be 3 (excluding the sender himself), the current information is only 5 bits (0/1), can not identify who has left

    the group chat

  1. How to deal with the members who quit the group chat? Is it removed from GruopMetaInfo? How do I assign an ID to a member who leaves a group chat and rejoins?

First of all, 2 this point, the member who quits the group chat can only mark the deletion, can not

physically delete, otherwise when the client displays the read unread details, the corresponding userid can not be found through the mapid, and the exited member rejoins the group chat It is easy to do, change the marker deletion to non-marker deletion, or use the old mapid

As for 1? I currently think of a better way is to add another bitmap, record whether the member has left the group chat when the message is sent, and the exit group chat is set to 1, so the final solution is to


the user ID of the group information, self-increase mapid bidirectional mapping, exit group chat member marker deletion, messageid read unread details stored {maxid, readbit[], quitbit[]}

What are the benefits of the new approach?

  1. Add self-increasing mapid field, a group chat maintains a copy, the cost is almost negligible

  2. per member’s read and unread by 8B (64bit) optimized to 2bit, reducing 62/64, 200 people have read unread the old scheme 1600B, now only need (200/8) * 2 + 4 = 54, saving 95%+ per message

If maxid reaches the level of millions or even tens of millions, wouldn’t it be a disaster? In general actual scenarios, group chats will limit the number of people, even if you keep kicking people and adding new people, then maxid can only reach the number of people in the enterprise at most. If maxid reaches a particularly large number, the storage corresponding to the read and unread can be increased by one more flag, and if the bitmap storage cost far exceeds the original solution, it can be implemented with the initial scheme, and the client can bury the compatibility logic in advance to | toutiao.com/i6686735232772604429




public number (zhisheng ) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitor keywords such as to view more articles corresponding to keywords.

like + Looking, less bugs 👇