Usually we open web pages, such as a shopping website for a treasure. All of them are clicked on the list of products, redirected to the page to the product details.

From the perspective of the HTTP protocol, it is to click a button on the web page, the front end sends an HTTP request, and the website returns an HTTP response.

This way of being actively requested by the client and responded by the server also satisfies the functional scenarios of most web pages.

However, it has not been found that in this case, the server will never actively send a message to the client.

Just like the girls you like never take the initiative to find you.

But if now, when you brush the web page, a small ad suddenly pops up in the lower right corner, prompting you to “sneak at home alone to play”.

Curious, studious, diligent, these things engraved in your DNA are moving.

You click on it and find out.

The plain-looking Gu Mou prompts you to “Taoist 9 dogs, all dressed horizontally.”

Teacher Shadow Emperor Mouhui told you, “Brothers will come and cut me.”

When they all come, you choose a character and enter the game interface.

At this time, come up to a mob, come from a distance, and then frantically take a wooden stick and smoke you.

You didn’t click a mouse once. The server automatically sends you the monster’s mobile data and attack data sources.

This….too heart-warming.

Touched, the problem comes,

How does a scenario like this where the server actively sends messages to the client do it?

Before we really answer this question, let’s talk about some relevant knowledge background.

In fact, the pain point of the problem is how to receive a message and change the web page without the user doing anything.

The most common solution is that the front-end code of the web page constantly sends HTTP requests to the server, and the server receives the request and responds to the client with a message.

This is a form of pseudo-server push in real time.

In fact, it is not that the server actively sends messages to the client, but the client itself constantly secretly requests the server, but the user is not aware of it.

There are also many scenarios in this way, the most common is to scan the code to log in.

For example, a letter public number platform, after the two-dimensional code of the login page appears, the front-end web page does not know whether the user has scanned or not, so it constantly asks the back-end server to see if anyone has scanned the code. And it is to send out requests at intervals of about 1 to 2 seconds, which can ensure that users can get timely feedback within 1 to 2s after scanning the code, and will not wait too long.

But then, there are two more obvious problems

When you open the F12 page, you will find HTTP requests on the full screen. Although it is small, this actually consumes bandwidth and also increases the burden on downstream servers.

In the worst case, after scanning the code, the user needs to wait for 1 to 2s to trigger the next http request, and then jump to the page, and the user will feel obviously stuck.

The experience of using it is that after the two-dimensional code appears, the mobile phone scans it, and then clicks on the mobile phone to confirm, at this time Caton waits for 1 to 2s, and the page jumps.

So the problem arises again, is there a better solution?

Yes, and the cost of implementation is very low.

We know that after the HTTP request is issued, it will generally leave a certain amount of time for the server to respond, such as 3s, and if it is not returned within the specified time, it is considered to be a timeout.

If our HTTP request is set to a large timeout, such as 30s, as long as the server receives a scan request within these 30s, it will immediately return to the client web page. If it times out, the next request is made immediately.

This reduces the number of HTTP requests, and since in most cases, the user will scan the code within a 30s interval, so the response is also timely.

For example, a certain cloud network disk does this. So you will find a scan code, click on the phone to confirm, the computer web page will jump in seconds, the experience is very good.

Kill two birds with one stone.

A mechanism like this that initiates a request and waits for a long time for the server to respond is the so-called long training wheel mechanism. In our commonly used message queue RocketMQ, consumers also use this method when they go to fetch data.

Like this, in the case of users do not perceive, the server pushes data to the browser technology, is the so-called server push technology, it also has an unrelated English name, comet technology, everyone has heard it.

The two solutions mentioned above, in essence, are actually the client actively fetching the data.

It can also be used for simple scenarios like scanning code login.

But if it is a web game, the game generally has a large amount of data that needs to be actively pushed from the server to the client.

This brings us to the websocket.

We know that both ends of a TCP connection, at the same time, can actively send data to each other. This is called full duplex.

The most widely used HTTP1.1 is also based on the TCP protocol, at the same time, the client and the server can only have one party actively send data, which is called half-duplex.

In other words, a good full-duplex TCP is used by HTTP as a half-duplex.


This is because at the beginning of the design of the HTTP protocol, the consideration is to look at the scene of the web page text, so that the client initiates the request and then responds by the server, which is enough, and there is no consideration for the web game, the client and the server must actively send a large amount of data to each other.

So in order to better support such a scenario, we need another new protocol based on TCP.

So the new application layer protocol websocket was designed.

Don’t be mistaken by the name. Although the name has a socket, in fact, between the socket and the websocket, just like Leifeng and Leifeng Tower, the two are close to nothing to do with each other.

We usually brush the web page, generally on the browser brush, a brush brush graphics, this time using the HTTP protocol, a page game will be opened, at this time we have to switch to our newly introduced websocket protocol.

To be compatible with these usage scenarios. After the TCP three handshakes establish a connection, the browser uniformly uses the HTTP protocol to communicate first.

If it is an ordinary HTTP request at this time, then the subsequent two parties will continue to interact with the ordinary HTTP protocol, which is no doubt.

If you want to establish a websocket connection at this time, you will put some special header headers in the HTTP request.

These header headers mean that the browser wants to upgrade the protocol (Connection: Upgrade) and wants to upgrade to the websocket protocol (Upgrade: websocket).

Also bring a randomly generated base64 code (Sec-WebSocket-Key) and send it to the server.

If the server happens to support upgrading to the websocket protocol. It will go through the websocket handshake process, and at the same time, according to the base64 code generated by the client, it will be turned into another string with a public algorithm, placed in the Sec-WebSocket-Accept header of the HTTP response, and at the same time, the 101 status code will be sent back to the browser.

The case of http status code = 200 (normal response) is much more commonly seen. 101 is really uncommon, it actually refers to protocol switching.

After that, the browser also uses the same public algorithm to convert the base64 code into another string, and if this string matches the string passed back by the server, the verification passes.

In this way, after two HTTP handshakes, the websocket is established, and the two parties can communicate using the webscoket data format.

We can grab a packet with wireshark and actually see what the packet looks like.

The image above, note line 2445 of the red box, is the first handshake of websocket, which means that an HTTP request with a special header was initiated.

The above figure draws the red box of 4714 lines of packets, that is, the second handshake of the server after getting the first handshake, you can see that this is also an HTTP type message, the status code returned is 101. At the same time, you can see that the returned packet header also has various websocket-related information, such as Sec-WebSocket-Accept.

The above figure is the whole picture, from the comments on the screenshot, it can be seen that websocket and HTTP are both TCP-based protocols. After three TCP handshakes, the HTTP protocol was upgraded to the websocket protocol.

You may see a saying on the Internet: “websocket is a new protocol based on HTTP”, which is not true, because websocket only uses HTTP when establishing a connection, and after the upgrade is complete, it has nothing to do with HTTP.

It’s as if the girl you like goes through your college roommate’s WeChat and they start talking on their own. Can you say that this girl communicated with your roommate through you? No. Like HTTP, you’re just a tool guy.

This is a bit of a “shell egg” meaning.

As mentioned above, after the protocol upgrade is completed, the two ends will communicate in the webscoket data format.

Packets are called frames in websockets.

Let’s take a look at what its data format looks like.

There are a lot of fields, but we only need to pay attention to the following.

opcode field: This is used to indicate what type of dataframe this is. Like what.

When equal to 1, it refers to packets of type text (string).

Packets equal to 2 are binary data types ([]byte).

Equal to 8 is the signal to close the connection

payload field: Holds the length of the data we really want to transfer, in bytes. For example, if the data you want to send is the string “111”, then its length is 3.

In addition, as you can see, we have several fields for storing the payload length, and we can use both the front 7 bits and the back 7 + 16 bits or 7 + 64 bits.

Then the problem comes.

We know that at the data level, everyone is a 01 binary stream. How do I know when to read 7 bit and when to read 7+16 bit?

Websocket will use the first 7 bits as the flag. No matter how big the next data is, read the first 7 bits first, and decide whether to read another 16 bits or 64 bits according to its value.

If the value of the first 7 bits is 0 to 125, then it represents the full length of the payload, and only the first 7 bits are read.

If it is 126 (0x7E). That means that the length of the payload ranges between 126 and 65535, and then you need to read 16bit. This 16-bit will contain the true length of the payload.

If it is 127 (0x7F). Then it means that the length range of the payload >=65536, and then you need to read 64bit. This 64-bit will contain the length of the payload. This can put 2 of the 64th power byte data, converted to a good number of terabytes, it is definitely enough.

payload data field: where the real data to be transmitted, after knowing the above payload length, you can intercept the corresponding data according to this value.

Have you found a small detail, the websocket data format is also in the form of data header (including payload length) + payload data.

The previously written “Since there is an HTTP protocol, why should there be RPC”, mentioned that the TCP protocol itself is full-duplex, but directly using pure naked TCP to transmit data, there will be a “problem” of sticky packets. In order to solve this problem, the upper-layer protocol will generally use the format of message header + message body to repackage the data to be sent.

The message header generally contains the length of the message body, through which the real message body can be intercepted.

The HTTP protocol and most of the RPC protocols, as well as the websocket protocol we introduced today, are designed this way.

Websocket perfectly inherits the full duplex capabilities of the TCP protocol, and also provides a solution to solve sticky packets. It is suitable for most scenarios that require frequent interaction between the server and the client (browser). For example, web/mini program games, web chat rooms, and some web collaborative office software such as Feishu.

Back to the question at the beginning of the article, in the web game that uses the websocket protocol, the monster movement and attack the player’s behavior are generated by the server logic, and the data such as damage to the player needs to be actively sent to the client by the server, and the client displays the corresponding effect after obtaining the data.

The TCP protocol itself is full-duplex, but our most commonly used HTTP1.1, although it is a TCP-based protocol, is half-duplex, and it is not very friendly for most scenarios that require the server to actively push data to the client, so we need to use a websocket protocol that supports full-duplex.

In HTTP1.1. As long as the client does not ask, the server does not answer. Based on this feature, for simple scenarios such as login pages, you can use timed polling or long polling to achieve the effect of server push (comet).

For complex scenarios that require frequent interaction between the client and the server, such as web games, the websocket protocol can be considered.

Websockets and sockets have almost nothing to do with each other, just similar names.

Because each browser supports the HTTP protocol, the websocket will first use the HTTP protocol plus some special header headers for handshake upgrade operations, and after the upgrade is successful, it has nothing to do with HTTP, and then it will use the websocket data format to send and receive data.

Recently, the number of original articles read has steadily declined, and after thinking about it, it tosses and turns at night.

I have an immature request.

I’ve been away from Guangdong for a long time, and no one has called me Liangzai for a long time.

Can you call me a pretty boy in the comments section?

Can my wish to be so kind and simple be satisfied?

If you really can’t call it out, can you help me click on the attention and the likes + in the lower right corner?

In addition, I recently intend to turn the new technology exchange group into an active high-quality technology group. From time to time, there are book delivery, red envelope distribution activities, technical problems encountered in the work, you can consult everyone inside, and there are opportunities to push within the work. Interested friends, welcome to add groups.

Scan reply: Add a group to join the group.