This article is from members of the Education-Adult & Innovation-Front End team and is licensed to ELab for publication.

This kind of live broadcast is not so high on the real-time requirements, the use of CDN for content distribution, there will be a delay of a few seconds or even ten seconds, mainly concerned about the picture quality, audio and video is Caton and other issues, generally choose RTMP and HLS protocol

RTMP (Real Time Messaging Protocol), that is, “real-time message transmission protocol”, it can not actually do real real-time, generally at least a few seconds to tens of seconds of delay, is Adobe company developed audio and video data transmission real-time messaging protocol, RTMP protocol based on TCP, including RTMP basic protocol and RTMPT / RTMPS / rtmpe and other variants, RTMP It is one of the mainstream streaming media transmission protocols, with good support for CDN and low difficulty in implementation, which is the choice of most live broadcast platforms, but RTMP has one of the biggest drawbacks – it does not support browsers, and Apple ios does not support it, and Adobe has stopped updating it

RTMP is still widely used on PCs

HLS (Http Live Streaming) is an HTTP-based streaming real-time transmission protocol defined by Apple, which is widely used in the field of video-on-demand and live broadcasting, and the HLS specification stipulates that players can download at least one ts slice to play, so HLS theoretically has at least one slice delay

HLS compatibility in the mobile side is better, ios needless to say, Android now basically supports the HLS protocol, the PC side if you want to use the hls .js adapter

The principle of HLS is to divide the entire stream into a plurality of small files to download, each time only download a few, the server will generate the latest live data to generate a new small file, when the client gets live broadcast, it plays by obtaining the latest video file fragments, so as to ensure that users will see newer content at any time when they connect in, to achieve an approximate live broadcast experience; The latency of HLS is generally higher than that of ordinary live streaming protocols, and the transmission content consists of two parts: one part of M3U8 is an index file, and the other part is a TS file, which is used to store audio and video media information

It consists of three parts: live broadcast client, signaling server, and CDN network

Live broadcast client mainly includes audio and video data collection, encoding, streaming, streaming, decoding and playback functions, but in fact, these functions are not implemented in the same client, why? Because as an anchor, he does not need to see the audience’s video or hear the audience’s voice, and as an audience, they communicate with the anchor through text, and do not need to share their audio and video information with the anchor

For the anchor client, it can collect data from the camera, microphone of the device, and then encode the collected audio and video data, and finally push the encoded audio and video data to the CDN

For the audience client, it first needs to obtain the streaming media address of the anchor’s room, and the audience pulls the audio and video data from the CDN after entering the room, decodes the obtained audio and video data, and finally renders and plays the audio and video

Signaling server, mainly used to receive signaling, and according to signaling to process some business-related logic, such as creating a room, joining a room, leaving a room, text chatting, etc

CDN networks, mainly used for the distribution of media data, media data transmitted to it can be quickly transmitted to users everywhere

As people’s requirements for real-time and interactive are getting higher and higher, and traditional live broadcasting technology is becoming more and more unable to meet people’s needs, WebRTC technology is a new technology proposed to solve people’s real-time and interactive needs

WebRTC (Web Real-Time Communication), that is, “web instant communication”, WebRTC is a support browser for real-time voice, video dialogue of the open source protocol, the current mainstream browsers are supporting WebRTC, even in the general case of network signal also has good stability, WebRTC can achieve point-to-point communication, communication between the two sides of the delay is low, so that users do not need to download and install any plug-ins can be real-time communication

Before the release of WebRTC, the cost of developing real-time audio and video interactive applications is very high, there are many technical issues that need to be considered, such as audio and video codec problems, data transmission problems, delay, packet loss, jitter, echo processing and elimination, etc. If you want to be compatible with real-time audio and video communication on the browser side, you also need to install additional plug-ins, WebRTC greatly reduces the threshold of audio and video development, developers only need to call WebRTC API to quickly build audio and video applications

The following is mainly through WebRTC’s real-time communication process to get a general understanding of WebRTC

The main work of the audio input equipment is to collect audio data, and the essence of the collected audio data is analog-to-digital conversion (A / D), that is, the analog signal is converted into a digital signal, and the collected data is then quantified, encoded, and finally forms a digital signal, which is the work to be completed by the audio equipment

Video equipment, and audio input devices are very similar, the video equipment analog-to-digital conversion (A/D) module is an optical sensor, the light into a digital signal, that is, RGB (Red, Green, Blue) data, after obtaining RGB data, but also through the DSP (Digital Signal Processer) optimization processing, such as automatic enhancement, color saturation, etc. are all things to be done at this stage, through the DSP optimization processing to obtain RGB images, Then compressed, transmitted, and the input format generally used by the encoder is YUV, so there is a special module inside the camera for converting the RGB image to a YUV format image

So what is YUV?

YUV is also a color coding method, it will be brightness information (Y) and color information (UV) separation, even if there is no UV information can display the complete image, but black and white, such a design is a good solution to the color TV and black and white TV compatibility problem (this is also the original intention of YUV design) relative to the RGB color space, the purpose of YUV is to encode, the convenience of transmission, reduce bandwidth occupation and information errors, the visual characteristics of the human eye is more sensitive to brightness, position, Color is relatively insensitive, in the video coding system in order to reduce bandwidth, you can save more luminance information, save less chromatic aberration information


This method returns a list of available media input and output devices, such as microphones, camcorders, headphone devices, etc

The format of the returned deviceInfo information is as follows:

Device detection is performed by calling the getUserMedia method (described below when audio and video capture).

The frame rate represents the number of images in the video in 1 second, the general frame rate reaches 10~12fps The human eye will feel that it is coherent, the higher the frame rate, the higher the number of images processed per second, so the traffic will be larger, the performance requirements of the device will be higher, so in the live broadcast system, generally will not set too high frame rate, high frame rate can get smoother, more realistic animation, generally speaking, 30fps is acceptable, but the performance to 60fps can significantly improve the sense of interaction and realism, However, in general, it is generally not easy to detect a significant fluency improvement over 75fps

WebRTC “track” borrows from the concept of multimedia, two tracks will never intersect, “track” in the multimedia is that each track data is independent, will not intersect with other tracks, such as MP4 in the audio track, video track, they are stored separately in the MP4 file


srcObject[1]: The property sets or returns an object that provides a media source associated with HTMLMediaElement, which is usually MediaStream, which can also be MediaSource, Blob or File according to the specification, but for MediaSource, Blob and File types, the current browser compatibility is not very good, so for these types can be passed URL.createObjectURL() creates the URL and assigns it to HTMLMediaElement.src

The MediaStreamConstraints parameter specifies which types of media tracks (audio tracks, video tracks) are included in MediaStream, and you can set some restrictions on these media tracks

We know that the video is composed of a frame of images and a set of audio, so the process of taking a picture is actually to extract the picture being displayed from the continuous playing video stream (a picture), we said that we can get the video stream through getUserMedia, so how to get the picture that is being displayed from the video stream?

The canvas’s drawImage is used here[2]

The first parameter of the drawImage supports the HTMLVideoElement type, so you can pass $video directly as the first parameter, so that the photo is obtained through the canvas

Then download the photo via download on the a tab and save it locally

The ArrayBuffer object represents a universal, fixed-length binary data buffer that can be used to store pictures, videos, and other content, but the ArrayBuffer object cannot be accessed directly, and ArrayBuffer only describes that there is such a space that can be used to hold binary data, but it does not really allocate space in the memory of the computer, and it only really exists in memory when it is specifically typed

It is a general term for Int32Array, Uint8Array, DataView, etc., which are implemented using the ArrayBuffer class, so they are collectively called ArrayBufferView

(Binary Large Object) is a large binary object type of JavaScript, WebRTC ultimately uses it to save the recorded audio and video stream as a multimedia file, and its underlying is implemented by the encapsulation class of the ArrayBuffer object mentioned above, that is, Int8Array, Uint8Array and other types

The stream parameter is the stream that will be recorded, which can come from a stream created with navigator.mediaDevices.getUserMedia or from audio, video, and canvas DOM elements

The MediaRecorder.ondataavailable event can be used to get the recorded property (a available blob object is provided in the data property of the event)

The recording process is as follows:

Once the data acquisition is complete, the next step is to start establishing the connection and then the data communication is carried out

To achieve a set of 1 to 1 communication system, usually our idea will be to create a socket at each end, and then through the socket and the peer connection, when the socket connection is successful, you can send data to the peer through the socket or receive peer data, WebRTC provides RTCPeerConnection class, its working principle and socket is basically the same, but its function is more powerful, the implementation is more complex, Let’s talk about RTCPeerConnection in WebRTC

In audio and video communication, each party only needs to have an RTCPeerConnection object to receive or send audio and video data, but in the real scene, in order to achieve the call between the end and the end, it is also necessary to use the signaling server to exchange some information, such as exchanging the IP and port addresses of both parties, so that the two parties can establish a connection with each other

WebRTC specification on WebRTC to achieve the function, API and other related information to do a lot of constraints, such as the specification defines how to collect audio and video data, how to record and how to transmit, etc., or even more detailed, but also defines what APIs are used, and what these APIs do, but these constraints are only for the client, and do not make any restrictions on the server, which leads us to use WebRTC, we must implement the signaling service. I’m not going to focus on how to implement a signaling server here, let’s just look at how RTCPeerConnection enables one-to-one communication

How does RTCPeerConnection work?

Create an RTCPeerConnection object for each side of the connection, and add a local stream to the RTCPeerConnection object, which is fetched from getUserMedia

Once you have the audio and video stream, you can start media negotiation with the peer (media negotiation is to see which codecs your device supports, and does my device support it?). If my device also supports it, then we will negotiate successfully), and this process needs to be done through a signaling server

Now suppose A and B need to communicate

At this point, the exchange of media information and consultation has been completed

In this way, a new Candidate is collected, and in the real world, whenever a new Candidate is obtained, it is exchanged to the peer through the signaling server, and the peer then calls the addIceCandidate() method of the RTCPeerConnection object to save the received Candidate and then performs connectivity detection according to the Candidate’s priority, if the Candidate After the connectivity test is completed, a connection is established between the ends, and the media data can be transmitted through this connection

Video is a continuous image sequence, composed of continuous frames, a frame is an image, due to the visual persistence effect of the human eye, when the frame sequence plays at a certain rate, what we see is the continuous video of the action, due to the high similarity between the continuous frames, in order to facilitate storage and transmission, we need to encode and compress the original video to remove the redundancy of the space and time dimensions

Video codec is the use of algorithms to remove redundant information of video data, compress, store and transmit images, and then decode and format conversion of video, the pursuit of the highest possible video reconstruction quality and the highest possible compression ratio within the available computing resources to achieve bandwidth and storage capacity requirements of video processing technology

The most important codec standards in video streaming are the H.26X series (H.261, H.263, H.264), the MPEG series, Apple’s QuickTime, and so on

Through the RTCPeerConnection object A and B after the two sides establish a connection, the local multimedia data can be transmitted to the remote end after encoding, the remote end received the media data decoding, how to display it, the following video as an example, see how to let RTCPeerConnection get the media data and video tag combined

When there is a data stream at the far end, the browser will call back to the onaddstream function, and in the callback function, the resulting stream is assigned to the srcObject object object of the video tag, so that the video is bound to RTCPeerConnection, and the video can get the video data from RTCPeerConnection and finally display it

There are very, very many things related to WebRTC, and here is just a very brief introduction to the general process of using WebRTC to achieve real-time communication, if you are interested, you can study the details in detail




That’s all for this sharing and hope it helps you ^_^

If you like it, don’t forget to share, like, and collect three times.

Welcome to pay attention to the public number ELab team receiving large manufacturers first-hand good article ~

ByteDance School/Social Recruitment Internal Push Code: F231V92

Drop off link: