Author of this issue

Zhao Hailin

Senior development engineer of B-side technology center

01 Preface

Bilibili Live was founded in 2014, and after 8 years of development, it has grown from the initial business test to one of the company’s important business segments. The technical architecture has also evolved from a single service to a complex system of hundreds of microservices. This article will review the step-by-step changes in the evolution of Bilibili’s live broadcast architecture over the past 8 years, and take you to understand how it has gradually become a microservice system capable of carrying tens of millions of online networks from 0.

02 From 0 to 1

Like most websites, live streaming also starts with a LAMP architecture, namely Linux + Apache + MySQL + PHP. The front-end, server-side, and timed tasks are all concentrated in a project called “live-app-web”.

 Live streaming system architecture

A typical live broadcast background architecture consists of three parts: the service system is used for the logical implementation of various service functions for live broadcasting, the push and pull stream system is used for the anchor to push the stream and the user to pull the stream to watch, and the long connection system is used to push and reach various real-time service data in the live broadcast. The live-app-web assumes the role of application server.

The application architecture of live-app-web

In live-app-web, there are pages rendered via the PHP template engine Smarty, front-end pages written by JS, and PHP message queue handlers that reside in the background (a production-consumption model implemented through Redis List), which are placed in their respective code directories, developed by front-end and back-end developers, and eventually deployed to a physical machine.

Live-app-web The original project structure

Although the architecture seems to be very rudimentary now, in the first 2 years we implemented the various business systems necessary for each live broadcasting platform in the live-app-web, and these systems have also continued to grow into an important and independent business system in the subsequent evolution.


Like any fast-growing business, the amount of live-app-web code is growing rapidly, from 5W lines of PHP code in 2015 to 8W lines at the end of the year, and then to 13W lines in mid-2016. During this period, we did a certain separation of front and back ends, and split the front-end part into a separate front-end application, such as live broadcast home page, live room page, live personal center page, etc. However, with the growth of business and the expansion of personnel to bring more and more problems to the single application, the problems of merger conflicts, release queues, and a sub-module problem caused by parallel projects have become increasingly prominent, and these problems have finally erupted in an important event.

The git status at a point in 2016

03 The bureau seat is coming

Students who do not understand the live broadcast of Station B of the Bureau can first understand the background of the incident through this link: How do you think of General Zhang Zhaozhong’s live broadcast of Station B on July 13? [1], a one-sentence summary is: the bureau seat came to the live broadcast hanging, Rui always knew that he publicly apologized.

At that time, the live broadcast was almost ridiculed by the whole network, and the voices of Weibo and Zhihu were overwhelming. Looking back, on the one hand, the number of viewers is not estimated enough, on the other hand, it is constrained by the single service architecture, and the means of monitoring alarms, resource elasticity, flow restriction and degradation, and fault isolation are very lacking. After the pain, the live broadcast began the process of micro-service.

04 Micro-service

How do I do microservices? This was a big problem for the team at that time, because the team members at that time were mainly from PHP backgrounds, and the business was also iterating rapidly, it was obviously unrealistic to switch languages from 0. At that time, there was a popular PHP high-performance service framework Swoole, which was very popular in China, which significantly improved the running performance of PHP services through process residency. After some technical research, the team decided to build a live microservices framework based on Swoole and defined the following principles:

Split microservices by business area

Each microservice has its own independent database and cache

Each microservice can only access its own database, cache, and only through RPC between services

The person in charge of microservices is responsible for the stability of the service for which they are responsible for the business

Service framework: We have developed our own microservice framework based on Swoole, this set of microservice framework to implement service process management, smooth restart, ORM, Cache, logging, etc., the business only needs to implement the controller and the corresponding service code according to the template, which can be developed quickly and easily.

Communication protocol: Microservices communication We use the RPC protocol based on TCP self-encapsulation and called liverpc, this RPC protocol is very simple and intuitive, through the fixed-length Header header + variable length JSON Body implementation, between services with TCP short connection service calls.

Service discovery: One of the more questions that needs to be addressed after the introduction of microservices is how to do service discovery? Combined with the background at that time, we chose zookeeper as the service discovery component, and in order to isolate the complexity of service registration, discovery, and health check, we specially developed a business companion program called Apollo for service configuration pull, service registration, and service node discovery, and the business framework sensed configuration changes for hot loading through file monitoring.

Configuration management: Similarly we used zookeeper to save configuration files for each service and configure pull and change monitoring via Apollo.

Message queue: We build a special kafka machine as a message queue, because PHP directly interacts with Kafka is more complicated, we set up a special delivery agent service publisher and message callback notification service notify.

Unified gateway: On the other hand, in response to the lack of unified flow throttling and degradation capabilities, we have developed a separate gateway service live-api, and require all external access to be forwarded to the corresponding service through live-api. On this layer of unified gateway, we have implemented traffic forwarding, URL rewriting, timeout control, throttling, caching, and downgrading. The live-api is also implemented based on swoole, the difference is that we are implementing a pure asynchronous client through swoole, which has some guarantee in performance.

At this point, the prototype of a complete set of microservice systems has emerged.

On this set of microservices systems, we gradually refactored the logic of the respective business domains of live-app-web into the corresponding services. At the same time, we collaborated with the DBA to complete the online splitting of the live broadcast database, splitting the business tables originally concentrated in one library into independent databases, realizing the wheel change on the highway and completely eliminating the risk of mixed storage layers.

In December 2017, the bureau seat came to Station B again to start a live broadcast, bringing in a lot more traffic than last year, but this time we stabilized.

05 Containerization

Live streaming services have always adopted the way physical machines are deployed, which has obvious flaws: each service needs to be assigned a separate port to avoid port conflicts, the deployment directory needs to be isolated, there is resource competition, and the capacity of a single service cannot be accurately evaluated. With the expansion of the business scale of Station B, the company’s infrastructure team also provides a more stable container platform, and after full research, we launched the service Docker transformation and completed the container deployment of all services in a short period of time.

One of the questions faced in Docker-based deployments is: How do I choose how to schedule the CPU?

We know that there are usually two ways in Docker: 1. CFS (completely fair scheduling) is scheduled through proportional CPU time sharding, which is more flexible in resource allocation, and can also improve the overall resource utilization through resource overallocation; 2. CPUSET (tied core) This way binds the POD to the specified one or more CPUs by setting the CPU affinity to achieve resource exclusivity.

When we migrated the PHP service to Docker, we found that the interface timeout in CFS mode was very serious and had reached an unacceptable level, so all PHP services were deployed in the CPUSET way, and the number of working processes of the PHP service was also optimally configured through stress testing, which performed best for 3 to 4 times the number of allocated CPUs.

The same problem exists in CPUSET mode, which is difficult to find in Prometheus’s monitoring charts, because the monitoring data is usually acquired in 30s cycles to generate a monitoring curve. However, on the request log, we can clearly see that there are spikes of requests in the second level, and these requests far exceed the number of CPUSET we have configured for the service, and it is obviously unrealistic to increase the quota to meet this burst, because it will cause a huge waste of resources.

For this scenario, we divide the application resource pool into two groups: fixed resource pool and elastic resource pool, the services in the fixed resource pool use CPUSET fixed allocation of resources, and the elastic resource pool adopts the way of multiple service mixes, and the single service does not limit its resource usage. The traffic is offloaded through the gateway and the burst traffic is introduced into the elastic resource pool to solve the capacity bottleneck caused by the burst.

The gateway service live-api counts the QPS of each interface in real time: when the QPS is less than X, all traffic is forwarded to the fixed resource pool group, and when the QPS > X, requests that exceed the threshold are forwarded to the elastic resource pool. At the same time, we have implemented the request marking function, and requests in the elastic resource pool will take precedence to request services that are also in the elastic resource pool. We can also determine whether the fixed resource pool service needs to be expanded by observing the utilization rate of the elastic resource pool, and the ultimate purpose is to solve the frequent burst traffic errors of individual services through a small number of mixed elastic resource pools.

The problem of CFS timeout was later elaborated in more detail in the article published by Alibaba Cloud, and the timeout problem caused by CFS scheduling was greatly alleviated by CPU Burst technology, the core of CPU Burst is to introduce our commonly used token bucket throttling algorithm into the Linux kernel CPU scheduling, when the CPU usage is lower than the set quota, the unused quota can be accumulated, and in the subsequent scheduling allows the use of this part of the cumulative quota to deal with prominent traffic. Subsequently, the kernel team solved the problems of cgoup leakage, scheduling imbalance, timeout and other issues through kernel upgrade and optimization. At the same time, through the optimization of scheduling algorithms on the core, the use of CPU Burst, Group Identity, SMT expeller and other technologies to achieve offline business mixing does not affect each other, the whole site resource pooling and other major technical features, resource capacity and utilization has been greatly improved. Business applications no longer use CPUST, a relatively fixed resource allocation method, but dynamically and on-demand to obtain the required running resources through elastic resource management policies such as VPA and HPA in the CFS scheduling mode.

06 Golang is really fragrant

2018 is the year of the Golang fire, Mr. Mao as a Golang evangelist in Bilibili main station to promote the evolution of Golang service is very successful, and through Golang developed a series of microservices frameworks and middleware, such as Kratos (Go microservices framework), Discovery (service discovery), Overload (cache proxy), etc., a considerable part of the project is also open source on github.

At that time, live broadcasting was facing the next step of technical evolution, because the PHP microservice architecture built on swoole could no longer support larger traffic, and its main problems were focused on:

PHP’s multi-process synchronization model is very easy to cause the entire service to hang up because of a single downstream exception, because the downstream response is slower, PHP Worker cannot be released in time, and new requests can only queue up for idle workers after coming, so cascading waiting leads to an avalanche of the system.

It is difficult to implement RPC concurrent calls, and in some complex business scenarios, because only downstream interfaces can be called serially, the final external interface is very time-consuming.

PHP service expansion brought about the pressure of database and cache connections, at that time there was no mature database proxy, but every PHP worker will directly connect to the database, which directly led to the explosion of the number of connections, further limiting the expansion ability of PHP services.

The co-process model of Golang can solve these problems, and Mr. Mao personally came to the live broadcast to guide the Golang service evolution when the main site Golang service evolution was basically completed.

For this Golang service-oriented evolution, we divide services into three types:

Service gateway (interface): The service gateway is divided into service scenarios, such as apps and web gateways, and completes API access to the corresponding scenarios in the gateway, aggregating data for downstream business services, handling application version differences, and downgrading of functional modules.

Business services: Business services are divided by business areas, such as room services, gift services, and different business services complete their business logic.

Business task (job): A business JOB is attached to a business service and is usually used in scenarios such as scheduled task processing and asynchronous queue consumption.

In particular, the design of the service gateway should be mentioned, in the scene of live broadcast home page and room page, due to the complex business logic, the client usually needs to call ten or even dozens of interfaces, and some interfaces also have timing dependencies. Not only is the client code implemented complex, but it also causes a delay in the presentation of the client page. Therefore, in the new Golang gateway implementation, we aggregate the display data of a single scene into one interface, that is, to open a page, you only need to call 1~2 interfaces to complete the page function rendering. Subsequently, we also implemented features such as active caching of hot data and automatic degradation of downstream service exceptions at the service gateway.

After several service pilots, it was found that Golang-based services far exceeded PHP services in terms of interface time and stability, especially when the gateway needed to aggregate more than 10 downstream data, and the concurrent processing interface through the coroutine took less than half of the original PHP service on average. In the period since, more and more Golang services have been created, and more APIs have been made available to Bilibili Web, PC, Android, iOS, and other devices through Golang Gateway.

07 The end of the live-app-web

In 2019, the earliest live broadcast service, live-app-web, finally completed its mission, and all online functions were refactored and migrated, realizing the overall offline of live-app-web services. As of the end of the offline live-app-web has accumulated 19W lines of code, hundreds of contributers, thank them!

08 The birth of a new gateway

Back to the evolution of Goalng microservices, we did not let the former live-api gateway take over the traffic of the Golang business gateway, on the one hand, because swoole did not have a mature asynchronous http client at that time, and on the other hand, the pure asynchronous gateway based on PHP gradually revealed performance bottlenecks. And the problem is gradually exposed in 2019:

Golang Service Gateway requires services to be connected separately within their respective services, and after configuration modification, they need to be restarted to take effect. In an emergency, it is even found that some services are not connected to the flow throttling component.

The poor performance of live-api under larger business traffic has become a bottleneck, and the existing PHP service needs to continue to iterate and provide services for a long time.

After investigating Kong, Tyk, Envoy and other open source gateways, we decided to use Envoy as the data surface and self-developed Golang service as the control surface to achieve the new gateway. Envoy is almost No.1 in the service mesh space, and it is well suited as a traffic forwarding service. We named the new gateway Ekango.

In order to further remove live-api, we upgraded the original TCP-based liverpc protocol to support HTTP calls, so that requests can be forwarded directly from Ekango to the corresponding PHP service, and also greatly facilitate the development and debugging costs of research and development.

In Ekango Gateway, we have implemented distributed current throttling, interface condition Rewrite, interface degradation, unified authentication, interface risk control, multi-active zone degradation, etc., and provide single-machine 15W+ QPS service capabilities.

At the same time, based on Ekango’s design and development experience, we implemented the service mesh application based on Envoy: Yuumi. Yuumi is a solution to the problem of PHP, JS and other languages accessing GRPC services developed by Golang, because microservice construction has long revolved around the Golang ecosystem, but the support for other languages is slightly weak. For live broadcasting, we hope that the PHP service can also enjoy the same service governance capabilities as the Golang ecosystem, and can easily call the GRPC service.

The implementation of Yuumi solves this problem, through the service mesh PHP/JS process to access the local sidecar process in the HTTP protocol, the sidecar and then forwards the request to the corresponding HTTP or GRPC service, and the business service does not need to care about service node discovery, node error retry, node load balancing and other microservice governance issues.

Ekango has helped live stream support several mega-events of the million-level and tens of millions of levels with stable performance. However, he also has some flaws, such as complex deployment configuration, C++ code is difficult to develop twice, especially the lack of visual control surface of traffic governance and control capabilities, and only a few developers can correctly configure it. In the previous article, we introduced that the microservice team developed a unified gateway for Station B, which not only supports the conventional traffic governance capabilities, but also provides advanced features such as the access mode and control surface of the whole process visualization, API metadata management, and full-link grayscale release. Therefore, after full evaluation, the live broadcast also migrates the full amount of gateway traffic to the unified gateway, and the unified gateway controls and governs the inlet traffic of the whole station. Unified Gateway is also synchronised on Github as part of the Kratos open source project.

At this point, the architectural evolution of live broadcasting has basically come to an end, after which we have performed the role splitting of message queues and scheduled tasks, and the introduction of distributed task scheduling to completely solve the problem of single point of service deployment. At the same time, we actively promote the multi-activity landing of the business to solve the problem of wider availability and serve the rapid development of the live broadcast business. In the process of architectural evolution, we have also encountered some typical problems, and here is also a summary of the treatment of these problems, hoping to inspire your thinking.

09 About Hot Key

Hot issues are everywhere, snapping, flash sales, lotteries, a large event, emergencies will form a hot spot, and the most direct impact is to produce thermal data, which in turn leads to a single node being hung, service avalanche and other terrible results. For live streaming businesses, the most likely to generate hot keys are those hot rooms, which we call high online rooms. With the iteration of the live broadcast architecture, our way of processing hot keys is also changing, but they all revolve around the idea of multi-level caching and divide and rule, and we also need to consider data consistency and timeliness, and we cannot blindly solve hot keys by adding caching.

 9.1 High online hotspot cache for PHP services

In the era of PHP microservices, we collect data from CDN and bullet screen long connections through a centralized monitor-service, obtain rooms with high online populations, and push these room information to the message queue. The hot room information is consumed by the service job that cares about the popular rooms, and the hot data that may be involved in the respective business is actively pushed to the cache. There will also be a timer in the service process that listens on these popular cache keys and regularly pulls this data directly into the memory of the PHP process. In this way, the business data of the popular room will directly hit the memory cache.


High online hotspot cache for Golang services

In the construction of Golang service, we simplified the detection logic of popular rooms, directly provided a popular room SDK, and the service service can directly determine whether a specific room_id/uid belongs to the hotspot through the SDK, and the popular room list information is pulled by the SDK internally. The service then caches the hotspot room data directly into memory through a timer.

Such benefits are:

The popular judgment threshold can be controlled by each service service itself, such as the A service believes that 1W online is popular and needs to be warmed up; Service B believes that more than 5W of popular data online requires preheating. This provides high data timeliness and consistency for non-popular data, and higher availability at the expense of some consistency for popular data.

Popular processing can be simulated, walkable, usually in the expected large-scale activities we will mark the active room in the background as a popular room in advance, and then through the pressure test to verify whether the popular room processing logic is effective and the performance is in line with expectations.

 9.3 Active detection of hot spot data

With the integration of live broadcasting in the main station of Station B, we found that hot spots do not only come from the scene of popular live broadcast rooms, but also hot articles and popular comments will cause hot key problems for some live broadcast services. That’s why we designed a more general hotspot detection and processing SDK.

After the service receives the user request, the counting API is called, and the SDK asynchronously calculates the Top-K through the sliding window+LFU+priority queue, and regularly calls back to the service to count the hotspot data IDs, and the service preloads the data source into memory based on these hotspot IDs. In this way, the statistics and judgment of hot spots are completely dependent on the QPS situation of the business itself, without relying on external data. Finally, we achieved second-level awareness of hot data and data warm-up caching capabilities.

 9.4 Memory caching for the proxy layer

Redis implemented a client-side caching mechanism in 6.0 to address hotspot data issues. Our middleware team also implements the client data cache on the internal cache proxy, through the middleware management background we can configure the regular expression to match a type of cache key, the rule-compliant cache key will be in the proxy layer for data caching, the next access to the key will directly hit the local cache, no longer need to access the cache server, until the local cache invalidation.

Agent-layer caching is particularly suitable for emergency processing processes where hot keys have already been discovered, and directly setting the discovered hot keys to local caching can greatly mitigate the risk of hot keys. However, it is not suitable for pre-configuration as a general-purpose hot Key processing scheme, especially for regular matching of one type of key, which affects the data consistency of such keys.


Proxyless Redis Client 

Integrated hotspot cache

The Hotspot Detection SDK requires active service access, and the cache scheme of the proxy layer is too simple. After several hot key triggering alarms, we communicated with our infrastructure classmates and explored a way to achieve transparent access to the Redis Client embedded hotspot cache SDK. In this scenario, the infrastructure students borrowed the HeavyKeeper algorithm to redesign the hot spot detection SDK. HeavyKeeper is used to get very accurate TopK calculations with little memory overhead in streaming data, and the TopK is the hot key we want to know. The combination of transparent access for services and dynamic update of cache configuration makes the killer solution for Hot Key.

 9.6 Handling of hot spot write data

In the live broadcast scene, in addition to reading hot spots, there are also scenes where hot spots are written. It is usually due to the write operation caused by a large number of users giving gifts to the same anchor, sending bullet screens, etc., and then producing a large number of concurrent write scenes for a single record. Further analysis of these concurrent writing scenarios we find that it is usually for the increase/decrease of the value of a single record, such as experience points, points, likes, etc., and such scenes can naturally support aggregation. Therefore, we have developed an aggregate write SDK, which can use memory aggregation or Redis aggregation to aggregate and write changes to data by business according to a set period, such as +1, +2, -1 so that the three operations can be directly aggregated into +2 operations. To implement this SDK, you need to consider the aggregation window size, downstream DB pressure, and data consistency guarantee for abnormal service restarts.

10 About request amplification

Room service is one of the largest and most core services for live streaming traffic, and the daily QPS is maintained at 20W+. In the Operational Room Service we found request amplification in the following scenarios:

 10.1 Request more data than is needed

When analyzing the source of the room service high QPS call, we found that some services only need part of the data in the room information but request the entire room information, for example, some business parties only need to determine whether the user has a live room but call the complete room information interface, and the problem interface that can be solved by one field returns dozens of fields, resulting in unnecessary bandwidth consumption and interface time. We refer to FieldMask (for FieldMask can refer to Netflix API Design Practice: Using FieldMask) design to split room information into different modules, such as playback correlation, live card display correlation, etc. The business side can assemble API calls to obtain the data of the corresponding module according to the needs of the scene to implement on-demand requests.

 10.2 Duplicate Requests

The live broadcast room carries nearly 80% of the service functions of live broadcasting, and users will request the room interface when entering the room. In this interface, the gateway aggregates multiple downstream data and returns it to the user uniformly, and we find that there is a situation where room information is repeatedly requested in this scenario.

In addition to the room-gateway requesting room information as shown in the above figure, gift-panel and dm-service will also request room information again, which directly leads to the enlargement of the request for room service. After more and more such downstream services, users will have more than 10 times more traffic amplification of room services when they enter the room at one time. And this kind of traffic amplification is obviously not necessary. The solution also directly passes the room information that dm-service and gift-panel depend on through the interface directly to the corresponding service. The call timing is adjusted to first call the room service to obtain room information, then call the business service concurrently to obtain the business module data, and finally assemble the data required by the business to return.

 10.3 Request amplification for Business Services

The live broadcast room carries dozens of business functions, and users will request these dozens of downstream services separately when they enter the room. For each downstream, it is required to prepare according to the QPS of the room, that is, to bear at least 2W+ of QPS. This is unbearable for some niche services, and data from the data point of view, 99% of the downstream requests are invalid requests that are checked out. In order to reduce the load of the access room scene service and reduce the waste of resources, we have implemented a TAG mechanism on the room service, the business service synchronizes the data TAG to the room service, and the gateway and client decide whether to request the corresponding business service according to the TAG identification status after requesting the room information, so as to avoid a large number of services that need to bear the QPS of the user entry level.

11 About Event Protection

The technical support of a large-scale event is a technical feast, and it is also a big test for all R&D students. Live streaming technology has precipitated a series of tools and methodologies in the event guarantee over the years. There are a series of standardized solutions, tools and platforms for scenario combing and analysis, service capacity estimation, full-link pressure testing, downgrade planning, and on-site support.

 11.1 Scene grooming

The purpose of scene sorting is to understand which business functions, services and interfaces are involved in a live broadcast activity, so as to carry out follow-up protection work for the business modules involved. Usually the function used in the live broadcast room is a subset of the ordinary live broadcast room, which requires that the functions in the live broadcast room need to have a control switch, and the control switch here must be implemented in the terminal, that is, the client will not have any request pressure on this function service after the switch is closed. Scene carding needs to request recording based on the user’s real operation path, and automated scene recording can be carried out by proxy packet capture, and then the scene dependency diagram is quickly generated through the Trace link corresponding to the recording request, which clarifies the services, resources and other information involved in the scene.

 11.2 Capacity Assessment

Capacity assessments are used to determine the resources required for activities to procure back-up and scale up early. Capacity assessments must be extrapolated based on historical data and activity estimates, with different growth factors for different businesses.

 11.3 Service Stress Testing

Service stress testing is usually a thorough survey of the real capacity of online services, and generally a stress test is conducted before and after service expansion and service expansion to verify whether the service capacity meets the requirements of the activity. In particular, the scene of data writing needs to realize the isolation of pressure test data and real data through the means of full-link pressure testing, so as to avoid the dirty data generated by pressure testing from affecting online business.

 11.4 Downgrade Plan

There is no guarantee of no plan, and there is a corresponding SOP for possible technical risks, and these SOPs need to verify the effectiveness of the scheme through rehearsal.

 11.5 On-site Safeguards

On-site duty support usually encounters the problem of information explosion and difficult collaboration, especially sudden system alarms are easy to produce a shocking effect. It is necessary to efficiently distribute information and collaborate in real time to achieve orderly circulation, no overlap and no leakage, and rapid implementation of the guarantee work. Based on the particularity of the support scenario, we have developed a real-time event support platform.

In the real-time support platform, different scene leaders and support duty are divided according to the business scenario, and all online service alarms and abnormal indicators will be displayed on the duty page of the corresponding support personnel in the form of real-time push. For common alarm types, such as CPU over, service throttling is directly associated with the SOP manual, and the on-duty personnel can complete the processing plan based on the manual guidance. After the end of the guarantee, we can also generate a guarantee report based on the data records of the real-time guarantee platform, review the problems that occur in the guarantee process, the response timeliness, the execution results and the subsequent TODO.

12 highlight moments

The 2021 League of Legends World Finals Bilibili Live achieved a record of more than 10 million people online on a single platform. The whole game service runs stably and the user watches smoothly, which is the highlight moment of the live broadcast.

13 Future outlook

The technical architecture of live broadcasting is still evolving, and it continues to evolve in the direction of business architecture governance, multi-activity and unitization around service stability and high availability construction. Expect a record number of people online in this year’s LOL S12.

If you have details you want to know or topics of interest, please feel free to leave a message to discuss.