Total 535 | 2022 Episode 052

1. Project background

2. Project Objectives

3. Scheme selection

4. Practice and Explore

4.1 Problems and Challenges

4.2 Precondition preparation

4.3 Data consistency between use case recording and playback

4.4 Operational consistency of use case recording and playback

4.5 Traceable automated testing

4.6 Maintenance of Use Cases

4.7 Cross-App playback use cases

4.8 Recording playback of buried points

5. Test process

5.1 Automated task triggering

5.2 Playback cluster scheduling

5.3 Asserting Services

5.4 Push messages

6. Landing and practice

6.1 Business co-construction

6.2 Practical Effects

One-click environment simulation to solve the environment preparation before the implementation of cumbersome use cases.

Before testing a use case, a lot of preparatory work often needs to be done, such as switching API environments, locating to a certain location, logging in to a specified account, and so on. These environmental conditions that need to be prepared are collectively referred to as preconditions. We know that the preparation of preconditions is usually not a step or two can be completed, such as account login / switching: we need to enter the login page, fill in the mobile phone number + password / verification code, click on the login and a series of actions to complete the process, very cumbersome, and every test we need to prepare, high repeatability. So we designed a separate precondition module for AlphaTest, splitting the use case into two parts: precondition + action step.

Different from other test frameworks, AlphaTest uses SDK integration, but there is no intrusive way to the business, so you can write white-box code to achieve automatic configuration of preconditions, only need to add the required instructions on the platform, after being issued to the SDK, you can complete the automatic configuration of preconditions according to the relevant instructions, no longer need to repeat the relevant operations. And these preconditions support reuse and do not require repeated configuration each time the use case is prepared. AlphaTest’s preconditions not only have a default implementation based on Meituan’s internal services and underlying Hooks, but also provide APIs to support business side customization, such as implementing different account systems.

It’s not just the code that affects the execution of the use case, but also the data.

In many cases, the automation use case cannot be completed normally, perhaps because the local data and network data during App playback are inconsistent with the recording time, resulting in the blocking of the use case execution process or the difference in the display of the App interface. This is also the main factor in the low test pass rate of most automated test tools/platforms, so to ensure test success rate, we need to control variables to exclude the impact of data.

The data on which the app runs depends has two parts – local data and network data:

Accuracy of target positioning and accuracy of gesture positioning.

The essence of UI automation testing is to replace people to automatically do step-by-step operations (click, long press, input, swipe, etc.). Whether the operation of the recording and playback process is consistent and accurate directly affects the success rate of the test and determines the usability of the tool/platform.

Whether the operation behavior is consistent first needs to confirm whether the action goal is consistent. Unlike the general testing tools/platforms, AlphaTest uses a multiple positioning scheme of ViewPath + image + coordinates. Thanks to the way the SDK is integrated, our ViewPath can record more elemental view features and execute different matching strategies. When the target control finds an exception, it will be combined with image matching and coordinate matching to find the bottom to ensure that the target control can be accurately found when the interface changes little.

With control-based target positioning, for some common simple operation gestures, such as clicking, long pressing, assertion, and even input can be well supported, just need to find the corresponding control, in the control location of the corresponding touch event can be sent. We know that the touch events that the App really receives are precise touch points on the screen, which are distributed to the current App window after the system processes them, and the App continues to distribute after receiving the events until the best responder to the event is found, and then digests the event through the responder chain. So how do we determine the coordinate point of a touch event to restore? Since we are only sure of the control, this point naturally becomes the center point of the control.

In most cases, these work well, but for some cases where multiple response controls overlap, unexpected operational errors can occur. In order to solve such a problem, we have combined control positioning and coordinate positioning: positioning based on pure coordinates is a positioning method with very high positioning accuracy, but the stability is very poor, and it has sufficient reliability only if the screen resolution is completely consistent and the position of the playback page control is completely consistent, but this is often unrealistic and requires too much machine in the test environment.

Based on the positioning of the control, there is a problem of insufficient accuracy. Using coordinate positioning, if the positioning area is small enough, the less affected by the screen size will be, just determine the relative position in a small range. The positioning based on the control target can narrow the target area to a specified area, and we can just combine the two while solving the problem of positioning accuracy and stability.

For the support of complex gestures, we can also use a differential way to split a complex gesture into the composition of multiple simple gestures, for example, we can split the positioning of a sliding operation into two parts: the starting position and the end position, and the positioning of these two positions becomes two ordinary single-point gesture operation positioning, which can be positioned in the form of a target control + relative coordinates mentioned above. The core idea is to reduce the area of the target control based on the positioning operation of the screen coordinate point to achieve the effect of consistent operation behavior regardless of the device resolution.

Test the whole process record, the problem traceability is reached with one click.

The purpose of testing is to ensure the stability of the operation of the App, and when a bug occurs during the test process and the test fails, it is necessary to trace the cause of the problem, the scene that occurred, and even the specific execution steps. This is also what most automated testing tools/platforms lack, and even if problems are found, troubleshooting is difficult; This problem is more serious when testing manually, and it is often difficult to locate because many defects cannot be reproduced.

The smallest execution unit for AlphaTest’s automation use case is the action instruction, and we record the execution status of each instruction in the test process and the interface snapshot in the process, and conduct a preliminary analysis of the cause of the exception if the instruction execution fails. The execution of the entire use case is then combined into a complete test report that quickly traces the problem steps. In addition, we have added a large number of log reports and videotaped the entire use case testing process to further help troubleshoot problems. True to the use case playback test traceability.

Do automation use cases require a continuous investment in human resources to maintain? Do schema upgrades, page refactoring, and use cases need to be re-recorded?

Due to the large number of automation tools/platforms, a major problem that hinders long-term use is the high maintenance cost of use cases, many tools/platforms allow us to use automation, but we still need to continue to invest in the update of maintenance use cases, and the final efficiency gains are minimal. For use case update maintenance, we can sort out and divide into three scenarios:

Does the same code run on different apps, do I need to rewrite multiple use cases?

Some of Meituan’s business may be reused on multiple apps. For example, takeaway has independent apps, but at the same time it has to be reused on the Meituan and Dianping apps, these functions almost share a code, while testers have to test the business functions on each app and maintain multiple use cases. Since the business itself is implemented consistently, we can adapt to the differences between different apps to make a business case play back across multiple apps, which can reduce the cost several times, and these differences are mainly reflected in:

The AlphaTest platform supports the configuration of different data in the App dimension, and when the SDK detects that the playback environment of the use case is inconsistent with the recording environment, it will automatically map and adapt, so that the use case can run on different apps.

In addition to functional testing, we will face another important problem in our daily development and testing work is buried point testing. Therefore, we extend the automated testing of the buried point on the basis of automation. The core idea of the buried point automation test is to judge the timing and reporting parameters of the buried point by comparing the recording period and the playback period. In order to ensure the stability of the buried point automatic test, we mainly use the following barrier mechanism:

Buried point timing verification: For timing verification, the program does not support the rules of “1px exposure”, “pull-down refresh exposure”, “page switching exposure”, “cutting front and background exposure” of buried point exposure, the main reason is that the rules for each business party to expose the buried point are inconsistent, and the implementation of the rule will greatly couple the business code. In Timing Validation we currently only support:

[1] Click the buried point to report the timing verification, the program through the event monitoring and buried point type information to determine whether the timing of the click buried point report is generated under the click operation, if not to report an error.

[2] Buried point repeated reporting check, for the general situation of the user operation will not produce two identical buried point reporting, so the program will verify all the buried point logs that occur under an event for one-by-one verification, to detect whether there are 2 or more buried point logs are exactly the same, if there is an occurrence will report an error.

AlphaTest’s core testing process always focuses on the recording and playback of use cases, and the whole process involves core modules such as automatic task triggering, playback cluster scheduling, assertion service, and message push.

Taking the process of UI automation and buried point automation as an example, AlphaTest takes the business team as the basic unit, which can be associated with the test cases of each team and synchronize the status regularly. At the same time, based on the online review of requirements, the automation use cases are combined with PR, integration packaging, two-round regression and other nodes in the R&D process, and the automation use cases are triggered regularly and the results report is pushed to the relevant person in charge.

Recording use case:

[1] First select the test case to be recorded on the AlphaTest platform, open the App to be tested to scan the code to enter the use case to be recorded, at this time you can set the preconditions required for the use case (account information, mock data, positioning information, etc.), and then click the Start button, the phone will automatically restart and start recording.

[2] The user follows the test case steps to operate the mobile phone normally, AlphaTest will record all the user’s operation behavior, and automatically generate semantic description language for display on the AlphaTest platform, and at the same time, the network data, buried point data and other verification information will also be stored together.

[3] During the recording process, you can quickly open the assertion mode, and record the text extraction/screenshot of the elements you want to verify on the page, so that the same elements can be verified during subsequent playback.

[4] Once all the test steps have been performed, click the Save button to generate a cost bar automation use case.

Use case playback:

[1] Scan the QR code corresponding to the automation use case for playback, and the user’s recorded behavior and network data will be restored to one by one, and the whole process of video recording will be assisted for follow-up troubleshooting and traceability.

[2] When an assertion event is encountered during playback, the asserted element is text-extracted/screenshotted and uploaded to the AlphaTest platform. After playback is completed, the assertion screenshot at the time of playback and the assertion screenshot at the time of recording are compared as one of the entire test results.

[3] The buried point data during playback will also be recorded together, and compared with the buried point data at the time of recording and the reporting time, and the difference items will be automatically extracted.

[4] After playback is complete, a full test report is generated and the results are pushed to the relevant personnel via the OA.

Playback plan: In the second round of regression testing, the number of playback use cases is as high as hundreds, in order to automate the whole process, we provide the concept of playback plan, you can group multiple automation use cases for group management, each group is a playback plan. Triggering a scheduled playback automatically triggers all automation use cases within a plan. When the entire plan is executed, the designated plan owner or group is notified.

In the entire takeaway C-end agile iteration process, the packaging platform mainly undertakes the process from business demand initiation to demand delivery, as the upstream platform of AlphaTest, it can provide packaging information and trigger automated use case playback tasks. Here’s a brief demonstration of how AlphaTest interacts with the agile collaboration platform:

The whole testing process is really hands-free, in order to count as automation. So we set out to build our own cluster of automated machines that could perform testing tasks 24 hours a day, 7 days a week. In order to ensure that the task playback can be completed smoothly, we have added corresponding keepalive strategies at different stages. The success rate of completion of tasks has been greatly improved.

Use case assertion is the core step of the entire automated use case verification, and our assertion service can make text and image assertions based on the actual situation of the use case. Among them, the image assertion service relies on the self-built image comparison algorithm service, which can efficiently record and play back the comparison of assertion images, and the image contrast accuracy rate can reach more than 99%.

Recording Phase:

[1] Added automatic capture of assertion decision information during recording.

[2] As in the normal process, extract screenshot information for an area.

[3] Extract text content if it is a text component, or extract the binary encoding or image URL of an image if it is a picture component, and extract the layout information within the area.

Replay phase:

[1] During playback, extract content that is consistent with recording (text information, image encoding, area screenshots, layout information).

[2] Upload the assertion information during playback to the AlphaTest platform.

[3] The AlphaTest platform verifies the assertion results, starting with a model-based image comparison and, if the decision is consistent, directly labeling the result.

[4] If the decision is inconsistent, the “assertion failed dataset” is matched, and if it can match, the result is marked. If the match does not work, you need to manually select the match type.

[5] The match types are “Text Check”, “Verification Based on Image Information”, “Human Verification”. If the first two sentences are judged to be consistent, the result is directly marked. If the result of the “manual check” is indeed that the two figures are inconsistent, the result is directly marked and ended.

[6] If the results of the “manual check” are consistent, and all of the above determinations are inaccurate, it is necessary to manually classify the cause of the decision error in both graphs (the specific type to be determined) and store the assertion to the failed dataset.

[7] Model automatic training, when the data set exceeds a certain threshold, through timing triggering, or manual triggering, trigger the model automatic training, after the training is automatically deployed to the AlphaTest platform, continuous iteration.

Image service: The image comparison model uses a comparison algorithm based on metric learning to convert the consistency discriminant of image pairs into a similar metric problem of image semantics. Metric learning (Metric Learning), also known as distance metric learning (DML), is a type of machine learning. Its essence is the learning of similarity, which can also be considered distance learning. Because under certain conditions, similarity and distance can be converted to each other. For example, in two vectors of spatial coordinates, the degree of similarity can be measured by the size of the cosine similarity or by the distance of the Euclidean type. The metric learning network adopts the classic Siamese structure, uses the backbone network based on resnext50 to extract the high-level semantic features of the image, followed by spplayer to complete the multi-scale feature fusion, the fused feature output as a feature vector to express the semantics of the image, and ContrastiveLoss is used for metric learning.

[1] Pre-training process: The resnext50 network is a pre-trained model using ImageNet.

[2] Data enhancement: To increase the richness of the data and improve the generalization performance of the network, the data enhancement mainly includes random clipping of the lower right part of the image and the addition of a black mask (changing the label of the image pair accordingly). This data enhancement is in line with the actual situation of key screenshots and will not cause changes in the data distribution.

[3] Contrast loss: The contrast loss function uses ContrastiveLoss, which is a pair based loss in Euclidean space, whose role is to reduce the distance between consistent image pairs and ensure that the distance of inconsistent image pairs is greater than margin, where margin=2.

[4] Similarity metric: The similarity metric is also a method of calculating the Euclidean distance of the image to the feature vector and normalizing it to the interval [0, 1] as the output image pair similarity.

As the final part of the playback process, we rely on Meituan’s self-built message queuing service and OA SDK message push capabilities to push test reports in real time. On top of this, you can also customize the message template for the push demands of different teams.

The takeaway C end mainly undertakes all the core processes of users ordering, placing orders and delivering at the App end, with many scenarios and complex services, which also brings many challenges to the version testing of testers, of which the most core and most labor-intensive is the second round of regression testing. At present, the C-end adopts the development method of bi-weekly agile iteration, and each iteration cycle gives testers three days to carry out two rounds of core process regression, so the C-end test team has invested a lot of human resources, but even so, it is still difficult to cover all processes; AlphaTest was designed to solve this problem – full coverage of the UI testing process and automated verification.

Transformation and maintenance of use cases

In the early stage of the landing of AlphaTest in the takeaway C-end test team, we adopted the model of co-construction, that is, business R&D personnel and corresponding testers jointly carry out use case recording and maintenance; The core reason for recommending this working mode is that the original working mode of the second-round cycle in the C-end function iteration process is that the R&D personnel conduct the second-round smoke test, and after completing the test, submit the second-round package to the tester for the second-round regression test, so this is a link that both parties need to participate in; As the most important test process before the release is launched, the second round of testing is also the focus of testers and R&D personnel to ensure the normal operation of the core process.

After several rounds of use and running-in, this model has proved to be effective, in the whole C-end two-round use case conversion process, the tester is mainly responsible for the recording and iteration process of the use case, and the R & D personnel are mainly responsible for the statistics of the version playback data and the discovery and solution of the problem use case.

The takeaway two-round landing situation

At present, AlphaTest has landed in multiple takeaway businesses, supporting more than 15 versions of the second round of regression testing, and the use case coverage rate has reached 70%. Now it has covered the test work of Native, Mach, React Native, Meituan Mini Program, and H5 technology stack, and can support UI automation test, buried point automation test, dynamic loading success rate automation test, and barrier-free adaptation rate automation test.

In the future, we will explore in the two directions of “intelligence” and “precision”, covering more test scenarios and further improving the efficiency of test personnel.

7. References

———-  END  ———-

Read more