Author / Mu Yan

Editor / Ah Han

This article is the fourth article of the “Label Portrait Series”, we have previously introduced the label portrait system construction methodology, label system design and processing, label processing and storage, this time we will introduce the “label scoring”.

Label scoring is an important measure of label governance, by scoring labels, you can clearly and intuitively evaluate labels from various dimensions, grasp the real use of labels, and continuously optimize labels to help business operations. At the same time, it can also help the data team determine which tags should be more invested in computing and storage resources, and reasonably plan cluster resources.


Why use tag scoring?

After the early label system design and label processing, the label can finally be put online, so that business personnel can use it and play a valuable role!

With the tag online for a period of time, we began to care about the daily occupation of computing resources and storage space, running out of hundreds of tags, how much business students really used, business revenue can cover the cost of data? After the label is launched, how is its quality, and is there a situation where the old rules do not apply and need continuous optimization?

With this problem in mind, we need a way to evaluate the use of tags after they go live and identify the value of each tag. Referring to the forms of movie ratings, Huabei ratings, etc., we decided to also give the labels a score and an order, which is simple and clear.


Label scoring model

Label scoring model, after consideration we selected 5 dimensions as scoring parameters:

Total label score = a * Label usage score + b * Label attention score + c * Label quality score + d * Label continuous optimization read score + e * Label safety score

Among them, the degree of label use, label attention, label quality, and continuous optimization of labels are the core dimensions, and the label safety can be considered according to the actual situation. a, b, c, d, e are weights, which add up to 100%.


Label usage scoring

Label usage to evaluate the use of labels being analyzed and external systems.

In the Kangaroo Cloud Label product, the label has several use cases:


• Label reference: For example, atomic tags are applied to derivative tags, derivative tags are referenced by combined tags, etc., based on this scenario, the “number of tag citations” indicator is calculated.

• Label analysis: The “number of label analysis” metrics is calculated when the label is analyzed in the image analysis functions such as tag circle group, group portrait, group comparison, and significance analysis.

• Label calls: The number of times a tag is queried by an external application through the data API to calculate the “number of tag calls” indicator.

Based on the above three indicators, we first use the Sigmoid function to convert the indicators into scores, and then summarize the scores of each indicator into a label usage score.


Tag attention score

Tag attention to evaluate how much it is searched, viewed, and collected.

In kangaroo cloud labeling products, label attention is related to the following scenarios:


• Tag search: Calculate the “tag search” metric when a tag is searched by a user in the tag market.

• Tag Viewing: The number of times a tag is clicked to view basic information, analysis pages, etc., and the “Tag View” metric is calculated

• Tag Collection: The number of users who collect the tag, calculated by the “Number of Favorite Users” indicator

The above 3 indicators can reflect the popularity of label attention, we still use the Sigmoid function to convert the indicator into a score, and then the score of each indicator is weighted into a label attention score.


Label quality score

Label quality, to evaluate the user’s marking situation, reflecting the rationality of the labeling rules.

When we define the label and the label value, after calculation, the label value is very small on the user, which means that our rule implementation is not reasonable. For example, we define the label of “activity”, which is divided into “high activity, medium activity, low activity”, etc., but the real users who are marked with this label are less than 70%, and a large part of the proportion is null, and the label is not labeled, indicating that the label value rules we have formulated have loopholes and need to be improved.

The system calculates the “label coverage” of each label, normalizes the coverage to a score, and converts it into a score.


Continuous optimisation scoring

Continuous optimization to evaluate whether to optimize the tag after it is online.

In the life cycle of customers, there are constantly new users coming in and silent users losing. Company strategy adjustments, product releases, etc. will affect customer behavior, these changes we need to present in the form of data, so we need to constantly adjust our labeling strategy according to business adjustments, customer changes, in order to pursue through the label directly and quickly reflect the customer situation, guide business operations.

Continuous optimization, which we evaluate through the “Tag Optimizations” metric, refers to the number of times a tag has been edited and republished after it has gone live. We also use the Sigmoid function to convert the indicator into a score.


Security score

Label safety does not reflect the popularity of the label, but it is also used as a dimension of the label score, which can be considered according to the situation of the enterprise.

In Kangaroo Cloud Label products, tag security-related policies are:


• Visibility of labels: The user range of the labels can be edited and viewed

• Whether authorization is required for label use: After the label is published, others use the label and whether they need to apply for approval

• Whether the label is row-level permission control: Above we control the column permissions of the label, and the row-level permissions reflect whether the label has row-level permissions set

• Whether the label is desensitized: Whether the label is desensitized

Based on the security policy configuration of the label, we also use the scoring method to evaluate.

Based on the above 5 dimensions of the score, we summarize the weighted summary according to the previously mentioned formula to obtain the total score.


The application of tag scoring

Based on tag scoring, in order to more intuitively let tag administrators and business personnel view popular tags, silent tags, etc., through the way of leaderboard:


Top tag leaderboard

Based on the three angles of label usage, attention, and continuous optimization, the popular score of the label is calculated, and the popular label of TOP N is displayed.


Silent label leaderboard

The reverse ranking of popular tags is silent tags, which indicate that the usage of these tags is very low, and you can consider going offline regularly to save cluster resources.


Comprehensive leaderboards

The comprehensive leaderboard is sorted according to the comprehensive score of the label, and the label is evaluated from several dimensions such as label usage, attention, continuous optimization, quality, and safety.


Label usage, attention, continuous optimization,

Quality and safety are ranked in the sub-list

Users can view the leaderboards of each sub-dimension of label usage, attention, continuous optimization, quality, and safety according to the dimensions they are more concerned about. At the same time, you can view the specific metrics of each label, such as the usage dimension, you can view the current number of references, analysis times, and calls of each label, and analyze the specific indicators to meet different label analysis scenarios.

After the label scoring model is launched, we need to adjust the weights of different dimensions according to the actual situation to meet our own actual conditions. When after a period of application, everyone recognizes this set of evaluation logic, you can transform the static scoring display into dynamic alarms, automated governance, etc., and set label quality alarms and scoring alarms to automatically notify label administrators and responsible persons.

The above is the scoring logic applied in the product, hoping to help everyone, and can also put forward different ideas to optimize the scoring model to achieve a better label governance effect.

For more information about the Kangaroo Cloud Indicator Product “Customer Data Insight Platform”, please click to read the original article.

Past Recommendations

▫ If you want precision marketing, start by learning to build a set of the right label system丨DTVision analysis and insight

▫ From insight to decision-making, this article interprets the methodology of label portrait system construction丨DTVision analysis insights

▫ Finally, someone explained the processing content of different labels and the falling library clearly丨DTVision analysis insight article

▫ The practical five-step method teaches you the design and processing of the indicator system丨DTVision analysis insights