This article is from members of the Education-Intelligent Learning-Front End team and has been published with the authorization of ELab.

Intelligent Learning Front-end Team Since its inception, the team has focused on breaking the stereotype of education by the public, breaking through the inherent teaching thinking, and breaking through various teaching barriers. It aims to develop the most appropriate learning plan for each student, and to teach it according to their aptitudes, so that quality education can be reached with “touch”.

The process of learning nlp, like upgrading to fight monsters, each stage is a hurdle, and to get out of the novice village, you need to cross these hurdles

Having previously shared how front-end engineers can quickly use an NLP model, this article is a small step forward in this article

This article is expected to take 30min, through this article mainly to obtain several knowledge points:

There are two distinct stages in the development of the NLP task, we take the bert model as the distinguishing point, the first half is the basic neural network stage (the stage before the bert model) and the second half is the BertTology stage (the stage after the bert model).

Refer to https://zhuanlan.zhihu.com/p/148007742

1950-1970 – Adopted a rules-based approach

1970-early 20th century – adoption of a statistical-based approach

2008-2018 – RNN, LSTM, GRU introduced deep learning

nowadays

The directions are divided into two directions

https://zhuanlan.zhihu.com/p/56802149

The following image is the nlp task category available on the huggingface

This is a bit of a big concept, this article in order to avoid redundancy and cumbersome, mainly emphasize two places, convenient to have a general understanding

Individual neurons are the basis of neural networks, just like neurons in the biological world (dendrites determine input, output; Axons complete signal transmission)

The mathematical representation is as follows: Output=f(∑n(x*w)+θ)

It can be seen that a neuron can accept multiple parameters (x1, x2、…、xn), each parameter will be configured with a corresponding weight w1, w2, wn, after the weighted sum, plus a bias value θj, after an activation function f processing to get the output.

Activation function function: add nonlinear factors to solve the lack of linear model expression ability, fit more cases

Where the values of w and θ are obtained by the model training, a neural network training process is to adjust the weight values of each neuron model to the best so that the overall prediction effect is the best

Loss function: Calculates the error between the output value and the target value

Backpropagation: Pass the error to the weights, let the weights be adjusted appropriately, and finally minimize the error between the output of the forward propagation and the label

Learning rate: Backpropagates the size of the steps, controls the adjustment amplitude, and finds a balance between precision and speed

Optimizer: It generally requires repeated iterations to find the right weight, which is time-consuming, so we use algorithms through a set of strategies (optimizers) to adjust the parameters in place faster and better

But, in the process of writing code, we don’t need to handwrite a loss function, handwrite an optimizer, Pytorch will help you encapsulate into an api; And in most scenarios you don’t even need to write a neural network by hand or train someone else to write a neural network, because we can use a pre-trained model directly and use it right out of the box

The bert model was mentioned earlier, the bert model is a pre-trained model, and the following is a brief pre-trained model

Specifically, we can review how front-end engineers can quickly use an NLP model

Third-party (mainly tripartite institutions) use datasets that have already been trained on models, and usually, we can use them out-of-the-box

The training cost of some pre-trained models

Many open source pre-trained models are committed to github or published to huggingface[1].

There is also a similar hugginface platform in China – Baidu paddle[2], but it is still the most used huggingface

Huggingface is mainly used in two ways:

Method 1: With the help of the huggingface encapsulated pipeline, a line of code is called

Method 2: With the help of atomization APIs (models, tokenizers) provided by huggingface transformers, etc

Atomizing APIs uses three steps:

Merit:

Problems with the problem:

The pre-trained model is like a hexagonal warrior, learning the features in the massive data, so that the ability indicators are good, but in a specific scenario, it cannot focus on learning certain characteristics of a specific business, so it cannot be as accurate as a sharp knife

So how do you fix it?

The answer is fine-tuning: let the pre-trained model learn the characteristics of the dataset in a specific business scenario and work better in a particular domain.

Based on the Bert model, fine-tune a Chinese Gestalt Fill-in-the-Blank task

The reason why this task was chosen is because there are fewer articles on fine-tuning Chinese models, and the ones that fine-tune Gestalt fill-in-the-blank are basically not found:

BERT is a model trained by predicting shielded subwords, which achieves excellent results in statement-level semantic analysis.

Block sub-words: Block some words in the sentence first, and then let the model predict the blocked words

Mask example

Original sentence: I love China

After the mask: I love [MASK] country

Bert will train 15% of the words in the text to mask the operation, of which there are special rules for how to mask the 15% of the words that need to be masked:

You can see that the results of model reasoning are not bad, and you can deduce common personal names

But our goal is to “change history” and let the model predict the “Three Kingdoms character Zhuge Tao” and achieve crossing, so how to do it?

Online operation address:

https://colab.research.google.com/drive/12SCpFa4gtgufiJ4JepLMuItjkWb6yfck?usp=sharing

train.json

Load the corpus code

Define training and test sets

Train the code

Training logs

Training is over

Successfully added “Zhuge Tao” to the prediction

After completing this course, it is considered that the first exploration of level3 has been successful!

Are the two flags done?

Pre-trained models are a boon for ordinary users, and by fine-tuning pre-trained models, everyone can collect and construct their own corpus to create their own unique nlp model, and perhaps everyone is a staff engineer 🤫

Huggingface course course

“Natural Language Processing Based on Bert Model”

huggingface: https://huggingface.co/

Baidu paddle: https://www.paddlepaddle.org.cn/hublist

That’s all for this sharing and hope it helps you ^_^

If you like it, don’t forget to share, like, and collect three times.

Welcome to pay attention to the public number ELab team to receive a good article from the big factory

Intelligent Learning Front-end Team Since its inception, the team has focused on breaking the stereotype of education by the public, breaking through the inherent teaching thinking, and breaking through various teaching barriers. It aims to develop the most appropriate learning plan for each student, and to teach it according to their aptitudes, so that quality education can be reached with “touch”.

ByteDance School/Social Recruit Internal Push Code: C4QC2V7

Drop off link: https://job.toutiao.com/s/2jML178