This article is from members of the Education-Intelligent Learning-Front End team and has been published with the authorization of ELab.
Intelligent Learning Front-end Team Since its inception, the team has focused on breaking the stereotype of education by the public, breaking through the inherent teaching thinking, and breaking through various teaching barriers. It aims to develop the most appropriate learning plan for each student, and to teach it according to their aptitudes, so that quality education can be reached with “touch”.
The process of learning nlp, like upgrading to fight monsters, each stage is a hurdle, and to get out of the novice village, you need to cross these hurdles
Having previously shared how front-end engineers can quickly use an NLP model, this article is a small step forward in this article
This article is expected to take 30min, through this article mainly to obtain several knowledge points:
There are two distinct stages in the development of the NLP task, we take the bert model as the distinguishing point, the first half is the basic neural network stage (the stage before the bert model) and the second half is the BertTology stage (the stage after the bert model).
Refer to https://zhuanlan.zhihu.com/p/148007742
1950-1970 – Adopted a rules-based approach
1970-early 20th century – adoption of a statistical-based approach
2008-2018 – RNN, LSTM, GRU introduced deep learning
nowadays
The directions are divided into two directions
https://zhuanlan.zhihu.com/p/56802149
The following image is the nlp task category available on the huggingface
This is a bit of a big concept, this article in order to avoid redundancy and cumbersome, mainly emphasize two places, convenient to have a general understanding
Individual neurons are the basis of neural networks, just like neurons in the biological world (dendrites determine input, output; Axons complete signal transmission)
The mathematical representation is as follows: Output=f(∑n(x*w)+θ)
It can be seen that a neuron can accept multiple parameters (x1, x2、…、xn), each parameter will be configured with a corresponding weight w1, w2, wn, after the weighted sum, plus a bias value θj, after an activation function f processing to get the output.
Activation function function: add nonlinear factors to solve the lack of linear model expression ability, fit more cases
Where the values of w and θ are obtained by the model training, a neural network training process is to adjust the weight values of each neuron model to the best so that the overall prediction effect is the best
Loss function: Calculates the error between the output value and the target value
Backpropagation: Pass the error to the weights, let the weights be adjusted appropriately, and finally minimize the error between the output of the forward propagation and the label
Learning rate: Backpropagates the size of the steps, controls the adjustment amplitude, and finds a balance between precision and speed
Optimizer: It generally requires repeated iterations to find the right weight, which is time-consuming, so we use algorithms through a set of strategies (optimizers) to adjust the parameters in place faster and better
But, in the process of writing code, we don’t need to handwrite a loss function, handwrite an optimizer, Pytorch will help you encapsulate into an api; And in most scenarios you don’t even need to write a neural network by hand or train someone else to write a neural network, because we can use a pre-trained model directly and use it right out of the box
The bert model was mentioned earlier, the bert model is a pre-trained model, and the following is a brief pre-trained model
Specifically, we can review how front-end engineers can quickly use an NLP model
Third-party (mainly tripartite institutions) use datasets that have already been trained on models, and usually, we can use them out-of-the-box
The training cost of some pre-trained models
Many open source pre-trained models are committed to github or published to huggingface[1].
There is also a similar hugginface platform in China – Baidu paddle[2], but it is still the most used huggingface
Huggingface is mainly used in two ways:
Method 1: With the help of the huggingface encapsulated pipeline, a line of code is called
Method 2: With the help of atomization APIs (models, tokenizers) provided by huggingface transformers, etc
Atomizing APIs uses three steps:
Merit:
Problems with the problem:
The pre-trained model is like a hexagonal warrior, learning the features in the massive data, so that the ability indicators are good, but in a specific scenario, it cannot focus on learning certain characteristics of a specific business, so it cannot be as accurate as a sharp knife
So how do you fix it?
The answer is fine-tuning: let the pre-trained model learn the characteristics of the dataset in a specific business scenario and work better in a particular domain.
Based on the Bert model, fine-tune a Chinese Gestalt Fill-in-the-Blank task
The reason why this task was chosen is because there are fewer articles on fine-tuning Chinese models, and the ones that fine-tune Gestalt fill-in-the-blank are basically not found:
BERT is a model trained by predicting shielded subwords, which achieves excellent results in statement-level semantic analysis.
Block sub-words: Block some words in the sentence first, and then let the model predict the blocked words
Mask example
Original sentence: I love China
After the mask: I love [MASK] country
Bert will train 15% of the words in the text to mask the operation, of which there are special rules for how to mask the 15% of the words that need to be masked:
You can see that the results of model reasoning are not bad, and you can deduce common personal names
But our goal is to “change history” and let the model predict the “Three Kingdoms character Zhuge Tao” and achieve crossing, so how to do it?
Online operation address:
https://colab.research.google.com/drive/12SCpFa4gtgufiJ4JepLMuItjkWb6yfck?usp=sharing
train.json
Load the corpus code
Define training and test sets
Train the code
Training logs
Training is over
Successfully added “Zhuge Tao” to the prediction
After completing this course, it is considered that the first exploration of level3 has been successful!
Are the two flags done?
Pre-trained models are a boon for ordinary users, and by fine-tuning pre-trained models, everyone can collect and construct their own corpus to create their own unique nlp model, and perhaps everyone is a staff engineer 🤫
Huggingface course course
“Natural Language Processing Based on Bert Model”
huggingface: https://huggingface.co/
Baidu paddle: https://www.paddlepaddle.org.cn/hublist
That’s all for this sharing and hope it helps you ^_^
If you like it, don’t forget to share, like, and collect three times.
Welcome to pay attention to the public number ELab team to receive a good article from the big factory
Intelligent Learning Front-end Team Since its inception, the team has focused on breaking the stereotype of education by the public, breaking through the inherent teaching thinking, and breaking through various teaching barriers. It aims to develop the most appropriate learning plan for each student, and to teach it according to their aptitudes, so that quality education can be reached with “touch”.
ByteDance School/Social Recruit Internal Push Code: C4QC2V7
Drop off link: https://job.toutiao.com/s/2jML178