Fine Tuning TensorFlow Bert Model for Sentiment Analysis
Bidirectional Encoder Representations from Transformers (BERT) is a major advancement in the field of Natural Language Processing (NLP) in recent years. BERT achieves good performances in many aspects of NLP, such as text classification, text summarisation and question answering.
In this article, I will walk through how to fine tune a BERT model based on your own dataset to do text classification (sentiment analysis in my case). When browsing through the net to look for guides, I came across mostly PyTorch implementation or fine-tuning using pre-existing dataset such as the GLUE dataset. Therefore, I would like to provide a guide on the Tensorflow implementation using my own customised dataset.
Hugging Face library provides convenient pre-trained transformer models to be used, including BERT. We will be using TFBertForSequenceClassification, the tensorflow implementation of fine tuning BERT model. This pretrained model is trained on the Wikipedia and Brown corpus. The following code installs the library and loads the pretrained model.
After loading the pretrained model, it is time to load our dataset. In my project, my dataset consists of two columns — sentence and polarity. A polarity of 0 means negative sentiment for the corresponding sentence, while a polarity of 1 means positive.
Next, we need to format the data such that it is recognised by the TFBertForSequnceClassification model. The pretrained BERT model takes three input features — input ids, token type ids and attention masks. Input ids are an id number assigned to each word based on the existing BERT vocabularies. Since BERT tokenizer helps you to pad your sentence with 0 so that every sentence is of the same length, token type ids are required to differentiate between actual words and paddings. Attention masks help to recognise which sentence does the word belong to. BERT tokenizer has a function encode_plus which converts your raw sentences into the three input features. The following code helps to organise your dataset in Tensors, such that it is compatible with BERT tensorflow implementation.
With the dataset and BERT pretrained model in place, we could fine-tune the model such that it suits our purposes. Tensorflow implementation is a very simple way to train your model, as shown in the code below.
And now you’re done! If you would like to make sentiment predictions on your test dataset, simply follow the code below, where pred_sentences is a list containing your test sentences.
Of course, there are other models other than BERT, for example XLNet, RoBerta and GPT. However, different models will take different input data formats, thus you might need to spend some time converting your raw dataset to their accepted format. One thing to note is that if you are only required to do sentiment analysis on very general sentences, most of the time you could already achieve a good result without fine tuning the model. Fine tuning is usually only needed if you are planning to do sentiment analysis on a very specific subject matter, for example the outlook of Bitcoin prices. Rising prices could mean positive and negative to different people, so it matters to fine tune your sentiment classifier.
Happy exploring!