Date of Award

Spring 2022

Document Type

Thesis (Undergraduate)

Department

Computer Science

First Advisor

Sarah Masud Preum

Abstract

This work explores entity based sentiment analysis for textual health advice through deep learning. We fine tuned a pretrained BERT model to analyze sentiments across five different predetermined categories which consist of food, medicine, disease, exercise, and vitality for three different sentiments: positive, negative, and neutral. Original set of annotated medical dataset from Dartmouth College’s Persist Lab was used to conduct the experiments. For the aim of tailoring the data for the purpose of entity based sentiment analysis, we explored data transformation techniques to generate optimum training examples. During the experiments, we were able to discover that the wide variety and complexity of terms for the medicine and the disease category posed difficulty on the BERT model which we utilized masking techniques to mitigate. We also demonstrated that our model struggles to learn neutral sentiments in cases where instances of all labels are balanced(in both training and testing). Our system was able to achieve an overall F1 score of 0.739 when conducting the experiment with masked text on a balanced dataset.

Share

COinS