Date of Award

6-1-2022

Document Type

Thesis (Undergraduate)

Department

Computer Science

First Advisor

Prof. Sarah Preum

Abstract

The ability of patients to understand health-related text is important for optimal health outcomes. A system that can automatically annotate medical entities could help patients better understand health-related text. Such a system would also accelerate manual data annotation for this low-resource domain as well as assist in down- stream medical NLP tasks such as finding textual similarity, identifying conflicting medical advice, and aspect-based sentiment analysis. In this work, we investigate a state-of-the-art entity set expansion model, BootstrapNet, for the task of medical entity classification on a new dataset of medical advice text. We also propose EP SBERT, a simple model that utilizes Sentence-BERT embeddings of entities and context patterns to more effectively capture the semantics of the entities. Our experiments show that EP SBERT significantly outperforms a random classifier baseline, outperforms the more complex BootstrapNet by 5.2 F1 points, and achieves a 5-fold cross validated weighted F1 score of 0.835. Further experiments show that EP SBERT achieves a weighted F1 score of 0.870 when we remove a peripheral class whose inclusion is nonessential to the problem formulation, and a weighted F1 score of 0.949 when using top-2 evaluation. This makes us confident that EP SBERT can be useful when building human-in-the-loop data annotation tools. Finally, we perform an extensive error analysis of EP SBERT, identifying two core challenges and future work. Our code will be made available at https://github.com/garrettjohnston99/EP-SBERT.

Recommended Citation

Johnston, Garrett, "Leveraging Context Patterns for Medical Entity Classification" (2022). Computer Science Senior Theses. 15.
https://digitalcommons.dartmouth.edu/cs_senior_theses/15

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons, Health Information Technology Commons

COinS

Computer Science Senior Theses

Leveraging Context Patterns for Medical Entity Classification

Date of Award

Document Type

Department

First Advisor

Abstract

Recommended Citation

Included in

Browse

Search

Contribute

Links

Questions?

Computer Science Senior Theses

Leveraging Context Patterns for Medical Entity Classification

Author

Date of Award

Document Type

Department

First Advisor

Abstract

Recommended Citation

Included in

Share

Browse

Search

Contribute

Links

Questions?