Date of Award

Spring 6-1-2021

Document Type

Thesis (Undergraduate)

Department or Program

Department of Computer Science

First Advisor

Soroush Vosoughi

Abstract

Tuning the complexity of one's writing is essential to presenting ideas in a logical, intuitive manner to audiences. This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model and a deep neural network model with an underlying Transformer architecture based on BERT. While BERT itself performs competitively, our feature engineering-based model helps in extreme cases, eg. separating instances of easy and neutral difficulty. Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonetic measures. Visualizations of BERT attention maps offer insight into potential features that Transformers models may implicitly learn when fine-tuned for the purposes of lexical complexity prediction. Our assembly technique performs reasonably well at predicting the complexities of single words, and we demonstrate how such techniques can be harnessed to perform well when on multi word expressions (MWEs) too.

Original Citation

Aadil Islam, Weicheng Ma, and Soroush Vosoughi. (2021). BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021).

Recommended Citation

Islam, Aadil, "Lexical Complexity Prediction with Assembly Models" (2021). Dartmouth College Undergraduate Theses. 222.
https://digitalcommons.dartmouth.edu/senior_theses/222

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

Dartmouth College Undergraduate Theses

Lexical Complexity Prediction with Assembly Models

Date of Award

Document Type

Department or Program

First Advisor

Abstract

Original Citation

Recommended Citation

Included in

Browse

Search

Contribute

Questions?

Dartmouth College Undergraduate Theses

Lexical Complexity Prediction with Assembly Models

Author

Date of Award

Document Type

Department or Program

First Advisor

Abstract

Original Citation

Recommended Citation

Included in

Share

Browse

Search

Contribute

Questions?