Date of Award

Spring 6-1-2021

Document Type

Thesis (Undergraduate)


Computer Science

First Advisor

Soroush Vosoughi


The growing popularity of social media as a platform to obtain information and share one's opinions on various topics makes it a rich source of information for research. In this study, we aimed to develop a framework to infer relationships between demographic and psychographic characteristics of a user and their opinion on a specific narrative - in this case, their stance on taking the COVID-19 vaccine. Twitter was the chosen platform due to the large USA user base and easily available data. Demographic traits included Race, Age, Gender, and Human-vs-Organization Status. Psychographic traits included the Big Five personality traits (Conscientiousness, Neuroticism, Openness, Agreeableness, Extraversion), Risk Seeking, Risk Aversion, Inward Focus, and Outward Focus. Our pipeline involved preprocessing the data, labelling tweets as vaccine-hesitant using distant supervision, training a vaccine hesitancy classifier to classify a second dataset, obtaining demographic and psychographic inferences for each user, and finally running a logistic regression with vaccine hesitancy as the dependent variable and sets of demographic and psychographic characteristics as the independent variable. We achieved an F1 score of 0.947 for our classifier and found statistically significant trends in vaccine hesitancy for race, age, gender, and human-vs- organization status. On the other hand, there were no significant relationships between any of the psychographic traits and vaccine hesitancy. It should be noted that this study was not pre-registered and the values for all variables (dependent and independent) come from noisy classifiers. As such, these results should only be viewed as a preliminary analysis of the demographic and psychographic factors correlated with vaccine hesitancy. We conclude that such a framework is a useful tool to identify the relations between different demographics and popular narratives. Further work and better data are necessary to improve the framework to the point where the strength of the correlations can be considered and not just the overall relationships. Furthermore, while psychographic traits yielded no significant results, there were several limitations in their inference, and focusing on improving psychographic trait inference is an important avenue for future studies.