Date of Award

Spring 5-31-2023

Document Type

Thesis (Undergraduate)


Computer Science

First Advisor

Inas Khayal


Improving patient-centered care necessitates accurate documentation of care preferences, a crucial aspect often underrepresented in administrative data. Most studies apply care documentation to specific patient populations, rather than more appropriately broad population of `seriously ill' patients. This paper addresses this gap by leveraging transformer-based machine learning models, exhibiting an improvement over traditional keyword-based search methods in identifying care preference documentation.

In order to capture a broad spectrum of seriously ill patients, we matched decedent patients to non-decedent counterparts by utilizing a propensity score matching, accounting for important variables like age, gender, primary diagnoses and commodities. We trained and fine-tuned Bio_ClinicalBERT and ClinicalLongformer models on a large dataset consisting of patient discharge summaries from last visit admissions and admissions within 6 months of last visit. By concentrating on key textual components within these summaries, we were able to enhance the signal-to-noise ratio, which consequently led to the capture of contextually nuanced mentions of care preference documentations missed by traditional keyword-based search methods.

These models demonstrated high sensitivity and specificity compared to industry-standard keyword-search methods, proving adept at interpreting complex clinical concepts, particularly for the multi-label text classification task for identifying care preference subdomains like goals-of-care conversations, code status clarification, referral to a palliative care specialist and hospice care. This study serves as a strong argument for the continued need for domain-specific pre-training of language models, particularly in the biomedical domain. Our findings not only contribute to enhancing end-of-life communication and aligning treatment with patients' care objectives but also pave the way for future research in this promising domain, with potential implications for improving patient care quality.

Available for download on Tuesday, June 04, 2024