Date of Award

Fall 2024

Document Type

Thesis (Master's)

Department or Program

Computer Science

Abstract

This case study investigates the use of Large Language Models (LLMs) and prompt engineering techniques for data augmentation,in the context of post-trial inter- view data (low-resource natural language data) from a randomized clinical trial on Pain Reprocessing Therapy (PRT) for chronic back pain. The limited sample size from the original study necessitates the generation of synthetic interview samples, needed to draw meaningful inferences from the data and to further extend the study. This study explores the efficacy of various prompting techniques in different approaches such as sentiment inversion and personality-based augmentation. The goal is to elicit desired responses from LLMs to generate high-quality synthetic interview transcripts. These transcripts should accurately reflect the diversity of patient responses, while this process simultaneously served as platform to gain a deeper understanding of LLMs’ capabilities. The study explored a combination of prompting techniques such as zero-shot, few-shot, generated knowledge, maieutic, and least-to-most prompting. The final approach involved proposing a customized data augmentation pipeline that incorporated psychological profiles (O.C.E.A.N. personality traits) to generate diverse perspectives on treatment effectiveness. The LLMs generated qualitative segments of the synthetic interviews were evaluated with online AI checkers to ensure they appeared human-like, where as the quantitative data was compared with the orig- ii inal clinical trial data. Through our study, we intend to demonstrate the potential of LLMs to gener- ate domain-specific data. This study implies the promising capabilities of LLMs by generating realistic interview data and simulating complex outcomes, offering valuable insights for future research in data augmentation and LLMs applications steered through various prompting techniques.

Share

COinS