Author ORCID Identifier

https://orcid.org/0000-0002-5163-966X

Date of Award

Spring 6-9-2024

Document Type

Thesis (Ph.D.)

Department or Program

Computer Science

First Advisor

Soroush Vosoughi

Abstract

The field of Natural Language Processing (NLP) has undergone a significant transformation with the emergence of large language models (LMs). These models have enabled the development of human-like conversational assistants (e.g., OpenAI's ChatGPT), and expert-level AI software engineering agents (e.g., Devin from Cognition Lab). However, these models face a fundamental challenge related to their training methodology. Predominantly trained on vast datasets scraped from the web, their self-supervised learning objective---predict missing tokens---unintentionally perpetuates the biases, inaccuracies, and sensitive information inherent in their training data. These issues lead to what is termed misaligned}behaviors and pose a significant hurdle in the development of reliable and trustworthy AI systems.

The goal of "aligning language models with the human world'' is to mitigate these challenges by ensuring that language models align more closely with human knowledge and societal values. This thesis introduces innovative training paradigms that enable language models to derive learning from grounded experiences or simulated social interactions, enhancing their alignment with human expectations. This work also explores the concept of "scalable oversight", an area focused on leveraging the superior capabilities of AI models to assist humans in ensuring that LLMs operate in a manner that is consistent with human values and knowledge. Through these innovations, this thesis contributes to the development of more reliable, accurate, and societally aligned LMs, addressing one of the key challenges in the field of NLP.

Recommended Citation

LIU, RUIBO, "Aligning Language Models with the Human World" (2024). Dartmouth College Ph.D Dissertations. 276.
https://digitalcommons.dartmouth.edu/dissertations/276

Download

Included in

Computational Engineering Commons

COinS

Dartmouth College Ph.D Dissertations

Aligning Language Models with the Human World

Author ORCID Identifier

Date of Award

Document Type

Department or Program

First Advisor

Abstract

Recommended Citation

Included in

Browse

Search

Contribute

Questions?

Dartmouth College Ph.D Dissertations

Aligning Language Models with the Human World

Author

Author ORCID Identifier

Date of Award

Document Type

Department or Program

First Advisor

Abstract

Recommended Citation

Included in

Share

Browse

Search

Contribute

Questions?