Author ORCID Identifier

https://orcid.org/0009-0001-6686-2725

Date of Award

Spring 5-2024

Document Type

Thesis (Master's)

Department or Program

Computer Science

First Advisor

Soroush Vosoughi

Second Advisor

James Mahoney

Third Advisor

Michael Cohen

Abstract

As the virtual reality (VR) industry continues to evolve, the question of how to effectively capture VR experiences for an audience remains a challenge. The predominant method of showcasing VR applications through first-person recordings lacks cinematic interest, failing to capture other viewpoints and the essence of the moment. Meanwhile, manually setting up cameras and editing videos requires technical expertise on behalf of the user. In this paper, we propose the use of machine learning (ML) to automatically select the most compelling predefined viewpoint in a VR environment, at any given moment. Our models, trained on actor motion and voice volume, aim to enhance the communication of VR experiences to viewers.

We train two models: a Researcher Model, based on camera selections from myself, and an Expert Model, based on selections from an outside expert. Through a within-subjects test, 33 participants viewed videos with camera selections made by a human director, by our models, and by randomly selecting cameras in the scene. Afterwards, users rated the models on four questions and matched videos to one of the three camera selection methods. Users correctly identified the videos made by our models at a rate higher than random (Researcher Model - 51.52%, Expert Model - 39.39% ), suggesting that there are still noticeable differences between our models' predictions and the actual human camera choices. We found that users, on average, most preferred watching the human director's video, then our model's video, and lastly the randomly selected camera video.

Share

COinS