Author ORCID Identifier
https://orcid.org/0009-0008-4516-0135
Date of Award
1-2025
Document Type
Thesis (Master's)
Department or Program
Computer Science
First Advisor
Michael Casey
Second Advisor
Tim Tregubov
Third Advisor
Elizabeth Murnane
Abstract
This master's thesis introduces OpenMUSE (Open Multimodal Unified Sound Engine), a platform that demonstrates the potential of open-source AI music generation by integrating state-of-the-art deep learning models into a unified system. By unifying ten different open-source models, including MusicGen, AudioLDM2, and custom-trained text-to-symbolic music generation models, OpenMUSE aims to create a user-friendly interface that empowers artists to produce complex, adaptive musical compositions. The system enhances accessibility by providing a simple web interface and natural language controls, while improving controllability through features like melody conditioning and semantic audio editing. Specifically, OpenMUSE offers a digital audio workstation (DAW)-inspired interface that lowers the technical barriers to music production while providing AI-powered capabilities.
Quantitative and qualitative evaluations demonstrate significant improvements in both objective audio quality metrics and user satisfaction compared to existing single-model approaches. Through user studies, the research validates that the integration of specialized models for different aspects of the music creation process—from melody generation to stem separation—enhances creative control without sacrificing ease of use. The system's architecture demonstrates how thoughtful interface design and model integration can make sophisticated AI music tools accessible to users of varying technical expertise.
Key contributions include:
- A web interface that integrates multiple open-source music generation models, providing intuitive controls for practical music creation workflows.
- A custom-trained text-to-symbolic music generation model that translates textual descriptions into symbolic musical representations in MIDI format.
- A unified system that bridges different modalities (text, audio, image) and distills the capabilities of various models into five core music creation tasks: accompaniment generation, text-to-music generation, audio editing, stem extraction, and text-to-MIDI generation.
- Detailed documentation of the attempts and challenges in integrating and optimizing various AI models for this music generation platform.
Recommended Citation
Vergho, Tyler K., "OpenMUSE: Integrating Open-Source Models into Music Creation Workflows" (2025). Dartmouth College Master’s Theses. 202.
https://digitalcommons.dartmouth.edu/masters_theses/202
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons, Software Engineering Commons