Date of Award
Spring 5-2026
Document Type
Thesis (Master's)
Department or Program
Engineering Sciences
First Advisor
Peter Chin
Second Advisor
Eugene Santos
Third Advisor
Nikhil Singh
Abstract
Generative music has reached an inflection point. AI systems turn text prompts into album-quality audio in seconds, and the browser has made music production one click away. What these systems cannot do is collaborate at the level musicians actually think. True artistry lies in chord theory, polyphony, subjective nuance, and the high-level musical reasoning that spans every harmonic choice and mix decision.
This thesis presents MODULO (Musician-Owned DAW for User-Led Orchestration), an AI-native desktop generative audio workstation. Built in C++ on Tracktion Engine, MODULO embeds structured agentic co-creation directly inside a professional multi-track timeline with Virtual Studio Technology (VST) and Audio Unit (AU) plugin hosting, low-latency audio, and full mixing infrastructure, so that every generated note remains subject to the musician's ear and intent.
In one session MODULO offers parallel customizable chord generation, a Chord Workshop that treats harmony as an editable abstraction, and a rule-based adaptive harmony engine that produces multiple independent voice lines in seconds. An audio-to-MIDI (Musical Instrument Digital Interface) pipeline converts live recordings into editable material, and one-click stem separation opens finished tracks back up for rework. Prompt-conditioned music generation handles full-song synthesis with automatic section-aware stem layout, a sound-effects module fills in non-melodic content, and a composition planner turns free-form intent into section-level blueprints. Underneath sits a parametric-equalization (EQ) mixer with buses, sends, and plugin hosting, wrapped in a musician-facing interaction layer of domain-specific hotkeys and contextual affordances.
Five within-subjects user studies with working musicians drawn from both university and industry settings, four at N=25 and one at N=10, validated the system using the NASA Task Load Index (NASA-TLX), the User Engagement Scale Short Form (UES-SF), the System Usability Scale (SUS), and paired t-tests with Benjamini-Hochberg correction. Treatment arms cut time-to-satisfaction by 50-89%, reduced NASA-TLX workload by 47-74%, and drove forced-choice preference to 64-100% across every measured dimension. A final integrated evaluation with five professionals returned a median SUS of 85.0, well above the 68-point acceptability threshold.
MODULO argues that the contribution lies in the joint design of model and interface: a generative model is only useful inside a DAW if the surrounding interface lets a musician audition, compare, and selectively commit; an interface that promises agency is only useful if the underlying model produces meaningfully diverse, theory-aware candidates for the musician to choose between. Each subsystem in this thesis pairs a specific modeling strategy with a specific interaction paradigm, and the empirical claims belong to that pairing, not to either component in isolation. With these joint design strategies, generative power and musical agency are not in tension but mutually reinforcing.
Recommended Citation
Khoo, Taka, "MODULO: A Generative Audio Workstation for AI-Native Music Co-Creation" (2026). Dartmouth College Master’s Theses. 279.
https://digitalcommons.dartmouth.edu/masters_theses/279
Included in
Applied Statistics Commons, Artificial Intelligence and Robotics Commons, Audio Arts and Acoustics Commons, Cognition and Perception Commons, Cognitive Science Commons, Composition Commons, Computer Engineering Commons, Ergonomics Commons, Graphics and Human Computer Interfaces Commons, Human Factors Psychology Commons, Music Theory Commons, Signal Processing Commons, Software Engineering Commons
