Date of Award

Spring 5-2026

Document Type

Thesis (Master's)

Department or Program

Engineering Sciences

First Advisor

Peter Chin

Second Advisor

Eugene Santos

Third Advisor

Nikhil Singh

Abstract

Generative music has reached an inflection point. AI systems turn text prompts into album-quality audio in seconds, and the browser has made music production one click away. What these systems cannot do is collaborate at the level musicians actually think. True artistry lies in chord theory, polyphony, subjective nuance, and the high-level musical reasoning that spans every harmonic choice and mix decision.

This thesis presents MODULO (Musician-Owned DAW for User-Led Orchestration), an AI-native desktop generative audio workstation. Built in C++ on Tracktion Engine, MODULO embeds structured agentic co-creation directly inside a professional multi-track timeline with Virtual Studio Technology (VST) and Audio Unit (AU) plugin hosting, low-latency audio, and full mixing infrastructure, so that every generated note remains subject to the musician's ear and intent.

In one session MODULO offers parallel customizable chord generation, a Chord Workshop that treats harmony as an editable abstraction, and a rule-based adaptive harmony engine that produces multiple independent voice lines in seconds. An audio-to-MIDI (Musical Instrument Digital Interface) pipeline converts live recordings into editable material, and one-click stem separation opens finished tracks back up for rework. Prompt-conditioned music generation handles full-song synthesis with automatic section-aware stem layout, a sound-effects module fills in non-melodic content, and a composition planner turns free-form intent into section-level blueprints. Underneath sits a parametric-equalization (EQ) mixer with buses, sends, and plugin hosting, wrapped in a musician-facing interaction layer of domain-specific hotkeys and contextual affordances.

Five within-subjects user studies with working musicians drawn from both university and industry settings, four at N=25 and one at N=10, validated the system using the NASA Task Load Index (NASA-TLX), the User Engagement Scale Short Form (UES-SF), the System Usability Scale (SUS), and paired t-tests with Benjamini-Hochberg correction. Treatment arms cut time-to-satisfaction by 50-89%, reduced NASA-TLX workload by 47-74%, and drove forced-choice preference to 64-100% across every measured dimension. A final integrated evaluation with five professionals returned a median SUS of 85.0, well above the 68-point acceptability threshold.

MODULO argues that the contribution lies in the joint design of model and interface: a generative model is only useful inside a DAW if the surrounding interface lets a musician audition, compare, and selectively commit; an interface that promises agency is only useful if the underlying model produces meaningfully diverse, theory-aware candidates for the musician to choose between. Each subsystem in this thesis pairs a specific modeling strategy with a specific interaction paradigm, and the empirical claims belong to that pairing, not to either component in isolation. With these joint design strategies, generative power and musical agency are not in tension but mutually reinforcing.

Available for download on Tuesday, May 11, 2027

Share

COinS