MuseLeader: Toward Music Editing through Time-series Semantic Parameters Control using Large Language Model

Paper

Poster Presentation @ CMMR 2025 Schedule (4 November 2025)

Demonstration

Abstract

Fig.1 Interface of MuseLeader

Designing control methods in music generation systems is essential for music generation along with user preferences. In particular, parameter control provides an effective means of adjusting the atmosphere of generated music, such as ``brightness.'' Additionally, some music generation systems allow users to specify transitions in atmospheric intensity. However, parameter control is constrained by the frame problem, where users can only manipulate the parameters predefined by the system. To overcome this limitation, this study proposes an approach that leverages large language models (LLMs) to allow users to define parameter meanings through text. We also introduce MuseLeader, a working music composition system. This is equipped with a graphical user interface supporting customization of semantically defined time-series parameters. User studies indicate that parameters with clear semantic definitions (e.g. ``Powerful,'' ``robotic'') can be effectively controlled according to user intent. Additionally, some users refine their expressive intentions through changing parameter axis. For further advancements, it is essential not only to enhance the inference capabilities of LLMs but also to explore multimodal inputs beyond text to improve the interpretation of complex and nuanced musical concepts.

Interface Design

Time-series semantic parameters provide an interface to capture the evolution of a musical piece over time by specifying a mood and its intensity transition. Users adjust the mood's intensity at different time points by dragging circular markers (Figure 1(b)), and they select a specific mood by entering its name in the accompanying text box (Figure 1(a)). For example, entering "intensity" prompts the system to manage changes in the music's intensity; throughout the text, this is referred to as "the parameter axis." When the submit button is pressed, the system generates a musical piece with both a melody and a chord progression, assigning instruments accordingly. After generation, modifying parameters updates only the corresponding portions while preserving the overall motif; if only a subset of parameters is changed, only those specific sections are updated.

Editing Examples

The mood's intensity is represented as a numerical value between 0.0 and 1.0 and can be set for every 4 measures. The higher the intensity value in a given section, the more strongly that mood is expected to be reflected in the music.

The music pieces were generated using gpt-4o-2024-11-20.

Parameter-Axis	(a) 0.00 → 0.25 → 0.50 → 1.00	(b) 0.00 → 0.25 → 1.00 → 1.00	(c) 1.00 → 0.75 → 0.50 → 0.00
Strength
Robotic
Brightness
Classic
Jazziness
Urban
Heart-Pounding
Emotional

Tendency

(+) Axis closely tied to a musical style and related to instrument selection can be handled.
(+) Only sections with changed parameters are subject to editing, preserving the motif of the piece.
(-) Concepts that can have multiple musical styles, abstract concepts, or those that complicate chord progressions cannot be handled.
(-) Variations in music editing are limited.

Implementation

Fig.2 System composition of MuseLeader

on gpt4o-2024-08-06

We utilize the inference capabilities of a large-scale language model (gpt-4o-2024-08-06) to associate specified words with editing operations along the axis designated by the user. However, the system struggles to accurately reflect user intent in time-series control. To address this issue, we refine the LLM's prompts.

Task Decomposition: Inspired by ComposerX, our system splits the composition task among multiple LLM agents—a leader, melody, chord, and instrument agent.

Planning of Musical Elements: The leader plan musical elements in four-measure segments documented in a table. Other agents use it to decide which sections to edit.

Use of Four-Measure Delimiters: our system instructs the LLM to insert newlines every four measure and comments (e.g., % measure [start]-[end]) to clearly show which sections should be edited.

Fig.3 Workflow of Multi-Agent LLM-Based Music Composition and Arrangement

The following are actual prompts used for each agent, translated into English.

Leader Agents: prompt example
Chord Agents: prompt example
Instrumet Agent: prompt example
Melody Agent: prompt example

This paper does not investigate the optimality of these prompt engineering techniques. Establishing reliable evaluation methods and optimizing prompt design are important challenges for future research. Moreover, these techniques may become unnecessary with advancements in the inference capabilities of large-scale language models in the future.

Tools

We use these tools for rendering abc format.

abc2midi ver 4.93
fluidsynth ver 2.3.4
soundfont: GeneralUser GS 1.471