elevenlabs-music-generation
agentspace-so/runcomfy-agent-skills
Generate studio-quality songs and instrumentals from text descriptions via ElevenLabs Music on RunComfy.
What is elevenlabs-music-generation?
ElevenLabs Music on RunComfy turns a style description plus structured lyrics into 44.1 kHz stereo audio (5 seconds to 5 minutes) with section-level control, multilingual vocals, and commercial-friendly output. Use it to create full vocal songs, instrumental beds, jingles, podcast intros, game loops, or branded audio assets.
- Generate full vocal songs with verse/chorus/bridge structure and consistent meter
- Create instrumental-only tracks for background music, podcasts, and game loops
- Produce short brand assets like jingles and stingers (5–30 seconds)
- Support multilingual lyrics with inline language annotations
- Output studio-quality 44.1 kHz stereo MP3 or WAV up to 5 minutes per call
- Control duration, instrumentation, and section timing via structured prompts
How to install elevenlabs-music-generation
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill elevenlabs-music-generation- RunComfy CLI installed (npm i -g @runcomfy/cli or npx -y @runcomfy/cli)
- RunComfy account and authentication token (runcomfy login or RUNCOMFY_TOKEN env var)
- This skill installed via npx skills add
How to use elevenlabs-music-generation
- 1.Install the RunComfy CLI globally or use npx
- 2.Authenticate with runcomfy login or set RUNCOMFY_TOKEN environment variable
- 3.Craft a prompt combining style brief (genre, tempo, instruments, vocal type) and structured lyrics with section markers ([Intro], [Verse], [Chorus], [Bridge], [Outro])
- 4.Call runcomfy run elevenlabs/elevenlabs/music-generation with --input JSON containing prompt, music_length_ms (5000–300000), and optional force_instrumental flag
- 5.Specify --output-dir to download the generated audio file
- 6.For iteration, draft at 30–45 seconds first to validate direction before rendering the full length
Use cases
- Create a full indie-pop song with structured lyrics and section markers (intro, verse, chorus, bridge, outro)
- Generate a calm lo-fi hip-hop instrumental for a study playlist or podcast background
- Produce a 5-second cheerful brand jingle with marimba and uplifting chord resolve
- Compose a game background loop with seamless, loop-friendly groove
- Draft a 30-second theme song for a video, then render the final version at full length
- Music producers and composers working with AI
- Content creators needing branded audio (podcasts, videos, games)
- Developers building games or interactive media with dynamic soundtracks
- Marketing teams creating jingles and brand stingers
- Anyone generating commercial-friendly music from text descriptions
elevenlabs-music-generation FAQ
Lead with a style brief (genre, mood, BPM, key instruments, vocal type), then provide lyrics with section markers like [Intro 8 bars], [Verse], [Chorus], [Bridge], [Outro]. Keep lyrical meter consistent with even syllable counts and clear rhyme schemes. For instrumental, set force_instrumental: true AND say 'no vocals' in the prompt.
Pricing is approximately $0.0083 per second of generated audio. A 30-second track costs ~$0.25, 60 seconds ~$0.50, and 5 minutes ~$2.49. Cost scales with music_length_ms, so draft short before committing to a full-length render.
Yes. Write the lyrics in the target language and optionally annotate the language inline, e.g., '[Verse] (sung in Brazilian Portuguese) ...'. Generate one call per language with the same style brief and swapped lyric lines.
ElevenLabs Music supports 5 seconds to 5 minutes per call (music_length_ms: 5000–300000). For longer pieces, generate sections separately and stitch them externally.
No. force_instrumental is the only vocal toggle available through this endpoint. You cannot request specific voice identities or voice cloning via the music-generation model.
Full instructions (SKILL.md)
Source of truth, from agentspace-so/runcomfy-agent-skills.
name: elevenlabs-music-generation
displayName: "ElevenLabs AI Music Generation — Pro Pack on RunComfy"
allowed-tools: Bash(runcomfy *)
description: >
Generate full songs and instrumental tracks with ElevenLabs Music on
RunComfy via the runcomfy CLI. ElevenLabs Music turns a style
description plus structured lyrics into studio-quality 44.1 kHz
stereo audio — 5 seconds to 5 minutes — with section-level control
(Intro / Verse / Chorus / Bridge), multilingual vocals, and
commercial-friendly output. Generate a backing track, a full vocal
song, a jingle, a podcast intro, a game loop, or an instrumental
bed. Calls runcomfy run elevenlabs/elevenlabs/music-generation
through the local RunComfy CLI. Triggers on "generate music",
"make a song", "AI music", "background music", "instrumental
track", "ElevenLabs Music", "soundtrack", "jingle", "theme music",
"royalty-free music", "compose", or any explicit ask to generate
music or a song from a text description.
homepage: https://www.runcomfy.com
license: MIT
ElevenLabs AI Music Generation — Pro Pack on RunComfy
Generate full songs and instrumental tracks from a text description — studio-quality 44.1 kHz stereo, 5 seconds to 5 minutes, with section-level structure control. ElevenLabs Music on the RunComfy Model API, called through the runcomfy CLI.
runcomfy.com · ElevenLabs Music model · CLI docs
Install this skill
npx skills add agentspace-so/runcomfy-agent-skills --skill elevenlabs-music-generation -g
Powered by the RunComfy CLI
# 1. Install (one of — see runcomfy-cli skill for details)
npm i -g @runcomfy/cli # global install
npx -y @runcomfy/cli --version # zero-install
# 2. Sign in
runcomfy login # or in CI: export RUNCOMFY_TOKEN=<token>
# 3. Generate music
runcomfy run elevenlabs/elevenlabs/music-generation \
--input '{"prompt": "..."}' \
--output-dir ./out
CLI deep dive: runcomfy-cli skill.
When to use ElevenLabs Music
ElevenLabs Music's strength is structured songs with real vocals — it takes a style brief plus lyrics with section markers and returns a coherent, mixed track. Pick it for:
- Full vocal songs — verse/chorus structure, multilingual lyrics, consistent meter
- Instrumental beds —
force_instrumental: truefor background music, podcast intros, game loops - Short brand assets — jingles, stingers, theme music (5–30 s)
- Long-form tracks — up to 5 minutes in a single call
- Commercial work — output is commercial-friendly
If the user just wants ambient sound or a one-off SFX (thunder, footsteps), that's a sound-effects task, not music — ElevenLabs Music is for songs and tracks.
Endpoint + input schema
Model: elevenlabs/elevenlabs/music-generation
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Style description and lyrics with section markers. See prompting tips |
music_length_ms | int | no | 40000 | Output duration in ms. 5000–300000 (5 s – 5 min) |
force_instrumental | bool | no | false | true = instrumental only, no vocals |
output_format | string | no | mp3_standard | mp3_standard (default), or WAV — see the model page API tab for the full format list |
Output: 44.1 kHz stereo audio. The result JSON contains the generated audio URL — the CLI downloads it into --output-dir.
Pricing: ~$0.0083 per second of generated audio (30 s ≈ $0.25, 60 s ≈ $0.50, 5 min ≈ $2.49). Cost scales with music_length_ms, so draft short and finalize long.
How to invoke
Full vocal song with structure:
runcomfy run elevenlabs/elevenlabs/music-generation \
--input '{
"prompt": "Upbeat indie-pop anthem, bright electric guitars, driving drums, 120 BPM, female lead vocal. [Intro 8 bars] instrumental build. [Verse] Chalk on the palms, laces double-knotted, morning on the ridge. [Chorus] We rise, we strike, we never fade out. [Bridge] soft breakdown, just piano and voice. [Outro] full band, fade.",
"music_length_ms": 60000
}' \
--output-dir ./out
Instrumental background bed:
runcomfy run elevenlabs/elevenlabs/music-generation \
--input '{
"prompt": "Calm lo-fi hip-hop instrumental for a study playlist. Warm Rhodes piano, soft vinyl crackle, mellow boom-bap drums, 75 BPM. No vocals. Consistent loop-friendly groove throughout.",
"music_length_ms": 90000,
"force_instrumental": true
}' \
--output-dir ./out
Short brand jingle:
runcomfy run elevenlabs/elevenlabs/music-generation \
--input '{
"prompt": "5-second cheerful brand stinger, bright marimba and a single uplifting chord resolve, no vocals.",
"music_length_ms": 5000,
"force_instrumental": true
}' \
--output-dir ./out
Prompting tips
ElevenLabs Music reads one prompt field that carries both the style brief and the lyrics. Structure it well:
- Lead with the style brief: genre, mood, tempo (BPM), key instruments, vocal type.
"Upbeat indie-pop anthem, bright electric guitars, 120 BPM, female lead vocal." - Then the lyrics with section markers:
[Intro],[Verse],[Chorus],[Bridge],[Outro]. Add approximate durations or bar counts —[Intro 8 bars],[Verse 16 bars]. - Keep lyrical meter consistent — even syllable counts per line, clear rhyme scheme. The model follows meter; sloppy meter produces awkward phrasing.
- Name lead instruments and mix priorities —
"electric guitar carries the chorus, drums sit back in the verse." - For instrumental, set
force_instrumental: trueAND say "no vocals" in the prompt — belt and suspenders. - Multilingual: write the lyrics in the target language; annotate accent/language inline if needed (
[Verse] (sung in Brazilian Portuguese) ...). - Avoid contradictory style instructions — "aggressive metal" + "soft lullaby" in one prompt confuses the model. One coherent direction per call.
- Draft short, finalize long: validate the direction with a 30–45 s draft (
music_length_ms: 35000) before paying for a 5-minute render.
Common patterns
Theme song for a video
- Full brief + lyrics +
[Intro]/[Verse]/[Chorus]structure,music_length_msmatched to the video length
Podcast intro / outro
force_instrumental: true, 10–20 s, "loop-friendly, clean ending"
Game background loop
force_instrumental: true, describe "seamless loop", 60–120 s, consistent groove
Multilingual release (same song, multiple languages)
- One call per language, identical style brief, swap only the lyric lines
Iterate then commit
- Draft at
music_length_ms: 35000to lock genre/tempo/structure → final render at full length
Limitations
- One
promptfield carries everything (style + lyrics). There is no separate "lyrics" parameter. - 5 s – 5 min per call (
music_length_ms5000–300000). For longer pieces, generate sections and stitch externally. - Cost scales with duration — a 5-minute render is ~10× a 30-second one.
force_instrumentalis the only vocal toggle — you can't request specific voice identities or clone a singer through this endpoint.- This skill pins ElevenLabs Music specifically. For sound effects, text-to-speech, or voice cloning, that's a different ElevenLabs capability not exposed through this endpoint.
Exit codes
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
How it works
The skill invokes runcomfy run elevenlabs/elevenlabs/music-generation with the JSON body. The CLI POSTs to the RunComfy Model API, polls request status, fetches the result, and downloads the generated audio file into --output-dir. Ctrl-C cancels the remote request before exit.
Security & Privacy
- Install via verified package manager only. Use
npm i -g @runcomfy/cliornpx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf — if the operator wants the curl-pipe path documented atdocs.runcomfy.com/cli/install, they should review the script first. - Token storage:
runcomfy loginwrites the API token to~/.config/runcomfy/token.jsonwith mode 0600. SetRUNCOMFY_TOKENenv var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in. - Input boundary (shell injection): the prompt is passed as a JSON string via
--input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content, even with backticks, quotes, or$(...)patterns. - Lyrics provenance: if the user supplies lyrics, confirm they have the rights to them. Generating music around copyrighted lyrics is the operator's responsibility — the skill does not check.
- Outbound endpoints (allowlist): only
model-api.runcomfy.net(request submission) and*.runcomfy.net/*.runcomfy.com(download whitelist for generated audio). No telemetry, no callbacks. - Generated-file size cap: the CLI aborts any single download > 2 GiB.
- Scope of bash usage: the skill only invokes
runcomfy <subcommand>—npm/npxlines are one-time operator setup, not commands the skill executes per call.
See also
runcomfy-cli— the underlying CLI, schema discovery, polling modes, scripting- ElevenLabs Music model page — full API tab with the latest schema
- All RunComfy models — image, video, and audio endpoints
ai-video-generation— pair a generated track with a generated videoai-avatar-video— talking-head video (different audio path — speech, not music)
Related skills
More from agentspace-so/runcomfy-agent-skills and the wider catalog.
video-edit
Intent-routed video editing skill: picks Wan 2.7, Kling 2.6, or Lucy Edit based on what you actually want to do.
image-to-video
Animate still images with the right model for your intent—HappyHorse, Wan, or Seedance on RunComfy.
nano-banana-2
Generate images with Google Nano Banana 2 (Gemini flash-tier) via RunComfy CLI — optimized prompting patterns included.
image-edit
Intent-routed image editing: picks the right model (batch, text rewrite, precise local, or inpaint) based on what you ask.
nano-banana-edit
Edit images with Google Nano Banana 2 on RunComfy — batch up to 20 inputs, preserve identity, swap backgrounds, localize edits.
flux-kontext
Edit images precisely with Flux 1 Kontext Pro via RunComfy CLI — single-reference local edits with strong prompt control