AI Skill

Pass

Audit score 90

elevenlabs-music-generation

agentspace-so/runcomfy-agent-skills

Generate studio-quality songs and instrumentals from text descriptions via ElevenLabs Music on RunComfy.

What is elevenlabs-music-generation?

ElevenLabs Music on RunComfy turns a style description plus structured lyrics into 44.1 kHz stereo audio (5 seconds to 5 minutes) with section-level control, multilingual vocals, and commercial-friendly output. Use it to create full vocal songs, instrumental beds, jingles, podcast intros, game loops, or branded audio assets.

Generate full vocal songs with verse/chorus/bridge structure and consistent meter
Create instrumental-only tracks for background music, podcasts, and game loops
Produce short brand assets like jingles and stingers (5–30 seconds)
Support multilingual lyrics with inline language annotations
Output studio-quality 44.1 kHz stereo MP3 or WAV up to 5 minutes per call
Control duration, instrumentation, and section timing via structured prompts

How to install elevenlabs-music-generation

npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill elevenlabs-music-generation

Prerequisites

RunComfy CLI installed (npm i -g @runcomfy/cli or npx -y @runcomfy/cli)
RunComfy account and authentication token (runcomfy login or RUNCOMFY_TOKEN env var)
This skill installed via npx skills add

Claude Code

Cursor

Windsurf

Cline

How to use elevenlabs-music-generation

1.Install the RunComfy CLI globally or use npx
2.Authenticate with runcomfy login or set RUNCOMFY_TOKEN environment variable
3.Craft a prompt combining style brief (genre, tempo, instruments, vocal type) and structured lyrics with section markers ([Intro], [Verse], [Chorus], [Bridge], [Outro])
4.Call runcomfy run elevenlabs/elevenlabs/music-generation with --input JSON containing prompt, music_length_ms (5000–300000), and optional force_instrumental flag
5.Specify --output-dir to download the generated audio file
6.For iteration, draft at 30–45 seconds first to validate direction before rendering the full length

Use cases

Good for

Create a full indie-pop song with structured lyrics and section markers (intro, verse, chorus, bridge, outro)
Generate a calm lo-fi hip-hop instrumental for a study playlist or podcast background
Produce a 5-second cheerful brand jingle with marimba and uplifting chord resolve
Compose a game background loop with seamless, loop-friendly groove
Draft a 30-second theme song for a video, then render the final version at full length

Who it's for

Music producers and composers working with AI
Content creators needing branded audio (podcasts, videos, games)
Developers building games or interactive media with dynamic soundtracks
Marketing teams creating jingles and brand stingers
Anyone generating commercial-friendly music from text descriptions

elevenlabs-music-generation FAQ

How do I structure the prompt for best results?

Lead with a style brief (genre, mood, BPM, key instruments, vocal type), then provide lyrics with section markers like [Intro 8 bars], [Verse], [Chorus], [Bridge], [Outro]. Keep lyrical meter consistent with even syllable counts and clear rhyme schemes. For instrumental, set force_instrumental: true AND say 'no vocals' in the prompt.

What is the cost per track?

Pricing is approximately $0.0083 per second of generated audio. A 30-second track costs ~$0.25, 60 seconds ~$0.50, and 5 minutes ~$2.49. Cost scales with music_length_ms, so draft short before committing to a full-length render.

Can I generate multilingual songs?

Yes. Write the lyrics in the target language and optionally annotate the language inline, e.g., '[Verse] (sung in Brazilian Portuguese) ...'. Generate one call per language with the same style brief and swapped lyric lines.

What are the duration limits?

ElevenLabs Music supports 5 seconds to 5 minutes per call (music_length_ms: 5000–300000). For longer pieces, generate sections separately and stitch them externally.

Can I request a specific voice or clone a singer?

No. force_instrumental is the only vocal toggle available through this endpoint. You cannot request specific voice identities or voice cloning via the music-generation model.

Full instructions (SKILL.md)

Source of truth, from agentspace-so/runcomfy-agent-skills.

name: elevenlabs-music-generation displayName: "ElevenLabs AI Music Generation — Pro Pack on RunComfy" allowed-tools: Bash(runcomfy *) description: > Generate full songs and instrumental tracks with ElevenLabs Music on RunComfy via the `runcomfy` CLI. ElevenLabs Music turns a style description plus structured lyrics into studio-quality 44.1 kHz stereo audio — 5 seconds to 5 minutes — with section-level control (Intro / Verse / Chorus / Bridge), multilingual vocals, and commercial-friendly output. Generate a backing track, a full vocal song, a jingle, a podcast intro, a game loop, or an instrumental bed. Calls `runcomfy run elevenlabs/elevenlabs/music-generation` through the local RunComfy CLI. Triggers on "generate music", "make a song", "AI music", "background music", "instrumental track", "ElevenLabs Music", "soundtrack", "jingle", "theme music", "royalty-free music", "compose", or any explicit ask to generate music or a song from a text description. homepage: https://www.runcomfy.com license: MIT

ElevenLabs AI Music Generation — Pro Pack on RunComfy

Generate full songs and instrumental tracks from a text description — studio-quality 44.1 kHz stereo, 5 seconds to 5 minutes, with section-level structure control. ElevenLabs Music on the RunComfy Model API, called through the runcomfy CLI.

runcomfy.com · ElevenLabs Music model · CLI docs

Install this skill

npx skills add agentspace-so/runcomfy-agent-skills --skill elevenlabs-music-generation -g

Powered by the RunComfy CLI

# 1. Install (one of — see runcomfy-cli skill for details)
npm i -g @runcomfy/cli                              # global install
npx -y @runcomfy/cli --version                      # zero-install

# 2. Sign in
runcomfy login                                      # or in CI: export RUNCOMFY_TOKEN=<token>

# 3. Generate music
runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{"prompt": "..."}' \
  --output-dir ./out

CLI deep dive: runcomfy-cli skill.

When to use ElevenLabs Music

ElevenLabs Music's strength is structured songs with real vocals — it takes a style brief plus lyrics with section markers and returns a coherent, mixed track. Pick it for:

Full vocal songs — verse/chorus structure, multilingual lyrics, consistent meter
Instrumental beds — force_instrumental: true for background music, podcast intros, game loops
Short brand assets — jingles, stingers, theme music (5–30 s)
Long-form tracks — up to 5 minutes in a single call
Commercial work — output is commercial-friendly

If the user just wants ambient sound or a one-off SFX (thunder, footsteps), that's a sound-effects task, not music — ElevenLabs Music is for songs and tracks.

Endpoint + input schema

Model: elevenlabs/elevenlabs/music-generation

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	Style description and lyrics with section markers. See prompting tips
`music_length_ms`	int	no	`40000`	Output duration in ms. 5000–300000 (5 s – 5 min)
`force_instrumental`	bool	no	`false`	`true` = instrumental only, no vocals
`output_format`	string	no	`mp3_standard`	`mp3_standard` (default), or WAV — see the model page API tab for the full format list

Output: 44.1 kHz stereo audio. The result JSON contains the generated audio URL — the CLI downloads it into --output-dir.

Pricing: ~$0.0083 per second of generated audio (30 s ≈ $0.25, 60 s ≈ $0.50, 5 min ≈ $2.49). Cost scales with music_length_ms, so draft short and finalize long.

How to invoke

Full vocal song with structure:

runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{
    "prompt": "Upbeat indie-pop anthem, bright electric guitars, driving drums, 120 BPM, female lead vocal. [Intro 8 bars] instrumental build. [Verse] Chalk on the palms, laces double-knotted, morning on the ridge. [Chorus] We rise, we strike, we never fade out. [Bridge] soft breakdown, just piano and voice. [Outro] full band, fade.",
    "music_length_ms": 60000
  }' \
  --output-dir ./out

Instrumental background bed:

runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{
    "prompt": "Calm lo-fi hip-hop instrumental for a study playlist. Warm Rhodes piano, soft vinyl crackle, mellow boom-bap drums, 75 BPM. No vocals. Consistent loop-friendly groove throughout.",
    "music_length_ms": 90000,
    "force_instrumental": true
  }' \
  --output-dir ./out

Short brand jingle:

runcomfy run elevenlabs/elevenlabs/music-generation \
  --input '{
    "prompt": "5-second cheerful brand stinger, bright marimba and a single uplifting chord resolve, no vocals.",
    "music_length_ms": 5000,
    "force_instrumental": true
  }' \
  --output-dir ./out

Prompting tips

ElevenLabs Music reads one prompt field that carries both the style brief and the lyrics. Structure it well:

Lead with the style brief: genre, mood, tempo (BPM), key instruments, vocal type. "Upbeat indie-pop anthem, bright electric guitars, 120 BPM, female lead vocal."
Then the lyrics with section markers: [Intro], [Verse], [Chorus], [Bridge], [Outro]. Add approximate durations or bar counts — [Intro 8 bars], [Verse 16 bars].
Keep lyrical meter consistent — even syllable counts per line, clear rhyme scheme. The model follows meter; sloppy meter produces awkward phrasing.
Name lead instruments and mix priorities — "electric guitar carries the chorus, drums sit back in the verse."
For instrumental, set force_instrumental: true AND say "no vocals" in the prompt — belt and suspenders.
Multilingual: write the lyrics in the target language; annotate accent/language inline if needed ([Verse] (sung in Brazilian Portuguese) ...).
Avoid contradictory style instructions — "aggressive metal" + "soft lullaby" in one prompt confuses the model. One coherent direction per call.
Draft short, finalize long: validate the direction with a 30–45 s draft (music_length_ms: 35000) before paying for a 5-minute render.

Common patterns

Theme song for a video

Full brief + lyrics + [Intro]/[Verse]/[Chorus] structure, music_length_ms matched to the video length

Podcast intro / outro

force_instrumental: true, 10–20 s, "loop-friendly, clean ending"

Game background loop

force_instrumental: true, describe "seamless loop", 60–120 s, consistent groove

Multilingual release (same song, multiple languages)

One call per language, identical style brief, swap only the lyric lines

Iterate then commit

Draft at music_length_ms: 35000 to lock genre/tempo/structure → final render at full length

Limitations

One prompt field carries everything (style + lyrics). There is no separate "lyrics" parameter.
5 s – 5 min per call (music_length_ms 5000–300000). For longer pieces, generate sections and stitch externally.
Cost scales with duration — a 5-minute render is ~10× a 30-second one.
force_instrumental is the only vocal toggle — you can't request specific voice identities or clone a singer through this endpoint.
This skill pins ElevenLabs Music specifically. For sound effects, text-to-speech, or voice cloning, that's a different ElevenLabs capability not exposed through this endpoint.

Exit codes

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill invokes runcomfy run elevenlabs/elevenlabs/music-generation with the JSON body. The CLI POSTs to the RunComfy Model API, polls request status, fetches the result, and downloads the generated audio file into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

Install via verified package manager only. Use npm i -g @runcomfy/cli or npx -y @runcomfy/cli. Agents must not pipe an arbitrary remote install script into a shell on the user's behalf — if the operator wants the curl-pipe path documented at docs.runcomfy.com/cli/install, they should review the script first.
Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600. Set RUNCOMFY_TOKEN env var to bypass the file in CI / containers. Never echo the token into a prompt, log it, or check it in.
Input boundary (shell injection): the prompt is passed as a JSON string via --input. The CLI does not shell-expand prompt content; it transmits the JSON body directly to the Model API over HTTPS. No shell-injection surface from prompt content, even with backticks, quotes, or $(...) patterns.
Lyrics provenance: if the user supplies lyrics, confirm they have the rights to them. Generating music around copyrighted lyrics is the operator's responsibility — the skill does not check.
Outbound endpoints (allowlist): only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated audio). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB.
Scope of bash usage: the skill only invokes runcomfy <subcommand> — npm / npx lines are one-time operator setup, not commands the skill executes per call.

Related skills

More from agentspace-so/runcomfy-agent-skills and the wider catalog.

video-edit

agentspace-so/runcomfy-agent-skills

Intent-routed video editing skill: picks Wan 2.7, Kling 2.6, or Lucy Edit based on what you actually want to do.

323k installs

image-to-video

agentspace-so/runcomfy-agent-skills

Animate still images with the right model for your intent—HappyHorse, Wan, or Seedance on RunComfy.

322k installs

nano-banana-2

agentspace-so/runcomfy-agent-skills

Generate images with Google Nano Banana 2 (Gemini flash-tier) via RunComfy CLI — optimized prompting patterns included.

322k installs

image-edit

agentspace-so/runcomfy-agent-skills

Intent-routed image editing: picks the right model (batch, text rewrite, precise local, or inpaint) based on what you ask.

322k installs

nano-banana-edit

agentspace-so/runcomfy-agent-skills

Edit images with Google Nano Banana 2 on RunComfy — batch up to 20 inputs, preserve identity, swap backgrounds, localize edits.

322k installs

flux-kontext

agentspace-so/runcomfy-agent-skills

Edit images precisely with Flux 1 Kontext Pro via RunComfy CLI — single-reference local edits with strong prompt control

322k installs

elevenlabs-music-generation

What is elevenlabs-music-generation?

How to install elevenlabs-music-generation

How to use elevenlabs-music-generation

Use cases

elevenlabs-music-generation FAQ

ElevenLabs AI Music Generation — Pro Pack on RunComfy

Install this skill

Powered by the RunComfy CLI

When to use ElevenLabs Music

Endpoint + input schema

How to invoke

Prompting tips

Common patterns

Theme song for a video

Podcast intro / outro

Game background loop

Multilingual release (same song, multiple languages)

Iterate then commit

Limitations

Exit codes

How it works

Security & Privacy

See also

Related skills

video-edit

image-to-video

nano-banana-2

image-edit

nano-banana-edit

flux-kontext