AI Skill
Review
Audit score 70

seedance-v2

agentspace-so/runcomfy-agent-skills

Generate cinematic short-form video with ByteDance Seedance 2.0 Pro — multi-modal refs, native lip-sync, via RunComfy CLI

What is seedance-v2?

Seedance 2.0 Pro is a multimodal cinematic video generation model by ByteDance, accessible through the RunComfy Model API. This skill lets coding agents invoke it via `runcomfy run bytedance/seedance-v2/pro`, accepting up to 9 image refs, 3 video clips, and 3 audio refs in a single call. It produces 4–15s videos at up to 720p with in-pass synchronized audio including lip-sync, ambient sound, and music. The skill also documents when to route to alternative models (HappyHorse 1.0, Wan 2.7, Kling, LTX 2) based on use case.

  • Generates 4–15s cinematic video clips from text prompts via ByteDance Seedance 2.0 Pro
  • Accepts up to 9 image references, 3 video clips, and 3 audio references in one call
  • Produces in-pass synchronized audio including speech, lip-sync, SFX, and ambient sound
  • Supports aspect ratios including 16:9, 9:16, 1:1, 4:3, 3:4, and 21:9
  • Provides seed control for reproducible variant testing
  • Routes CLI calls to RunComfy's hosted model API, polls for results, and downloads output files

How to install seedance-v2

npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill seedance-v2
Prerequisites
  • Node.js environment to run npx / npm
  • RunComfy CLI installed globally: `npm i -g @runcomfy/cli`
  • RunComfy account with `runcomfy login` completed (browser device-code flow)
  • For CI/containers: set `RUNCOMFY_TOKEN=<token>` environment variable instead of login
  • Reference media (images, video, audio) must meet spec: videos 2–15s, audio ≤15MB and 2–15s
Claude Code
Cursor
Windsurf
Cline

How to use seedance-v2

  1. 1.Install the skill: `npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill seedance-v2`
  2. 2.Ensure RunComfy CLI is installed and you are logged in (`runcomfy login`)
  3. 3.Invoke the model with a text-only prompt: `runcomfy run bytedance/seedance-v2/pro --input '{"prompt": "<your prompt>"}' --output-dir <path>`
  4. 4.Add `image_url` array for stable identity references (faces, logos, brand marks)
  5. 5.Add `video_url` and/or `audio_url` arrays for scene or tone references
  6. 6.Set `duration` (4–15s), `aspect_ratio`, `resolution`, and `generate_audio` as needed
  7. 7.Use `seed` for reproducible outputs across variant tests
  8. 8.Check exit codes: 0=success, 65=bad input, 75=retry, 77=auth failure

Use cases

Good for
  • Lip-synced spokesperson or dialogue ads using a character image reference
  • Brand-consistent multi-language video narratives with stable identity via image refs
  • Cinematic short-form film previs combining image, video, and audio references
  • Ad creatives where audio references guide voice tone or mood
  • Reproducible video variant testing using seed control
Who it's for
  • Video marketers producing spokesperson or dialogue ad content
  • Creative directors doing cinematic short-form previs
  • Developers automating video generation pipelines via CLI
  • Brand teams needing identity-consistent video across languages
  • Agents orchestrating multi-modal media production workflows

seedance-v2 FAQ

How is Seedance 2.0 Pro different from HappyHorse 1.0 or Wan 2.7?

Seedance 2.0 Pro excels at multi-modal reference inputs (image + video + audio) and native in-pass lip-sync. HappyHorse 1.0 currently ranks #1 in blind-vote video quality. Wan 2.7 is better when you want lip-sync driven by your own audio track via `audio_url`. Use Kling for motion editing on existing footage.

Can I generate videos longer than 15 seconds?

No. The endpoint enforces a 4–15 second duration limit. For longer content, segment into multiple calls.

How do I keep a character's face consistent across the video?

Put the face or identity in `image_url` rather than describing it in the prompt. Verbal descriptions of stable identity waste tokens and cause drift.

Does generate_audio produce lip-sync automatically?

Yes, when `generate_audio` is true (the default), the model generates synchronized speech, SFX, and music in-pass. Lip-sync quality depends on prompt clarity and is not guaranteed perfect under all conditions.

How do I authenticate in a CI environment without a browser?

Set the `RUNCOMFY_TOKEN=<token>` environment variable instead of running `runcomfy login`. The CLI will use the token directly.

Full instructions (SKILL.md)

Source of truth, from agentspace-so/runcomfy-agent-skills.


name: seedance-v2 displayName: "Seedance 2.0 Pro — Pro Pack on RunComfy" description: > Generate cinematic short-form video with ByteDance Seedance 2.0 Pro on RunComfy. Documents Seedance 2.0 Pro's strengths (multi-modal references — up to 9 images, 3 videos, 3 audio — synchronized in-pass audio with natural lip-sync, cinematic motion refinement), the 4–15s duration schema, and when to route to HappyHorse 1.0 / Wan 2.7 / Kling instead. Calls runcomfy run bytedance/seedance-v2/pro through the local RunComfy CLI. Triggers on "seedance", "seedance 2", "seedance v2", "seedance pro", "bytedance video", or any explicit ask to generate video with this model. homepage: https://www.runcomfy.com license: MIT

Seedance 2.0 Pro — Pro Pack on RunComfy

runcomfy.com · Seedance 2.0 Pro · GitHub

ByteDance Seedance 2.0 Pro — multimodal cinematic video generator with native lip-synced audio — hosted on the RunComfy Model API.

npx skills add agentspace-so/runcomfy-skills --skill seedance-v2 -g

When to pick this model (vs siblings)

Seedance 2.0 Pro's distinct strength is multi-modal cinematic short-form: combine character images + scene videos + reference audio into one coherent shot. Pick it when fidelity to a reference identity / scene matters and you want native lip-sync.

You wantUse
Lip-synced spokesperson / dialogue adSeedance 2.0 Pro
Multi-modal references (image + video + audio)Seedance 2.0 Pro
Brand-consistent multi-language narrativeSeedance 2.0 Pro
Currently-#1 blind-vote video qualityHappyHorse 1.0
Audio-driven lip-sync from your own trackWan 2.7 (audio_url)
Motion editing on existing footageKling Video O1
Ultra-fast iterationLTX 2

If the user said "Seedance" / "Seedance 2" / "ByteDance video" explicitly, route here regardless.

Prerequisites

  1. RunComfy CLInpm i -g @runcomfy/cli
  2. RunComfy accountruncomfy login opens a browser device-code flow.
  3. CI / containers — set RUNCOMFY_TOKEN=<token> instead of runcomfy login.

Endpoints + input schema

bytedance/seedance-v2/pro

FieldTypeRequiredDefaultNotes
promptstringyesCN ≤ 500 chars OR EN ≤ 1000 words.
image_urlarrayno[]0–9 references (JPEG/PNG/WebP/BMP/TIFF/GIF).
video_urlarrayno[]0–3 clips (MP4/MOV), 2–15s each.
audio_urlarrayno[]0–3 audio refs (WAV/MP3), 2–15s, < 15MB each.
aspect_ratioenumnoadaptiveadaptive, 16:9, 9:16, 4:3, 3:4, 1:1, 21:9.
durationintno54–15 (whole seconds).
resolutionenumno720p480p or 720p.
generate_audioboolnotrueIn-pass synchronized speech / SFX / music.
seedintnoReproducibility.

How to invoke

Default (text only, 5s, 720p with audio):

runcomfy run bytedance/seedance-v2/pro \
  --input '{"prompt": "<user prompt>"}' \
  --output-dir <absolute/path>

Lip-synced ad with character reference (image-stable, text-evolves):

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Medium close-up. The woman explains today'\''s special in a warm friendly tone, slow push-in, soft window light, gentle cafe ambience.",
    "image_url": ["https://.../barista-headshot.jpg"],
    "duration": 8,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>

Multi-modal (image + video + audio refs):

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café from video 1, voice tone matches audio 1.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-ref.mp3"]
  }' \
  --output-dir <absolute/path>

The CLI submits, polls, fetches the result, downloads *.runcomfy.net/*.runcomfy.com URLs into --output-dir.

Prompting — what actually works

Image vs text division. This is the single most important rule. Stable identity (face, costume, brand mark, logo) → put in image_url. Evolving narrative (action, mood, lighting, camera) → put in prompt. Trying to verbally describe a face in detail wastes tokens and produces drift.

Camera + motion in plain language. "Medium close-up", "slow push-in", "handheld follow", "locked-off wide" all work as directives. Combine: "Medium close-up. Slow push-in over 3 seconds. Handheld, slight breathing motion."

Audio direction with generate_audio: true — say the tone: "warm friendly conversational", "calm instructional", "crisp newsroom delivery". For ambient: "gentle cafe chatter, distant traffic, no foreground music".

Reference media specs — videos must be 2–15s; audio must be ≤15MB and 2–15s. Out-of-range files reject. Match aspect ratio of refs to your output to avoid crops.

Anti-patterns:

  • Mixing radically different aesthetic refs (watercolor + photoreal) → confuses.
  • Conflicting style cues in prompt → simplify by removing contradictions.
  • Trying to describe stable identity verbally → use image_url instead.
  • Asking for >15s clips → 422; segment into multiple calls.

Where it shines

Use caseWhy Seedance 2.0 Pro
Spokesperson / dialogue adsNative in-pass lip-sync, no separate TTS step
Brand-consistent multi-language narrativesImage refs hold identity; text drives translation
Cinematic short-form film previsCamera-shot grammar + multi-modal refs
Ad creatives with reference music / VO toneAudio refs guide voice / mood without locking lip-sync
Reproducible variant testingSeed control + fixed schema

Sample prompts (verified to produce strong results)

Default playground example:

Golden hour on a quiet cafe terrace: a barista wipes the counter, then
looks up and explains today's special in a friendly tone, natural
lip-sync. Medium close-up, slow push-in; warm side light, soft bokeh
through glass, gentle cafe ambience and subtle film grain.

Multi-modal lip-sync (text + image):

Same person as image 1 in a softly-lit recording booth, leaning into
the mic, says: "We just shipped the biggest update of the year."
Calm conversational tone. Medium close-up, locked tripod, shallow DOF,
warm key light from camera-left.

Limitations

  • Duration 4–15s — no longer clips on this endpoint.
  • Resolution ceiling 720p on the playground variant.
  • Reference media specs — videos / audio must be 2–15s; audio < 15MB.
  • Lip-sync quality — depends on prompt clarity; not guaranteed perfect under all conditions.
  • No @-syntax for character binding — relies on image refs + prompt alignment.

Exit codes

codemeaning
0success
64bad CLI args
65bad input JSON / schema mismatch
69upstream 5xx
75retryable: timeout / 429
77not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill invokes runcomfy run bytedance/seedance-v2/pro with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/bytedance/seedance-v2/pro, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

  • Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.