PluginBench
Skill
Fail
Audit score 45

music-to-video

heygen-com/hyperframes

Turn music tracks into beat-synced videos with typography, templates, and optional media—no narration needed.

What is music-to-video?

music-to-video analyzes a music track once and orchestrates a beat-synced HyperFrames video pipeline. Use it when you have a music file (or video to extract audio from) and want a complete video driven by the beat grid, with typography and templates as the foundation and optional user images/videos cut in on the same beat grid.

  • Analyze music once with a canonical beat-grid analyzer to extract timing, energy phases, onsets, and key moments
  • Lay out frames at real musical changes (drops, surges, rolls, silence stretches) snapping to audiomap anchors
  • Build a complete per-frame plan with templates, typography, motion, and optional media placement
  • Dispatch one sub-agent per frame to compose each frame as an HTML file
  • Assemble frames into a final video composition and render to MP4
  • Support any genre (lyric video, slideshow, kinetic promo) through per-frame choices without branching logic

How to install music-to-video

npx skills add https://github.com/heygen-com/hyperframes --skill music-to-video
Prerequisites
  • Python 3 with librosa, numpy, and soundfile installed (for beat analysis)
  • A music track (audio file or video to extract audio from)
  • Optional: user-supplied images or videos to weave into frames
Claude Code
Cursor
Windsurf
Cline

How to use music-to-video

  1. 1.Run Step 0 (Setup): Initialize the project, establish the music source (user file or generated), and stage any user media
  2. 2.Run Step 1 (Analyze): Execute analyze-beatgrid.py to produce audiomap.json with timing, energy, and beat data
  3. 3.Run Step 2 (Skeleton): Read the audiomap and lay out frames at musical boundaries, setting span_sec, pacing, mood, and feel for each
  4. 4.Run Step 3 (Plan): Fill in the complete STORYBOARD.md with templates, typography, motion, and media placement; get user approval
  5. 5.Run Step 4 (Build): Dispatch one frame-worker sub-agent per frame to compose each frame as an HTML file
  6. 6.Run Step 5 (Assemble): Combine all frame compositions into a single index.html
  7. 7.Run Step 6 (Render): Render the final composition to video.mp4

Use cases

Good for
  • Create a beat-synced music video with user-supplied images or video clips cut to the beat grid
  • Generate a kinetic typography promo or slideshow driven entirely by a music track
  • Build a lyric video or visual loop where the beat grid determines all frame cuts and transitions
  • Produce a complete video from just a music file and optional media—zero assets required if using templates alone
  • Turn a calm or rhythmic track into a synchronized visual composition with phrase-based or beat-based pacing
Who it's for
  • Music producers and artists wanting quick beat-synced video content
  • Marketing teams creating kinetic promos or visual loops
  • Content creators building lyric videos or music visualizers
  • Developers using HyperFrames to automate music-driven video generation

music-to-video FAQ

Do I need to supply images or videos?

No. Typography and templates are the floor—a complete video needs zero assets. Any images or videos you supply are optional and will be cut into frames on the same beat grid.

How does the analyzer decide the beat grid?

analyze-beatgrid.py reads the track once and extracts energy phases, onsets, rolls, and silences. Its bpm and beats_sec are reliable only on genuinely rhythmic music; on calm music, the grid is a metronome imposed by the tool, so you pace by phrases and energy instead.

What genres can this produce?

Any genre (lyric video, slideshow, kinetic promo) emerges from per-frame template and motion choices. The pipeline does not branch on genre—the same workflow handles all of them.

How many frames will my video have?

Frame count is determined by real musical changes (drops, surges, rolls, silence stretches), not by beat count. A fast track does not blow up the frame count because extra density goes inside groups within a frame.

Can I use a video file as the music source?

Yes. If you provide a video, the skill will extract the audio and use it as bgm.mp3. If you provide only a music file, it uses that directly.

Full instructions (SKILL.md)

Source of truth, from heygen-com/hyperframes.


name: music-to-video description: "Use when the user has a music track (an audio file, or a video to pull audio from) and wants a beat-synced HyperFrames video, calm to hard-hitting. The music drives everything: one analyzer reads it once, the orchestrator lays out the frames and fills a per-frame plan, and one sub-agent builds each frame. Typography and templates are the floor — a complete video needs zero assets — but any images or videos the user supplies are cut into the frames on the same beat grid (beat-cut / ken-burns). The genre (lyric video, slideshow, kinetic promo) falls out of the per-frame choices; the pipeline never branches on it."

music-to-video — one music-grounded, beat-synced video workflow

Use this skill to turn a music track into a beat-synced HyperFrames video. You analyze the track once, lay out the frames, fill in a per-frame plan, and build each frame as a composition. The input is a music track plus optional user images or videos — there is no narration and no website capture. Typography and templates are the floor (a complete video needs zero assets); any media the user supplies is cut in on the same beat grid.

You are the orchestrator. Work in videos/<project>/. Run the steps in order and pass each Gate before moving on. Two steps need the user: Step 3 (plan approval) and Step 6 (render approval). Do every step yourself except Step 4, where you dispatch one sub-agent per frame. Keep design and motion rules out of this file — they live in references/ and the frame-worker sub-agent.

SKILL_DIR = this skill directory. PROJECT_DIR = videos/<project-name>/.

Workflow: Step 0 setup → hyperframes.json + assets/bgm.mp3; Step 1 analyze → audiomap.json; Step 2 skeleton → STORYBOARD.md (frames, groups TBD); Step 3 plan → complete STORYBOARD.md + frame.md; Step 4 build → compositions/frames/NN-*.html; Step 5 assemble → index.html; Step 6 render → renders/video.mp4.

Two ideas that shape everything

  • One analyzer, and you trust it. analyze-beatgrid.py is the only beat analyzer — never re-measure beats with another tool or by ear. Its energy / density / rolls / onsets / silences are always reliable. Its bpm and beats_sec are reliable only when the music is genuinely rhythmic; on calm music the grid is a metronome the tracker imposed, so pace by phrases and energy instead and never hard-cut to it. Deciding which case you're in is each frame's pacing (Step 2).
  • One frame = one file; groups live inside. Step 2 cuts the track into frames, and each frame becomes one composition file compositions/frames/NN-<frame_id>.html, built by one frame-worker. A frame can subdivide into groups (each a template or a motion-primitives combo). Extra density goes inside a group, so frame count tracks distinct treatments, not beats — a fast track does not blow up the number of sub-agents.

Step 0: Setup, BGM, and inputs

Goal: Establish the music source, create the HyperFrames project, and note any user-supplied media.

The music is the spine — establish one track before anything else. This skill is tuned for fast, high-energy BGM: a strong beat grid drives the cuts (calm tracks work, but pace by phrase rather than beat). If the user gave you audio — a music file, or a video to pull the audio from — use it. If not, generate one: choose the mood from the user's description (e.g. "driving synthwave", "trap beat", "upbeat corporate") and produce a track via /hyperframes-media (references/bgm.md — HeyGen retrieval when credentialed, else local Lyria / MusicGen; ElevenLabs or another generator also works). Before generating, run npx hyperframes auth status and relay its output verbatim (don't paraphrase or rewrite it) — it shows whether BGM comes from HeyGen or local MusicGen and, if not signed in, how to sign in. If not signed in, STOP and wait for the user to choose — sign in, or continue offline with local MusicGen — before generating the track; don't write keys into a per-repo .env. (In autonomous mode, note the status and continue offline.) See /hyperframes-media → Preflight for the canonical guidance. Either way the track lands at assets/bgm.mp3. Stage any user-supplied images or videos so frames can weave them in on the beat grid; otherwise typography carries the whole video.

Initialize only if hyperframes.json is missing. Name <project> from the brief in kebab-case, such as midnight-drive-loop — never a timestamp. init checks the installed skills against the latest on GitHub and updates the global set if any are out of date.

npx hyperframes init "videos/<project>" --non-interactive --example=blank
mkdir -p "$PROJECT_DIR/assets" "$PROJECT_DIR/renders"
cp "<user-music>" "$PROJECT_DIR/assets/bgm.mp3"   # extract from a video first if needed
# only if the user gave you images/videos:
node <SKILL_DIR>/scripts/stage-assets.mjs --from <dir> --hyperframes "$PROJECT_DIR" --into public

The brand (font + palette) is chosen at Step 3, not here. Don't pick a genre or a track type up front — assets are just an optional ingredient, and the genre emerges from the per-frame choices.

Gate: hyperframes.json + assets/bgm.mp3 exist; aspect / length / fps and (if any) the asset inventory are noted.


Step 1: Analyze the music

Goal: Produce the one canonical timing analysis the whole video is built on.

analyze-beatgrid.py is the only beat analyzer — never re-measure beats with another tool or by ear. It reads the track once and writes audiomap.json: energy phases (level / density / feel), onsets + onset_rate, rolls, silences, hard_stops, key_moments, phrases, tempo / grid, and audio.duration_sec. It's deterministic — the same file always gives the same map. Most fields are reliable on any music; bpm and beats_sec are reliable only when the music is genuinely rhythmic, and judging that is the call you make at Step 2.

Prerequisites: Python 3 with librosa, numpy, and soundfile available. If import fails, install them into the active Python environment before running the analyzer:

python3 -m pip install librosa numpy soundfile
python3 <SKILL_DIR>/scripts/analyze-beatgrid.py "$PROJECT_DIR/assets/bgm.mp3" \
  -o "$PROJECT_DIR/audiomap.json" --print

Gate: audiomap.json exists; audio.duration_sec is known.


Step 2: Frame skeleton (structure only)

Goal: Read the music and lay out the frames — the skeleton of STORYBOARD.md.

Read references/frame-skeleton.md. Turn audiomap.json into the skeleton of STORYBOARD.md yourself — there is no intermediate JSON. Cut the track into frames at real musical changes (hard_stops, SURGE / DROP key_moments, the edges of a roll, a stretch with no onsets, a big energy jump), snapping every boundary to an audiomap anchor. For each frame set span_sec, pacing (the verdict from Step 1's trust call — beat_cut when the grid is real, phrase_flow when it's a metronome imposed on calm music), mood, and a one-line feel (the plain music situation Step 3 matches a template against). Only classify and lay out here: leave every frame's ### Groups as TBD (Step 3) and the frontmatter style blank — no templates, copy, color, or fonts. Expect ~1–6 frames.

Gate: frames tile the track (first at 0, last at duration_s); each carries span_sec + pacing + mood + feel; every ### Groups is TBD; no content anywhere.


Step 3: Fill the plan (user-gated)

Goal: Turn the skeleton into an approved, complete STORYBOARD.md.

Read references/planning.md, storyboard-format.md, template-catalog.md, motion-primitive-catalog.md, and montage.md (only if the user supplied assets). Editing the same file in place, do two things:

  1. Pick the brand. Choose one preset from ../hyperframes-creative/frame-presets/ using the table in ../hyperframes-creative/references/design-spec.md (match the track's mood; only its fonts and colors matter — templates own composition). Copy it into frame.md unmodified and fill the frontmatter style (font + a ≤4–6 swatch palette) from it.
  2. Fill every frame. Decide its groups and give each a treatment: a matched template from the catalog (with bound params and real audiomap anchors), a free-compose from the primitive catalog, or an asset treatment that obeys pacing. Write the copy. You own WHAT (template / primitives + content + anchors); the frame-worker owns HOW — never write millisecond tweens into the storyboard.
node <SKILL_DIR>/scripts/validate-plan.mjs --storyboard "$PROJECT_DIR/STORYBOARD.md" \
  --audiomap "$PROJECT_DIR/audiomap.json" --templates <SKILL_DIR>/references/templates

Fix every (hard errors: duration mismatch, frames not tiling the track, a missing src); warnings are best-effort. Then show the user a frame-by-frame summary and iterate until they approve.

Gate: frame.md is a verbatim preset copy; validate-plan.mjs exits 0; the user approved the plan.


Step 4: Build frames from the plan

Goal: Build every frame as a self-contained composition file.

Create compositions/frames/. Read sub-agents/frame-worker.md and ../hyperframes-core/references/subagent-dispatch.md. Dispatch one frame-worker per frame, in parallel where possible (otherwise in waves). Each worker gets exactly one frame and this context:

PROJECT_DIR: <abs path>
frame_id: <NN-frame_id>              # = the frame file stem, e.g. 02-f2; the composition id
Your block: the `## Frame N — <frame_id>` block in PROJECT_DIR/STORYBOARD.md
audiomap: PROJECT_DIR/audiomap.json
frame.md: PROJECT_DIR/frame.md
Materials: for each group, <SKILL_DIR>/references/templates/<id>/index.html (templates) and
           <SKILL_DIR>/references/motion-primitives/<id>/ (free); staged assets/ (asset groups)
Contracts: ../hyperframes-core/references/sub-compositions.md + determinism-rules.md
Canvas: <w>×<h>   Pacing: <beat_cut|phrase_flow>
Write to: PROJECT_DIR/compositions/frames/<frame_id>.html

The worker forks the cited materials, converts every anchor to frame-local seconds (local_t = track_t − span_sec[0]), gates its groups with 0ms cuts, and writes one seek-safe frame file. The worker never runs the hyperframes CLI — those commands operate on the assembled project, which doesn't exist yet, so they'd report on the wrong files. The worker just writes to the contract and stops; you verify after assembly (Step 6). As each worker returns, you can confirm its file landed on disk.

Gate: every frame has its compositions/frames/NN-*.html on disk.


Step 5: Assemble

Goal: Wire the built frames + BGM into the playable index.html.

assemble-index.mjs is deterministic — no subagent, no judgment. It references each frame file at its cumulative data-start, mounts assets/bgm.mp3 on track 11, and hard-cuts frame → frame (frames tile the track with no gaps, so there is no transition injector).

node <SKILL_DIR>/scripts/assemble-index.mjs --storyboard "$PROJECT_DIR/STORYBOARD.md" \
  --hyperframes "$PROJECT_DIR" --audiomap "$PROJECT_DIR/audiomap.json"

Fix any it reports — a missing or blank frame file means that worker wrote a partial file; re-dispatch it (Step 4) and re-assemble.

Gate: index.html exists; total duration == audiomap.audio.duration_sec.


Step 6: Verify and render

Goal: Verify the assembled video, get user approval, and render the final MP4.

Run the CLI on the assembled project — that's the correct unit (the per-frame workers couldn't run it). lint checks structure, validate runs headless Chrome (catching JS errors and missing assets), inspect snapshots frames.

( cd "$PROJECT_DIR" && npx hyperframes lint . && npx hyperframes validate . && npx hyperframes inspect . )

Inspect at t=0, each frame start, the strongest DROP / SURGE, every hard_stops[].t, and the final frame. On failure, make the cheapest safe fix yourself: edit the offending compositions/frames/NN-*.html. Never change duration or audio timing to hide a sync issue. Once the gates pass, pause for user review, then render only on approval:

( cd "$PROJECT_DIR" && npx hyperframes render . --skill=music-to-video -q draft -o renders/video.mp4 --fps 30 )

Gate: lint / validate / inspect passed; the user approved; renders/video.mp4 exists with audio, duration == audiomap.audio.duration_sec. The final reply states the MP4 path and duration.


Resume table

You haveContinue from
assets/bgm.mp3 onlyStep 1
audiomap.jsonStep 2
STORYBOARD.md (skeleton)Step 3
STORYBOARD.md (complete)Step 4
all frame filesStep 5
index.htmlStep 6

Quick Reference

Formats: landscape 1920x1080 by default; portrait 1080x1920; square 1080x1080. Set the canvas once in the storyboard frontmatter (canvas: { w, h, fps }).

Scripts under scripts/: analyze-beatgrid.py (the one analyzer), validate-plan.mjs (plan check), assemble-index.mjs (index assembly), stage-assets.mjs (stage user media), lib/storyboard.mjs (vendored parser). Everything else is the hyperframes CLI.

ReadWhen
references/frame-skeleton.mdStep 2: read the music, lay out the frames, set pacing
references/planning.md · storyboard-format.mdStep 3: pick the brand, fill each frame, write the plan
references/template-catalog.mdStep 3: pick a template per group
references/motion-primitive-catalog.mdStep 3/4: L0 recipes for free-compose
references/montage.mdStep 3/4: asset treatments (beat-cut / ken-burns)
sub-agents/frame-worker.mdStep 4: dispatch + build one frame
../hyperframes-core/references/subagent-dispatch.mdStep 4: dispatch sub-agents safely
../hyperframes-creative/references/design-spec.mdStep 3: pick the preset (the brand)

Directory layout

music-to-video/
  SKILL.md
  references/   frame-skeleton.md · planning.md · storyboard-format.md
                template-catalog.md · motion-primitive-catalog.md · montage.md
                templates/<id>/          { index.html (+ assets/ · program.json) }  ← L1 catalog impls
                motion-primitives/<id>/  { index.html } (+ ../assets/gsap.min.js shared by recipes) ← L0 catalog impls
  scripts/      analyze-beatgrid.py · assemble-index.mjs · validate-plan.mjs · stage-assets.mjs · lib/storyboard.mjs
  sub-agents/   frame-worker.md   ← the one subagent (one per frame)