Skill

Review

Audit score 70

hyperframes-media

heygen-com/hyperframes

Audio and media engine for HyperFrames: TTS, background music, sound effects, transcription, captions, and background removal.

Source View on skills.sh

What is hyperframes-media?

Unified audio engine (`scripts/audio.mjs`) that generates voiceovers, background music, sound effects, and captions for HyperFrames compositions. Supports multi-provider TTS (HeyGen/ElevenLabs/Kokoro), music retrieval or local generation, sound effect library access, and caption authoring. Degrades gracefully from cloud services to local fallbacks based on credential availability.

Generate voiceovers via HeyGen Starfish TTS, ElevenLabs, or local Kokoro (with native word timestamps from HeyGen)
Retrieve background music from HeyGen audio library or generate locally via Lyria/MusicGen
Retrieve sound effects from HeyGen library or use bundled 21-file fallback library
Transcribe audio with Whisper and extract per-word timing data
Author captions, subtitles, lyrics, and karaoke styling with per-word control
Remove backgrounds from media assets

How to install hyperframes-media

npx skills add https://github.com/heygen-com/hyperframes --skill hyperframes-media

Prerequisites

HeyGen account (optional but recommended; sign in via `npx hyperframes auth login`)
For local TTS fallback: Kokoro-82M (no additional setup required)
For local BGM generation: Lyria or MusicGen (Python dependencies; optional)
Node.js and npm to run the audio engine script

Claude Code

Cursor

Windsurf

Cline

How to use hyperframes-media

1.Run `npx hyperframes auth status` to check credential status and show sign-in guidance before generating audio
2.Create an `audio_request.json` file with lines (text + optional SFX cues) and BGM mode/query/prompt
3.Execute `node <MEDIA_DIR>/scripts/audio.mjs --request ./audio_request.json --hyperframes . --out ./audio_meta.json` to generate assets
4.If BGM is set to generate mode, run `scripts/wait-bgm.mjs` before assembly to wait for detached generation to complete
5.Use `--only tts,bgm,sfx` flag to run subsets and merge results into existing output (e.g., TTS+BGM first, then SFX)
6.Reference the output `audio_meta.json` (id-keyed metadata with paths, durations, word timings) in your composition assembly

Use cases

Good for

Create narrated video compositions with precise word-level timing for animation sync
Generate background music matching mood prompts or retrieve licensed tracks from HeyGen library
Add sound effects to scenes using semantic search or fallback to local bundled library
Build subtitle/karaoke tracks with per-word styling and timing information
Produce multi-language voiceovers by switching TTS providers and languages

Who it's for

Video composition developers building HyperFrames projects
Content creators needing automated voiceover and music generation
Developers building interactive or animated video experiences
Teams working with multi-language or multi-voice narration
Workflows requiring precise audio timing and caption synchronization

hyperframes-media FAQ

What happens if I don't have a HeyGen credential?

The engine degrades gracefully: TTS falls back to ElevenLabs (if `$ELEVENLABS_API_KEY` is set) then Kokoro local; BGM switches from retrieval to local Lyria/MusicGen generation; SFX uses the bundled 21-file library. Always run `npx hyperframes auth status` first to show the user their options and let them decide whether to sign in.

How do I get native word timestamps for captions?

HeyGen TTS provides native word timestamps. For other providers (ElevenLabs, Kokoro), chain the `transcribe` step after TTS generation to extract per-word timing. Pass `--words narration.words.json` to `scripts/heygen-tts.mjs` to capture HeyGen word data.

Can I generate background music without a HeyGen credential?

Yes. Set `bgm.mode` to `generate` in your request. The engine will use local Lyria or MusicGen. BGM generation is spawned detached, so run `scripts/wait-bgm.mjs` before assembling to ensure it completes.

How do I use sound effects from my own library instead of the bundled one?

The engine retrieves SFX from the HeyGen audio library by default (with HeyGen credential) or falls back to the bundled 21-file library. Custom SFX integration is not covered in the core engine; reference `references/sfx.md` for details on the bundled manifest and retrieval behavior.

What is the `--only` flag for?

Use `--only tts,bgm,sfx` to run a subset of capabilities and merge results into an existing `--out` file. For example, generate TTS+BGM early, then add SFX once cues are finalized, without regenerating earlier assets.

Full instructions (SKILL.md)

Source of truth, from heygen-com/hyperframes.

name: hyperframes-media description: Audio and media assets for HyperFrames compositions, produced by one shared audio engine (`scripts/audio.mjs`) — multi-provider TTS (HeyGen / ElevenLabs / Kokoro local), background music + sound effects (HeyGen audio-library retrieval by default, with local Lyria / MusicGen BGM generation and a bundled SFX library as the no-credential fallback), Whisper transcription, background removal, and caption authoring. Use for voiceover / TTS, BGM, SFX / sound effects, transcription, captions / subtitles / lyrics / karaoke / per-word styling, voice + provider selection, and music-mood prompting.

HyperFrames Media

Create the audio and media assets a composition needs — voiceover (TTS), background music + sound effects, transcription, captions, background removal — then consume and animate that data in HTML. For placing assets into compositions, see hyperframes-core.

The audio engine — one source for TTS · BGM · SFX

Workflows do NOT hand-roll audio or vendor a copy. There is one engine — scripts/audio.mjs — that takes a neutral audio_request.json and writes audio_meta.json (plus assets under assets/voice|bgm|sfx):

# <MEDIA_DIR> = this skill's directory
node <MEDIA_DIR>/scripts/audio.mjs --request ./audio_request.json --hyperframes . --out ./audio_meta.json

All three capabilities degrade on ONE switch — whether a HeyGen credential is present (resolved from $HEYGEN_API_KEY / $HYPERFRAMES_API_KEY / ~/.heygen, not the CLI):

Capability	HeyGen credential present	absent
TTS	HeyGen Starfish REST (native word timestamps)	→ ElevenLabs → Kokoro (chain `transcribe` for words)
BGM	HeyGen music retrieval	Lyria → MusicGen local generation (detached)
SFX	HeyGen sound-effects retrieval (min_score 0.4)	bundled 21-file library (`assets/sfx/`)

Request (audio_request.json): { provider?, lang?, speed?, lines: [{ id, text, sfx?: [names] }], bgm: { mode?, query?, prompt? } }. id joins each line back to the caller's model (a frame number, a scene id, …). bgm.mode = retrieve | generate | none; omit for auto (retrieve when credentialed, else generate). An explicit retrieve is strict — it skips rather than starting a detached generate (for callers with no wait-bgm step).
Output (audio_meta.json, id-keyed): { tts_provider, voice_id, bgm, bgm_pending, …, voices: [{ id, path, duration_s, words }], sfx: [{ id, name, file, source, offset_s, duration_s, volume }], total_duration_s }.
--only tts,bgm,sfx runs a subset and merges into an existing --out (e.g. TTS+BGM early, SFX once cues exist).
BGM generate is spawned detached (bgm_pending: true) — run scripts/wait-bgm.mjs before assembling.
scripts/heygen-tts.mjs is a single-shot CLI over the same code (one text → wav + words) for when you just need HeyGen TTS without a request file.

Full flag list + the audio_meta.json schema live in the header of scripts/audio.mjs. The references below cover the provider details and edge cases behind each capability.

Preflight — show sign-in status before any audio

Always run this before generating voice or BGM — inside a full workflow or a one-off "generate me a BGM/voiceover" request. No HeyGen credential is not a reason to silently fall back to local engines: first recommend signing in and let the user decide. Run the shared preflight and relay its output verbatim — don't improvise your own "missing key" prompt, and don't offer to write keys into a per-repo .env:

npx hyperframes auth status

Signed in → it prints the account; proceed.
Not signed in (exit 1 is expected here — "not signed in" is a normal state, not a failure) → it prints registration-first guidance. Recommend signing in: npx hyperframes auth login is browser OAuth — it signs in and creates an account (always available through this repo's CLI). To use an existing HeyGen API key (from app.heygen.com/settings/api), run npx hyperframes auth login --api-key — it saves to the shared ~/.heygen (no per-repo .env). The output also lists the local engines voice/BGM will fall back to and a pip hint when deps are missing. Relay this output as-is — don't paraphrase it into your own wording. Then STOP and wait for the user to choose — sign in, or say "go" / "local" to continue offline — before generating anything. This is a real decision point, not a passing note: don't fold it into another question, and don't proceed past it on your own. (Exception: in autonomous / non-interactive mode, note the status and continue offline.)
npx hyperframes auth status --json returns { configured, recommended_action, offline_engines } for deterministic branching.
If the CLI can't run (not on PATH and npx can't fetch it) → still recommend signing in (npx hyperframes auth login) and STOP for the user's choice — don't treat "no credential" as a silent green light for local generation.

Credential resolution, full key priority, and the local-dependency list are in references/requirements.md.

Provider chains (the detail behind the engine)

TTS — first available provider wins (the engine, or npx hyperframes tts "..."):

Order	Provider	Detected when	Word timestamps
1	HeyGen (Starfish)	`$HEYGEN_API_KEY` / `hyperframes auth login`	Yes, native — pass `--words narration.words.json` to capture
2	ElevenLabs	`$ELEVENLABS_API_KEY` set	No — chain `transcribe` after
3	Kokoro-82M (local, 54 voices)	always (no key required)	No — chain `transcribe` after

The published hyperframes tts CLI is often the local-only build (its --help says "Kokoro-82M", no --provider/--words) and silently falls back to Kokoro even with $HEYGEN_API_KEY set. That is why the engine's HeyGen path is the self-contained scripts/heygen-tts.mjs (REST), NOT the CLI; the CLI is used only for the Kokoro path. See references/tts.md.

BGM & SFX — by default retrieved from the HeyGen audio library (/v3/audio/sounds), same credential as HeyGen TTS, with the no-credential fallback from the switch above:

Asset	HeyGen `type`	Lands in	Fallback (no credential)
BGM	`music`	`assets/bgm/track.mp3` (retrieve) · `track.wav` (generate)	Lyria / MusicGen generation
SFX	`sound_effects` (min_score 0.4)	`assets/sfx/<slug>.mp3`	bundled 21-file library (`assets/sfx/*` + `manifest.json`)

See references/bgm.md and references/sfx.md.

Routing

Task	Read
The audio engine — request/meta schema, `--only`, the switch	`scripts/audio.mjs` (header comment)
`npx hyperframes tts` / `heygen-tts.mjs` — providers, voices, words	`references/tts.md`
BGM — HeyGen retrieval + local Lyria / MusicGen generation	`references/bgm.md`
SFX — HeyGen retrieval (min_score 0.4) + bundled local library	`references/sfx.md`
`npx hyperframes transcribe` — Whisper, model rules, output shape	`references/transcribe.md`
`npx hyperframes remove-background` — transparent cutouts	`references/remove-background.md`
TTS → transcription → captions (no recorded voiceover)	`references/tts-to-captions.md`
Caption authoring — style detection, layout, word grouping, exit	`references/captions/authoring.md`
Transcript handling — input formats, quality gates, cleanup, APIs	`references/captions/transcript-handling.md`
Caption motion — karaoke, marker effects, audio-reactive	`references/captions/motion.md`
Model caches, system dependencies, troubleshooting	`references/requirements.md`

Non-negotiable rules

One engine, no vendored copies. Produce audio via scripts/audio.mjs (or heygen-tts.mjs for one-shot HeyGen TTS). Don't re-implement TTS/BGM/SFX inside a workflow — write an audio_request.json adapter and call the engine.
"HeyGen available" = a resolvable credential, not the CLI. The whole switch keys off heygenCredential(); the published hyperframes tts may be Kokoro-only, and there is no hyperframes bgm / hyperframes sfx command at all.
Voice IDs are provider-specific. am_michael is Kokoro-only; HeyGen UUIDs don't work on Kokoro. If you pass --voice, also pin --provider to avoid silent provider drift when the user's env changes.
Always pass --model to transcribe. The CLI default small.en silently translates non-English audio. See references/transcribe.md → "Language Rule".
HeyGen returns word timestamps; ElevenLabs / Kokoro do not. The engine chains transcribe automatically for the latter two; standalone, pass --words to HeyGen or run transcribe against the audio file.
Captions consume the flat word-array format with { id, text, start, end }. See references/transcribe.md → "Output Shape".
remove-background --background-output is hole-cut, not inpainted. For "scene without the person", a different tool is needed. See references/remove-background.md → "When NOT the right tool".
BGM/SFX default to HeyGen retrieval; the no-credential fallback is generation (BGM) or the bundled library (SFX). /audio/sounds ranks by a text query — name effects concretely (glass shatter, not dramatic sound); a no-match skips, never blocks the render. SFX sit at volume ~0.35 under voice + BGM. See references/sfx.md / references/bgm.md.
Treat workflow caption HTML as generated output. For preset-backed videos, the reusable skin source lives at .hyperframes/caption-skin.html and the workflow script writes compositions/captions.html; do not edit generated compositions/captions.html to fix the skin. Rebuild via the workflow's captions.mjs, or use that workflow's explicit overrides mechanism when present.

Related skills

More from heygen-com/hyperframes and the wider catalog.

hyperframes

heygen-com/hyperframes

Router and entry skill for video authoring—renders video from HTML with intent-based workflow selection.

145k installs

hyperframes-cli

heygen-com/hyperframes

CLI for HyperFrames video composition: scaffold, lint, validate, render locally or on AWS Lambda.

141k installsAudited

hyperframes-registry

heygen-com/hyperframes

Install and wire reusable blocks and components into HyperFrames compositions via registry.

137k installs

remotion-to-hyperframes

heygen-com/hyperframes

Port existing Remotion (React) video compositions to HyperFrames HTML—one-way translation only.

120k installs

gsap

heygen-com/hyperframes

GSAP animation reference for HyperFrames compositions with timelines, easing, and performance optimization.

92k installs

website-to-hyperframes

heygen-com/hyperframes

Capture a website and create professional HyperFrames videos from it.

90k installs