PluginBench
Skill
Fail
Audit score 45

mmx-cli

minimax-ai/cli

Generate text, images, video, speech, and music via MiniMax AI from the terminal.

What is mmx-cli?

mmx-cli is a command-line interface for the MiniMax AI platform, enabling you to create diverse media content and perform web searches. Use it when you need to generate text completions, images, videos, speech, or music, or manage MiniMax API resources programmatically.

  • Chat with MiniMax language models (default: MiniMax-M2.7) with multi-turn conversation support
  • Generate images with customizable dimensions, aspect ratios, and optional subject references
  • Create videos asynchronously using MiniMax-Hailuo models with polling or webhook callbacks
  • Synthesize speech from text with voice selection, speed/pitch control, and optional subtitles
  • Generate music with lyrics, vocal styles, genre, mood, and instrument specifications
  • Support for agent/CI contexts with non-interactive flags, JSON output, and dry-run mode

How to install mmx-cli

npx skills add null --skill mmx-cli
Prerequisites
  • Node.js and npm installed
  • MiniMax API key (sk-xxxxx format) or OAuth login credentials
  • Authentication via `mmx auth login --api-key` or `mmx auth login` (persists to ~/.mmx/)
Claude Code
Cursor
Windsurf
Cline

How to use mmx-cli

  1. 1.Install globally: `npm install -g mmx-cli`
  2. 2.Authenticate: `mmx auth login --api-key sk-xxxxx` or `mmx auth login` for OAuth
  3. 3.Verify auth: `mmx auth status`
  4. 4.Use agent flags in scripts: `--non-interactive --quiet --output json` for reliable automation
  5. 5.Call desired command (text chat, image generate, video generate, speech synthesize, or music generate) with required and optional parameters
  6. 6.For video generation, use `--async` to get task ID immediately or omit to poll until completion

Use cases

Good for
  • Generate product descriptions or marketing copy using text chat with system prompts
  • Create multiple variations of promotional images with different aspect ratios and seeds
  • Produce short-form video content asynchronously while polling for completion status
  • Convert long-form articles or scripts into audio narration with custom voice and speed settings
  • Compose background music or theme songs for projects with specified mood and instrumentation
Who it's for
  • Content creators and marketers generating media at scale
  • Developers building AI-powered applications requiring multimodal generation
  • Automation engineers integrating MiniMax capabilities into CI/CD pipelines
  • Teams needing programmatic access to text, image, video, speech, and music generation

mmx-cli FAQ

How do I use mmx-cli in non-interactive/agent contexts?

Always include `--non-interactive --quiet --output json` flags. These prevent prompts, suppress spinners, and return machine-readable JSON. Use `--dry-run` to preview requests without executing.

Can I generate video asynchronously?

Yes. Use `--async` or `--no-wait` to return the task ID immediately. Omit these flags to poll until completion (default interval: 5 seconds, adjustable with `--poll-interval`).

How do I pass multiple messages for multi-turn chat?

Use `--message` multiple times with role prefixes (e.g., `--message "system:You are helpful" --message "user:Hello"`), or load from a JSON file with `--messages-file <path>`.

What audio formats and sample rates does speech synthesis support?

Default format is mp3 with 32000 Hz sample rate and 128000 bps bitrate. Customize with `--format`, `--sample-rate`, and `--bitrate` flags. Subtitles can be saved alongside audio with `--subtitles`.

How do I generate music with lyrics?

Provide `--lyrics` with structured text, or use `--lyrics-optimizer` to auto-generate from the prompt. For instrumental music, use `--instrumental`. Customize with `--vocals`, `--genre`, `--mood`, `--instruments`, and `--tempo`.

Full instructions (SKILL.md)

Source of truth, from minimax-ai/cli.


name: mmx-cli description: Use mmx to generate text, images, video, speech, and music via the MiniMax AI platform. Use when the user wants to create media content, chat with MiniMax models, perform web search, or manage MiniMax API resources from the terminal.

MiniMax CLI — Agent Skill Guide

Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform.

Prerequisites

# Install
npm install -g mmx-cli

# Auth (OAuth persists to ~/.mmx/credentials.json, API key persists to ~/.mmx/config.json)
mmx auth login --api-key sk-xxxxx

# Verify active auth source
mmx auth status

# Or pass per-call
mmx text chat --api-key sk-xxxxx --message "Hello"

Region is auto-detected. Override with --region global or --region cn.


Agent Flags

Always use these flags in non-interactive (agent/CI) contexts:

FlagPurpose
--non-interactiveFail fast on missing args instead of prompting
--quietSuppress spinners/progress; stdout is pure data
--output jsonMachine-readable JSON output
--asyncReturn task ID immediately (video generation)
--dry-runPreview the API request without executing
--yesSkip confirmation prompts

Commands

text chat

Chat completion. Default model: MiniMax-M2.7.

mmx text chat --message <text> [flags]
FlagTypeDescription
--message <text>string, required, repeatableMessage text. Prefix with role: to set role (e.g. "system:You are helpful", "user:Hello")
--messages-file <path>stringJSON file with messages array. Use - for stdin
--system <text>stringSystem prompt
--model <model>stringModel ID (default: MiniMax-M2.7)
--max-tokens <n>numberMax tokens (default: 4096)
--temperature <n>numberSampling temperature (0.0, 1.0]
--top-p <n>numberNucleus sampling threshold
--streambooleanStream tokens (default: on in TTY)
--tool <json-or-path>string, repeatableTool definition JSON or file path
# Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet

# Multi-turn
mmx text chat \
  --system "You are a coding assistant." \
  --message "user:Write fizzbuzz in Python" \
  --output json

# From file
cat conversation.json | mmx text chat --messages-file - --output json

stdout: response text (text mode) or full response object (json mode).


image generate

Generate images. Model: image-01.

mmx image generate --prompt <text> [flags]
FlagTypeDescription
--prompt <text>string, requiredImage description
--aspect-ratio <ratio>stringe.g. 16:9, 1:1. Ignored if --width and --height are both set
--n <count>numberNumber of images (default: 1)
--seed <n>numberRandom seed for reproducible generation
--width <px>numberWidth in pixels (512–2048, multiple of 8). Requires --height
--height <px>numberHeight in pixels (512–2048, multiple of 8). Requires --width
--prompt-optimizerbooleanOptimize prompt before generation
--aigc-watermarkbooleanEmbed AI-generated content watermark
--subject-ref <params>stringSubject reference: type=character,image=path-or-url
--response-format <format>stringurl (default) or base64. Base64 bypasses CDN download
--out-dir <dir>stringDownload images to directory
--out-prefix <prefix>stringFilename prefix (default: image)
mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
# stdout: image URLs (one per line in quiet mode)

mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)

video generate

Generate video. Default model: MiniMax-Hailuo-2.3. This is an async task — by default it polls until completion.

mmx video generate --prompt <text> [flags]
FlagTypeDescription
--prompt <text>string, requiredVideo description
--model <model>stringMiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast
--first-frame <path-or-url>stringFirst frame image
--callback-url <url>stringWebhook URL for completion
--download <path>stringSave video to specific file
--asyncbooleanReturn task ID immediately
--no-waitbooleanSame as --async
--poll-interval <seconds>numberPolling interval (default: 5)
# Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}

# Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
# stdout: ocean.mp4

video task get

Query status of a video generation task.

mmx video task get --task-id <id> [--output json]

video download

Download a completed video by task ID.

mmx video download --file-id <id> [--out <path>]

speech synthesize

Text-to-speech. Default model: speech-2.8-hd. Max 10k chars.

mmx speech synthesize --text <text> [flags]
FlagTypeDescription
--text <text>stringText to synthesize
--text-file <path>stringRead text from file. Use - for stdin
--model <model>stringspeech-2.8-hd (default), speech-2.6, speech-02
--voice <id>stringVoice ID (default: English_expressive_narrator)
--speed <n>numberSpeed multiplier
--volume <n>numberVolume level
--pitch <n>numberPitch adjustment
--format <fmt>stringAudio format (default: mp3)
--sample-rate <hz>numberSample rate (default: 32000)
--bitrate <bps>numberBitrate (default: 128000)
--channels <n>numberAudio channels (default: 1)
--language <code>stringLanguage boost
--subtitlesbooleanDownload and save subtitles as .srt file (alongside --out audio file). API must support subtitles for the selected model.
--pronunciation <from/to>string, repeatableCustom pronunciation
--sound-effect <effect>stringAdd sound effect
--out <path>stringSave audio to file
--streambooleanStream raw audio to stdout
mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
# stdout: hello.mp3

mmx speech synthesize --text "Hello" --subtitles --out hello.mp3
# saves hello.mp3 + hello.srt (SRT subtitle file)

echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3

music generate

Generate music. Responds well to rich, structured descriptions.

Model: music-2.6-free — unlimited for API key users, RPM = 3.

mmx music generate --prompt <text> [--lyrics <text>] [flags]
FlagTypeDescription
--prompt <text>stringMusic style description (can be detailed)
--lyrics <text>stringSong lyrics with structure tags. Required unless --instrumental or --lyrics-optimizer is used.
--lyrics-file <path>stringRead lyrics from file. Use - for stdin
--lyrics-optimizerbooleanAuto-generate lyrics from prompt. Cannot be used with --lyrics or --instrumental.
--instrumentalbooleanGenerate instrumental music (no vocals). Cannot be used with --lyrics.
--vocals <text>stringVocal style, e.g. "warm male baritone", "bright female soprano", "duet with harmonies"
--genre <text>stringMusic genre, e.g. folk, pop, jazz
--mood <text>stringMood or emotion, e.g. warm, melancholic, uplifting
--instruments <text>stringInstruments to feature, e.g. "acoustic guitar, piano"
--tempo <text>stringTempo description, e.g. fast, slow, moderate
--bpm <number>numberExact tempo in beats per minute
--key <text>stringMusical key, e.g. C major, A minor, G sharp
--avoid <text>stringElements to avoid in the generated music
--use-case <text>stringUse case context, e.g. "background music for video", "theme song"
--structure <text>stringSong structure, e.g. "verse-chorus-verse-bridge-chorus"
--references <text>stringReference tracks or artists, e.g. "similar to Ed Sheeran"
--extra <text>stringAdditional fine-grained requirements
--aigc-watermarkbooleanEmbed AI-generated content watermark
--format <fmt>stringAudio format (default: mp3)
--sample-rate <hz>numberSample rate (default: 44100)
--bitrate <bps>numberBitrate (default: 256000)
--out <path>stringSave audio to file
--streambooleanStream raw audio to stdout

At least one of --prompt or --lyrics is required.

# With lyrics
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet

# Auto-generate lyrics from prompt
mmx music generate --prompt "Upbeat pop about summer" --lyrics-optimizer --out summer.mp3 --quiet

# Instrumental
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3 --quiet

# Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
  --vocals "male and female duet, harmonies in chorus" \
  --instruments "acoustic guitar, piano" \
  --bpm 95 \
  --lyrics-file song.txt \
  --out duet.mp3

music cover

Generate a cover version of a song based on reference audio.

Model: music-cover-free — unlimited for API key users, RPM = 3.

mmx music cover --prompt <text> (--audio <url> | --audio-file <path>) [flags]
FlagTypeDescription
--prompt <text>string, requiredTarget cover style, e.g. "Indie folk, acoustic guitar, warm male vocal"
--audio <url>stringURL of reference audio (mp3, wav, flac, etc. — 6s to 6min, max 50MB)
--audio-file <path>stringLocal reference audio file (auto base64-encoded)
--lyrics <text>stringCover lyrics. If omitted, extracted from reference audio via ASR.
--lyrics-file <path>stringRead lyrics from file. Use - for stdin
--seed <number>numberRandom seed 0–1000000 for reproducible results
--format <fmt>stringAudio format: mp3, wav, pcm (default: mp3)
--sample-rate <hz>numberSample rate (default: 44100)
--bitrate <bps>numberBitrate (default: 256000)
--channel <n>numberChannels: 1 (mono) or 2 (stereo, default)
--out <path>stringSave audio to file
--streambooleanStream raw audio to stdout
# Cover from URL
mmx music cover --prompt "Indie folk, acoustic guitar, warm male vocal" \
  --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --out cover.mp3 --quiet

# Cover from local file with custom lyrics
mmx music cover --prompt "Jazz, piano, slow" \
  --audio-file original.mp3 --lyrics-file lyrics.txt --out jazz_cover.mp3 --quiet

# Reproducible result with seed
mmx music cover --prompt "Pop, upbeat" --audio https://filecdn.minimax.chat/public/d20eda57-2e36-45bf-9e12-82d9f2e69a86.mp3 --seed 42 --out cover.mp3

vision describe

Image understanding via VLM. Provide either --image or --file-id, not both.

mmx vision describe (--image <path-or-url> | --file-id <id>) [flags]
FlagTypeDescription
--image <path-or-url>stringLocal path or URL (auto base64-encoded)
--file-id <id>stringPre-uploaded file ID (skips base64)
--prompt <text>stringQuestion about the image (default: "Describe the image.")
mmx vision describe --image photo.jpg --prompt "What breed?" --output json

stdout: description text (text mode) or full response (json mode).


search query

Web search via MiniMax.

mmx search query --q <query>
FlagTypeDescription
--q <query>string, requiredSearch query
mmx search query --q "MiniMax AI" --output json --quiet

quota show

Display Token Plan usage and remaining quotas.

mmx quota show [--output json]

Tool Schema Export

Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:

# All tool-worthy commands (excludes auth/config/update)
mmx config export-schema

# Single command
mmx config export-schema --command "video generate"

Use this to dynamically register mmx commands as tools in your agent framework.


Exit Codes

CodeMeaning
0Success
1General error
2Usage error (bad flags, missing args)
3Authentication error
4Quota exceeded
5Timeout
10Content filter triggered

Piping Patterns

# stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'

# stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2>/dev/null

# Chain: generate image → describe it
URL=$(mmx image generate --prompt "A sunset" --quiet)
mmx vision describe --image "$URL" --quiet

# Async video workflow
TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
mmx video task get --task-id "$TASK" --output json
mmx video download --task-id "$TASK" --out robot.mp4

Configuration Precedence

CLI flags → environment variables → ~/.mmx/config.json → defaults.

# Persistent config
mmx config set --key region --value cn
mmx config show

# Environment
export MINIMAX_API_KEY=sk-xxxxx
export MINIMAX_REGION=cn

Default Model Configuration

Set per-modality defaults so you don't need --model every time:

# Set defaults
mmx config set --key default-text-model --value MiniMax-M2.7-highspeed
mmx config set --key default-speech-model --value speech-2.8-hd
mmx config set --key default-video-model --value MiniMax-Hailuo-2.3
mmx config set --key default-music-model --value music-2.6

# Use without --model
mmx text chat --message "Hello"
mmx speech synthesize --text "Hello" --out hello.mp3
mmx video generate --prompt "Ocean waves"
mmx music generate --prompt "Upbeat pop" --instrumental

# --model still overrides per-call
mmx text chat --model MiniMax-M2.7 --message "Hello"

Resolution priority: --model flag > config default > hardcoded fallback.