image-edit
agentspace-so/runcomfy-agent-skills
Intent-routed image editing: picks the right model (batch, text rewrite, precise local, or inpaint) based on what you ask.
What is image-edit?
image-edit is a skill for coding agents that acts as a smart router between four RunComfy image-editing models. When you describe an edit, the skill classifies your intent and calls the appropriate model via the RunComfy CLI: Nano Banana Edit for batch or identity-preserving edits, GPT Image 2 Edit for multilingual in-image text rewrites and multi-reference composition, Flux Kontext Pro for single-shot precise local edits, and Z-Image Turbo Inpaint for mask-driven region replacement or object removal. Each model's prompting patterns are bundled so the agent can produce sharper results without wasting iterations on the wrong model.
- Routes edit requests to one of four RunComfy models based on detected user intent
- Supports batch editing of 1–20 images with locked aspect ratio and resolution via Nano Banana Edit
- Handles multilingual in-image text rewriting (Japanese, Cyrillic, Arabic, etc.) via GPT Image 2 Edit
- Supports multi-reference composition (subject, scene, palette from separate images) via GPT Image 2 Edit
- Enables single-shot precise local edits with high-fidelity preservation via Flux Kontext Pro
- Performs mask-driven object removal and region replacement via Z-Image Turbo Inpaint
How to install image-edit
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill image-edit- Node.js available to run npx
- RunComfy CLI installed globally: npm i -g @runcomfy/cli
- RunComfy account created at runcomfy.com
- Authenticated via runcomfy login, or RUNCOMFY_TOKEN environment variable set for CI/containers
- Input images hosted at publicly accessible HTTPS URLs
How to use image-edit
- 1.Install the skill: npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill image-edit
- 2.Ensure the RunComfy CLI is installed (npm i -g @runcomfy/cli) and you are logged in (runcomfy login)
- 3.Describe your edit to the agent in natural language (e.g., 'swap the background', 'rewrite the headline in Japanese', 'remove the cable on the left')
- 4.The skill classifies your intent and selects the appropriate model automatically
- 5.For batch edits, provide 1–20 image URLs and optionally specify aspect_ratio and resolution for consistency
- 6.For multilingual text rewrites, quote the exact new text and name the script (e.g., 'Japanese kana')
- 7.For mask-driven inpainting, supply a mask image URL alongside the source image
- 8.Outputs are saved to the directory specified via --output-dir in the CLI call
Use cases
- Batch-editing a product SKU gallery with consistent aspect ratio and identity preservation
- Rewriting in-image headlines into multiple languages (e.g., Japanese kana, Arabic) without touching the rest of the image
- Compositing a subject from one photo into a scene from another photo
- Removing cables, watermarks, or distractions from images using a mask
- Swapping backgrounds while preserving subject identity across a series of images
- Developers building image-editing pipelines with coding agents like Claude Code or Cursor
- E-commerce teams automating product image variants and localization
- Designers who need precise local edits or background swaps without manual masking tools
- Marketing teams producing multilingual ad creatives from a single source asset
- Anyone using RunComfy who wants agent-driven model selection for image editing tasks
image-edit FAQ
Nano Banana Edit (google/nano-banana-2/edit) is the default. It supports both single and batch edits and is the most flexible option.
Yes, Nano Banana Edit accepts 1–20 image URLs per call. Lock aspect_ratio and resolution for consistent batch output. GPT Image 2 Edit accepts up to 10 URLs but treats the first as primary and the rest as auxiliary references.
No. A RunComfy account and either a CLI login session or a RUNCOMFY_TOKEN environment variable are required to make API calls.
GPT Image 2 Edit is selected for multilingual typography. In your prompt, quote the exact new text and name the script explicitly (e.g., 'Japanese kana', 'Cyrillic', 'Arabic right-to-left').
Use Z-Image Turbo Inpaint when you have a precise mask defining the region to remove or replace. It offers tunable strength and edge-consistent results. Nano Banana Edit handles removal via spatial language in the prompt without a mask.
Full instructions (SKILL.md)
Source of truth, from agentspace-so/runcomfy-agent-skills.
name: image-edit
displayName: "Image Edit — Pro Pack on RunComfy"
description: >
Edit images on RunComfy — this skill is a smart router that matches
the user's intent to the right edit model in the RunComfy catalog.
Picks Nano Banana Edit (batch up to 20, identity-preserving default),
OpenAI GPT Image 2 Edit (multilingual in-image text rewrite,
multi-ref composition, layout precision), Flux Kontext Pro
(single-ref high-fidelity local edit), or Z-Image Turbo Inpaint
(mask-driven precise region edit). Bundles each model's documented
prompting patterns so the skill gets sharper edits without burning
iterations on the wrong model. Calls runcomfy run <vendor>/<model>/edit
through the local RunComfy CLI. Triggers on "image edit", "edit image",
"image-to-image", "i2i", "swap background", "remove object",
"rewrite headline", or any explicit ask to edit a single or batch
of images.
homepage: https://www.runcomfy.com
license: MIT
Image Edit — Pro Pack on RunComfy
runcomfy.com · Nano Banana Edit · GPT Image 2 Edit · Flux Kontext · Z-Image Inpaint · GitHub
Image edit, intent-routed. This skill doesn't lock you to one model — it picks the right edit model in the RunComfy catalog based on what the user actually wants: batch identity-preservation, multilingual text rewrite, single-shot precise edit, or mask-driven region replacement.
npx skills add agentspace-so/runcomfy-skills --skill image-edit -g
Pick the right model for the user's intent
| User intent | Model | Why |
|---|---|---|
| Batch edit 1–20 images consistently (SKU gallery, A/B variants) | Nano Banana Edit | Up to 20 input images per call; locked aspect/resolution for series |
| Swap background, preserve subject identity | Nano Banana Edit | Strong identity preservation under "keep X unchanged" prompts |
| Localized object removal / addition with spatial language ("the left object", "upper-right corner") | Nano Banana Edit | Honors directional spatial scope |
| Multilingual / non-Latin in-image text rewrite (Japanese kana, Cyrillic, Arabic) | GPT Image 2 Edit | Strongest in class for multilingual typography |
| Multi-reference composition (subject from img1, scene from img2, palette from img3) | GPT Image 2 Edit | Numbered refs route cues correctly |
| Layout-precise repositioning ("move headline from top-right to bottom-center") | GPT Image 2 Edit | Directional language honored at layout level |
| Identity preservation across translated headline variants | GPT Image 2 Edit | Same source asset → many language variants, identity stable |
| Single-shot precise local edit ("she's now holding an orange umbrella") | Flux Kontext Pro | Single-ref single-instruction, high-fidelity preservation |
| Mask-driven object removal (cables, watermarks, distractions) | Z-Image Turbo Inpaint | Mask-required, strength-tunable, edge-consistent |
| Mask-driven region replacement (full background swap with mask) | Z-Image Turbo Inpaint | High strength + clean mask = clean replacement |
| Default if unspecified | Nano Banana Edit | Most flexible, supports both single and batch |
The agent reads this table, classifies the user's intent, and picks the matching subsection below.
Prerequisites
- RunComfy CLI —
npm i -g @runcomfy/cli - RunComfy account —
runcomfy login. - CI / containers — set
RUNCOMFY_TOKEN=<token>.
Route 1: Nano Banana Edit — default for general edit + batch
Model: google/nano-banana-2/edit
Schema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Lead with preservation goals, end with the change. |
image_urls | array | yes | — | 1–20 publicly-fetchable HTTPS URLs. |
number_of_images | int | no | 1 | 1–4 outputs per call. |
aspect_ratio | enum | no | auto | auto follows input; lock for batch consistency. |
resolution | enum | no | 1K | 0.5K / 1K / 2K / 4K. |
output_format | enum | no | png | png / jpeg / webp. |
seed | int | no | — | Reproducibility. |
enable_web_search | bool | no | false | Web-grounded edits (extra latency). |
Invoke
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Keep the subject identity, pose, and clothing unchanged. Convert the background into a rainy neon cyberpunk street.",
"image_urls": ["https://.../portrait.jpg"]
}' \
--output-dir <absolute/path>
Batch (lock aspect + resolution):
runcomfy run google/nano-banana-2/edit \
--input '{
"prompt": "Replace the watermark in the bottom-right with the text \"AURA\" in clean white sans-serif. Keep everything else exactly as in the input.",
"image_urls": ["https://.../sku-1.jpg", "https://.../sku-2.jpg", "https://.../sku-3.jpg"],
"aspect_ratio": "1:1",
"resolution": "1K"
}' \
--output-dir <absolute/path>
Prompting tips
- Preservation first:
"Keep [identity / pose / brand / framing] unchanged."Then state the change. - Spatial scope: "background only", "the left object", "upper-right quadrant" — concrete locations honored.
- Batch consistency: lock
aspect_ratioandresolutionacross the batch. - Iterate small: split compound edits into multiple shorter passes.
Route 2: GPT Image 2 Edit — multilingual text + multi-ref composition
Model: openai/gpt-image-2/edit
Schema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Edit instruction; lead with preservation. |
images | string[] | yes | — | Up to 10 HTTPS URLs. First is primary; rest are auxiliary. |
size | enum | no | auto | auto, 1024_1024, 1024_1536, 1536_1024. Only these. |
Invoke
Multilingual text rewrite:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight.",
"images": ["https://.../poster-en.jpg"]
}' \
--output-dir <absolute/path>
Multi-ref composition:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity unchanged.",
"images": ["https://.../subject.jpg", "https://.../room.jpg"]
}' \
--output-dir <absolute/path>
Prompting tips
- Quote in-image text exactly. Name the script for non-Latin:
"Japanese kana","Cyrillic","Arabic right-to-left". - Number multi-refs:
"subject from image 1, lighting from image 2". - Directional layout language:
"move the headline from top-right to bottom-center","replace the watermark in the bottom-right". size: "auto"preserves input ratio — recommended unless the edit changes framing.
Route 3: Flux Kontext Pro — single-shot precise local edit
Model: blackforestlabs/flux-1-kontext/pro/edit
Schema (minimal)
| Field | Type | Required | Notes |
|---|---|---|---|
prompt | string | yes | One declarative edit instruction. |
image | string | yes | Single source image URL. |
aspect_ratio | enum | no | Pick from supported W:H values. |
seed | int | no | Reproducibility. |
Single image only — no array. For multi-image flows, use Route 1 (Nano Banana Edit).
Invoke
runcomfy run blackforestlabs/flux-1-kontext/pro/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and clothing unchanged. Add an orange umbrella in her left hand and a slight smile.",
"image": "https://.../portrait.jpg"
}' \
--output-dir <absolute/path>
Prompting tips
- One declarative instruction. "She is now holding an orange umbrella and smiling" — imperative, single change.
- Preservation first. Lead with
"Keep [unchanged elements]"then state the change. - Iterate small. Compound edits drift on a single pass; split into sequential passes.
Route 4: Z-Image Turbo Inpaint — mask-driven precise region edit
Model: tongyi-mai/z-image/turbo/inpainting
Schema
| Field | Type | Required | Notes |
|---|---|---|---|
prompt | string | yes | What to fill / replace; preservation constraints for the unmasked surround. |
image | string | yes | Source image URL. |
mask_image | string | yes | Grayscale mask URL (white = inpaint, black = preserve). |
strength | float | no | 0.3–0.6 retouching, 0.7–1.0 full replacement. |
control_scale | float | no | 0.6–0.9 typical. |
aspect_ratio | enum | no | W:H output ratio. |
seed | int | no | Reproducibility. |
Invoke
Object removal (low strength):
runcomfy run tongyi-mai/z-image/turbo/inpainting \
--input '{
"prompt": "Remove overhead cables; preserve rooflines and sky gradient; thin clean sky.",
"image": "https://.../street.jpg",
"mask_image": "https://.../cables-mask.png",
"strength": 0.5,
"control_scale": 0.8
}' \
--output-dir <absolute/path>
Region replacement (high strength):
runcomfy run tongyi-mai/z-image/turbo/inpainting \
--input '{
"prompt": "Replace busy backdrop with smooth light gray studio paper; mask background only.",
"image": "https://.../product.jpg",
"mask_image": "https://.../bg-mask.png",
"strength": 0.9
}' \
--output-dir <absolute/path>
Prompting tips
- A mask URL is required — grayscale, white = inpaint region, black = preserve. Slight blur on mask edges (1–3px) blends better than sharp binary.
- Strength by intent:
0.3–0.5for retouching / cleanup,0.6–0.7for object replacement with style match,0.8–1.0for full-region replacement. - Name what stays outside the mask in the prompt:
"preserve rooflines and sky gradient","match brick pattern and mortar tone". - Spatial labels still help even though the mask defines the region:
"the left shelf","upper-right quadrant".
Limitations
- Each route inherits its model's limits. Nano Banana: 1–20 inputs, 1–4 outputs. GPT Image 2 Edit: up to 10 refs, 4 fixed sizes. Flux Kontext: single ref. Z-Image Inpaint: mask required.
- No multi-route blending. This skill picks one model per call.
- Brand-specific overrides — if the user named a specific model, route to the corresponding brand skill (
gpt-image-edit,flux-kontext,nano-banana-edit) for fuller treatment.
Exit codes
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
How it works
The skill picks one of Nano Banana Edit / GPT Image 2 Edit / Flux Kontext Pro / Z-Image Turbo Inpaint based on user intent and invokes runcomfy run <model_id> with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.
Security & Privacy
- Token storage:
runcomfy loginwrites the API token to~/.config/runcomfy/token.jsonwith mode 0600 (owner-only read/write). SetRUNCOMFY_TOKENenv var to bypass the file entirely in CI / containers. - Input boundary: the user prompt is passed as a JSON string to the CLI via
--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content. - Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
- Outbound endpoints: only
model-api.runcomfy.net(request submission) and*.runcomfy.net/*.runcomfy.com(download whitelist for generated outputs). No telemetry, no callbacks. - Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
Related skills
More from agentspace-so/runcomfy-agent-skills and the wider catalog.
video-edit
>
image-to-video
>
nano-banana-2
>
nano-banana-edit
>
flux-kontext
>
wan-2-7
>