gpt-image-edit
agentspace-so/runcomfy-agent-skills
Edit images with OpenAI GPT Image 2 on RunComfy—preserves identity and handles multilingual text in images.
What is gpt-image-edit?
GPT Image 2 `/edit` endpoint for targeted image-to-image edits on RunComfy. Strongest at preserving identity through edits, rewriting embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic), and layout-precise repositioning. Use when you need multilingual text editing, identity preservation, or multi-reference composition (up to 10 images).
- Edit images with preservation of identity, pose, and brand elements
- Rewrite embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic) while keeping layout intact
- Compose subjects from one image into scenes from up to 9 reference images
- Reposition layout elements (headlines, CTAs) with directional language
- Preserve input aspect ratio or resize to fixed dimensions (1:1, 2:3 portrait, 3:2 landscape)
- Batch edit up to 10 reference images per call
How to install gpt-image-edit
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill gpt-image-edit- RunComfy CLI: `npm i -g @runcomfy/cli`
- RunComfy account: `runcomfy login` (or `RUNCOMFY_TOKEN` env var for CI/containers)
- Publicly-fetchable HTTPS URLs for input images
How to use gpt-image-edit
- 1.Prepare your input image(s) as publicly-accessible HTTPS URLs (up to 10 total)
- 2.Craft a prompt leading with preservation goals, then state the change
- 3.For multilingual text, quote the exact characters and name the script (e.g., 'Japanese kana', 'Cyrillic')
- 4.Run `runcomfy run openai/gpt-image-2/edit --input '{...}' --output-dir <path>`
- 5.Retrieve the edited image from the output directory
Use cases
- Multilingual ad localization: one source asset → many language headline variants
- Brand-safe CTA or headline swaps while preserving the rest of the image
- Multi-reference composition: subject from one image, lighting/palette from another
- Layout-precise repositioning: move headline from top-right to bottom-center
- Identity preservation across signage edits for faces and brand marks
- Marketing and creative teams doing ad localization
- Product teams managing multilingual asset variants
- Designers needing layout-precise edits with identity preservation
- E-commerce teams creating localized product imagery
- Agencies handling brand-safe image modifications
gpt-image-edit FAQ
Use GPT Image Edit for multilingual text editing, identity preservation through targeted edits, and layout precision. Use Nano Banana Edit for batch consistency across 20+ SKU images. Use Flux Kontext for single-shot precise local edits with source-fidelity-first priority.
Quote the exact characters and name the script: 'the headline reads "コーヒー" in bold Japanese kana' or 'the label says "АРОМА" in Cyrillic, white on black'. Don't paraphrase—quote the text exactly.
Yes, up to 10 publicly-fetchable HTTPS URLs. The first is primary; the rest are auxiliary cues. Refer to them by number in your prompt: 'subject from image 1, lighting from image 2, color palette from image 3'.
Four options: `auto` (preserves input ratio—recommended), `1024_1024` (1:1), `1024_1536` (2:3 portrait), `1536_1024` (3:2 landscape). Only override `auto` when the edit explicitly changes framing.
Long compound edits (change A and B and C and D) increase drift. Split into multiple passes for better results. Always lead with preservation goals to keep the model focused.
Full instructions (SKILL.md)
Source of truth, from agentspace-so/runcomfy-agent-skills.
name: gpt-image-edit
displayName: "GPT Image Edit — Pro Pack on RunComfy"
description: >
Edit images with OpenAI GPT Image 2 (the /edit endpoint of ChatGPT
Images 2.0) on RunComfy — bundled with the model's documented
prompting patterns so the skill gets sharper output than naive
prompting against the same model. Documents GPT Image Edit's strengths
(preservation language, multilingual in-image text editing,
multi-reference up to 10 images, layout / typography precision),
the schema, and when to route to Nano Banana Edit / Flux Kontext /
GPT Image 2 t2i instead. Calls
runcomfy run openai/gpt-image-2/edit through the local RunComfy CLI.
Triggers on "gpt image edit", "gpt-image-edit", "chatgpt image edit",
"edit with gpt image 2", or any explicit ask to edit with this model.
homepage: https://www.runcomfy.com
license: MIT
GPT Image Edit — Pro Pack on RunComfy
runcomfy.com · Edit endpoint · Text-to-image sibling · GitHub
OpenAI GPT Image 2 — /edit endpoint (ChatGPT Images 2.0 image-to-image) on the RunComfy Model API. Strongest in its class at preserving identity through targeted edits and rewriting embedded text in any script (Latin, kana, CJK, Cyrillic, Arabic).
npx skills add agentspace-so/runcomfy-skills --skill gpt-image-edit -g
When to pick this model (vs siblings)
| You want | Use |
|---|---|
| Edit multilingual / embedded text in image | GPT Image Edit |
| Identity preservation through translated headline variants | GPT Image Edit |
| Layout-precise edit (move headline, swap CTA, etc.) | GPT Image Edit |
| Up to 10 reference images | GPT Image Edit |
| Batch up to 20 images consistently | Nano Banana Edit |
| Single-shot precise local edit, source-fidelity-first | Flux Kontext |
| Generate from scratch with GPT Image 2 | sibling gpt-image-2 skill |
| Batch SKU galleries with stable identity | Nano Banana Edit |
Prerequisites
- RunComfy CLI —
npm i -g @runcomfy/cli - RunComfy account —
runcomfy loginopens a browser device-code flow. - CI / containers — set
RUNCOMFY_TOKEN=<token>instead ofruncomfy login.
Endpoints + input schema
openai/gpt-image-2/edit
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Edit instruction. Lead with preservation, end with the change. |
images | string[] | yes | — | Up to 10 publicly-fetchable HTTPS URLs. First is primary; rest are auxiliary. |
size | enum | no | auto | auto (preserve input), 1024_1024 (1:1), 1024_1536 (2:3 portrait), 1536_1024 (3:2 landscape). |
size=auto preserves the input ratio — strongly recommended unless the edit explicitly changes framing.
How to invoke
Single-ref preservation edit:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the person'\''s face, pose, and brand mark unchanged. Replace the background with a soft warm-grey studio sweep and a gentle floor shadow.",
"images": ["https://.../portrait.jpg"]
}' \
--output-dir <absolute/path>
Multilingual text rewrite (preserve everything except the headline):
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Keep the photograph, layout, and brand mark exactly as in the input. Replace only the in-image headline. The new headline reads \"今日のおすすめ\" in bold Japanese kana, same position and font weight as before.",
"images": ["https://.../poster-en.jpg"]
}' \
--output-dir <absolute/path>
Multi-ref composition:
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "Compose subject from image 1 into the room from image 2. Match the lighting and color palette of image 2. Keep image 1 subject identity (face, pose, clothing) unchanged.",
"images": ["https://.../subject.jpg", "https://.../room.jpg"]
}' \
--output-dir <absolute/path>
Prompting — what actually works
Lead with preservation goals. Always: "Keep [face / pose / clothing / brand / framing] unchanged." Then state the change. The model honors what's stated up front.
Multilingual text — quote the characters, name the script. "the headline reads \"コーヒー\" in bold Japanese kana", "the label says \"АРОМА\" in Cyrillic, white on black", "the right-margin caption reads \"تخفيض\" in Arabic right-to-left". Don't paraphrase — quote.
Directional language for spatial edits. Concrete spatial scopes work: "move the headline from top-right to bottom-center", "remove the leftmost object only", "replace the watermark in the bottom-right corner".
Multi-ref numbering. When passing multiple images, refer to them by number: "subject from image 1, lighting from image 2, color palette from image 3". The model routes cues correctly.
Use size: "auto" to preserve input ratio. Only override when the edit explicitly changes framing (e.g. cropping a 16:9 to 1:1).
Anti-patterns:
- Long compound edit instructions ("change A and B and C and D") → drift increases per added scope.
- Missing preservation goals → model subtly rewrites the face / brand / framing.
- Paraphrasing in-image text instead of quoting it → text comes out different.
- Asking for
sizeoutside the 3 fixed values +auto→ 422.
Where it shines
| Use case | Why GPT Image Edit |
|---|---|
| Multilingual ad localization | One source asset → many language variants of the same headline |
| Brand-safe headline / CTA swaps | Layout precision + preservation language hold the rest stable |
| Multi-ref composition (subject from one, scene from another) | Numbered refs route cues correctly |
| Layout-precise repositioning | Directional language ("top-right to bottom-center") honored |
| Identity preservation across signage edits | Strongest in class for face / brand preservation through targeted edits |
Sample prompts (verified to produce strong results)
Background swap with full preservation (page example):
Turn the background into a bright minimal white-to-soft-gray studio
sweep with gentle floor shadow; add a large headline in-image that
reads "OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged
Multilingual variant:
Keep the photograph, layout, lighting, and brand mark exactly as in the
input. Replace only the in-image headline.
The new headline reads "コーヒー" in bold Japanese kana, same position
and font weight as before.
Multi-ref composition:
Compose subject from image 1 into the kitchen from image 2.
Match the warm window light and color palette of image 2.
Keep subject identity (face, pose, clothing) from image 1 unchanged.
Limitations
size: 3 fixed values +auto— anything else 422s.images: up to 10 — first is primary, rest are auxiliary cues.- Long compound prompts drift — split into multiple passes when needed.
- For batch consistency across many SKU images, Nano Banana Edit (up to 20) is better.
- Photorealism on portraits — Nano Banana Pro wins head-to-head.
Exit codes
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
How it works
The skill invokes runcomfy run openai/gpt-image-2/edit with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/edit, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.
Security & Privacy
- Token storage:
runcomfy loginwrites the API token to~/.config/runcomfy/token.jsonwith mode 0600 (owner-only read/write). SetRUNCOMFY_TOKENenv var to bypass the file entirely in CI / containers. - Input boundary: the user prompt is passed as a JSON string to the CLI via
--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content. - Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
- Outbound endpoints: only
model-api.runcomfy.net(request submission) and*.runcomfy.net/*.runcomfy.com(download whitelist for generated outputs). No telemetry, no callbacks. - Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
Related skills
More from agentspace-so/runcomfy-agent-skills and the wider catalog.
video-edit
>
image-to-video
>
nano-banana-2
>
image-edit
>
nano-banana-edit
>
flux-kontext
>