gpt-image-2
agentspace-so/runcomfy-agent-skills
Generate and edit images with OpenAI GPT Image 2 on RunComfy—best for embedded text, logos, and directive precision.
What is gpt-image-2?
GPT Image 2 (ChatGPT Images 2.0) hosted on RunComfy's Model API delivers reliable text rendering, multilingual typography, and layout precision. Use it when embedded text, brand assets, product mockups, or iterative refinement with stable composition matter more than stylization.
- Generate images from text prompts with three fixed sizes (1:1, 2:3 portrait, 3:2 landscape)
- Edit images with natural-language instructions while preserving composition and identity
- Render embedded text, logos, and multilingual typography reliably
- Accept up to 10 reference images for multi-image guided edits
- Invoke via local RunComfy CLI with async REST polling
How to install gpt-image-2
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill gpt-image-2- RunComfy CLI: npm i -g @runcomfy/cli
- RunComfy account via runcomfy login (or RUNCOMFY_TOKEN env var for CI)
How to use gpt-image-2
- 1.Install RunComfy CLI and authenticate with runcomfy login
- 2.For text-to-image: call runcomfy run openai/gpt-image-2/text-to-image with prompt and optional size (1024_1024, 1024_1536, or 1536_1024)
- 3.For edits: call runcomfy run openai/gpt-image-2/edit with edit instruction and up to 10 reference image URLs
- 4.Specify --output-dir for downloaded results; CLI polls every 2s until complete
- 5.Use --output json and --no-wait for pipe-friendly integration
Use cases
- E-commerce product photography with accurate label text and brand-safe lighting
- Localized brand assets—one source image to multiple language variants
- Ad creative with integrated headlines and visual elements
- Signage, posters, and packaging mockups with legible text at scale
- UI mockups and scientific illustrations requiring layout precision
- Product and e-commerce teams
- Marketing and creative directors
- Brand and localization specialists
- UI/UX designers and mockup creators
- Anyone needing text-accurate image generation
gpt-image-2 FAQ
Use GPT Image 2 for embedded text, logos, multilingual typography, and directive precision. Use Flux 2 for heavy stylization and painterly looks, Nano Banana Pro for hyperrealistic portraits, and Seedream 5 for cinematic/aesthetic-first hero shots.
Only three fixed sizes: 1024×1024 (1:1 square), 1024×1536 (2:3 portrait), and 1536×1024 (3:2 landscape). Extreme aspect ratios are auto-resized to the nearest supported size.
Quote the exact text in your prompt and keep it short. For multilingual text, name the script (e.g., 'Japanese kana', 'Cyrillic', 'Arabic right-to-left'). GPT Image 2 is the strongest text-rendering model in its class when you provide literal characters.
Yes, the edit endpoint accepts up to 10 reference image URLs. Number them in your prompt (e.g., 'subject from image 1, lighting from image 2') so the model routes cues correctly.
Change one attribute per iteration (lighting OR background OR pose OR text) and keep the rest of the prompt verbatim. The model holds composition stable when only one knob moves.
Full instructions (SKILL.md)
Source of truth, from agentspace-so/runcomfy-agent-skills.
name: gpt-image-2
displayName: "GPT Image 2 — Pro Pack on RunComfy"
description: >
Generate and edit images with OpenAI GPT Image 2 (ChatGPT Images 2.0)
on RunComfy. Documents GPT Image 2's strengths (embedded text, logos,
multilingual typography, instruction precision), its 3 fixed sizes,
edit-with-preservation language, and when to route to a sibling
(Flux 2 / Nano Banana Pro / Seedream) instead. Calls runcomfy run openai/gpt-image-2/text-to-image or /edit through the local
RunComfy CLI. Triggers on "gpt image 2", "gpt-image-2", "ChatGPT
Images 2", "image 2", or any explicit ask to generate or edit with
this model.
homepage: https://www.runcomfy.com
license: MIT
GPT Image 2 — Pro Pack on RunComfy
runcomfy.com · Text-to-image · Edit · GitHub
OpenAI GPT Image 2 (ChatGPT Images 2.0) hosted on the RunComfy Model API — no OpenAI key, async REST.
npx skills add agentspace-so/runcomfy-skills --skill gpt-image-2 -g
When to pick this model (vs siblings)
GPT Image 2's distinct strength is directive precision: it follows multi-element prompts, layout cues, and embedded-text instructions more reliably than its peers. Pick it when what's on the canvas matters more than how stylized it looks.
| You want | Use |
|---|---|
| Embedded text, logos, signage, multilingual typography | GPT Image 2 |
| Brand-safe, e-commerce / ad / UI mockup imagery | GPT Image 2 |
| Iterative refinement that holds composition stable | GPT Image 2 |
| Heavy stylization, painterly look | Flux 2 |
| Hyperrealistic portrait | Nano Banana Pro |
| Cinematic / aesthetic-first hero shots | Seedream 5 |
If the user explicitly asked for GPT Image 2 / ChatGPT Image 2 / Image 2, route here regardless — don't second-guess the model choice.
Prerequisites
- RunComfy CLI —
npm i -g @runcomfy/cli - RunComfy account —
runcomfy loginopens a browser device-code flow. - CI / containers — set
RUNCOMFY_TOKEN=<token>instead ofruncomfy login.
Endpoints + input schema
Two endpoints, same model.
openai/gpt-image-2/text-to-image
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | The positive prompt |
size | enum | no | 1024_1024 | 1024_1024 (1:1), 1024_1536 (2:3 portrait), 1536_1024 (3:2 landscape) — only these three |
openai/gpt-image-2/edit
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
prompt | string | yes | — | Natural-language edit instruction |
images | string[] | yes | — | Up to 10 reference image URLs (publicly fetchable HTTPS) |
size | enum | no | auto | auto (preserve input ratio), or one of the three fixed sizes above |
size=auto on edit preserves the input aspect ratio — strongly recommended unless the edit explicitly changes framing.
How to invoke
Text-to-image:
runcomfy run openai/gpt-image-2/text-to-image \
--input '{"prompt": "<user prompt>", "size": "1024_1536"}' \
--output-dir <absolute/path>
Edit (single ref):
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "<edit instruction>",
"images": ["https://..."]
}' \
--output-dir <absolute/path>
Edit (multi-ref, up to 10):
runcomfy run openai/gpt-image-2/edit \
--input '{
"prompt": "compose subject from image 1 into the room from image 2; match the lighting of image 2",
"images": ["https://...subject.jpg", "https://...room.jpg"]
}' \
--output-dir <absolute/path>
The CLI submits, polls every 2s until terminal, then downloads any *.runcomfy.net / *.runcomfy.com URL from the result into --output-dir. Stdout is the result JSON. Stderr is progress.
For pipe-friendly usage:
runcomfy --output json run openai/gpt-image-2/text-to-image \
--input '{"prompt":"..."}' --no-wait | jq -r .request_id
Prompting — what actually works
These are model-specific patterns that empirically improve output quality. Apply to text-to-image and edit alike.
Be explicit on subject + setting + mood. "A close-up of a matte ceramic water bottle on warm linen, soft window light, neutral background" — three concrete directives — beats "nice product photo of a bottle".
Quote embedded text exactly. Keep it short. GPT Image 2 is the strongest text-rendering model in this class, but only when you put the literal characters in quotes. Long blocks of text degrade. For multilingual text, name the script: "Japanese kana", "Cyrillic", "Arabic right-to-left".
Use compositional cues directly. "rule of thirds", "close-up", "aerial view", "centered subject", "shallow depth of field" — these have learned-meaning to the model.
Iterate one attribute at a time. When refining, change one thing per iteration (lighting OR background OR pose OR text) and keep the rest of the prompt verbatim. The model holds composition stable across iterations when only one knob moves.
Don't conflict instructions. "no text" + "the word 'AQUA+' on the label" is incoherent — the model will pick one and you don't control which.
Don't pile up styles. "ukiyo-e + watercolor + 8K + cinematic + minimalist" cancels out. Pick one or two style anchors max.
For the edit endpoint specifically:
- State preservation goals. "keep the person's pose and face identity unchanged", "keep the brand mark and typography on the package", "keep the overall framing". The model needs to know what NOT to change.
- Use directional language for spatial edits. "Move the headline from top-right to bottom-center", not "reposition the headline".
- Multi-ref: number the images in the prompt — "subject from image 1, lighting and background from image 2" — and the model will route the cues correctly.
Where it shines
| Use case | Why GPT Image 2 |
|---|---|
| E-commerce product photography | Reliable text on labels, brand-safe lighting, consistent across SKUs |
| High-conversion ads | Headline + visual integration in one pass |
| Brand asset localization | One source asset → many language variants of the same headline |
| Signage, posters, packaging mock-ups | Text rendering accuracy at multiple scales |
| UI mockups, scientific illustrations | Layout precision and label legibility |
Sample prompts (verified to produce strong results)
Text-to-image — product hero:
A minimal hero product still life: a matte ceramic water bottle on warm linen,
soft window light, the word "AQUA+" in clean sans-serif on the label,
subtle rim highlights, e-commerce ready, 8K detail, neutral background
Text-to-image — multilingual signage:
A small Tokyo café storefront at dusk, warm interior glow,
the sign reads "コーヒー" in bold Japanese kana on a wooden plaque,
shallow depth of field, rule of thirds, cinematic
Edit — background swap with preservation:
Turn the background into a bright minimal white-to-soft-gray studio sweep
with gentle floor shadow; add a large headline in-image that reads
"OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged
Limitations
- Only 3 fixed sizes on text-to-image (and the same 3 +
autoon edit). Extreme aspect ratios are auto-resized to the nearest supported one. - Prompt length ~ a few thousand tokens. Long blocks of embedded text degrade output.
- Edit's multi-image support is "guidance from up to 10 refs", not ControlNet-style stacks. The first image is treated as the primary; the rest provide auxiliary cues.
- Photorealism on portraits is not its strongest suit — Nano Banana Pro wins that head-to-head.
Exit codes
The runcomfy CLI uses sysexits-style codes:
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch (e.g. size: "2048_2048" would 422) |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
How it works
- The skill invokes
runcomfy run openai/gpt-image-2/<endpoint>with a JSON body matching the schema above. - The CLI POSTs to
https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/<endpoint>with the user's bearer token. - The Model API returns a
request_id; the CLI pollsGET .../requests/<id>/statusevery 2 seconds. - On terminal status, the CLI fetches
GET .../requests/<id>/resultand downloads any URL whose host ends with.runcomfy.netor.runcomfy.cominto--output-dir. Other URLs are listed but not fetched. Ctrl-Cwhile polling sendsPOST .../requests/<id>/cancelso you don't get billed for GPU you stopped.
What this skill is not
Not a direct OpenAI API client. Not a capability grant — depends on a working RunComfy account. Not multi-tenant.
Security & Privacy
- Token storage:
runcomfy loginwrites the API token to~/.config/runcomfy/token.jsonwith mode 0600 (owner-only read/write). SetRUNCOMFY_TOKENenv var to bypass the file entirely in CI / containers. - Input boundary: the user prompt is passed as a JSON string to the CLI via
--input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content. - Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
- Outbound endpoints: only
model-api.runcomfy.net(request submission) and*.runcomfy.net/*.runcomfy.com(download whitelist for generated outputs). No telemetry, no callbacks. - Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
Related skills
More from agentspace-so/runcomfy-agent-skills and the wider catalog.
video-edit
Intent-routed video editing skill: picks Wan 2.7, Kling 2.6, or Lucy Edit based on what you actually want to do.
image-to-video
Animate still images with the right model for your intent—HappyHorse, Wan, or Seedance on RunComfy.
nano-banana-2
Generate images with Google Nano Banana 2 (Gemini flash-tier) via RunComfy CLI — optimized prompting patterns included.
image-edit
Intent-routed image editing: picks the right model (batch, text rewrite, precise local, or inpaint) based on what you ask.
nano-banana-edit
Edit images with Google Nano Banana 2 on RunComfy — batch up to 20 inputs, preserve identity, swap backgrounds, localize edits.
flux-kontext
Edit images precisely with Flux 1 Kontext Pro via RunComfy CLI — single-reference local edits with strong prompt control