Skill

Review

Audit score 70

baoyu-danger-gemini-web

jimliu/baoyu-skills

Generate images and text via reverse-engineered Gemini Web API with vision and multi-turn support.

What is baoyu-danger-gemini-web?

Provides text and image generation through Gemini Web API without official SDK. Supports vision input via reference images, multi-turn conversations with session persistence, and multiple model variants. Use when you need Gemini-powered generation as a backend or when users explicitly request Gemini text/image generation.

Generate text using Gemini models (Pro, Flash, Flash-Thinking, 3.1 Pro preview)
Generate images from text prompts with optional reference images
Accept reference images for vision-based analysis and generation
Maintain multi-turn conversations with session persistence
Output results as JSON or raw text/image files

How to install baoyu-danger-gemini-web

npx skills add https://github.com/jimliu/baoyu-skills --skill baoyu-danger-gemini-web

Prerequisites

bun or npx installed
Google account for browser-based authentication (first run opens auth flow)
Chrome, Chromium, Edge, or Chrome Canary/Beta for cookie-based session management

Claude Code

Cursor

Windsurf

Cline

How to use baoyu-danger-gemini-web

1.Run the skill with a text prompt: `${BUN_X} {baseDir}/scripts/main.ts "Your prompt"`
2.For image generation, add `--image output.png` flag to save generated image
3.For vision input, provide `--reference image.png` to analyze or use as generation reference
4.For multi-turn conversations, use `--sessionId session-name` to persist context across calls
5.Check consent file on first use; accept disclaimer to proceed with reverse-engineered API

Use cases

Good for

Generate images on demand when user requests 'create image with Gemini'
Provide vision capabilities by analyzing reference images alongside generation
Build multi-turn chatbots that remember context across separate invocations
Serve as image generation backend for other skills that need it
Perform text generation when official Gemini API is unavailable

Who it's for

Developers needing Gemini image/text generation without official SDK
Agents requiring vision-capable AI backends
Users wanting multi-turn Gemini conversations with session memory
Projects using reverse-engineered APIs as fallback generation sources

baoyu-danger-gemini-web FAQ

What authentication is required?

First run opens a browser for Google authentication. Cookies are cached automatically. Use `--login` to refresh, or set `GEMINI_WEB_CHROME_PROFILE_DIR` to use a dedicated Chrome profile.

Can I use this for image generation only?

Yes. Use `--prompt "description" --image output.png` to generate and save images without text output.

How do multi-turn conversations work?

Pass `--sessionId session-name` with each call. The skill maintains conversation state in `sessions/<id>.json`, allowing context to persist across separate invocations.

What models are available?

gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, and gemini-3.1-pro-preview. Specify with `--model model-name`.

Is this officially supported by Google?

No. This uses a reverse-engineered Gemini Web API. You must accept a disclaimer on first use acknowledging this is unofficial.

Full instructions (SKILL.md)

Source of truth, from jimliu/baoyu-skills.

name: baoyu-danger-gemini-web description: Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation. version: 1.56.2 metadata: openclaw: homepage: https://github.com/JimLiu/baoyu-skills#baoyu-danger-gemini-web requires: anyBins: - bun - npx

Gemini Web Client

Text/image generation via Gemini Web API. Supports reference images and multi-turn conversations.

User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

Prefer built-in user-input tools exposed by the current agent runtime — e.g., AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent.
Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.

Script Directory

Important: All scripts are located in the scripts/ subdirectory of this skill.

Agent Execution Instructions:

Determine this SKILL.md file's directory path as {baseDir}
Script path = {baseDir}/scripts/<script-name>.ts
Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun
Replace all {baseDir} and ${BUN_X} in this document with actual values

Script Reference:

Script	Purpose
`scripts/main.ts`	CLI entry point for text/image generation
`scripts/gemini-webapi/*`	TypeScript port of `gemini_webapi` (GeminiClient, types, utils)

Consent Check (REQUIRED)

Before first use, verify user consent for reverse-engineered API usage.

Consent file locations:

macOS: ~/Library/Application Support/baoyu-skills/gemini-web/consent.json
Linux: ~/.local/share/baoyu-skills/gemini-web/consent.json
Windows: %APPDATA%\baoyu-skills\gemini-web\consent.json

Flow:

Check if consent file exists with accepted: true and disclaimerVersion: "1.0"
If valid consent exists → print warning with acceptedAt date, proceed
If no consent → show disclaimer, ask user via AskUserQuestion:
- "Yes, I accept" → create consent file with ISO timestamp, proceed
- "No, I decline" → output decline message, stop
Consent file format: {"version":1,"accepted":true,"acceptedAt":"<ISO>","disclaimerVersion":"1.0"}

Preferences (EXTEND.md)

Check EXTEND.md in priority order — the first one found wins:

Priority	Path	Scope
1	`.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md`	Project
2	`${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-danger-gemini-web/EXTEND.md`	XDG
3	`$HOME/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md`	User home

If none found, use defaults.

EXTEND.md supports: Default model, proxy settings, custom data directory.

Usage

# Text generation
${BUN_X} {baseDir}/scripts/main.ts "Your prompt"
${BUN_X} {baseDir}/scripts/main.ts --prompt "Your prompt" --model gemini-3-flash

# Image generation
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cute cat" --image cat.png
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# Vision input (reference images)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Describe this" --reference image.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Create variation" --reference a.png --image out.png

# Multi-turn conversation
${BUN_X} {baseDir}/scripts/main.ts "Remember: 42" --sessionId session-abc
${BUN_X} {baseDir}/scripts/main.ts "What number?" --sessionId session-abc

# JSON output
${BUN_X} {baseDir}/scripts/main.ts "Hello" --json

Options

Option	Description
`--prompt`, `-p`	Prompt text
`--promptfiles`	Read prompt from files (concatenated)
`--model`, `-m`	Model: gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, gemini-3.1-pro-preview
`--image [path]`	Generate image (default: generated.png)
`--reference`, `--ref`	Reference images for vision input
`--sessionId`	Session ID for multi-turn conversation
`--list-sessions`	List saved sessions
`--json`	Output as JSON
`--login`	Refresh cookies, then exit
`--cookie-path`	Custom cookie file path
`--profile-dir`	Chrome profile directory

Models

Model	Description
`gemini-3-pro`	Default, latest 3.0 Pro
`gemini-3-flash`	Fast, lightweight 3.0 Flash
`gemini-3-flash-thinking`	3.0 Flash with thinking
`gemini-3.1-pro-preview`	3.1 Pro preview (empty header, auto-routed)

Authentication

First run opens browser for Google auth. Cookies cached automatically.

When no explicit profile dir is set, cookie refresh may reuse an already-running local Chrome/Chromium debugging session tied to a standard user-data dir. Set --profile-dir or GEMINI_WEB_CHROME_PROFILE_DIR to force a dedicated profile and skip existing-session reuse. This is a best-effort CDP session reuse path, not the Chrome DevTools MCP prompt-based --autoConnect flow described in Chrome's official docs.

Supported browsers (auto-detected): Chrome, Chrome Canary/Beta, Chromium, Edge.

Force refresh: --login flag. Override browser: GEMINI_WEB_CHROME_PATH env var.

Environment Variables

Variable	Description
`GEMINI_WEB_DATA_DIR`	Data directory
`GEMINI_WEB_COOKIE_PATH`	Cookie file path
`GEMINI_WEB_CHROME_PROFILE_DIR`	Chrome profile directory
`GEMINI_WEB_CHROME_PATH`	Chrome executable path
`HTTP_PROXY`, `HTTPS_PROXY`	Proxy for Google access (set inline with command)

Sessions

Session files stored in data directory under sessions/<id>.json.

Contains: id, metadata (Gemini chat state), messages array, timestamps.

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

Related skills

More from jimliu/baoyu-skills and the wider catalog.

baoyu-post-to-wechat

jimliu/baoyu-skills

Post articles and image-text content to WeChat Official Accounts via API or browser automation.

30k installs

baoyu-image-gen

jimliu/baoyu-skills

Multi-provider AI image generation with text-to-image, reference images, batch processing, and aspect ratio control.

28k installs

baoyu-markdown-to-html

jimliu/baoyu-skills

Convert Markdown to styled HTML with WeChat-compatible themes, code highlighting, math, and Mermaid diagrams.

28k installs

baoyu-infographic

jimliu/baoyu-skills

Generate professional infographics with 21 layouts and 22 styles—analyze content and produce publication-ready visuals.

28k installsAudited

baoyu-cover-image

jimliu/baoyu-skills

Generate customizable article cover images with 5 dimensions, 11 color palettes, and 7 rendering styles.

27k installs

baoyu-article-illustrator

jimliu/baoyu-skills

Analyze articles and generate illustrated images with consistent Type × Style × Palette approach.

27k installs