PluginBench
Skill
Review
Audit score 70

baoyu-danger-gemini-web

jimliu/baoyu-skills

Generate images and text via reverse-engineered Gemini Web API with vision and multi-turn support.

What is baoyu-danger-gemini-web?

Provides text and image generation through Gemini Web API without official SDK. Supports vision input via reference images, multi-turn conversations with session persistence, and multiple model variants. Use when you need Gemini-powered generation as a backend or when users explicitly request Gemini text/image generation.

  • Generate text using Gemini models (Pro, Flash, Flash-Thinking, 3.1 Pro preview)
  • Generate images from text prompts with optional reference images
  • Accept reference images for vision-based analysis and generation
  • Maintain multi-turn conversations with session persistence
  • Output results as JSON or raw text/image files

How to install baoyu-danger-gemini-web

npx skills add https://github.com/jimliu/baoyu-skills --skill baoyu-danger-gemini-web
Prerequisites
  • bun or npx installed
  • Google account for browser-based authentication (first run opens auth flow)
  • Chrome, Chromium, Edge, or Chrome Canary/Beta for cookie-based session management
Claude Code
Cursor
Windsurf
Cline

How to use baoyu-danger-gemini-web

  1. 1.Run the skill with a text prompt: `${BUN_X} {baseDir}/scripts/main.ts "Your prompt"`
  2. 2.For image generation, add `--image output.png` flag to save generated image
  3. 3.For vision input, provide `--reference image.png` to analyze or use as generation reference
  4. 4.For multi-turn conversations, use `--sessionId session-name` to persist context across calls
  5. 5.Check consent file on first use; accept disclaimer to proceed with reverse-engineered API

Use cases

Good for
  • Generate images on demand when user requests 'create image with Gemini'
  • Provide vision capabilities by analyzing reference images alongside generation
  • Build multi-turn chatbots that remember context across separate invocations
  • Serve as image generation backend for other skills that need it
  • Perform text generation when official Gemini API is unavailable
Who it's for
  • Developers needing Gemini image/text generation without official SDK
  • Agents requiring vision-capable AI backends
  • Users wanting multi-turn Gemini conversations with session memory
  • Projects using reverse-engineered APIs as fallback generation sources

baoyu-danger-gemini-web FAQ

What authentication is required?

First run opens a browser for Google authentication. Cookies are cached automatically. Use `--login` to refresh, or set `GEMINI_WEB_CHROME_PROFILE_DIR` to use a dedicated Chrome profile.

Can I use this for image generation only?

Yes. Use `--prompt "description" --image output.png` to generate and save images without text output.

How do multi-turn conversations work?

Pass `--sessionId session-name` with each call. The skill maintains conversation state in `sessions/<id>.json`, allowing context to persist across separate invocations.

What models are available?

gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, and gemini-3.1-pro-preview. Specify with `--model model-name`.

Is this officially supported by Google?

No. This uses a reverse-engineered Gemini Web API. You must accept a disclaimer on first use acknowledging this is unofficial.

Full instructions (SKILL.md)

Source of truth, from jimliu/baoyu-skills.


name: baoyu-danger-gemini-web description: Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation. version: 1.56.2 metadata: openclaw: homepage: https://github.com/JimLiu/baoyu-skills#baoyu-danger-gemini-web requires: anyBins: - bun - npx

Gemini Web Client

Text/image generation via Gemini Web API. Supports reference images and multi-turn conversations.

User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

  1. Prefer built-in user-input tools exposed by the current agent runtime — e.g., AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent.
  2. Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
  3. Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.

Script Directory

Important: All scripts are located in the scripts/ subdirectory of this skill.

Agent Execution Instructions:

  1. Determine this SKILL.md file's directory path as {baseDir}
  2. Script path = {baseDir}/scripts/<script-name>.ts
  3. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun
  4. Replace all {baseDir} and ${BUN_X} in this document with actual values

Script Reference:

ScriptPurpose
scripts/main.tsCLI entry point for text/image generation
scripts/gemini-webapi/*TypeScript port of gemini_webapi (GeminiClient, types, utils)

Consent Check (REQUIRED)

Before first use, verify user consent for reverse-engineered API usage.

Consent file locations:

  • macOS: ~/Library/Application Support/baoyu-skills/gemini-web/consent.json
  • Linux: ~/.local/share/baoyu-skills/gemini-web/consent.json
  • Windows: %APPDATA%\baoyu-skills\gemini-web\consent.json

Flow:

  1. Check if consent file exists with accepted: true and disclaimerVersion: "1.0"
  2. If valid consent exists → print warning with acceptedAt date, proceed
  3. If no consent → show disclaimer, ask user via AskUserQuestion:
    • "Yes, I accept" → create consent file with ISO timestamp, proceed
    • "No, I decline" → output decline message, stop
  4. Consent file format: {"version":1,"accepted":true,"acceptedAt":"<ISO>","disclaimerVersion":"1.0"}

Preferences (EXTEND.md)

Check EXTEND.md in priority order — the first one found wins:

PriorityPathScope
1.baoyu-skills/baoyu-danger-gemini-web/EXTEND.mdProject
2${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-danger-gemini-web/EXTEND.mdXDG
3$HOME/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.mdUser home

If none found, use defaults.

EXTEND.md supports: Default model, proxy settings, custom data directory.

Usage

# Text generation
${BUN_X} {baseDir}/scripts/main.ts "Your prompt"
${BUN_X} {baseDir}/scripts/main.ts --prompt "Your prompt" --model gemini-3-flash

# Image generation
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cute cat" --image cat.png
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# Vision input (reference images)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Describe this" --reference image.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Create variation" --reference a.png --image out.png

# Multi-turn conversation
${BUN_X} {baseDir}/scripts/main.ts "Remember: 42" --sessionId session-abc
${BUN_X} {baseDir}/scripts/main.ts "What number?" --sessionId session-abc

# JSON output
${BUN_X} {baseDir}/scripts/main.ts "Hello" --json

Options

OptionDescription
--prompt, -pPrompt text
--promptfilesRead prompt from files (concatenated)
--model, -mModel: gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, gemini-3.1-pro-preview
--image [path]Generate image (default: generated.png)
--reference, --refReference images for vision input
--sessionIdSession ID for multi-turn conversation
--list-sessionsList saved sessions
--jsonOutput as JSON
--loginRefresh cookies, then exit
--cookie-pathCustom cookie file path
--profile-dirChrome profile directory

Models

ModelDescription
gemini-3-proDefault, latest 3.0 Pro
gemini-3-flashFast, lightweight 3.0 Flash
gemini-3-flash-thinking3.0 Flash with thinking
gemini-3.1-pro-preview3.1 Pro preview (empty header, auto-routed)

Authentication

First run opens browser for Google auth. Cookies cached automatically.

When no explicit profile dir is set, cookie refresh may reuse an already-running local Chrome/Chromium debugging session tied to a standard user-data dir. Set --profile-dir or GEMINI_WEB_CHROME_PROFILE_DIR to force a dedicated profile and skip existing-session reuse. This is a best-effort CDP session reuse path, not the Chrome DevTools MCP prompt-based --autoConnect flow described in Chrome's official docs.

Supported browsers (auto-detected): Chrome, Chrome Canary/Beta, Chromium, Edge.

Force refresh: --login flag. Override browser: GEMINI_WEB_CHROME_PATH env var.

Environment Variables

VariableDescription
GEMINI_WEB_DATA_DIRData directory
GEMINI_WEB_COOKIE_PATHCookie file path
GEMINI_WEB_CHROME_PROFILE_DIRChrome profile directory
GEMINI_WEB_CHROME_PATHChrome executable path
HTTP_PROXY, HTTPS_PROXYProxy for Google access (set inline with command)

Sessions

Session files stored in data directory under sessions/<id>.json.

Contains: id, metadata (Gemini chat state), messages array, timestamps.

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.