baoyu-danger-gemini-web
jimliu/baoyu-skills
Generate images and text via reverse-engineered Gemini Web API with vision and multi-turn support.
What is baoyu-danger-gemini-web?
Provides text and image generation through Gemini Web API without official SDK. Supports vision input via reference images, multi-turn conversations with session persistence, and multiple model variants. Use when you need Gemini-powered generation as a backend or when users explicitly request Gemini text/image generation.
- Generate text using Gemini models (Pro, Flash, Flash-Thinking, 3.1 Pro preview)
- Generate images from text prompts with optional reference images
- Accept reference images for vision-based analysis and generation
- Maintain multi-turn conversations with session persistence
- Output results as JSON or raw text/image files
How to install baoyu-danger-gemini-web
npx skills add https://github.com/jimliu/baoyu-skills --skill baoyu-danger-gemini-web- bun or npx installed
- Google account for browser-based authentication (first run opens auth flow)
- Chrome, Chromium, Edge, or Chrome Canary/Beta for cookie-based session management
How to use baoyu-danger-gemini-web
- 1.Run the skill with a text prompt: `${BUN_X} {baseDir}/scripts/main.ts "Your prompt"`
- 2.For image generation, add `--image output.png` flag to save generated image
- 3.For vision input, provide `--reference image.png` to analyze or use as generation reference
- 4.For multi-turn conversations, use `--sessionId session-name` to persist context across calls
- 5.Check consent file on first use; accept disclaimer to proceed with reverse-engineered API
Use cases
- Generate images on demand when user requests 'create image with Gemini'
- Provide vision capabilities by analyzing reference images alongside generation
- Build multi-turn chatbots that remember context across separate invocations
- Serve as image generation backend for other skills that need it
- Perform text generation when official Gemini API is unavailable
- Developers needing Gemini image/text generation without official SDK
- Agents requiring vision-capable AI backends
- Users wanting multi-turn Gemini conversations with session memory
- Projects using reverse-engineered APIs as fallback generation sources
baoyu-danger-gemini-web FAQ
First run opens a browser for Google authentication. Cookies are cached automatically. Use `--login` to refresh, or set `GEMINI_WEB_CHROME_PROFILE_DIR` to use a dedicated Chrome profile.
Yes. Use `--prompt "description" --image output.png` to generate and save images without text output.
Pass `--sessionId session-name` with each call. The skill maintains conversation state in `sessions/<id>.json`, allowing context to persist across separate invocations.
gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, and gemini-3.1-pro-preview. Specify with `--model model-name`.
No. This uses a reverse-engineered Gemini Web API. You must accept a disclaimer on first use acknowledging this is unofficial.
Full instructions (SKILL.md)
Source of truth, from jimliu/baoyu-skills.
name: baoyu-danger-gemini-web description: Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation. version: 1.56.2 metadata: openclaw: homepage: https://github.com/JimLiu/baoyu-skills#baoyu-danger-gemini-web requires: anyBins: - bun - npx
Gemini Web Client
Text/image generation via Gemini Web API. Supports reference images and multi-turn conversations.
User Input Tools
When this skill prompts the user, follow this tool-selection rule (priority order):
- Prefer built-in user-input tools exposed by the current agent runtime — e.g.,
AskUserQuestion,request_user_input,clarify,ask_user, or any equivalent. - Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
- Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.
Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.
Script Directory
Important: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
- Determine this SKILL.md file's directory path as
{baseDir} - Script path =
{baseDir}/scripts/<script-name>.ts - Resolve
${BUN_X}runtime: ifbuninstalled →bun; ifnpxavailable →npx -y bun; else suggest installing bun - Replace all
{baseDir}and${BUN_X}in this document with actual values
Script Reference:
| Script | Purpose |
|---|---|
scripts/main.ts | CLI entry point for text/image generation |
scripts/gemini-webapi/* | TypeScript port of gemini_webapi (GeminiClient, types, utils) |
Consent Check (REQUIRED)
Before first use, verify user consent for reverse-engineered API usage.
Consent file locations:
- macOS:
~/Library/Application Support/baoyu-skills/gemini-web/consent.json - Linux:
~/.local/share/baoyu-skills/gemini-web/consent.json - Windows:
%APPDATA%\baoyu-skills\gemini-web\consent.json
Flow:
- Check if consent file exists with
accepted: trueanddisclaimerVersion: "1.0" - If valid consent exists → print warning with
acceptedAtdate, proceed - If no consent → show disclaimer, ask user via
AskUserQuestion:- "Yes, I accept" → create consent file with ISO timestamp, proceed
- "No, I decline" → output decline message, stop
- Consent file format:
{"version":1,"accepted":true,"acceptedAt":"<ISO>","disclaimerVersion":"1.0"}
Preferences (EXTEND.md)
Check EXTEND.md in priority order — the first one found wins:
| Priority | Path | Scope |
|---|---|---|
| 1 | .baoyu-skills/baoyu-danger-gemini-web/EXTEND.md | Project |
| 2 | ${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-danger-gemini-web/EXTEND.md | XDG |
| 3 | $HOME/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md | User home |
If none found, use defaults.
EXTEND.md supports: Default model, proxy settings, custom data directory.
Usage
# Text generation
${BUN_X} {baseDir}/scripts/main.ts "Your prompt"
${BUN_X} {baseDir}/scripts/main.ts --prompt "Your prompt" --model gemini-3-flash
# Image generation
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cute cat" --image cat.png
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
# Vision input (reference images)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Describe this" --reference image.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Create variation" --reference a.png --image out.png
# Multi-turn conversation
${BUN_X} {baseDir}/scripts/main.ts "Remember: 42" --sessionId session-abc
${BUN_X} {baseDir}/scripts/main.ts "What number?" --sessionId session-abc
# JSON output
${BUN_X} {baseDir}/scripts/main.ts "Hello" --json
Options
| Option | Description |
|---|---|
--prompt, -p | Prompt text |
--promptfiles | Read prompt from files (concatenated) |
--model, -m | Model: gemini-3-pro (default), gemini-3-flash, gemini-3-flash-thinking, gemini-3.1-pro-preview |
--image [path] | Generate image (default: generated.png) |
--reference, --ref | Reference images for vision input |
--sessionId | Session ID for multi-turn conversation |
--list-sessions | List saved sessions |
--json | Output as JSON |
--login | Refresh cookies, then exit |
--cookie-path | Custom cookie file path |
--profile-dir | Chrome profile directory |
Models
| Model | Description |
|---|---|
gemini-3-pro | Default, latest 3.0 Pro |
gemini-3-flash | Fast, lightweight 3.0 Flash |
gemini-3-flash-thinking | 3.0 Flash with thinking |
gemini-3.1-pro-preview | 3.1 Pro preview (empty header, auto-routed) |
Authentication
First run opens browser for Google auth. Cookies cached automatically.
When no explicit profile dir is set, cookie refresh may reuse an already-running local Chrome/Chromium debugging session tied to a standard user-data dir.
Set --profile-dir or GEMINI_WEB_CHROME_PROFILE_DIR to force a dedicated profile and skip existing-session reuse.
This is a best-effort CDP session reuse path, not the Chrome DevTools MCP prompt-based --autoConnect flow described in Chrome's official docs.
Supported browsers (auto-detected): Chrome, Chrome Canary/Beta, Chromium, Edge.
Force refresh: --login flag. Override browser: GEMINI_WEB_CHROME_PATH env var.
Environment Variables
| Variable | Description |
|---|---|
GEMINI_WEB_DATA_DIR | Data directory |
GEMINI_WEB_COOKIE_PATH | Cookie file path |
GEMINI_WEB_CHROME_PROFILE_DIR | Chrome profile directory |
GEMINI_WEB_CHROME_PATH | Chrome executable path |
HTTP_PROXY, HTTPS_PROXY | Proxy for Google access (set inline with command) |
Sessions
Session files stored in data directory under sessions/<id>.json.
Contains: id, metadata (Gemini chat state), messages array, timestamps.
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
Related skills
More from jimliu/baoyu-skills and the wider catalog.
baoyu-post-to-wechat
Post articles and image-text content to WeChat Official Accounts via API or browser automation.
baoyu-image-gen
Multi-provider AI image generation with text-to-image, reference images, batch processing, and aspect ratio control.
baoyu-markdown-to-html
Convert Markdown to styled HTML with WeChat-compatible themes, code highlighting, math, and Mermaid diagrams.
baoyu-infographic
Generate professional infographics with 21 layouts and 22 styles—analyze content and produce publication-ready visuals.
baoyu-cover-image
Generate customizable article cover images with 5 dimensions, 11 color palettes, and 7 rendering styles.
baoyu-article-illustrator
Analyze articles and generate illustrated images with consistent Type × Style × Palette approach.