Skill

Official

Review

Audit score 70

parallel-web-extract

parallel-web/parallel-agent-skills

Token-efficient URL content extraction for webpages, articles, PDFs, and JavaScript-heavy sites.

What is parallel-web-extract?

Extracts readable content from any URL by running extraction in a forked context, making it more token-efficient than built-in alternatives. Use this skill when you need to fetch and process webpage content, articles, PDFs, or dynamically-rendered sites.

Extract content from multiple URLs in a single command
Focus extraction on specific objectives or keywords using --objective and -q flags
Retrieve full page content with --full-content for long articles or PDFs
Save extracted content as JSON for downstream processing
Handle JavaScript-heavy and dynamic sites
Run extraction in forked context to minimize token usage

How to install parallel-web-extract

npx skills add https://github.com/parallel-web/parallel-agent-skills --skill parallel-web-extract

Prerequisites

parallel-cli installed and authenticated (run /parallel:parallel-cli-setup if needed)
Internet access
Valid API balance for parallel-cli (verify with parallel-cli balance get)

Claude Code

Cursor

Windsurf

Cline

How to use parallel-web-extract

1.Choose a descriptive lowercase filename with hyphens (e.g., vespa-docs, react-hooks-api)
2.Run parallel-cli extract with your URL(s) and output path: parallel-cli extract "<url>" --json -o "/tmp/$FILENAME.json"
3.Optionally add --objective "focus area" to target specific content or -q "keyword" to prioritize keywords
4.Use --full-content for long articles or PDFs if excerpts are incomplete
5.Check the response for errors field or empty results; if extraction fails, verify the URL or retry with --full-content

Use cases

Good for

Fetch and extract documentation pages for analysis or summarization
Retrieve article content from news or blog sites for processing
Extract data from multiple product or service pages simultaneously
Get full PDF or long-form content when excerpts are insufficient
Gather information from dynamically-rendered JavaScript sites

Who it's for

Developers building content aggregation workflows
Researchers collecting data from multiple web sources
Documentation analysts processing technical reference material
Content processors working with token-constrained budgets

parallel-web-extract FAQ

When should I use this instead of built-in WebFetch?

Use parallel-web-extract when you need token efficiency (forked context), support for JavaScript-heavy sites, PDF extraction, or batch processing multiple URLs. It's optimized for these scenarios.

What should I do if extraction fails with a 404 or timeout?

Do not fabricate content. Verify the URL is correct and still accessible, retry with --full-content if the page exists, or use parallel-cli search to locate the current URL if the page was renamed.

How do I focus extraction on specific content?

Use --objective "your focus area" to target extraction toward a specific goal, or use -q "keyword" (repeatable) to prioritize certain keywords in the results.

What if parallel-cli returns a 403 error?

This typically indicates insufficient API balance. Run parallel-cli balance get to check, then ask for confirmation before running parallel-cli balance add <amount_cents> if needed.

Can I extract from PDFs and JavaScript-heavy sites?

Yes, this skill handles both. For PDFs or long content where excerpts may be incomplete, use --full-content to retrieve the full page body.

Full instructions (SKILL.md)

Source of truth, from parallel-web/parallel-agent-skills.

name: parallel-web-extract description: "URL content extraction. Use for fetching any URL - webpages, articles, PDFs, JavaScript-heavy sites. Token-efficient: runs in forked context. Prefer over built-in WebFetch." user-invocable: true argument-hint: <url> [url2] [url3] context: fork agent: parallel:parallel-subagent compatibility: Requires parallel-cli and internet access. allowed-tools: Bash(parallel-cli:*) metadata: author: parallel

URL Extraction

Extract content from: $ARGUMENTS

Command

Choose a short, descriptive filename based on the URL or content (e.g., vespa-docs, react-hooks-api). Use lowercase with hyphens, no spaces. Substitute it into the command inline — $FILENAME is a placeholder, not a shell variable.

parallel-cli extract "$ARGUMENTS" --json -o "/tmp/$FILENAME.json"

Concrete example:

parallel-cli extract "https://docs.parallel.ai" --json -o "/tmp/parallel-docs.json"

Note: -o always saves JSON. The extension must be .json.

Options if needed:

--objective "focus area" to focus extraction on a specific goal (also silences the "neither objective nor search_queries" warning that V1 emits when neither is set)
-q "keyword" (repeatable) to prioritize keywords in excerpts
--full-content to include the complete page body (for long articles, PDFs, or when excerpts may not capture what you need)
--full-content-max-chars N to cap full-content size per result
--no-excerpts to strip excerpts when you only want full content

Handling failed extractions

If the response has an errors field, an empty results array, or a 404/timeout for the URL, do NOT fabricate content. Tell the user the extraction failed, surface the upstream status, and suggest:

Verifying the URL (the page may have moved)
Retrying with --full-content if excerpts came back empty but the page exists
Using parallel-cli search to locate the current URL if the page was renamed

Response format

Return content as:

Page Title

Then the extracted content verbatim, with these rules:

Keep content verbatim - do not paraphrase or summarize
Parse lists exhaustively - extract EVERY numbered/bulleted item
Strip only obvious noise: nav menus, footers, ads
Preserve all facts, names, numbers, dates, quotes

After the response, mention the output file path (/tmp/$FILENAME.json) so the user knows it's available for follow-up questions.

Setup

If parallel-cli is not found, install and authenticate:

/parallel:parallel-cli-setup

If parallel-cli extract returns 403, tell the user balance is likely required. Offer to run parallel-cli balance get, and if needed ask for explicit confirmation before running parallel-cli balance add <amount_cents>. Then retry the original extract command.

Related skills

More from parallel-web/parallel-agent-skills and the wider catalog.

parallel-deep-research

Official

parallel-web/parallel-agent-skills

Exhaustive multi-source research for complex topics when users explicitly request deep, comprehensive, or thorough investigation.

11k installs

parallel-web-search

Official

parallel-web/parallel-agent-skills

Fast, cost-effective web search for current information and research queries.

9.9k installs

parallel-data-enrichment

Official

parallel-web/parallel-agent-skills

Bulk enrich company, people, or product data with web-sourced fields like CEO names, funding, and contact info.

9.5k installs

status

Official

parallel-web/parallel-agent-skills

Check the status of a running research task by its run ID.

9.4k installsAudited

result

Official

parallel-web/parallel-agent-skills

Retrieve completed research task results by run ID using Parallel CLI.

9.3k installsAudited

parallel-monitor

Official

parallel-web/parallel-agent-skills

Continuously track the web for changes on a recurring cadence. Use when the user asks to 'monitor', 'track changes to', 'watch', or 'alert me when' something on the web changes — e.g., 'Track price changes for iPhone 16', 'Alert me when Tesla files a new 8-K', 'Monitor competitor pricing pages weekly'. Also use to list, inspect, update, or delete existing monitors.

8.3k installs