PluginBench
Skill
Official
Review
Audit score 70

firecrawl-crawl

firecrawl/cli

Bulk extract content from entire websites or site sections with depth and path filtering.

What is firecrawl-crawl?

Crawls a website following links up to specified depth or page limits, extracting content from multiple pages. Use this when you need to extract all pages from a docs section, bulk-scrape a site, or gather content across many linked pages on the same domain.

  • Crawl websites following links up to configurable depth limits
  • Filter crawls by path patterns (include/exclude specific URL paths)
  • Limit crawl scope by maximum page count or depth
  • Control concurrency and request delays for rate limiting
  • Output extracted content to JSON files
  • Check status of running crawl jobs asynchronously

How to install firecrawl-crawl

npx skills add null --skill firecrawl-crawl
Prerequisites
  • Firecrawl CLI installed and configured
  • Valid Firecrawl API key set in environment
  • Sufficient Firecrawl credits for the crawl scope
Claude Code
Cursor
Windsurf
Cline

How to use firecrawl-crawl

  1. 1.Run firecrawl crawl with a starting URL and desired scope options
  2. 2.Use --include-paths to limit crawl to specific URL patterns (e.g., /docs)
  3. 3.Set --limit or --max-depth to control crawl size
  4. 4.Add --wait and --progress flags to block until crawl completes
  5. 5.Specify --output to save results to a JSON file
  6. 6.Optionally use --pretty to format output for readability

Use cases

Good for
  • Extract all pages from a documentation section (e.g., /docs/)
  • Bulk scrape an entire website section following internal links
  • Gather content from multiple related pages on the same domain
  • Discover and extract all pages matching specific URL patterns
  • Create a snapshot of website content for analysis or backup
Who it's for
  • Developers building content extraction pipelines
  • Documentation researchers needing bulk page extraction
  • Data engineers scraping structured website content
  • AI agents automating multi-page content gathering

firecrawl-crawl FAQ

Should I use --wait or let the crawl run asynchronously?

Use --wait when you need results immediately. Without it, the command returns a job ID that you can poll later with firecrawl crawl <job-id>.

How do I avoid crawling the entire site when I only need one section?

Use --include-paths to scope the crawl to specific URL patterns, e.g., --include-paths /docs limits crawling to paths under /docs/.

What's the difference between crawl, scrape, and map?

scrape extracts a single page; map discovers all URLs on a site; crawl follows links and extracts multiple pages up to depth/limit. Use map first to understand site structure before crawling.

How do I control crawl speed and resource usage?

Use --delay to add millisecond pauses between requests, --max-concurrency to limit parallel workers, and check firecrawl credit-usage before large crawls.

Can I resume or check status of a crawl in progress?

Yes, run firecrawl crawl <job-id> with the job ID returned from an async crawl to check status and retrieve results.

Full instructions (SKILL.md)

Source of truth, from firecrawl/cli.


name: firecrawl-crawl description: | Bulk extract content from an entire website or site section. Use this skill when the user wants to crawl a site, extract all pages from a docs section, bulk-scrape multiple pages following links, or says "crawl", "get all the pages", "extract everything under /docs", "bulk extract", or needs content from many pages on the same site. Handles depth limits, path filtering, and concurrent extraction. allowed-tools:

  • Bash(firecrawl *)
  • Bash(npx firecrawl *)

firecrawl crawl

Bulk extract content from a website. Crawls pages following links up to a depth/limit.

When to use

  • You need content from many pages on a site (e.g., all /docs/)
  • You want to extract an entire site section
  • Step 4 in the workflow escalation pattern: search → scrape → map → crawl → interact

Quick start

# Crawl a docs section
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

# Full crawl with depth limit
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json

# Check status of a running crawl
firecrawl crawl <job-id>

Options

OptionDescription
--waitWait for crawl to complete before returning
--progressShow progress while waiting
--limit <n>Max pages to crawl
--max-depth <n>Max link depth to follow
--include-paths <paths>Only crawl URLs matching these paths
--exclude-paths <paths>Skip URLs matching these paths
--delay <ms>Delay between requests
--max-concurrency <n>Max parallel crawl workers
--prettyPretty print JSON output
-o, --output <path>Output file path

Tips

  • Always use --wait when you need the results immediately. Without it, crawl returns a job ID for async polling.
  • Use --include-paths to scope the crawl — don't crawl an entire site when you only need one section.
  • Crawl consumes credits per page. Check firecrawl credit-usage before large crawls.

See also