firecrawl-crawl
firecrawl/cli
Bulk extract content from entire websites or site sections with depth and path filtering.
What is firecrawl-crawl?
Crawls a website following links up to specified depth or page limits, extracting content from multiple pages. Use this when you need to extract all pages from a docs section, bulk-scrape a site, or gather content across many linked pages on the same domain.
- Crawl websites following links up to configurable depth limits
- Filter crawls by path patterns (include/exclude specific URL paths)
- Limit crawl scope by maximum page count or depth
- Control concurrency and request delays for rate limiting
- Output extracted content to JSON files
- Check status of running crawl jobs asynchronously
How to install firecrawl-crawl
npx skills add null --skill firecrawl-crawl- Firecrawl CLI installed and configured
- Valid Firecrawl API key set in environment
- Sufficient Firecrawl credits for the crawl scope
How to use firecrawl-crawl
- 1.Run firecrawl crawl with a starting URL and desired scope options
- 2.Use --include-paths to limit crawl to specific URL patterns (e.g., /docs)
- 3.Set --limit or --max-depth to control crawl size
- 4.Add --wait and --progress flags to block until crawl completes
- 5.Specify --output to save results to a JSON file
- 6.Optionally use --pretty to format output for readability
Use cases
- Extract all pages from a documentation section (e.g., /docs/)
- Bulk scrape an entire website section following internal links
- Gather content from multiple related pages on the same domain
- Discover and extract all pages matching specific URL patterns
- Create a snapshot of website content for analysis or backup
- Developers building content extraction pipelines
- Documentation researchers needing bulk page extraction
- Data engineers scraping structured website content
- AI agents automating multi-page content gathering
firecrawl-crawl FAQ
Use --wait when you need results immediately. Without it, the command returns a job ID that you can poll later with firecrawl crawl <job-id>.
Use --include-paths to scope the crawl to specific URL patterns, e.g., --include-paths /docs limits crawling to paths under /docs/.
scrape extracts a single page; map discovers all URLs on a site; crawl follows links and extracts multiple pages up to depth/limit. Use map first to understand site structure before crawling.
Use --delay to add millisecond pauses between requests, --max-concurrency to limit parallel workers, and check firecrawl credit-usage before large crawls.
Yes, run firecrawl crawl <job-id> with the job ID returned from an async crawl to check status and retrieve results.
Full instructions (SKILL.md)
Source of truth, from firecrawl/cli.
name: firecrawl-crawl description: | Bulk extract content from an entire website or site section. Use this skill when the user wants to crawl a site, extract all pages from a docs section, bulk-scrape multiple pages following links, or says "crawl", "get all the pages", "extract everything under /docs", "bulk extract", or needs content from many pages on the same site. Handles depth limits, path filtering, and concurrent extraction. allowed-tools:
- Bash(firecrawl *)
- Bash(npx firecrawl *)
firecrawl crawl
Bulk extract content from a website. Crawls pages following links up to a depth/limit.
When to use
- You need content from many pages on a site (e.g., all
/docs/) - You want to extract an entire site section
- Step 4 in the workflow escalation pattern: search → scrape → map → crawl → interact
Quick start
# Crawl a docs section
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json
# Full crawl with depth limit
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json
# Check status of a running crawl
firecrawl crawl <job-id>
Options
| Option | Description |
|---|---|
--wait | Wait for crawl to complete before returning |
--progress | Show progress while waiting |
--limit <n> | Max pages to crawl |
--max-depth <n> | Max link depth to follow |
--include-paths <paths> | Only crawl URLs matching these paths |
--exclude-paths <paths> | Skip URLs matching these paths |
--delay <ms> | Delay between requests |
--max-concurrency <n> | Max parallel crawl workers |
--pretty | Pretty print JSON output |
-o, --output <path> | Output file path |
Tips
- Always use
--waitwhen you need the results immediately. Without it, crawl returns a job ID for async polling. - Use
--include-pathsto scope the crawl — don't crawl an entire site when you only need one section. - Crawl consumes credits per page. Check
firecrawl credit-usagebefore large crawls.
See also
- firecrawl-scrape — scrape individual pages
- firecrawl-map — discover URLs before deciding to crawl
- firecrawl-download — download site to local files (uses map + scrape)
Related skills
More from firecrawl/cli and the wider catalog.
firecrawl
Search, scrape, and interact with the web via Firecrawl CLI—real-time content extraction and monitoring.
firecrawl-scrape
Extract clean markdown from any URL, including JavaScript-rendered pages.
firecrawl-search
Web search with full page content extraction—find articles, research topics, and discover sources beyond snippets.
firecrawl-agent
AI-powered autonomous data extraction from complex websites, returning structured JSON.
firecrawl-map
Discover and list all URLs on a website with optional search filtering.
firecrawl-download
Download entire websites as local markdown, screenshots, or multiple formats organized in directories.