firecrawl-parse
firecrawl/cli
Convert local files (PDF, DOCX, XLSX, HTML, etc.) to clean markdown with optional AI summaries.
What is firecrawl-parse?
Extracts and converts local documents into well-formatted markdown saved to disk. Use this when you have a file on your computer that needs parsing, summarizing, or content extraction—ideal for PDFs, Word docs, spreadsheets, and HTML files.
- Parse local files (PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, HTML) into clean markdown
- Generate AI-powered summaries of document content
- Answer specific questions about parsed file content
- Save output to disk to avoid context bloat
- Support for files up to 50 MB
How to install firecrawl-parse
npx skills add null --skill firecrawl-parse- Firecrawl CLI installed (via `npx skills add null --skill firecrawl-parse`)
- Local file path to parse
- `.firecrawl/` directory created (recommended: `mkdir -p .firecrawl`)
- Add `.firecrawl/` to `.gitignore` to avoid committing large parsed files
How to use firecrawl-parse
- 1.Create output directory: `mkdir -p .firecrawl`
- 2.Run parse command with `-o` flag to save to disk: `firecrawl parse ./document.pdf -o .firecrawl/output.md`
- 3.For AI summary, add `-S` flag: `firecrawl parse ./document.pdf -S -o .firecrawl/summary.md`
- 4.To answer a question, use `-Q` flag: `firecrawl parse ./document.pdf -Q "Your question here" -o .firecrawl/qa.md`
- 5.Read output incrementally (e.g., `head`, `grep`, `rg`) rather than loading entire file into context
Use cases
- User uploads a PDF and asks 'what does this say?'—parse it and summarize
- Extract text from a DOCX contract and answer 'what are the payment terms?'
- Convert an XLSX spreadsheet to markdown for analysis
- Parse an HTML document and save structured output for downstream processing
- Batch process multiple documents and check credit usage
- Developers processing user-uploaded documents
- Analysts extracting insights from PDFs and reports
- Teams automating document workflows
- Anyone needing to convert local files to structured markdown
firecrawl-parse FAQ
Use parse for local files on disk (PDF, DOCX, etc.). Use scrape for URLs and web content.
PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, HTML, HTM, and XHTML.
Parsed documents can be hundreds of KB. Saving to disk prevents bloating your context window.
Approximately 1 credit per PDF page; HTML files cost 1 credit flat. Check balance with `firecrawl credit-usage`.
Maximum 50 MB per file.
Full instructions (SKILL.md)
Source of truth, from firecrawl/cli.
name: firecrawl-parse
description: |
Efficiently extract and convert the contents of any local file—such as PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, or HTML—into clean, well-formatted markdown saved to disk. Use this skill whenever the user requests to parse, read, or extract information from a file on their computer, including phrases like “parse this PDF”, “convert this document”, “read this file”, “extract text from”, or when a local file path (not a URL) is provided. This skill offers advanced options like generating AI-powered summaries and answering questions based on the file's content. Prefer this tool over scrape when handling local files to deliver precise, structured outputs for downstream tasks.
allowed-tools:
- Bash(firecrawl *)
- Bash(npx firecrawl *)
firecrawl parse
Turn a local document into clean markdown on disk. Supports PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, HTML/HTM/XHTML.
When to use
- You have a file on disk (not a URL) and want its text as markdown
- User drops a PDF/DOCX and asks what it says, or to summarize it
- Use
scrapeinstead when the source is a URL
Quick start
Always save to .firecrawl/ with -o — parsed docs can be hundreds of KB and blow up context if streamed to stdout. Add .firecrawl/ to .gitignore.
mkdir -p .firecrawl
# File → markdown
firecrawl parse ./paper.pdf -o .firecrawl/paper.md
# AI summary
firecrawl parse ./paper.pdf -S -o .firecrawl/paper-summary.md
# Ask a question about the doc
firecrawl parse ./paper.pdf -Q "What are the main conclusions?" \
-o .firecrawl/paper-qa.md
Then head, grep, rg etc., or incrementally read the file - don't load the whole thing at once.
Options
| Option | Description |
|---|---|
-S, --summary | AI-generated summary |
-Q, --query <prompt> | Ask a question about the parsed content |
-o, --output <path> | Output file path — always use this |
-f, --format <fmt> | markdown (default), html, summary |
--timeout <ms> | Timeout for the parse job |
--timing | Show request duration |
Tips
- Quote paths with spaces:
firecrawl parse "./My Doc.pdf" -o .firecrawl/mydoc.md. - Max upload size: 50 MB per file.
- Credits: ~1 per PDF page; HTML is 1 flat.
- Check
.firecrawl/before re-parsing the same file. - To check your credit balance (recommended for batch processing and similar workflows), use the
firecrawl credit-usagecommand.
See also
- firecrawl-scrape — same idea for URLs
Related skills
More from firecrawl/cli and the wider catalog.
firecrawl
Search, scrape, and interact with the web via Firecrawl CLI—real-time content extraction and monitoring.
firecrawl-scrape
Extract clean markdown from any URL, including JavaScript-rendered pages.
firecrawl-search
Web search with full page content extraction—find articles, research topics, and discover sources beyond snippets.
firecrawl-crawl
Bulk extract content from entire websites or site sections with depth and path filtering.
firecrawl-agent
AI-powered autonomous data extraction from complex websites, returning structured JSON.
firecrawl-map
Discover and list all URLs on a website with optional search filtering.