firecrawl-knowledge-ingest
firecrawl/firecrawl-workflows
Ingest authenticated docs portals and knowledge bases using Firecrawl's browser automation.
What is firecrawl-knowledge-ingest?
Extracts content from public or login-gated documentation sites, help centers, and knowledge bases using Firecrawl's browser to handle JavaScript rendering, pagination, and authentication. Use when standard crawlers fail due to dynamic content or access restrictions.
- Navigate and scrape JS-heavy documentation portals with browser automation
- Handle login-gated and authenticated knowledge bases
- Follow pagination, load-more controls, and sidebar navigation automatically
- Extract structured content as JSON or markdown with metadata (title, section, date, author, tags)
- Preserve code examples, tables, and formatting while stripping navigation chrome
How to install firecrawl-knowledge-ingest
npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-knowledge-ingest- Firecrawl API key (from https://www.firecrawl.dev)
- Portal URL and any required authentication credentials
- Node.js environment with npx
How to use firecrawl-knowledge-ingest
- 1.Install the skill via npx skills add
- 2.Provide the documentation portal URL and specify if authentication is required
- 3.Indicate desired output format (JSON, markdown, or merged file)
- 4.Set maximum page limit if needed to control scope
- 5.Run the workflow to navigate, extract, and structure the content
- 6.Review the generated report with sections, article counts, and any failed pages
Use cases
- Extract all articles from a SaaS product's help center requiring login credentials
- Ingest a paginated support knowledge base into a searchable JSON database
- Scrape JavaScript-rendered API documentation that fails with standard crawlers
- Collect structured content from multi-section docs portals with sidebar navigation
- Build a local markdown archive of a dynamic documentation site
- Documentation engineers building searchable knowledge bases
- Support teams archiving help center content
- AI engineers preparing training data from gated documentation
- Product teams migrating docs between platforms
- Developers integrating external docs into internal systems
firecrawl-knowledge-ingest FAQ
The skill respects authentication boundaries and can handle login-gated portals. Provide credentials or auth details during the onboarding interview.
Yes, it automatically follows pagination controls, next links, load-more buttons, and sidebar navigation to extract all pages.
JSON (structured with metadata), markdown (readable format), or merged file combining all articles.
Yes, the skill preserves code blocks, tables, and formatting while stripping navigation chrome, headers, and footers.
The final report tracks failed or restricted pages separately, so you can see extraction progress and any access issues.
Full instructions (SKILL.md)
Source of truth, from firecrawl/firecrawl-workflows.
name: firecrawl-knowledge-ingest description: Ingest public or authenticated knowledge bases and docs portals with Firecrawl browser. Use for JS-heavy docs, login-gated portals, paginated help centers, support knowledge bases, or structured JSON/markdown extraction from documentation sites. license: ISC metadata: author: firecrawl version: "0.1.0" homepage: https://www.firecrawl.dev source: https://github.com/firecrawl/firecrawl-workflows inputs:
- name: FIRECRAWL_API_KEY description: Firecrawl API key for hosted Firecrawl requests. required: true
Firecrawl Knowledge Ingest
Use this when a docs portal needs browser navigation, auth, pagination, or JS rendering.
Onboarding Interview
Infer the portal URL, output format, auth needs, and page limit from context. If the portal is clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the portal URL, whether authentication is required, or the desired output format.
Firecrawl Collection Plan
Use Firecrawl browser to:
- open the portal and inspect navigation
- identify sections, categories, sidebar links, and article URLs
- follow sidebar navigation, next links, pagination, load-more controls, or search
- scrape article content as markdown
- extract metadata such as title, section, last updated date, author, and tags
Try Firecrawl map as a supplement for public URLs, but use browser navigation for auth-gated or JS-heavy content.
Final Deliverable
# Knowledge Ingest: [Portal]
## Summary
[Pages extracted, sections covered, limitations]
## Output
[JSON/markdown/merged file path or content]
## Sections
[Section names and article counts]
## Failed Or Restricted Pages
[Any access/loading issues]
## Sources
[URLs extracted]
## Rerun Inputs
workflow: firecrawl-knowledge-ingest
url: [portal url]
format: [json/markdown/merged]
max_pages: [number]
JSON Shape
Use source, url, extractedAt, totalArticles, and sections[] with article title, url, section, content, and metadata.
Quality Bar
- Preserve code examples, tables, and formatting.
- Strip nav chrome, headers, and footers.
- Track extraction progress and page failures.
- Respect authentication boundaries.
Related skills
More from firecrawl/firecrawl-workflows and the wider catalog.
firecrawl-deep-research
Produce rigorous, cited analytical reports on complex topics with multi-angle research and contrarian views.
firecrawl-research-papers
Find and synthesize research papers, whitepapers, and technical reports using semantic search and paper expansion.
firecrawl-website-design-clone
Extract any website's design system into an agent-ready DESIGN.md using Firecrawl scraping.
firecrawl-market-research
Extract market, financial, and company metrics from web sources using Firecrawl for structured research reports.
firecrawl-knowledge-base
Build organized, LLM-ready knowledge bases from web content using Firecrawl.
firecrawl-seo-audit
Comprehensive SEO audit with site mapping, on-page analysis, keyword opportunities, and competitor SERP comparison using Firecrawl.