Skill

Official

Review

Audit score 70

firecrawl-knowledge-ingest

firecrawl/firecrawl-workflows

Ingest authenticated docs portals and knowledge bases using Firecrawl's browser automation.

What is firecrawl-knowledge-ingest?

Extracts content from public or login-gated documentation sites, help centers, and knowledge bases using Firecrawl's browser to handle JavaScript rendering, pagination, and authentication. Use when standard crawlers fail due to dynamic content or access restrictions.

Navigate and scrape JS-heavy documentation portals with browser automation
Handle login-gated and authenticated knowledge bases
Follow pagination, load-more controls, and sidebar navigation automatically
Extract structured content as JSON or markdown with metadata (title, section, date, author, tags)
Preserve code examples, tables, and formatting while stripping navigation chrome

How to install firecrawl-knowledge-ingest

npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-knowledge-ingest

Prerequisites

Firecrawl API key (from https://www.firecrawl.dev)
Portal URL and any required authentication credentials
Node.js environment with npx

Claude Code

Cursor

Windsurf

Cline

How to use firecrawl-knowledge-ingest

1.Install the skill via npx skills add
2.Provide the documentation portal URL and specify if authentication is required
3.Indicate desired output format (JSON, markdown, or merged file)
4.Set maximum page limit if needed to control scope
5.Run the workflow to navigate, extract, and structure the content
6.Review the generated report with sections, article counts, and any failed pages

Use cases

Good for

Extract all articles from a SaaS product's help center requiring login credentials
Ingest a paginated support knowledge base into a searchable JSON database
Scrape JavaScript-rendered API documentation that fails with standard crawlers
Collect structured content from multi-section docs portals with sidebar navigation
Build a local markdown archive of a dynamic documentation site

Who it's for

Documentation engineers building searchable knowledge bases
Support teams archiving help center content
AI engineers preparing training data from gated documentation
Product teams migrating docs between platforms
Developers integrating external docs into internal systems

firecrawl-knowledge-ingest FAQ

What authentication methods are supported?

The skill respects authentication boundaries and can handle login-gated portals. Provide credentials or auth details during the onboarding interview.

Can it handle paginated content?

Yes, it automatically follows pagination controls, next links, load-more buttons, and sidebar navigation to extract all pages.

What output formats are available?

JSON (structured with metadata), markdown (readable format), or merged file combining all articles.

Does it preserve code examples and tables?

Yes, the skill preserves code blocks, tables, and formatting while stripping navigation chrome, headers, and footers.

What if some pages fail to load?

The final report tracks failed or restricted pages separately, so you can see extraction progress and any access issues.

Full instructions (SKILL.md)

Source of truth, from firecrawl/firecrawl-workflows.

name: firecrawl-knowledge-ingest description: Ingest public or authenticated knowledge bases and docs portals with Firecrawl browser. Use for JS-heavy docs, login-gated portals, paginated help centers, support knowledge bases, or structured JSON/markdown extraction from documentation sites. license: ISC metadata: author: firecrawl version: "0.1.0" homepage: https://www.firecrawl.dev source: https://github.com/firecrawl/firecrawl-workflows inputs:

name: FIRECRAWL_API_KEY description: Firecrawl API key for hosted Firecrawl requests. required: true

Firecrawl Knowledge Ingest

Use this when a docs portal needs browser navigation, auth, pagination, or JS rendering.

Onboarding Interview

Infer the portal URL, output format, auth needs, and page limit from context. If the portal is clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the portal URL, whether authentication is required, or the desired output format.

Firecrawl Collection Plan

Use Firecrawl browser to:

open the portal and inspect navigation
identify sections, categories, sidebar links, and article URLs
follow sidebar navigation, next links, pagination, load-more controls, or search
scrape article content as markdown
extract metadata such as title, section, last updated date, author, and tags

Try Firecrawl map as a supplement for public URLs, but use browser navigation for auth-gated or JS-heavy content.

Final Deliverable

# Knowledge Ingest: [Portal]

## Summary
[Pages extracted, sections covered, limitations]

## Output
[JSON/markdown/merged file path or content]

## Sections
[Section names and article counts]

## Failed Or Restricted Pages
[Any access/loading issues]

## Sources
[URLs extracted]

## Rerun Inputs
workflow: firecrawl-knowledge-ingest
url: [portal url]
format: [json/markdown/merged]
max_pages: [number]

JSON Shape

Use source, url, extractedAt, totalArticles, and sections[] with article title, url, section, content, and metadata.

Quality Bar

Preserve code examples, tables, and formatting.
Strip nav chrome, headers, and footers.
Track extraction progress and page failures.
Respect authentication boundaries.

Comprehensive SEO audit with site mapping, on-page analysis, keyword opportunities, and competitor SERP comparison using Firecrawl.

23k installs