PluginBench
Skill
Official
Review
Audit score 70

firecrawl-knowledge-ingest

firecrawl/firecrawl-workflows

Ingest authenticated docs portals and knowledge bases using Firecrawl's browser automation.

What is firecrawl-knowledge-ingest?

Extracts content from public or login-gated documentation sites, help centers, and knowledge bases using Firecrawl's browser to handle JavaScript rendering, pagination, and authentication. Use when standard crawlers fail due to dynamic content or access restrictions.

  • Navigate and scrape JS-heavy documentation portals with browser automation
  • Handle login-gated and authenticated knowledge bases
  • Follow pagination, load-more controls, and sidebar navigation automatically
  • Extract structured content as JSON or markdown with metadata (title, section, date, author, tags)
  • Preserve code examples, tables, and formatting while stripping navigation chrome

How to install firecrawl-knowledge-ingest

npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-knowledge-ingest
Prerequisites
  • Firecrawl API key (from https://www.firecrawl.dev)
  • Portal URL and any required authentication credentials
  • Node.js environment with npx
Claude Code
Cursor
Windsurf
Cline

How to use firecrawl-knowledge-ingest

  1. 1.Install the skill via npx skills add
  2. 2.Provide the documentation portal URL and specify if authentication is required
  3. 3.Indicate desired output format (JSON, markdown, or merged file)
  4. 4.Set maximum page limit if needed to control scope
  5. 5.Run the workflow to navigate, extract, and structure the content
  6. 6.Review the generated report with sections, article counts, and any failed pages

Use cases

Good for
  • Extract all articles from a SaaS product's help center requiring login credentials
  • Ingest a paginated support knowledge base into a searchable JSON database
  • Scrape JavaScript-rendered API documentation that fails with standard crawlers
  • Collect structured content from multi-section docs portals with sidebar navigation
  • Build a local markdown archive of a dynamic documentation site
Who it's for
  • Documentation engineers building searchable knowledge bases
  • Support teams archiving help center content
  • AI engineers preparing training data from gated documentation
  • Product teams migrating docs between platforms
  • Developers integrating external docs into internal systems

firecrawl-knowledge-ingest FAQ

What authentication methods are supported?

The skill respects authentication boundaries and can handle login-gated portals. Provide credentials or auth details during the onboarding interview.

Can it handle paginated content?

Yes, it automatically follows pagination controls, next links, load-more buttons, and sidebar navigation to extract all pages.

What output formats are available?

JSON (structured with metadata), markdown (readable format), or merged file combining all articles.

Does it preserve code examples and tables?

Yes, the skill preserves code blocks, tables, and formatting while stripping navigation chrome, headers, and footers.

What if some pages fail to load?

The final report tracks failed or restricted pages separately, so you can see extraction progress and any access issues.

Full instructions (SKILL.md)

Source of truth, from firecrawl/firecrawl-workflows.


name: firecrawl-knowledge-ingest description: Ingest public or authenticated knowledge bases and docs portals with Firecrawl browser. Use for JS-heavy docs, login-gated portals, paginated help centers, support knowledge bases, or structured JSON/markdown extraction from documentation sites. license: ISC metadata: author: firecrawl version: "0.1.0" homepage: https://www.firecrawl.dev source: https://github.com/firecrawl/firecrawl-workflows inputs:

  • name: FIRECRAWL_API_KEY description: Firecrawl API key for hosted Firecrawl requests. required: true

Firecrawl Knowledge Ingest

Use this when a docs portal needs browser navigation, auth, pagination, or JS rendering.

Onboarding Interview

Infer the portal URL, output format, auth needs, and page limit from context. If the portal is clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the portal URL, whether authentication is required, or the desired output format.

Firecrawl Collection Plan

Use Firecrawl browser to:

  • open the portal and inspect navigation
  • identify sections, categories, sidebar links, and article URLs
  • follow sidebar navigation, next links, pagination, load-more controls, or search
  • scrape article content as markdown
  • extract metadata such as title, section, last updated date, author, and tags

Try Firecrawl map as a supplement for public URLs, but use browser navigation for auth-gated or JS-heavy content.

Final Deliverable

# Knowledge Ingest: [Portal]

## Summary
[Pages extracted, sections covered, limitations]

## Output
[JSON/markdown/merged file path or content]

## Sections
[Section names and article counts]

## Failed Or Restricted Pages
[Any access/loading issues]

## Sources
[URLs extracted]

## Rerun Inputs
workflow: firecrawl-knowledge-ingest
url: [portal url]
format: [json/markdown/merged]
max_pages: [number]

JSON Shape

Use source, url, extractedAt, totalArticles, and sections[] with article title, url, section, content, and metadata.

Quality Bar

  • Preserve code examples, tables, and formatting.
  • Strip nav chrome, headers, and footers.
  • Track extraction progress and page failures.
  • Respect authentication boundaries.