firecrawl-company-directories
firecrawl/firecrawl-workflows
Extract structured company lists from directories like YC, Crunchbase, and Product Hunt using Firecrawl.
What is firecrawl-company-directories?
This skill scrapes company directories and converts them into structured JSON, CSV, or CRM-ready lists. Use it when you need to build company databases, research startup landscapes, or populate prospect lists from public directories.
- Extracts company data (name, description, industry, stage, location, funding, team size) from directories
- Handles pagination, infinite scroll, and dynamic filtering using Firecrawl browser when needed
- Outputs structured JSON, CSV, or markdown tables ready for CRM import or analysis
- Deduplicates results and tracks extraction progress
- Supports YC, Crunchbase, Product Hunt, G2, and custom directory URLs
How to install firecrawl-company-directories
npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-company-directories- Firecrawl API key (required)
- Directory URL or name (YC, Crunchbase, Product Hunt, G2, or custom)
How to use firecrawl-company-directories
- 1.Provide the directory URL or name and specify any filters (industry, stage, location, etc.)
- 2.Indicate desired result count and output format (JSON, CSV, or markdown)
- 3.The skill infers extraction strategy: uses browser for dynamic/paginated directories, scrape/map for static listings
- 4.Reviews extracted fields (name, description, industry, stage, location, funding, tags, URLs)
- 5.Delivers final output with summary, company table, sources used, and rerun inputs for future updates
Use cases
- Build a prospect list from Y Combinator or Crunchbase filtered by industry and funding stage
- Export Product Hunt or G2 category listings into a CSV for sales outreach
- Create a research dataset of startups in a specific vertical with founding dates and locations
- Populate a CRM with company information from a custom startup directory
- Monitor and extract updates from category-based directories on a recurring basis
- Sales and business development teams building prospect lists
- Researchers and analysts studying startup ecosystems or market segments
- Product managers tracking competitors in specific categories
- Founders researching the competitive landscape
- Data teams automating company database updates
firecrawl-company-directories FAQ
YC, Crunchbase, Product Hunt, G2 categories, startup directories, and any custom company directory URL. The skill adapts to the structure of each source.
Yes. Firecrawl browser automatically handles pagination, infinite scroll, and dynamic filtering when needed. Static listings use faster scrape/map methods.
The skill notes these blocks in the output. Login walls and CAPTCHAs may prevent extraction of gated content.
Yes. Specify filters during setup (e.g., 'Series A startups in AI/ML based in US'). The skill applies them during extraction.
JSON (structured with metadata), CSV (spreadsheet-ready), and markdown tables. Choose based on your downstream use (CRM import, analysis, etc.).
Full instructions (SKILL.md)
Source of truth, from firecrawl/firecrawl-workflows.
name: firecrawl-company-directories description: Extract structured company lists from directories with Firecrawl. Use for scraping YC, Crunchbase, Product Hunt, G2, startup directories, category directories, or custom company databases into JSON, CSV, CRM-ready lists, or research tables. license: ISC metadata: author: firecrawl version: "0.1.0" homepage: https://www.firecrawl.dev source: https://github.com/firecrawl/firecrawl-workflows inputs:
- name: FIRECRAWL_API_KEY description: Firecrawl API key for hosted Firecrawl requests. required: true
Firecrawl Company Directories
Use this to turn startup or company directories into structured lists.
Onboarding Interview
Infer the directory, filters, result count, and output format from context. If the source is clear, proceed immediately.
Ask at most 1-3 concise questions only if blocked, such as the directory URL/name, required filters, or target result count.
Firecrawl Collection Plan
Use Firecrawl browser when the directory needs filters, pagination, infinite scroll, or profile clicks. Use scrape/map when listings are public and static.
Suggested sources include YC companies, Crunchbase, Product Hunt, G2 categories, or any custom directory URL.
Extraction Fields
Capture fields that are visible:
- name
- description
- industry/category
- stage/founded/location/team size/funding when visible
- tags
- directory profile URL
- company website URL
Leave unavailable fields blank. Do not infer.
Final Deliverable
# Company Directory Export: [Source]
## Summary
[Filters, count extracted, limitations]
## Companies
[Table or link to JSON/CSV]
## Sources
[Directory pages and profiles used]
## Rerun Inputs
workflow: firecrawl-company-directories
directory: [source]
filters: [criteria]
max_results: [number]
output: [json/csv/markdown]
JSON Shape
Use source, filters, extractedAt, totalResults, and companies[] with name, url, description, industry, stage, founded, location, teamSize, funding, tags, profileUrl, and websiteUrl.
Quality Bar
- Deduplicate companies.
- Track pagination progress.
- Note rate limits, login walls, or CAPTCHA blocks.
Related skills
More from firecrawl/firecrawl-workflows and the wider catalog.
firecrawl-deep-research
Produce rigorous, cited analytical reports on complex topics with multi-angle research and contrarian views.
firecrawl-research-papers
Find and synthesize research papers, whitepapers, and technical reports using semantic search and paper expansion.
firecrawl-website-design-clone
Extract any website's design system into an agent-ready DESIGN.md using Firecrawl scraping.
firecrawl-market-research
Extract market, financial, and company metrics from web sources using Firecrawl for structured research reports.
firecrawl-knowledge-base
Build organized, LLM-ready knowledge bases from web content using Firecrawl.
firecrawl-seo-audit
Comprehensive SEO audit with site mapping, on-page analysis, keyword opportunities, and competitor SERP comparison using Firecrawl.