Firecrawl Website Crawler
Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction.
Features
- Superior JS Rendering - Handles complex JavaScript-heavy websites
- Anti-Bot Bypass - Built-in techniques to avoid blocking
- Smart Rate Limiting - Automatic throttling to prevent IP bans
- Clean Markdown Output - Get beautifully formatted content
- Subdomain Crawling - Optionally include subdomains
- URL Pattern Filtering - Include/exclude specific URL patterns
- Screenshot Capture - Optional visual snapshots of pages
- Geo-Targeting - Crawl from specific countries
- Demo Mode - Test without an API key using sample data
Use Cases
- Content Migration - Extract all content for website migrations
- SEO Audits - Crawl sites for technical SEO analysis
- Research & Analysis - Gather content for competitive research
- Data Extraction - Collect structured data from websites
- Archival - Create markdown backups of website content
- Training Data - Gather content for AI/ML training datasets
Input
| Field | Type | Description | Default |
|---|---|---|---|
| url | string | Website URL to crawl | Required |
| maxPages | number | Maximum pages to crawl | 100 |
| maxDepth | number | Maximum crawl depth | 5 |
| includeSubdomains | boolean | Include subdomains | false |
| excludePatterns | array | URL patterns to exclude | - |
| includePatterns | array | Only include matching URLs | - |
| outputFormat | string | markdown, html, text, links | markdown |
| includeScreenshots | boolean | Capture page screenshots | false |
| waitForSelector | string | CSS selector to wait for | - |
| firecrawlApiKey | string | Your Firecrawl API key | - |
| demoMode | boolean | Run with sample data | false |
Output
{
"url": "https://example.com/page",
"title": "Page Title",
"description": "Meta description of the page",
"markdown": "# Page Title\n\nFull markdown content...",
"wordCount": 450,
"statusCode": 200,
"crawledAt": "2024-01-15T10:30:00Z"
}
Output Formats
| Format | Description |
|---|---|
| markdown | Clean, formatted markdown with headers and links |
| html | Raw HTML content |
| text | Plain text with markdown stripped |
| links | Only extracted links from each page |
Pricing
| Event | Description | Price |
|---|---|---|
| Crawl Started | Charged when a website crawl is initiated | $0.02 |
| Pages Crawled (per 10) | Charged per 10 pages successfully crawled | $0.01 |
Examples
Basic Crawl
{
"url": "https://example.com",
"maxPages": 50,
"outputFormat": "markdown"
}
Deep Crawl with Filtering
{
"url": "https://example.com",
"maxPages": 500,
"maxDepth": 10,
"includeSubdomains": true,
"excludePatterns": ["/admin/*", "/login/*"],
"includePatterns": ["/blog/*", "/docs/*"]
}
JS-Heavy Site with Screenshots
{
"url": "https://spa-example.com",
"waitForSelector": ".content-loaded",
"includeScreenshots": true,
"maxPages": 100
}
Related Actors
- Firecrawl Site Mapper - Fast URL discovery (lighter weight)
- Firecrawl Competitive Intelligence - Targeted competitor analysis
Built by John Rippy
Keywords
firecrawl, website crawler, web scraper, javascript rendering, anti-bot, markdown extraction, content migration, seo audit, deep crawl, subdomain crawl