AI Extraction Agent
"Extract Anything from Any Website with Natural Language" by John Rippy | johnrippy.link
🏆 2025 Zapier Automation Hero of the Year — Project Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →
---
Stop Paying for Expensive Web Scraping APIs
You're currently paying for: Firecrawl ($16+/mo), Diffbot ($299/mo), Apify scrapers per use, or building custom scrapers for every website. What if you could just describe what you want?The AI Extraction Agent uses Claude AI + Playwright to autonomously extract structured data from any website based on natural language objectives:
- No code required - Just describe what you want in plain English
- No Firecrawl dependency - Uses Playwright for scraping (you control the cost)
- Autonomous crawling - Follows links to find relevant content
- Intelligent extraction - Claude AI understands context and extracts clean data
- Schema support - Optionally provide JSON schema for structured output
- BYOK - Bring your own Anthropic API key
---
How It Works
---
Use Cases
1. Competitive Pricing Intelligence
{
"url": "https://competitor.com",
"objective": "Find all pricing plans and list their names, monthly costs, annual discounts, and included features"
}
2. Lead Enrichment
{
"url": "https://company.com",
"objective": "Extract the leadership team with their names, titles, and LinkedIn profiles"
}
3. Product Research
{
"url": "https://store.com/products",
"objective": "Get all products with name, price, description, SKU, and availability status"
}
4. Content Aggregation
{
"url": "https://blog.company.com",
"objective": "Extract all blog posts with title, author, date, and summary"
}
5. Job Listings
{
"url": "https://company.com/careers",
"objective": "Find all open positions with title, department, location, and requirements"
}
---
Quick Start Examples
Example 1: Basic Extraction
{
"url": "https://example-saas.com",
"objective": "Find all pricing plans and list their names, prices, and features",
"anthropicApiKey": "sk-ant-..."
}
Returns:
{
"success": true,
"url": "https://example-saas.com",
"objective": "Find all pricing plans...",
"data": {
"plans": [
{
"name": "Starter",
"price": 29,
"billingCycle": "monthly",
"features": ["5 users", "10GB storage", "Email support"]
},
{
"name": "Professional",
"price": 79,
"billingCycle": "monthly",
"features": ["25 users", "100GB storage", "Priority support", "API access"]
}
]
},
"pagesScraped": 3,
"pagesVisited": ["https://example.com", "https://example.com/pricing"],
"extractedAt": "2024-12-23T10:30:00.000Z"
}
Example 2: With Schema (Structured Output)
{
"url": "https://company.com/team",
"objective": "Extract the leadership team information",
"schema": {
"type": "object",
"properties": {
"team": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"title": { "type": "string" },
"linkedin": { "type": "string" }
}
}
}
}
},
"anthropicApiKey": "sk-ant-..."
}
Example 3: Demo Mode (No API Key Required)
{
"demoMode": true,
"objective": "Find the pricing information"
}
---
Input Parameters
*Required when not in demo mode
---
Output Format
{
"success": true,
"url": "https://example.com",
"objective": "Find pricing plans",
"data": {
"plans": [
{
"name": "Starter",
"price": 29,
"features": ["Feature 1", "Feature 2"]
}
]
},
"pagesScraped": 3,
"pagesVisited": [
"https://example.com",
"https://example.com/pricing"
],
"extractedAt": "2024-12-23T10:30:00.000Z"
}
---
Pricing
Apify Compute
- Standard Playwright actor pricing
- ~$0.25-0.50 per run (depends on pages scraped)
Anthropic API (BYOK)
- Claude API usage: ~$0.003-0.015 per extraction
- Depends on page content size
- Uses claude-sonnet-4-20250514 for best results
Cost Comparison
No monthly subscription. Pay per use.---
API Integration
Using the Apify API
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('localhowl/ai-extraction-agent').call({
url: 'https://competitor.com/pricing',
objective: 'Extract all pricing plans with features and costs',
maxPages: 5,
anthropicApiKey: 'sk-ant-...'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].data);
Using cURL
curl -X POST "https://api.apify.com/v2/acts/localhowl~ai-extraction-agent/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"objective": "Find the company contact information",
"anthropicApiKey": "sk-ant-..."
}'
---
Why Choose This Over Firecrawl?
Best for: Users who want full control over costs and extraction logic, or who already have an Anthropic API key.---
Perfect For
Sales Teams
- Extract competitor pricing for battlecards
- Gather prospect company information
- Build targeted lead lists
Product Managers
- Competitive feature analysis
- Market research
- Pricing strategy research
Marketing Teams
- Content research and aggregation
- Competitor blog analysis
- Social proof collection
Developers
- API endpoint discovery
- Documentation extraction
- Data migration preparation
---
Limitations
- JavaScript-heavy SPAs: May require higher maxPages for full content discovery
- Rate Limiting: Respects robots.txt and includes built-in delays
- Content Length: Very large pages are truncated at 50,000 characters
- Authentication: Cannot access login-protected content
---
Support
For issues or feature requests, contact support@localhowl.com
---
Built by John Rippy | johnrippy.link🏆 2025 Zapier Automation Hero of the Year — Project Phoenix: A 95-step AI sales pipeline cutting development time by 50%. Read more →
---
Keywords
ai web scraper, natural language extraction, claude ai scraper, autonomous web agent, web data extraction, playwright scraper, ai data extraction, structured data extraction, website scraper, competitor analysis, pricing intelligence, lead enrichment, firecrawl alternative, no-code scraper, ai powered scraper