* feat: add 12 official Apify skills for web scraping and data extraction Add the complete Apify agent-skills collection as official vendor skills, bringing the total skill count from 954 to 966. New skills: - apify-actor-development: Develop, debug, and deploy Apify Actors - apify-actorization: Convert existing projects into Apify Actors - apify-audience-analysis: Audience demographics across social platforms - apify-brand-reputation-monitoring: Track reviews, ratings, and sentiment - apify-competitor-intelligence: Analyze competitor strategies and pricing - apify-content-analytics: Track engagement metrics and campaign ROI - apify-ecommerce: E-commerce data scraping for pricing intelligence - apify-influencer-discovery: Find and evaluate influencers - apify-lead-generation: B2B/B2C lead generation from multiple platforms - apify-market-research: Market conditions and geographic opportunities - apify-trend-analysis: Discover emerging trends across platforms - apify-ultimate-scraper: Universal AI-powered web scraper Existing skill fixes: - design-orchestration: Add missing description, fix markdown list spacing - multi-agent-brainstorming: Add missing description, fix markdown list spacing Registry and documentation updates: - Update skill count to 966+ across README.md, README.vi.md - Add Apify to official sources in SOURCES.md and all README variants - Register new skills in catalog.json, skills_index.json, bundles.json, aliases.json - Update CATALOG.md category counts (data-ai: 152, infrastructure: 95) Validation script improvements: - Raise description length limit from 200 to 1024 characters - Add empty description validation check - Apply PEP 8 formatting (line length, spacing, trailing whitespace) * refactor: truncate skill descriptions in SKILL.md files and revert description length validation to 200 characters. * feat: Add `apify-ultimate-scraper` to data-ai and move `apify-lead-generation` from business to general categories.
231 lines
9.2 KiB
Markdown
231 lines
9.2 KiB
Markdown
---
|
|
name: apify-ultimate-scraper
|
|
description: "Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, Google Maps, Google Search, Google Trends, Booking.com, and TripAdvisor. Use for lead gener..."
|
|
---
|
|
|
|
# Universal Web Scraper
|
|
|
|
AI-driven data extraction from 55+ Actors across all major platforms. This skill automatically selects the best Actor for your task.
|
|
|
|
## Prerequisites
|
|
(No need to check it upfront)
|
|
|
|
- `.env` file with `APIFY_TOKEN`
|
|
- Node.js 20.6+ (for native `--env-file` support)
|
|
- `mcpc` CLI tool: `npm install -g @apify/mcpc`
|
|
|
|
## Workflow
|
|
|
|
Copy this checklist and track progress:
|
|
|
|
```
|
|
Task Progress:
|
|
- [ ] Step 1: Understand user goal and select Actor
|
|
- [ ] Step 2: Fetch Actor schema via mcpc
|
|
- [ ] Step 3: Ask user preferences (format, filename)
|
|
- [ ] Step 4: Run the scraper script
|
|
- [ ] Step 5: Summarize results and offer follow-ups
|
|
```
|
|
|
|
### Step 1: Understand User Goal and Select Actor
|
|
|
|
First, understand what the user wants to achieve. Then select the best Actor from the options below.
|
|
|
|
#### Instagram Actors (12)
|
|
|
|
| Actor ID | Best For |
|
|
|----------|----------|
|
|
| `apify/instagram-profile-scraper` | Profile data, follower counts, bio info |
|
|
| `apify/instagram-post-scraper` | Individual post details, engagement metrics |
|
|
| `apify/instagram-comment-scraper` | Comment extraction, sentiment analysis |
|
|
| `apify/instagram-hashtag-scraper` | Hashtag content, trending topics |
|
|
| `apify/instagram-hashtag-stats` | Hashtag performance metrics |
|
|
| `apify/instagram-reel-scraper` | Reels content and metrics |
|
|
| `apify/instagram-search-scraper` | Search users, places, hashtags |
|
|
| `apify/instagram-tagged-scraper` | Posts tagged with specific accounts |
|
|
| `apify/instagram-followers-count-scraper` | Follower count tracking |
|
|
| `apify/instagram-scraper` | Comprehensive Instagram data |
|
|
| `apify/instagram-api-scraper` | API-based Instagram access |
|
|
| `apify/export-instagram-comments-posts` | Bulk comment/post export |
|
|
|
|
#### Facebook Actors (14)
|
|
|
|
| Actor ID | Best For |
|
|
|----------|----------|
|
|
| `apify/facebook-pages-scraper` | Page data, metrics, contact info |
|
|
| `apify/facebook-page-contact-information` | Emails, phones, addresses from pages |
|
|
| `apify/facebook-posts-scraper` | Post content and engagement |
|
|
| `apify/facebook-comments-scraper` | Comment extraction |
|
|
| `apify/facebook-likes-scraper` | Reaction analysis |
|
|
| `apify/facebook-reviews-scraper` | Page reviews |
|
|
| `apify/facebook-groups-scraper` | Group content and members |
|
|
| `apify/facebook-events-scraper` | Event data |
|
|
| `apify/facebook-ads-scraper` | Ad creative and targeting |
|
|
| `apify/facebook-search-scraper` | Search results |
|
|
| `apify/facebook-reels-scraper` | Reels content |
|
|
| `apify/facebook-photos-scraper` | Photo extraction |
|
|
| `apify/facebook-marketplace-scraper` | Marketplace listings |
|
|
| `apify/facebook-followers-following-scraper` | Follower/following lists |
|
|
|
|
#### TikTok Actors (14)
|
|
|
|
| Actor ID | Best For |
|
|
|----------|----------|
|
|
| `clockworks/tiktok-scraper` | Comprehensive TikTok data |
|
|
| `clockworks/free-tiktok-scraper` | Free TikTok extraction |
|
|
| `clockworks/tiktok-profile-scraper` | Profile data |
|
|
| `clockworks/tiktok-video-scraper` | Video details and metrics |
|
|
| `clockworks/tiktok-comments-scraper` | Comment extraction |
|
|
| `clockworks/tiktok-followers-scraper` | Follower lists |
|
|
| `clockworks/tiktok-user-search-scraper` | Find users by keywords |
|
|
| `clockworks/tiktok-hashtag-scraper` | Hashtag content |
|
|
| `clockworks/tiktok-sound-scraper` | Trending sounds |
|
|
| `clockworks/tiktok-ads-scraper` | Ad content |
|
|
| `clockworks/tiktok-discover-scraper` | Discover page content |
|
|
| `clockworks/tiktok-explore-scraper` | Explore content |
|
|
| `clockworks/tiktok-trends-scraper` | Trending content |
|
|
| `clockworks/tiktok-live-scraper` | Live stream data |
|
|
|
|
#### YouTube Actors (5)
|
|
|
|
| Actor ID | Best For |
|
|
|----------|----------|
|
|
| `streamers/youtube-scraper` | Video data and metrics |
|
|
| `streamers/youtube-channel-scraper` | Channel information |
|
|
| `streamers/youtube-comments-scraper` | Comment extraction |
|
|
| `streamers/youtube-shorts-scraper` | Shorts content |
|
|
| `streamers/youtube-video-scraper-by-hashtag` | Videos by hashtag |
|
|
|
|
#### Google Maps Actors (4)
|
|
|
|
| Actor ID | Best For |
|
|
|----------|----------|
|
|
| `compass/crawler-google-places` | Business listings, ratings, contact info |
|
|
| `compass/google-maps-extractor` | Detailed business data |
|
|
| `compass/Google-Maps-Reviews-Scraper` | Review extraction |
|
|
| `poidata/google-maps-email-extractor` | Email discovery from listings |
|
|
|
|
#### Other Actors (6)
|
|
|
|
| Actor ID | Best For |
|
|
|----------|----------|
|
|
| `apify/google-search-scraper` | Google search results |
|
|
| `apify/google-trends-scraper` | Google Trends data |
|
|
| `voyager/booking-scraper` | Booking.com hotel data |
|
|
| `voyager/booking-reviews-scraper` | Booking.com reviews |
|
|
| `maxcopell/tripadvisor-reviews` | TripAdvisor reviews |
|
|
| `vdrmota/contact-info-scraper` | Contact enrichment from URLs |
|
|
|
|
---
|
|
|
|
#### Actor Selection by Use Case
|
|
|
|
| Use Case | Primary Actors |
|
|
|----------|---------------|
|
|
| **Lead Generation** | `compass/crawler-google-places`, `poidata/google-maps-email-extractor`, `vdrmota/contact-info-scraper` |
|
|
| **Influencer Discovery** | `apify/instagram-profile-scraper`, `clockworks/tiktok-profile-scraper`, `streamers/youtube-channel-scraper` |
|
|
| **Brand Monitoring** | `apify/instagram-tagged-scraper`, `apify/instagram-hashtag-scraper`, `compass/Google-Maps-Reviews-Scraper` |
|
|
| **Competitor Analysis** | `apify/facebook-pages-scraper`, `apify/facebook-ads-scraper`, `apify/instagram-profile-scraper` |
|
|
| **Content Analytics** | `apify/instagram-post-scraper`, `clockworks/tiktok-scraper`, `streamers/youtube-scraper` |
|
|
| **Trend Research** | `apify/google-trends-scraper`, `clockworks/tiktok-trends-scraper`, `apify/instagram-hashtag-stats` |
|
|
| **Review Analysis** | `compass/Google-Maps-Reviews-Scraper`, `voyager/booking-reviews-scraper`, `maxcopell/tripadvisor-reviews` |
|
|
| **Audience Analysis** | `apify/instagram-followers-count-scraper`, `clockworks/tiktok-followers-scraper`, `apify/facebook-followers-following-scraper` |
|
|
|
|
---
|
|
|
|
#### Multi-Actor Workflows
|
|
|
|
For complex tasks, chain multiple Actors:
|
|
|
|
| Workflow | Step 1 | Step 2 |
|
|
|----------|--------|--------|
|
|
| **Lead enrichment** | `compass/crawler-google-places` → | `vdrmota/contact-info-scraper` |
|
|
| **Influencer vetting** | `apify/instagram-profile-scraper` → | `apify/instagram-comment-scraper` |
|
|
| **Competitor deep-dive** | `apify/facebook-pages-scraper` → | `apify/facebook-posts-scraper` |
|
|
| **Local business analysis** | `compass/crawler-google-places` → | `compass/Google-Maps-Reviews-Scraper` |
|
|
|
|
#### Can't Find a Suitable Actor?
|
|
|
|
If none of the Actors above match the user's request, search the Apify Store directly:
|
|
|
|
```bash
|
|
export $(grep APIFY_TOKEN .env | xargs) && mcpc --json mcp.apify.com --header "Authorization: Bearer $APIFY_TOKEN" tools-call search-actors keywords:="SEARCH_KEYWORDS" limit:=10 offset:=0 category:="" | jq -r '.content[0].text'
|
|
```
|
|
|
|
Replace `SEARCH_KEYWORDS` with 1-3 simple terms (e.g., "LinkedIn profiles", "Amazon products", "Twitter").
|
|
|
|
### Step 2: Fetch Actor Schema
|
|
|
|
Fetch the Actor's input schema and details dynamically using mcpc:
|
|
|
|
```bash
|
|
export $(grep APIFY_TOKEN .env | xargs) && mcpc --json mcp.apify.com --header "Authorization: Bearer $APIFY_TOKEN" tools-call fetch-actor-details actor:="ACTOR_ID" | jq -r ".content"
|
|
```
|
|
|
|
Replace `ACTOR_ID` with the selected Actor (e.g., `compass/crawler-google-places`).
|
|
|
|
This returns:
|
|
- Actor description and README
|
|
- Required and optional input parameters
|
|
- Output fields (if available)
|
|
|
|
### Step 3: Ask User Preferences
|
|
|
|
Before running, ask:
|
|
1. **Output format**:
|
|
- **Quick answer** - Display top few results in chat (no file saved)
|
|
- **CSV** - Full export with all fields
|
|
- **JSON** - Full export in JSON format
|
|
2. **Number of results**: Based on character of use case
|
|
|
|
### Step 4: Run the Script
|
|
|
|
**Quick answer (display in chat, no file):**
|
|
```bash
|
|
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/reference/scripts/run_actor.js \
|
|
--actor "ACTOR_ID" \
|
|
--input 'JSON_INPUT'
|
|
```
|
|
|
|
**CSV:**
|
|
```bash
|
|
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/reference/scripts/run_actor.js \
|
|
--actor "ACTOR_ID" \
|
|
--input 'JSON_INPUT' \
|
|
--output YYYY-MM-DD_OUTPUT_FILE.csv \
|
|
--format csv
|
|
```
|
|
|
|
**JSON:**
|
|
```bash
|
|
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/reference/scripts/run_actor.js \
|
|
--actor "ACTOR_ID" \
|
|
--input 'JSON_INPUT' \
|
|
--output YYYY-MM-DD_OUTPUT_FILE.json \
|
|
--format json
|
|
```
|
|
|
|
### Step 5: Summarize Results and Offer Follow-ups
|
|
|
|
After completion, report:
|
|
- Number of results found
|
|
- File location and name
|
|
- Key fields available
|
|
- **Suggested follow-up workflows** based on results:
|
|
|
|
| If User Got | Suggest Next |
|
|
|-------------|--------------|
|
|
| Business listings | Enrich with `vdrmota/contact-info-scraper` or get reviews |
|
|
| Influencer profiles | Analyze engagement with comment scrapers |
|
|
| Competitor pages | Deep-dive with post/ad scrapers |
|
|
| Trend data | Validate with platform-specific hashtag scrapers |
|
|
|
|
|
|
## Error Handling
|
|
|
|
`APIFY_TOKEN not found` - Ask user to create `.env` with `APIFY_TOKEN=your_token`
|
|
`mcpc not found` - Ask user to install `npm install -g @apify/mcpc`
|
|
`Actor not found` - Check Actor ID spelling
|
|
`Run FAILED` - Ask user to check Apify console link in error output
|
|
`Timeout` - Reduce input size or increase `--timeout`
|