* feat: add 12 official Apify skills for web scraping and data extraction Add the complete Apify agent-skills collection as official vendor skills, bringing the total skill count from 954 to 966. New skills: - apify-actor-development: Develop, debug, and deploy Apify Actors - apify-actorization: Convert existing projects into Apify Actors - apify-audience-analysis: Audience demographics across social platforms - apify-brand-reputation-monitoring: Track reviews, ratings, and sentiment - apify-competitor-intelligence: Analyze competitor strategies and pricing - apify-content-analytics: Track engagement metrics and campaign ROI - apify-ecommerce: E-commerce data scraping for pricing intelligence - apify-influencer-discovery: Find and evaluate influencers - apify-lead-generation: B2B/B2C lead generation from multiple platforms - apify-market-research: Market conditions and geographic opportunities - apify-trend-analysis: Discover emerging trends across platforms - apify-ultimate-scraper: Universal AI-powered web scraper Existing skill fixes: - design-orchestration: Add missing description, fix markdown list spacing - multi-agent-brainstorming: Add missing description, fix markdown list spacing Registry and documentation updates: - Update skill count to 966+ across README.md, README.vi.md - Add Apify to official sources in SOURCES.md and all README variants - Register new skills in catalog.json, skills_index.json, bundles.json, aliases.json - Update CATALOG.md category counts (data-ai: 152, infrastructure: 95) Validation script improvements: - Raise description length limit from 200 to 1024 characters - Add empty description validation check - Apply PEP 8 formatting (line length, spacing, trailing whitespace) * refactor: truncate skill descriptions in SKILL.md files and revert description length validation to 200 characters. * feat: Add `apify-ultimate-scraper` to data-ai and move `apify-lead-generation` from business to general categories.
96 lines
2.4 KiB
Markdown
96 lines
2.4 KiB
Markdown
# Python Actorization
|
|
|
|
## Install the Apify SDK
|
|
|
|
```bash
|
|
pip install apify
|
|
```
|
|
|
|
## Wrap Main Function with Actor Context Manager
|
|
|
|
```python
|
|
import asyncio
|
|
from apify import Actor
|
|
|
|
async def main() -> None:
|
|
async with Actor:
|
|
# ============================================
|
|
# Your existing code goes here
|
|
# ============================================
|
|
|
|
# Example: Get input from Apify Console or API
|
|
actor_input = await Actor.get_input()
|
|
print(f'Input: {actor_input}')
|
|
|
|
# Example: Your crawler or processing logic
|
|
# crawler = PlaywrightCrawler(...)
|
|
# await crawler.run([actor_input.get('startUrl')])
|
|
|
|
# Example: Push results to dataset
|
|
# await Actor.push_data({'result': 'data'})
|
|
|
|
# ============================================
|
|
# End of your code
|
|
# ============================================
|
|
|
|
if __name__ == '__main__':
|
|
asyncio.run(main())
|
|
```
|
|
|
|
## Key Points
|
|
|
|
- `async with Actor:` handles both initialization and cleanup
|
|
- Automatically manages platform event listeners and graceful shutdown
|
|
- Local execution remains unchanged - the SDK automatically detects the environment
|
|
|
|
## Crawlee Python Projects
|
|
|
|
```python
|
|
import asyncio
|
|
from apify import Actor
|
|
from crawlee.playwright_crawler import PlaywrightCrawler
|
|
|
|
async def main() -> None:
|
|
async with Actor:
|
|
# Get and validate input
|
|
actor_input = await Actor.get_input() or {}
|
|
start_url = actor_input.get('startUrl', 'https://example.com')
|
|
max_items = actor_input.get('maxItems', 100)
|
|
|
|
item_count = 0
|
|
|
|
async def request_handler(context):
|
|
nonlocal item_count
|
|
if item_count >= max_items:
|
|
return
|
|
|
|
title = await context.page.title()
|
|
await context.push_data({'url': context.request.url, 'title': title})
|
|
item_count += 1
|
|
|
|
crawler = PlaywrightCrawler(request_handler=request_handler)
|
|
await crawler.run([start_url])
|
|
|
|
if __name__ == '__main__':
|
|
asyncio.run(main())
|
|
```
|
|
|
|
## Batch Processing Scripts
|
|
|
|
```python
|
|
import asyncio
|
|
from apify import Actor
|
|
|
|
async def main() -> None:
|
|
async with Actor:
|
|
actor_input = await Actor.get_input() or {}
|
|
items = actor_input.get('items', [])
|
|
|
|
for item in items:
|
|
result = process_item(item)
|
|
await Actor.push_data(result)
|
|
|
|
if __name__ == '__main__':
|
|
asyncio.run(main())
|
|
```
|