Files
antigravity-skills-reference/skills/apify-actorization/references/schemas-and-output.md
Ahmed Rehan 2f55f046b9 feat: add 12 official Apify agent-skills for web scraping & data extraction (#165)
* feat: add 12 official Apify skills for web scraping and data extraction

Add the complete Apify agent-skills collection as official vendor skills,
bringing the total skill count from 954 to 966.

New skills:
- apify-actor-development: Develop, debug, and deploy Apify Actors
- apify-actorization: Convert existing projects into Apify Actors
- apify-audience-analysis: Audience demographics across social platforms
- apify-brand-reputation-monitoring: Track reviews, ratings, and sentiment
- apify-competitor-intelligence: Analyze competitor strategies and pricing
- apify-content-analytics: Track engagement metrics and campaign ROI
- apify-ecommerce: E-commerce data scraping for pricing intelligence
- apify-influencer-discovery: Find and evaluate influencers
- apify-lead-generation: B2B/B2C lead generation from multiple platforms
- apify-market-research: Market conditions and geographic opportunities
- apify-trend-analysis: Discover emerging trends across platforms
- apify-ultimate-scraper: Universal AI-powered web scraper

Existing skill fixes:
- design-orchestration: Add missing description, fix markdown list spacing
- multi-agent-brainstorming: Add missing description, fix markdown list spacing

Registry and documentation updates:
- Update skill count to 966+ across README.md, README.vi.md
- Add Apify to official sources in SOURCES.md and all README variants
- Register new skills in catalog.json, skills_index.json, bundles.json, aliases.json
- Update CATALOG.md category counts (data-ai: 152, infrastructure: 95)

Validation script improvements:
- Raise description length limit from 200 to 1024 characters
- Add empty description validation check
- Apply PEP 8 formatting (line length, spacing, trailing whitespace)

* refactor: truncate skill descriptions in SKILL.md files and revert  description length validation to 200 characters.

* feat: Add `apify-ultimate-scraper` to data-ai and move `apify-lead-generation` from business to general categories.
2026-03-01 10:02:50 +01:00

4.1 KiB

Schemas and Output Configuration

Input Schema

Map your application's inputs to .actor/input_schema.json. Validate against the JSON Schema from the @apify/json_schemas npm package (input.schema.json).

{
    "title": "My Actor Input",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "startUrl": {
            "title": "Start URL",
            "type": "string",
            "description": "The URL to start processing from",
            "editor": "textfield",
            "prefill": "https://example.com"
        },
        "maxItems": {
            "title": "Max Items",
            "type": "integer",
            "description": "Maximum number of items to process",
            "default": 100,
            "minimum": 1
        }
    },
    "required": ["startUrl"]
}

Mapping Guidelines

  • Command-line arguments → input schema properties
  • Environment variables → input schema or Actor env vars in actor.json
  • Config files → input schema with object/array types
  • Flatten deeply nested structures for better UX

Output Schema

Define output structure in .actor/output_schema.json. Validate against the JSON Schema from the @apify/json_schemas npm package (output.schema.json).

For Table-Like Data (Multiple Items)

  • Use Actor.pushData() (JS) or Actor.push_data() (Python)
  • Each item becomes a row in the dataset

For Single Files or Blobs

  • Use key-value store: Actor.setValue() / Actor.set_value()
  • Get the public URL and include it in the dataset:
// Store file with public access
await Actor.setValue('report.pdf', pdfBuffer, { contentType: 'application/pdf' });

// Get the public URL
const storeInfo = await Actor.openKeyValueStore();
const publicUrl = `https://api.apify.com/v2/key-value-stores/${storeInfo.id}/records/report.pdf`;

// Include URL in dataset output
await Actor.pushData({ reportUrl: publicUrl });

For Multiple Files with a Common Prefix (Collections)

// Store multiple files with a prefix
for (const [name, data] of files) {
    await Actor.setValue(`screenshots/${name}`, data, { contentType: 'image/png' });
}
// Files are accessible at: .../records/screenshots%2F{name}

Actor Configuration (actor.json)

Configure .actor/actor.json. Validate against the JSON Schema from the @apify/json_schemas npm package (actor.schema.json).

{
    "actorSpecification": 1,
    "name": "my-actor",
    "title": "My Actor",
    "description": "Brief description of what the actor does",
    "version": "1.0.0",
    "meta": {
        "templateId": "ts_empty",
        "generatedBy": "Claude Code with Claude Opus 4.5"
    },
    "input": "./input_schema.json",
    "dockerfile": "../Dockerfile"
}

Important: Fill in the generatedBy property with the tool/model used.

State Management

Request Queue - For Pausable Task Processing

The request queue works for any task processing, not just web scraping. Use a dummy URL with custom uniqueKey and userData for non-URL tasks:

const requestQueue = await Actor.openRequestQueue();

// Add tasks to the queue (works for any processing, not just URLs)
await requestQueue.addRequest({
    url: 'https://placeholder.local',  // Dummy URL for non-scraping tasks
    uniqueKey: `task-${taskId}`,       // Unique identifier for deduplication
    userData: { itemId: 123, action: 'process' },  // Your custom task data
});

// Process tasks from the queue (with Crawlee)
const crawler = new BasicCrawler({
    requestQueue,
    requestHandler: async ({ request }) => {
        const { itemId, action } = request.userData;
        // Process your task using userData
        await processTask(itemId, action);
    },
});
await crawler.run();

// Or manually consume without Crawlee:
let request;
while ((request = await requestQueue.fetchNextRequest())) {
    await processTask(request.userData);
    await requestQueue.markRequestHandled(request);
}

Key-Value Store - For Checkpoint State

// Save state
await Actor.setValue('STATE', { processedCount: 100 });

// Restore state on restart
const state = await Actor.getValue('STATE') || { processedCount: 0 };