Files
antigravity-skills-reference/skills/apify-actor-development/references/dataset-schema.md
Ahmed Rehan 2f55f046b9 feat: add 12 official Apify agent-skills for web scraping & data extraction (#165)
* feat: add 12 official Apify skills for web scraping and data extraction

Add the complete Apify agent-skills collection as official vendor skills,
bringing the total skill count from 954 to 966.

New skills:
- apify-actor-development: Develop, debug, and deploy Apify Actors
- apify-actorization: Convert existing projects into Apify Actors
- apify-audience-analysis: Audience demographics across social platforms
- apify-brand-reputation-monitoring: Track reviews, ratings, and sentiment
- apify-competitor-intelligence: Analyze competitor strategies and pricing
- apify-content-analytics: Track engagement metrics and campaign ROI
- apify-ecommerce: E-commerce data scraping for pricing intelligence
- apify-influencer-discovery: Find and evaluate influencers
- apify-lead-generation: B2B/B2C lead generation from multiple platforms
- apify-market-research: Market conditions and geographic opportunities
- apify-trend-analysis: Discover emerging trends across platforms
- apify-ultimate-scraper: Universal AI-powered web scraper

Existing skill fixes:
- design-orchestration: Add missing description, fix markdown list spacing
- multi-agent-brainstorming: Add missing description, fix markdown list spacing

Registry and documentation updates:
- Update skill count to 966+ across README.md, README.vi.md
- Add Apify to official sources in SOURCES.md and all README variants
- Register new skills in catalog.json, skills_index.json, bundles.json, aliases.json
- Update CATALOG.md category counts (data-ai: 152, infrastructure: 95)

Validation script improvements:
- Raise description length limit from 200 to 1024 characters
- Add empty description validation check
- Apply PEP 8 formatting (line length, spacing, trailing whitespace)

* refactor: truncate skill descriptions in SKILL.md files and revert  description length validation to 200 characters.

* feat: Add `apify-ultimate-scraper` to data-ai and move `apify-lead-generation` from business to general categories.
2026-03-01 10:02:50 +01:00

6.1 KiB

Dataset Schema Reference

The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.

Examples

JavaScript and TypeScript

Consider an example Actor that calls Actor.pushData() to store data into dataset:

import { Actor } from 'apify';
// Initialize the JavaScript SDK
await Actor.init();

/**
 * Actor code
 */
await Actor.pushData({
    numericField: 10,
    pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
    linkUrl: 'https://google.com',
    textField: 'Google',
    booleanField: true,
    dateField: new Date(),
    arrayField: ['#hello', '#world'],
    objectField: {},
});

// Exit successfully
await Actor.exit();

Python

Consider an example Actor that calls Actor.push_data() to store data into dataset:

# Dataset push example (Python)
import asyncio
from datetime import datetime
from apify import Actor

async def main():
    await Actor.init()

    # Actor code
    await Actor.push_data({
        'numericField': 10,
        'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
        'linkUrl': 'https://google.com',
        'textField': 'Google',
        'booleanField': True,
        'dateField': datetime.now().isoformat(),
        'arrayField': ['#hello', '#world'],
        'objectField': {},
    })

    # Exit successfully
    await Actor.exit()

if __name__ == '__main__':
    asyncio.run(main())

Configuration

To set up the Actor's output tab UI, reference a dataset schema file in .actor/actor.json:

{
    "actorSpecification": 1,
    "name": "book-library-scraper",
    "title": "Book Library Scraper",
    "version": "1.0.0",
    "storages": {
        "dataset": "./dataset_schema.json"
    }
}

Then create the dataset schema in .actor/dataset_schema.json:

{
    "actorSpecification": 1,
    "fields": {},
    "views": {
        "overview": {
            "title": "Overview",
            "transformation": {
                "fields": [
                    "pictureUrl",
                    "linkUrl",
                    "textField",
                    "booleanField",
                    "arrayField",
                    "objectField",
                    "dateField",
                    "numericField"
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    "pictureUrl": {
                        "label": "Image",
                        "format": "image"
                    },
                    "linkUrl": {
                        "label": "Link",
                        "format": "link"
                    },
                    "textField": {
                        "label": "Text",
                        "format": "text"
                    },
                    "booleanField": {
                        "label": "Boolean",
                        "format": "boolean"
                    },
                    "arrayField": {
                        "label": "Array",
                        "format": "array"
                    },
                    "objectField": {
                        "label": "Object",
                        "format": "object"
                    },
                    "dateField": {
                        "label": "Date",
                        "format": "date"
                    },
                    "numericField": {
                        "label": "Number",
                        "format": "number"
                    }
                }
            }
        }
    }
}

Structure

{
    "actorSpecification": 1,
    "fields": {},
    "views": {
        "<VIEW_NAME>": {
            "title": "string (required)",
            "description": "string (optional)",
            "transformation": {
                "fields": ["string (required)"],
                "unwind": ["string (optional)"],
                "flatten": ["string (optional)"],
                "omit": ["string (optional)"],
                "limit": "integer (optional)",
                "desc": "boolean (optional)"
            },
            "display": {
                "component": "table (required)",
                "properties": {
                    "<FIELD_NAME>": {
                        "label": "string (optional)",
                        "format": "text|number|date|link|boolean|image|array|object (optional)"
                    }
                }
            }
        }
    }
}

Properties

Dataset Schema Properties

  • actorSpecification (integer, required) - Specifies the version of dataset schema structure document (currently only version 1)
  • fields (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)
  • views (DatasetView object, required) - Object with API and UI views description

DatasetView Properties

  • title (string, required) - Visible in UI Output tab and API
  • description (string, optional) - Only available in API response
  • transformation (ViewTransformation object, required) - Data transformation applied when loading from Dataset API
  • display (ViewDisplay object, required) - Output tab UI visualization definition

ViewTransformation Properties

  • fields (string[], required) - Fields to present in output (order matches column order)
  • unwind (string[], optional) - Deconstructs nested children into parent object
  • flatten (string[], optional) - Transforms nested object into flat structure
  • omit (string[], optional) - Removes specified fields from output
  • limit (integer, optional) - Maximum number of results (default: all)
  • desc (boolean, optional) - Sort order (true = newest first)

ViewDisplay Properties

  • component (string, required) - Only table is available
  • properties (Object, optional) - Keys matching transformation.fields with ViewDisplayProperty values

ViewDisplayProperty Properties

  • label (string, optional) - Table column header
  • format (string, optional) - One of: text, number, date, link, boolean, image, array, object