* feat: add 12 official Apify skills for web scraping and data extraction Add the complete Apify agent-skills collection as official vendor skills, bringing the total skill count from 954 to 966. New skills: - apify-actor-development: Develop, debug, and deploy Apify Actors - apify-actorization: Convert existing projects into Apify Actors - apify-audience-analysis: Audience demographics across social platforms - apify-brand-reputation-monitoring: Track reviews, ratings, and sentiment - apify-competitor-intelligence: Analyze competitor strategies and pricing - apify-content-analytics: Track engagement metrics and campaign ROI - apify-ecommerce: E-commerce data scraping for pricing intelligence - apify-influencer-discovery: Find and evaluate influencers - apify-lead-generation: B2B/B2C lead generation from multiple platforms - apify-market-research: Market conditions and geographic opportunities - apify-trend-analysis: Discover emerging trends across platforms - apify-ultimate-scraper: Universal AI-powered web scraper Existing skill fixes: - design-orchestration: Add missing description, fix markdown list spacing - multi-agent-brainstorming: Add missing description, fix markdown list spacing Registry and documentation updates: - Update skill count to 966+ across README.md, README.vi.md - Add Apify to official sources in SOURCES.md and all README variants - Register new skills in catalog.json, skills_index.json, bundles.json, aliases.json - Update CATALOG.md category counts (data-ai: 152, infrastructure: 95) Validation script improvements: - Raise description length limit from 200 to 1024 characters - Add empty description validation check - Apply PEP 8 formatting (line length, spacing, trailing whitespace) * refactor: truncate skill descriptions in SKILL.md files and revert description length validation to 200 characters. * feat: Add `apify-ultimate-scraper` to data-ai and move `apify-lead-generation` from business to general categories.
6.1 KiB
6.1 KiB
Dataset Schema Reference
The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.
Examples
JavaScript and TypeScript
Consider an example Actor that calls Actor.pushData() to store data into dataset:
import { Actor } from 'apify';
// Initialize the JavaScript SDK
await Actor.init();
/**
* Actor code
*/
await Actor.pushData({
numericField: 10,
pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
linkUrl: 'https://google.com',
textField: 'Google',
booleanField: true,
dateField: new Date(),
arrayField: ['#hello', '#world'],
objectField: {},
});
// Exit successfully
await Actor.exit();
Python
Consider an example Actor that calls Actor.push_data() to store data into dataset:
# Dataset push example (Python)
import asyncio
from datetime import datetime
from apify import Actor
async def main():
await Actor.init()
# Actor code
await Actor.push_data({
'numericField': 10,
'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
'linkUrl': 'https://google.com',
'textField': 'Google',
'booleanField': True,
'dateField': datetime.now().isoformat(),
'arrayField': ['#hello', '#world'],
'objectField': {},
})
# Exit successfully
await Actor.exit()
if __name__ == '__main__':
asyncio.run(main())
Configuration
To set up the Actor's output tab UI, reference a dataset schema file in .actor/actor.json:
{
"actorSpecification": 1,
"name": "book-library-scraper",
"title": "Book Library Scraper",
"version": "1.0.0",
"storages": {
"dataset": "./dataset_schema.json"
}
}
Then create the dataset schema in .actor/dataset_schema.json:
{
"actorSpecification": 1,
"fields": {},
"views": {
"overview": {
"title": "Overview",
"transformation": {
"fields": [
"pictureUrl",
"linkUrl",
"textField",
"booleanField",
"arrayField",
"objectField",
"dateField",
"numericField"
]
},
"display": {
"component": "table",
"properties": {
"pictureUrl": {
"label": "Image",
"format": "image"
},
"linkUrl": {
"label": "Link",
"format": "link"
},
"textField": {
"label": "Text",
"format": "text"
},
"booleanField": {
"label": "Boolean",
"format": "boolean"
},
"arrayField": {
"label": "Array",
"format": "array"
},
"objectField": {
"label": "Object",
"format": "object"
},
"dateField": {
"label": "Date",
"format": "date"
},
"numericField": {
"label": "Number",
"format": "number"
}
}
}
}
}
}
Structure
{
"actorSpecification": 1,
"fields": {},
"views": {
"<VIEW_NAME>": {
"title": "string (required)",
"description": "string (optional)",
"transformation": {
"fields": ["string (required)"],
"unwind": ["string (optional)"],
"flatten": ["string (optional)"],
"omit": ["string (optional)"],
"limit": "integer (optional)",
"desc": "boolean (optional)"
},
"display": {
"component": "table (required)",
"properties": {
"<FIELD_NAME>": {
"label": "string (optional)",
"format": "text|number|date|link|boolean|image|array|object (optional)"
}
}
}
}
}
}
Properties
Dataset Schema Properties
actorSpecification(integer, required) - Specifies the version of dataset schema structure document (currently only version 1)fields(JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)views(DatasetView object, required) - Object with API and UI views description
DatasetView Properties
title(string, required) - Visible in UI Output tab and APIdescription(string, optional) - Only available in API responsetransformation(ViewTransformation object, required) - Data transformation applied when loading from Dataset APIdisplay(ViewDisplay object, required) - Output tab UI visualization definition
ViewTransformation Properties
fields(string[], required) - Fields to present in output (order matches column order)unwind(string[], optional) - Deconstructs nested children into parent objectflatten(string[], optional) - Transforms nested object into flat structureomit(string[], optional) - Removes specified fields from outputlimit(integer, optional) - Maximum number of results (default: all)desc(boolean, optional) - Sort order (true = newest first)
ViewDisplay Properties
component(string, required) - Onlytableis availableproperties(Object, optional) - Keys matchingtransformation.fieldswith ViewDisplayProperty values
ViewDisplayProperty Properties
label(string, optional) - Table column headerformat(string, optional) - One of:text,number,date,link,boolean,image,array,object