6.1 KiB
6.1 KiB
Dataset Schema Reference
The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.
Examples
JavaScript and TypeScript
Consider an example Actor that calls Actor.pushData() to store data into dataset:
import { Actor } from 'apify';
// Initialize the JavaScript SDK
await Actor.init();
/**
* Actor code
*/
await Actor.pushData({
numericField: 10,
pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
linkUrl: 'https://google.com',
textField: 'Google',
booleanField: true,
dateField: new Date(),
arrayField: ['#hello', '#world'],
objectField: {},
});
// Exit successfully
await Actor.exit();
Python
Consider an example Actor that calls Actor.push_data() to store data into dataset:
# Dataset push example (Python)
import asyncio
from datetime import datetime
from apify import Actor
async def main():
await Actor.init()
# Actor code
await Actor.push_data({
'numericField': 10,
'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
'linkUrl': 'https://google.com',
'textField': 'Google',
'booleanField': True,
'dateField': datetime.now().isoformat(),
'arrayField': ['#hello', '#world'],
'objectField': {},
})
# Exit successfully
await Actor.exit()
if __name__ == '__main__':
asyncio.run(main())
Configuration
To set up the Actor's output tab UI, reference a dataset schema file in .actor/actor.json:
{
"actorSpecification": 1,
"name": "book-library-scraper",
"title": "Book Library Scraper",
"version": "1.0.0",
"storages": {
"dataset": "./dataset_schema.json"
}
}
Then create the dataset schema in .actor/dataset_schema.json:
{
"actorSpecification": 1,
"fields": {},
"views": {
"overview": {
"title": "Overview",
"transformation": {
"fields": [
"pictureUrl",
"linkUrl",
"textField",
"booleanField",
"arrayField",
"objectField",
"dateField",
"numericField"
]
},
"display": {
"component": "table",
"properties": {
"pictureUrl": {
"label": "Image",
"format": "image"
},
"linkUrl": {
"label": "Link",
"format": "link"
},
"textField": {
"label": "Text",
"format": "text"
},
"booleanField": {
"label": "Boolean",
"format": "boolean"
},
"arrayField": {
"label": "Array",
"format": "array"
},
"objectField": {
"label": "Object",
"format": "object"
},
"dateField": {
"label": "Date",
"format": "date"
},
"numericField": {
"label": "Number",
"format": "number"
}
}
}
}
}
}
Structure
{
"actorSpecification": 1,
"fields": {},
"views": {
"<VIEW_NAME>": {
"title": "string (required)",
"description": "string (optional)",
"transformation": {
"fields": ["string (required)"],
"unwind": ["string (optional)"],
"flatten": ["string (optional)"],
"omit": ["string (optional)"],
"limit": "integer (optional)",
"desc": "boolean (optional)"
},
"display": {
"component": "table (required)",
"properties": {
"<FIELD_NAME>": {
"label": "string (optional)",
"format": "text|number|date|link|boolean|image|array|object (optional)"
}
}
}
}
}
}
Properties
Dataset Schema Properties
actorSpecification(integer, required) - Specifies the version of dataset schema structure document (currently only version 1)fields(JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)views(DatasetView object, required) - Object with API and UI views description
DatasetView Properties
title(string, required) - Visible in UI Output tab and APIdescription(string, optional) - Only available in API responsetransformation(ViewTransformation object, required) - Data transformation applied when loading from Dataset APIdisplay(ViewDisplay object, required) - Output tab UI visualization definition
ViewTransformation Properties
fields(string[], required) - Fields to present in output (order matches column order)unwind(string[], optional) - Deconstructs nested children into parent objectflatten(string[], optional) - Transforms nested object into flat structureomit(string[], optional) - Removes specified fields from outputlimit(integer, optional) - Maximum number of results (default: all)desc(boolean, optional) - Sort order (true = newest first)
ViewDisplay Properties
component(string, required) - Onlytableis availableproperties(Object, optional) - Keys matchingtransformation.fieldswith ViewDisplayProperty values
ViewDisplayProperty Properties
label(string, optional) - Table column headerformat(string, optional) - One of:text,number,date,link,boolean,image,array,object