Files
antigravity-skills-reference/skills/apify-actor-development/references/dataset-schema.md
Ahmed Rehan 2f55f046b9 feat: add 12 official Apify agent-skills for web scraping & data extraction (#165)
* feat: add 12 official Apify skills for web scraping and data extraction

Add the complete Apify agent-skills collection as official vendor skills,
bringing the total skill count from 954 to 966.

New skills:
- apify-actor-development: Develop, debug, and deploy Apify Actors
- apify-actorization: Convert existing projects into Apify Actors
- apify-audience-analysis: Audience demographics across social platforms
- apify-brand-reputation-monitoring: Track reviews, ratings, and sentiment
- apify-competitor-intelligence: Analyze competitor strategies and pricing
- apify-content-analytics: Track engagement metrics and campaign ROI
- apify-ecommerce: E-commerce data scraping for pricing intelligence
- apify-influencer-discovery: Find and evaluate influencers
- apify-lead-generation: B2B/B2C lead generation from multiple platforms
- apify-market-research: Market conditions and geographic opportunities
- apify-trend-analysis: Discover emerging trends across platforms
- apify-ultimate-scraper: Universal AI-powered web scraper

Existing skill fixes:
- design-orchestration: Add missing description, fix markdown list spacing
- multi-agent-brainstorming: Add missing description, fix markdown list spacing

Registry and documentation updates:
- Update skill count to 966+ across README.md, README.vi.md
- Add Apify to official sources in SOURCES.md and all README variants
- Register new skills in catalog.json, skills_index.json, bundles.json, aliases.json
- Update CATALOG.md category counts (data-ai: 152, infrastructure: 95)

Validation script improvements:
- Raise description length limit from 200 to 1024 characters
- Add empty description validation check
- Apply PEP 8 formatting (line length, spacing, trailing whitespace)

* refactor: truncate skill descriptions in SKILL.md files and revert  description length validation to 200 characters.

* feat: Add `apify-ultimate-scraper` to data-ai and move `apify-lead-generation` from business to general categories.
2026-03-01 10:02:50 +01:00

210 lines
6.1 KiB
Markdown

# Dataset Schema Reference
The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.
## Examples
### JavaScript and TypeScript
Consider an example Actor that calls `Actor.pushData()` to store data into dataset:
```javascript
import { Actor } from 'apify';
// Initialize the JavaScript SDK
await Actor.init();
/**
* Actor code
*/
await Actor.pushData({
numericField: 10,
pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
linkUrl: 'https://google.com',
textField: 'Google',
booleanField: true,
dateField: new Date(),
arrayField: ['#hello', '#world'],
objectField: {},
});
// Exit successfully
await Actor.exit();
```
### Python
Consider an example Actor that calls `Actor.push_data()` to store data into dataset:
```python
# Dataset push example (Python)
import asyncio
from datetime import datetime
from apify import Actor
async def main():
await Actor.init()
# Actor code
await Actor.push_data({
'numericField': 10,
'pictureUrl': 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',
'linkUrl': 'https://google.com',
'textField': 'Google',
'booleanField': True,
'dateField': datetime.now().isoformat(),
'arrayField': ['#hello', '#world'],
'objectField': {},
})
# Exit successfully
await Actor.exit()
if __name__ == '__main__':
asyncio.run(main())
```
## Configuration
To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`:
```json
{
"actorSpecification": 1,
"name": "book-library-scraper",
"title": "Book Library Scraper",
"version": "1.0.0",
"storages": {
"dataset": "./dataset_schema.json"
}
}
```
Then create the dataset schema in `.actor/dataset_schema.json`:
```json
{
"actorSpecification": 1,
"fields": {},
"views": {
"overview": {
"title": "Overview",
"transformation": {
"fields": [
"pictureUrl",
"linkUrl",
"textField",
"booleanField",
"arrayField",
"objectField",
"dateField",
"numericField"
]
},
"display": {
"component": "table",
"properties": {
"pictureUrl": {
"label": "Image",
"format": "image"
},
"linkUrl": {
"label": "Link",
"format": "link"
},
"textField": {
"label": "Text",
"format": "text"
},
"booleanField": {
"label": "Boolean",
"format": "boolean"
},
"arrayField": {
"label": "Array",
"format": "array"
},
"objectField": {
"label": "Object",
"format": "object"
},
"dateField": {
"label": "Date",
"format": "date"
},
"numericField": {
"label": "Number",
"format": "number"
}
}
}
}
}
}
```
## Structure
```json
{
"actorSpecification": 1,
"fields": {},
"views": {
"<VIEW_NAME>": {
"title": "string (required)",
"description": "string (optional)",
"transformation": {
"fields": ["string (required)"],
"unwind": ["string (optional)"],
"flatten": ["string (optional)"],
"omit": ["string (optional)"],
"limit": "integer (optional)",
"desc": "boolean (optional)"
},
"display": {
"component": "table (required)",
"properties": {
"<FIELD_NAME>": {
"label": "string (optional)",
"format": "text|number|date|link|boolean|image|array|object (optional)"
}
}
}
}
}
}
```
## Properties
### Dataset Schema Properties
- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1)
- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)
- `views` (DatasetView object, required) - Object with API and UI views description
### DatasetView Properties
- `title` (string, required) - Visible in UI Output tab and API
- `description` (string, optional) - Only available in API response
- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API
- `display` (ViewDisplay object, required) - Output tab UI visualization definition
### ViewTransformation Properties
- `fields` (string[], required) - Fields to present in output (order matches column order)
- `unwind` (string[], optional) - Deconstructs nested children into parent object
- `flatten` (string[], optional) - Transforms nested object into flat structure
- `omit` (string[], optional) - Removes specified fields from output
- `limit` (integer, optional) - Maximum number of results (default: all)
- `desc` (boolean, optional) - Sort order (true = newest first)
### ViewDisplay Properties
- `component` (string, required) - Only `table` is available
- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values
### ViewDisplayProperty Properties
- `label` (string, optional) - Table column header
- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object`