4.1 KiB
4.1 KiB
Schemas and Output Configuration
Input Schema
Map your application's inputs to .actor/input_schema.json. Validate against the JSON Schema from the @apify/json_schemas npm package (input.schema.json).
{
"title": "My Actor Input",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrl": {
"title": "Start URL",
"type": "string",
"description": "The URL to start processing from",
"editor": "textfield",
"prefill": "https://example.com"
},
"maxItems": {
"title": "Max Items",
"type": "integer",
"description": "Maximum number of items to process",
"default": 100,
"minimum": 1
}
},
"required": ["startUrl"]
}
Mapping Guidelines
- Command-line arguments → input schema properties
- Environment variables → input schema or Actor env vars in actor.json
- Config files → input schema with object/array types
- Flatten deeply nested structures for better UX
Output Schema
Define output structure in .actor/output_schema.json. Validate against the JSON Schema from the @apify/json_schemas npm package (output.schema.json).
For Table-Like Data (Multiple Items)
- Use
Actor.pushData()(JS) orActor.push_data()(Python) - Each item becomes a row in the dataset
For Single Files or Blobs
- Use key-value store:
Actor.setValue()/Actor.set_value() - Get the public URL and include it in the dataset:
// Store file with public access
await Actor.setValue('report.pdf', pdfBuffer, { contentType: 'application/pdf' });
// Get the public URL
const storeInfo = await Actor.openKeyValueStore();
const publicUrl = `https://api.apify.com/v2/key-value-stores/${storeInfo.id}/records/report.pdf`;
// Include URL in dataset output
await Actor.pushData({ reportUrl: publicUrl });
For Multiple Files with a Common Prefix (Collections)
// Store multiple files with a prefix
for (const [name, data] of files) {
await Actor.setValue(`screenshots/${name}`, data, { contentType: 'image/png' });
}
// Files are accessible at: .../records/screenshots%2F{name}
Actor Configuration (actor.json)
Configure .actor/actor.json. Validate against the JSON Schema from the @apify/json_schemas npm package (actor.schema.json).
{
"actorSpecification": 1,
"name": "my-actor",
"title": "My Actor",
"description": "Brief description of what the actor does",
"version": "1.0.0",
"meta": {
"templateId": "ts_empty",
"generatedBy": "Claude Code with Claude Opus 4.5"
},
"input": "./input_schema.json",
"dockerfile": "../Dockerfile"
}
Important: Fill in the generatedBy property with the tool/model used.
State Management
Request Queue - For Pausable Task Processing
The request queue works for any task processing, not just web scraping. Use a dummy URL with custom uniqueKey and userData for non-URL tasks:
const requestQueue = await Actor.openRequestQueue();
// Add tasks to the queue (works for any processing, not just URLs)
await requestQueue.addRequest({
url: 'https://placeholder.local', // Dummy URL for non-scraping tasks
uniqueKey: `task-${taskId}`, // Unique identifier for deduplication
userData: { itemId: 123, action: 'process' }, // Your custom task data
});
// Process tasks from the queue (with Crawlee)
const crawler = new BasicCrawler({
requestQueue,
requestHandler: async ({ request }) => {
const { itemId, action } = request.userData;
// Process your task using userData
await processTask(itemId, action);
},
});
await crawler.run();
// Or manually consume without Crawlee:
let request;
while ((request = await requestQueue.fetchNextRequest())) {
await processTask(request.userData);
await requestQueue.markRequestHandled(request);
}
Key-Value Store - For Checkpoint State
// Save state
await Actor.setValue('STATE', { processedCount: 100 });
// Restore state on restart
const state = await Actor.getValue('STATE') || { processedCount: 0 };