- AgentHub: 13 files updated with non-engineering examples (content drafts, research, strategy) — engineering stays primary, cross-domain secondary - AgentHub: 7 slash commands, 5 Python scripts, 3 references, 1 agent, dry_run.py validation (57 checks) - Marketplace: agenthub entry added with cross-domain keywords, engineering POWERFUL updated (25→30), product (12→13), counts synced across all configs - SEO: generate-docs.py now produces keyword-rich <title> tags and meta descriptions using SKILL.md frontmatter — "Claude Code Skills" in site_name propagates to all 276 HTML pages - SEO: per-domain title suffixes (Agent Skill for Codex & OpenClaw, etc.), slug-as-title cleanup, domain label stripping from titles - Broken links: 141→0 warnings — new rewrite_skill_internal_links() converts references/, scripts/, assets/ links to GitHub source URLs; skills/index.md phantom slugs fixed (6 marketing, 7 RA/QM) - Counts synced: 204 skills, 266 tools, 382 refs, 16 agents, 17 commands, 21 plugins — consistent across CLAUDE.md, README.md, docs/index.md, marketplace.json, getting-started.md, mkdocs.yml - Platform sync: Codex 163 skills, Gemini 246 items, OpenClaw compatible Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3.4 KiB
3.4 KiB
title, description
| title | description |
|---|---|
| /ar:setup — Create New Experiment — Agent Skill for Codex & OpenClaw | Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator. Agent skill for Claude Code, Codex CLI, Gemini CLI, OpenClaw. |
/ar:setup — Create New Experiment
:material-rocket-launch: Engineering - POWERFUL
:material-identifier: `setup`
:material-github: Source
Install:
claude /plugin install engineering-advanced-skills
Set up a new autoresearch experiment with all required configuration.
Usage
/ar:setup # Interactive mode
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
/ar:setup --list # Show existing experiments
/ar:setup --list-evaluators # Show available evaluators
What It Does
If arguments provided
Pass them directly to the setup script:
python {skill_path}/scripts/setup_experiment.py \
--domain {domain} --name {name} \
--target {target} --eval "{eval_cmd}" \
--metric {metric} --direction {direction} \
[--evaluator {evaluator}] [--scope {scope}]
If no arguments (interactive mode)
Collect each parameter one at a time:
- Domain — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
- Name — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
- Target file — Ask: "Which file to optimize?" Verify it exists.
- Eval command — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
- Metric — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
- Direction — Ask: "Is lower or higher better?"
- Evaluator (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
- Scope — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"
Then run setup_experiment.py with the collected parameters.
Listing
# Show existing experiments
python {skill_path}/scripts/setup_experiment.py --list
# Show available evaluators
python {skill_path}/scripts/setup_experiment.py --list-evaluators
Built-in Evaluators
| Name | Metric | Use Case |
|---|---|---|
benchmark_speed |
p50_ms (lower) |
Function/API execution time |
benchmark_size |
size_bytes (lower) |
File, bundle, Docker image size |
test_pass_rate |
pass_rate (higher) |
Test suite pass percentage |
build_speed |
build_seconds (lower) |
Build/compile/Docker build time |
memory_usage |
peak_mb (lower) |
Peak memory during execution |
llm_judge_content |
ctr_score (higher) |
Headlines, titles, descriptions |
llm_judge_prompt |
quality_score (higher) |
System prompts, agent instructions |
llm_judge_copy |
engagement_score (higher) |
Social posts, ad copy, emails |
After Setup
Report to the user:
- Experiment path and branch name
- Whether the eval command worked and the baseline metric
- Suggest: "Run
/ar:run {domain}/{name}to start iterating, or/ar:loop {domain}/{name}for autonomous mode."