**Bug fixes (run_experiment.py):** - Fix broken revert logic: was saving HEAD as pre_commit (no-op revert), now uses git reset --hard HEAD~1 for correct rollback - Remove broken --loop mode (agent IS the loop, script handles one iteration) - Fix shell injection: all git commands use subprocess list form - Replace shell tail with Python file read **Bug fixes (other scripts):** - setup_experiment.py: fix shell injection in git branch creation, remove dead --skip-baseline flag, fix evaluator docstring parsing - log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None), add domain_filter to CSV/markdown export, move import time to top - evaluators: add FileNotFoundError handling, fix output format mismatch in llm_judge_copy, add peak_kb on macOS, add ValueError handling **Plugin packaging (NEW):** - plugin.json, settings.json, CLAUDE.md for plugin registry - 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume - /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly) - experiment-runner agent for autonomous loop iterations - Registered in marketplace.json as plugin #20 **SKILL.md rewrite:** - Replace ambiguous "Loop Protocol" with clear "Agent Protocol" - Add results.tsv format spec, strategy escalation, self-improvement - Replace "NEVER STOP" with resumable stopping logic **Docs & sync:** - Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill - 6 new MkDocs pages, mkdocs.yml nav updated - Counts updated: 17 agents, 22 slash commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
88 lines
3.3 KiB
Markdown
88 lines
3.3 KiB
Markdown
---
|
|
title: "/ar:setup — Create New Experiment"
|
|
description: "/ar:setup — Create New Experiment - Claude Code skill from the Engineering - POWERFUL domain."
|
|
---
|
|
|
|
# /ar:setup — Create New Experiment
|
|
|
|
<div class="page-meta" markdown>
|
|
<span class="meta-badge">:material-rocket-launch: Engineering - POWERFUL</span>
|
|
<span class="meta-badge">:material-identifier: `setup`</span>
|
|
<span class="meta-badge">:material-github: <a href="https://github.com/alirezarezvani/claude-skills/tree/main/engineering/autoresearch-agent/skills/setup/SKILL.md">Source</a></span>
|
|
</div>
|
|
|
|
<div class="install-banner" markdown>
|
|
<span class="install-label">Install:</span> <code>claude /plugin install engineering-advanced-skills</code>
|
|
</div>
|
|
|
|
|
|
Set up a new autoresearch experiment with all required configuration.
|
|
|
|
## Usage
|
|
|
|
```
|
|
/ar:setup # Interactive mode
|
|
/ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower
|
|
/ar:setup --list # Show existing experiments
|
|
/ar:setup --list-evaluators # Show available evaluators
|
|
```
|
|
|
|
## What It Does
|
|
|
|
### If arguments provided
|
|
|
|
Pass them directly to the setup script:
|
|
|
|
```bash
|
|
python {skill_path}/scripts/setup_experiment.py \
|
|
--domain {domain} --name {name} \
|
|
--target {target} --eval "{eval_cmd}" \
|
|
--metric {metric} --direction {direction} \
|
|
[--evaluator {evaluator}] [--scope {scope}]
|
|
```
|
|
|
|
### If no arguments (interactive mode)
|
|
|
|
Collect each parameter one at a time:
|
|
|
|
1. **Domain** — Ask: "What domain? (engineering, marketing, content, prompts, custom)"
|
|
2. **Name** — Ask: "Experiment name? (e.g., api-speed, blog-titles)"
|
|
3. **Target file** — Ask: "Which file to optimize?" Verify it exists.
|
|
4. **Eval command** — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)"
|
|
5. **Metric** — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)"
|
|
6. **Direction** — Ask: "Is lower or higher better?"
|
|
7. **Evaluator** (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?"
|
|
8. **Scope** — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?"
|
|
|
|
Then run `setup_experiment.py` with the collected parameters.
|
|
|
|
### Listing
|
|
|
|
```bash
|
|
# Show existing experiments
|
|
python {skill_path}/scripts/setup_experiment.py --list
|
|
|
|
# Show available evaluators
|
|
python {skill_path}/scripts/setup_experiment.py --list-evaluators
|
|
```
|
|
|
|
## Built-in Evaluators
|
|
|
|
| Name | Metric | Use Case |
|
|
|------|--------|----------|
|
|
| `benchmark_speed` | `p50_ms` (lower) | Function/API execution time |
|
|
| `benchmark_size` | `size_bytes` (lower) | File, bundle, Docker image size |
|
|
| `test_pass_rate` | `pass_rate` (higher) | Test suite pass percentage |
|
|
| `build_speed` | `build_seconds` (lower) | Build/compile/Docker build time |
|
|
| `memory_usage` | `peak_mb` (lower) | Peak memory during execution |
|
|
| `llm_judge_content` | `ctr_score` (higher) | Headlines, titles, descriptions |
|
|
| `llm_judge_prompt` | `quality_score` (higher) | System prompts, agent instructions |
|
|
| `llm_judge_copy` | `engagement_score` (higher) | Social posts, ad copy, emails |
|
|
|
|
## After Setup
|
|
|
|
Report to the user:
|
|
- Experiment path and branch name
|
|
- Whether the eval command worked and the baseline metric
|
|
- Suggest: "Run `/ar:run {domain}/{name}` to start iterating, or `/ar:loop {domain}/{name}` for autonomous mode."
|