**Bug fixes (run_experiment.py):** - Fix broken revert logic: was saving HEAD as pre_commit (no-op revert), now uses git reset --hard HEAD~1 for correct rollback - Remove broken --loop mode (agent IS the loop, script handles one iteration) - Fix shell injection: all git commands use subprocess list form - Replace shell tail with Python file read **Bug fixes (other scripts):** - setup_experiment.py: fix shell injection in git branch creation, remove dead --skip-baseline flag, fix evaluator docstring parsing - log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None), add domain_filter to CSV/markdown export, move import time to top - evaluators: add FileNotFoundError handling, fix output format mismatch in llm_judge_copy, add peak_kb on macOS, add ValueError handling **Plugin packaging (NEW):** - plugin.json, settings.json, CLAUDE.md for plugin registry - 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume - /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly) - experiment-runner agent for autonomous loop iterations - Registered in marketplace.json as plugin #20 **SKILL.md rewrite:** - Replace ambiguous "Loop Protocol" with clear "Agent Protocol" - Add results.tsv format spec, strategy escalation, self-improvement - Replace "NEVER STOP" with resumable stopping logic **Docs & sync:** - Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill - 6 new MkDocs pages, mkdocs.yml nav updated - Counts updated: 17 agents, 22 slash commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
95 lines
2.9 KiB
Markdown
95 lines
2.9 KiB
Markdown
---
|
|
title: "/ar:run — Single Experiment Iteration"
|
|
description: "/ar:run — Single Experiment Iteration - Claude Code skill from the Engineering - POWERFUL domain."
|
|
---
|
|
|
|
# /ar:run — Single Experiment Iteration
|
|
|
|
<div class="page-meta" markdown>
|
|
<span class="meta-badge">:material-rocket-launch: Engineering - POWERFUL</span>
|
|
<span class="meta-badge">:material-identifier: `run`</span>
|
|
<span class="meta-badge">:material-github: <a href="https://github.com/alirezarezvani/claude-skills/tree/main/engineering/autoresearch-agent/skills/run/SKILL.md">Source</a></span>
|
|
</div>
|
|
|
|
<div class="install-banner" markdown>
|
|
<span class="install-label">Install:</span> <code>claude /plugin install engineering-advanced-skills</code>
|
|
</div>
|
|
|
|
|
|
Run exactly ONE experiment iteration: review history, decide a change, edit, commit, evaluate.
|
|
|
|
## Usage
|
|
|
|
```
|
|
/ar:run engineering/api-speed # Run one iteration
|
|
/ar:run # List experiments, let user pick
|
|
```
|
|
|
|
## What It Does
|
|
|
|
### Step 1: Resolve experiment
|
|
|
|
If no experiment specified, run `python {skill_path}/scripts/setup_experiment.py --list` and ask the user to pick.
|
|
|
|
### Step 2: Load context
|
|
|
|
```bash
|
|
# Read experiment config
|
|
cat .autoresearch/{domain}/{name}/config.cfg
|
|
|
|
# Read strategy and constraints
|
|
cat .autoresearch/{domain}/{name}/program.md
|
|
|
|
# Read experiment history
|
|
cat .autoresearch/{domain}/{name}/results.tsv
|
|
|
|
# Checkout the experiment branch
|
|
git checkout autoresearch/{domain}/{name}
|
|
```
|
|
|
|
### Step 3: Decide what to try
|
|
|
|
Review results.tsv:
|
|
- What changes were kept? What pattern do they share?
|
|
- What was discarded? Avoid repeating those approaches.
|
|
- What crashed? Understand why.
|
|
- How many runs so far? (Escalate strategy accordingly)
|
|
|
|
**Strategy escalation:**
|
|
- Runs 1-5: Low-hanging fruit (obvious improvements)
|
|
- Runs 6-15: Systematic exploration (vary one parameter)
|
|
- Runs 16-30: Structural changes (algorithm swaps)
|
|
- Runs 30+: Radical experiments (completely different approaches)
|
|
|
|
### Step 4: Make ONE change
|
|
|
|
Edit only the target file specified in config.cfg. Change one thing. Keep it simple.
|
|
|
|
### Step 5: Commit and evaluate
|
|
|
|
```bash
|
|
git add {target}
|
|
git commit -m "experiment: {short description of what changed}"
|
|
|
|
python {skill_path}/scripts/run_experiment.py \
|
|
--experiment {domain}/{name} --single
|
|
```
|
|
|
|
### Step 6: Report result
|
|
|
|
Read the script output. Tell the user:
|
|
- **KEEP**: "Improvement! {metric}: {value} ({delta} from previous best)"
|
|
- **DISCARD**: "No improvement. {metric}: {value} vs best {best}. Reverted."
|
|
- **CRASH**: "Evaluation failed: {reason}. Reverted."
|
|
|
|
### Step 7: Self-improvement check
|
|
|
|
After every 10th experiment (check results.tsv line count), update the Strategy section of program.md with patterns learned.
|
|
|
|
## Rules
|
|
|
|
- ONE change per iteration. Don't change 5 things at once.
|
|
- NEVER modify the evaluator (evaluate.py). It's ground truth.
|
|
- Simplicity wins. Equal performance with simpler code is an improvement.
|
|
- No new dependencies.
|