**Bug fixes (run_experiment.py):** - Fix broken revert logic: was saving HEAD as pre_commit (no-op revert), now uses git reset --hard HEAD~1 for correct rollback - Remove broken --loop mode (agent IS the loop, script handles one iteration) - Fix shell injection: all git commands use subprocess list form - Replace shell tail with Python file read **Bug fixes (other scripts):** - setup_experiment.py: fix shell injection in git branch creation, remove dead --skip-baseline flag, fix evaluator docstring parsing - log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None), add domain_filter to CSV/markdown export, move import time to top - evaluators: add FileNotFoundError handling, fix output format mismatch in llm_judge_copy, add peak_kb on macOS, add ValueError handling **Plugin packaging (NEW):** - plugin.json, settings.json, CLAUDE.md for plugin registry - 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume - /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly) - experiment-runner agent for autonomous loop iterations - Registered in marketplace.json as plugin #20 **SKILL.md rewrite:** - Replace ambiguous "Loop Protocol" with clear "Agent Protocol" - Add results.tsv format spec, strategy escalation, self-improvement - Replace "NEVER STOP" with resumable stopping logic **Docs & sync:** - Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill - 6 new MkDocs pages, mkdocs.yml nav updated - Counts updated: 17 agents, 22 slash commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
133 lines
4.5 KiB
Markdown
133 lines
4.5 KiB
Markdown
---
|
|
title: "/ar:loop — Autonomous Experiment Loop"
|
|
description: "/ar:loop — Autonomous Experiment Loop - Claude Code skill from the Engineering - POWERFUL domain."
|
|
---
|
|
|
|
# /ar:loop — Autonomous Experiment Loop
|
|
|
|
<div class="page-meta" markdown>
|
|
<span class="meta-badge">:material-rocket-launch: Engineering - POWERFUL</span>
|
|
<span class="meta-badge">:material-identifier: `loop`</span>
|
|
<span class="meta-badge">:material-github: <a href="https://github.com/alirezarezvani/claude-skills/tree/main/engineering/autoresearch-agent/skills/loop/SKILL.md">Source</a></span>
|
|
</div>
|
|
|
|
<div class="install-banner" markdown>
|
|
<span class="install-label">Install:</span> <code>claude /plugin install engineering-advanced-skills</code>
|
|
</div>
|
|
|
|
|
|
Start a recurring experiment loop that runs at a user-selected interval.
|
|
|
|
## Usage
|
|
|
|
```
|
|
/ar:loop engineering/api-speed # Start loop (prompts for interval)
|
|
/ar:loop engineering/api-speed 10m # Every 10 minutes
|
|
/ar:loop engineering/api-speed 1h # Every hour
|
|
/ar:loop engineering/api-speed daily # Daily at ~9am
|
|
/ar:loop engineering/api-speed weekly # Weekly on Monday ~9am
|
|
/ar:loop engineering/api-speed monthly # Monthly on 1st ~9am
|
|
/ar:loop stop engineering/api-speed # Stop an active loop
|
|
```
|
|
|
|
## What It Does
|
|
|
|
### Step 1: Resolve experiment
|
|
|
|
If no experiment specified, list experiments and let user pick.
|
|
|
|
### Step 2: Select interval
|
|
|
|
If interval not provided as argument, present options:
|
|
|
|
```
|
|
Select loop interval:
|
|
1. Every 10 minutes (rapid — stay and watch)
|
|
2. Every hour (background — check back later)
|
|
3. Daily at ~9am (overnight experiments)
|
|
4. Weekly on Monday (long-running experiments)
|
|
5. Monthly on 1st (slow experiments)
|
|
```
|
|
|
|
Map to cron expressions:
|
|
|
|
| Interval | Cron Expression | Shorthand |
|
|
|----------|----------------|-----------|
|
|
| 10 minutes | `*/10 * * * *` | `10m` |
|
|
| 1 hour | `7 * * * *` | `1h` |
|
|
| Daily | `57 8 * * *` | `daily` |
|
|
| Weekly | `57 8 * * 1` | `weekly` |
|
|
| Monthly | `57 8 1 * *` | `monthly` |
|
|
|
|
### Step 3: Create the recurring job
|
|
|
|
Use `CronCreate` with this prompt (fill in the experiment details):
|
|
|
|
```
|
|
You are running autoresearch experiment "{domain}/{name}".
|
|
|
|
1. Read .autoresearch/{domain}/{name}/config.cfg for: target, evaluate_cmd, metric, metric_direction
|
|
2. Read .autoresearch/{domain}/{name}/program.md for strategy and constraints
|
|
3. Read .autoresearch/{domain}/{name}/results.tsv for experiment history
|
|
4. Run: git checkout autoresearch/{domain}/{name}
|
|
|
|
Then do exactly ONE iteration:
|
|
- Review results.tsv: what worked, what failed, what hasn't been tried
|
|
- Edit the target file with ONE change (strategy escalation based on run count)
|
|
- Commit: git add {target} && git commit -m "experiment: {description}"
|
|
- Evaluate: python {skill_path}/scripts/run_experiment.py --experiment {domain}/{name} --single
|
|
- Read the output (KEEP/DISCARD/CRASH)
|
|
|
|
Rules:
|
|
- ONE change per experiment
|
|
- NEVER modify the evaluator
|
|
- If 5 consecutive crashes in results.tsv, delete this cron job (CronDelete) and alert
|
|
- After every 10 experiments, update Strategy section of program.md
|
|
|
|
Current best metric: {read from results.tsv or "no baseline yet"}
|
|
Total experiments so far: {count from results.tsv}
|
|
```
|
|
|
|
### Step 4: Store loop metadata
|
|
|
|
Write to `.autoresearch/{domain}/{name}/loop.json`:
|
|
|
|
```json
|
|
{
|
|
"cron_id": "{id from CronCreate}",
|
|
"interval": "{user selection}",
|
|
"started": "{ISO timestamp}",
|
|
"experiment": "{domain}/{name}"
|
|
}
|
|
```
|
|
|
|
### Step 5: Confirm to user
|
|
|
|
```
|
|
Loop started for {domain}/{name}
|
|
Interval: {interval description}
|
|
Cron ID: {id}
|
|
Auto-expires: 3 days (CronCreate limit)
|
|
|
|
To check progress: /ar:status
|
|
To stop the loop: /ar:loop stop {domain}/{name}
|
|
|
|
Note: Recurring jobs auto-expire after 3 days.
|
|
Run /ar:loop again to restart after expiry.
|
|
```
|
|
|
|
## Stopping a Loop
|
|
|
|
When user runs `/ar:loop stop {experiment}`:
|
|
|
|
1. Read `.autoresearch/{domain}/{name}/loop.json` to get the cron ID
|
|
2. Call `CronDelete` with that ID
|
|
3. Delete `loop.json`
|
|
4. Confirm: "Loop stopped for {experiment}. {n} experiments completed."
|
|
|
|
## Important Limitations
|
|
|
|
- **3-day auto-expiry**: CronCreate jobs expire after 3 days. For longer experiments, the user must re-run `/ar:loop` to restart. Results persist — the new loop picks up where the old one left off.
|
|
- **One loop per experiment**: Don't start multiple loops for the same experiment.
|
|
- **Concurrent experiments**: Multiple experiments can loop simultaneously ONLY if they're on different git branches (which they are by default — each experiment gets `autoresearch/{domain}/{name}`).
|