firefrost-gaming/claude-skills-reference

Files

Reza Rezvani 7911cf957a feat(autoresearch-agent): fix critical bugs, package as plugin with 5 slash commands

**Bug fixes (run_experiment.py):**
- Fix broken revert logic: was saving HEAD as pre_commit (no-op revert),
  now uses git reset --hard HEAD~1 for correct rollback
- Remove broken --loop mode (agent IS the loop, script handles one iteration)
- Fix shell injection: all git commands use subprocess list form
- Replace shell tail with Python file read

**Bug fixes (other scripts):**
- setup_experiment.py: fix shell injection in git branch creation,
  remove dead --skip-baseline flag, fix evaluator docstring parsing
- log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None),
  add domain_filter to CSV/markdown export, move import time to top
- evaluators: add FileNotFoundError handling, fix output format mismatch
  in llm_judge_copy, add peak_kb on macOS, add ValueError handling

**Plugin packaging (NEW):**
- plugin.json, settings.json, CLAUDE.md for plugin registry
- 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume
- /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly)
- experiment-runner agent for autonomous loop iterations
- Registered in marketplace.json as plugin #20

**SKILL.md rewrite:**
- Replace ambiguous "Loop Protocol" with clear "Agent Protocol"
- Add results.tsv format spec, strategy escalation, self-improvement
- Replace "NEVER STOP" with resumable stopping logic

**Docs & sync:**
- Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill
- 6 new MkDocs pages, mkdocs.yml nav updated
- Counts updated: 17 agents, 22 slash commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 14:38:59 +01:00

4.5 KiB

Raw Blame History

title, description

title	description
/ar:loop — Autonomous Experiment Loop	/ar:loop — Autonomous Experiment Loop - Claude Code skill from the Engineering - POWERFUL domain.

/ar:loop — Autonomous Experiment Loop

:material-rocket-launch: Engineering - POWERFUL :material-identifier: `loop` :material-github: Source

Install: claude /plugin install engineering-advanced-skills

Start a recurring experiment loop that runs at a user-selected interval.

Usage

/ar:loop engineering/api-speed             # Start loop (prompts for interval)
/ar:loop engineering/api-speed 10m         # Every 10 minutes
/ar:loop engineering/api-speed 1h          # Every hour
/ar:loop engineering/api-speed daily       # Daily at ~9am
/ar:loop engineering/api-speed weekly      # Weekly on Monday ~9am
/ar:loop engineering/api-speed monthly     # Monthly on 1st ~9am
/ar:loop stop engineering/api-speed        # Stop an active loop

What It Does

Step 1: Resolve experiment

If no experiment specified, list experiments and let user pick.

Step 2: Select interval

If interval not provided as argument, present options:

Select loop interval:
  1. Every 10 minutes  (rapid — stay and watch)
  2. Every hour         (background — check back later)
  3. Daily at ~9am      (overnight experiments)
  4. Weekly on Monday   (long-running experiments)
  5. Monthly on 1st     (slow experiments)

Map to cron expressions:

Interval	Cron Expression	Shorthand
10 minutes	`/10 * * *`	`10m`
1 hour	`7 * * * *`	`1h`
Daily	`57 8 * * *`	`daily`
Weekly	`57 8 * * 1`	`weekly`
Monthly	`57 8 1 * *`	`monthly`

Step 3: Create the recurring job

Use CronCreate with this prompt (fill in the experiment details):

You are running autoresearch experiment "{domain}/{name}".

1. Read .autoresearch/{domain}/{name}/config.cfg for: target, evaluate_cmd, metric, metric_direction
2. Read .autoresearch/{domain}/{name}/program.md for strategy and constraints
3. Read .autoresearch/{domain}/{name}/results.tsv for experiment history
4. Run: git checkout autoresearch/{domain}/{name}

Then do exactly ONE iteration:
- Review results.tsv: what worked, what failed, what hasn't been tried
- Edit the target file with ONE change (strategy escalation based on run count)
- Commit: git add {target} && git commit -m "experiment: {description}"
- Evaluate: python {skill_path}/scripts/run_experiment.py --experiment {domain}/{name} --single
- Read the output (KEEP/DISCARD/CRASH)

Rules:
- ONE change per experiment
- NEVER modify the evaluator
- If 5 consecutive crashes in results.tsv, delete this cron job (CronDelete) and alert
- After every 10 experiments, update Strategy section of program.md

Current best metric: {read from results.tsv or "no baseline yet"}
Total experiments so far: {count from results.tsv}

Step 4: Store loop metadata

Write to .autoresearch/{domain}/{name}/loop.json:

{
  "cron_id": "{id from CronCreate}",
  "interval": "{user selection}",
  "started": "{ISO timestamp}",
  "experiment": "{domain}/{name}"
}

Step 5: Confirm to user

Loop started for {domain}/{name}
  Interval: {interval description}
  Cron ID: {id}
  Auto-expires: 3 days (CronCreate limit)

  To check progress: /ar:status
  To stop the loop:  /ar:loop stop {domain}/{name}

  Note: Recurring jobs auto-expire after 3 days.
  Run /ar:loop again to restart after expiry.

Stopping a Loop

When user runs /ar:loop stop {experiment}:

Read .autoresearch/{domain}/{name}/loop.json to get the cron ID
Call CronDelete with that ID
Delete loop.json
Confirm: "Loop stopped for {experiment}. {n} experiments completed."

Important Limitations

3-day auto-expiry: CronCreate jobs expire after 3 days. For longer experiments, the user must re-run /ar:loop to restart. Results persist — the new loop picks up where the old one left off.
One loop per experiment: Don't start multiple loops for the same experiment.
Concurrent experiments: Multiple experiments can loop simultaneously ONLY if they're on different git branches (which they are by default — each experiment gets autoresearch/{domain}/{name}).

4.5 KiB Raw Blame History