**Bug fixes (run_experiment.py):** - Fix broken revert logic: was saving HEAD as pre_commit (no-op revert), now uses git reset --hard HEAD~1 for correct rollback - Remove broken --loop mode (agent IS the loop, script handles one iteration) - Fix shell injection: all git commands use subprocess list form - Replace shell tail with Python file read **Bug fixes (other scripts):** - setup_experiment.py: fix shell injection in git branch creation, remove dead --skip-baseline flag, fix evaluator docstring parsing - log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None), add domain_filter to CSV/markdown export, move import time to top - evaluators: add FileNotFoundError handling, fix output format mismatch in llm_judge_copy, add peak_kb on macOS, add ValueError handling **Plugin packaging (NEW):** - plugin.json, settings.json, CLAUDE.md for plugin registry - 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume - /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly) - experiment-runner agent for autonomous loop iterations - Registered in marketplace.json as plugin #20 **SKILL.md rewrite:** - Replace ambiguous "Loop Protocol" with clear "Agent Protocol" - Add results.tsv format spec, strategy escalation, self-improvement - Replace "NEVER STOP" with resumable stopping logic **Docs & sync:** - Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill - 6 new MkDocs pages, mkdocs.yml nav updated - Counts updated: 17 agents, 22 slash commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1.8 KiB
Autoresearch Agent — Claude Code Instructions
This plugin runs autonomous experiment loops that optimize any file by a measurable metric.
Commands
Use the /ar: namespace for all commands:
/ar:setup— Set up a new experiment interactively/ar:run— Run a single experiment iteration/ar:loop— Start an autonomous loop with user-selected interval/ar:status— Show dashboard and results/ar:resume— Resume a paused experiment
How it works
You (the AI agent) are the experiment loop. The scripts handle evaluation and git rollback.
- You edit the target file with ONE change
- You commit it
- You call
run_experiment.py --single— it evaluates and prints KEEP/DISCARD/CRASH - You repeat
Results persist in results.tsv and git log. Sessions can be resumed.
When to use each command
Starting fresh
/ar:setup
Creates the experiment directory, config, program.md, results.tsv, and git branch.
Running one iteration at a time
/ar:run engineering/api-speed
Read history, make one change, evaluate, report result.
Autonomous background loop
/ar:loop engineering/api-speed
Prompts for interval (10min, 1h, daily, weekly, monthly), then creates a recurring job.
Checking progress
/ar:status
Shows the dashboard across all experiments with metrics and trends.
Resuming after context limit or break
/ar:resume engineering/api-speed
Reads results history, checks out the branch, and continues where you left off.
Agents
- experiment-runner: Spawned for each loop iteration. Reads config, results history, decides what to try, edits target, commits, evaluates.
Key principle
One change per experiment. Measure everything. Compound improvements.
The agent never modifies the evaluator. The evaluator is ground truth.