firefrost-gaming/claude-skills-reference

Files

Reza Rezvani 7911cf957a feat(autoresearch-agent): fix critical bugs, package as plugin with 5 slash commands

**Bug fixes (run_experiment.py):**
- Fix broken revert logic: was saving HEAD as pre_commit (no-op revert),
  now uses git reset --hard HEAD~1 for correct rollback
- Remove broken --loop mode (agent IS the loop, script handles one iteration)
- Fix shell injection: all git commands use subprocess list form
- Replace shell tail with Python file read

**Bug fixes (other scripts):**
- setup_experiment.py: fix shell injection in git branch creation,
  remove dead --skip-baseline flag, fix evaluator docstring parsing
- log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None),
  add domain_filter to CSV/markdown export, move import time to top
- evaluators: add FileNotFoundError handling, fix output format mismatch
  in llm_judge_copy, add peak_kb on macOS, add ValueError handling

**Plugin packaging (NEW):**
- plugin.json, settings.json, CLAUDE.md for plugin registry
- 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume
- /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly)
- experiment-runner agent for autonomous loop iterations
- Registered in marketplace.json as plugin #20

**SKILL.md rewrite:**
- Replace ambiguous "Loop Protocol" with clear "Agent Protocol"
- Add results.tsv format spec, strategy escalation, self-improvement
- Replace "NEVER STOP" with resumable stopping logic

**Docs & sync:**
- Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill
- 6 new MkDocs pages, mkdocs.yml nav updated
- Counts updated: 17 agents, 22 slash commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 14:38:59 +01:00

1.8 KiB

Raw Permalink Blame History

Autoresearch Agent — Claude Code Instructions

This plugin runs autonomous experiment loops that optimize any file by a measurable metric.

Commands

Use the /ar: namespace for all commands:

/ar:setup — Set up a new experiment interactively
/ar:run — Run a single experiment iteration
/ar:loop — Start an autonomous loop with user-selected interval
/ar:status — Show dashboard and results
/ar:resume — Resume a paused experiment

How it works

You (the AI agent) are the experiment loop. The scripts handle evaluation and git rollback.

You edit the target file with ONE change
You commit it
You call run_experiment.py --single — it evaluates and prints KEEP/DISCARD/CRASH
You repeat

Results persist in results.tsv and git log. Sessions can be resumed.

When to use each command

Starting fresh

/ar:setup

Creates the experiment directory, config, program.md, results.tsv, and git branch.

Running one iteration at a time

/ar:run engineering/api-speed

Read history, make one change, evaluate, report result.

Autonomous background loop

/ar:loop engineering/api-speed

Prompts for interval (10min, 1h, daily, weekly, monthly), then creates a recurring job.

Checking progress

/ar:status

Shows the dashboard across all experiments with metrics and trends.

Resuming after context limit or break

/ar:resume engineering/api-speed

Reads results history, checks out the branch, and continues where you left off.

Agents

experiment-runner: Spawned for each loop iteration. Reads config, results history, decides what to try, edits target, commits, evaluates.

Key principle

One change per experiment. Measure everything. Compound improvements.

The agent never modifies the evaluator. The evaluator is ground truth.

1.8 KiB Raw Permalink Blame History