claude-skills-reference/engineering/autoresearch-agent/scripts/setup_experiment.py at c834d71a4434e5d0488b250e8ffeb76dec774702

firefrost-gaming/claude-skills-reference

Files

Leo a799d8bdb8 feat: add autoresearch-agent — autonomous experiment loop for ML, prompt, code & skill optimization

Inspired by Karpathy's autoresearch. The agent modifies a target file, runs a
fixed evaluation, keeps improvements (git commit), discards failures (git reset),
and loops indefinitely — no human in the loop.

Includes:
- SKILL.md with setup wizard, 4 domain configs, experiment loop protocol
- 3 stdlib-only Python scripts (setup, run, log — 687 lines)
- Reference docs: experiment domains guide, program.md templates

Domains: ML training (val_bpb), prompt engineering (eval_score),
code performance (p50_ms), agent skill optimization (pass_rate).

Cherry-picked from feat/autoresearch-agent and rebased onto dev.
Fixes: timeout inconsistency (2x→2.5x), results.tsv tracking clarity,
zero-metric edge case, installation section aligned with multi-tool support.

2026-03-13 07:21:44 +01:00

8.0 KiB

Raw Blame History

View Raw

8.0 KiB Raw Blame History

8.0 KiB

Raw Blame History