Files
claude-skills-reference/engineering/autoresearch-agent/evaluators/memory_usage.py
Reza Rezvani 7911cf957a feat(autoresearch-agent): fix critical bugs, package as plugin with 5 slash commands
**Bug fixes (run_experiment.py):**
- Fix broken revert logic: was saving HEAD as pre_commit (no-op revert),
  now uses git reset --hard HEAD~1 for correct rollback
- Remove broken --loop mode (agent IS the loop, script handles one iteration)
- Fix shell injection: all git commands use subprocess list form
- Replace shell tail with Python file read

**Bug fixes (other scripts):**
- setup_experiment.py: fix shell injection in git branch creation,
  remove dead --skip-baseline flag, fix evaluator docstring parsing
- log_results.py: fix 6 falsy-zero bugs (baseline=0 treated as None),
  add domain_filter to CSV/markdown export, move import time to top
- evaluators: add FileNotFoundError handling, fix output format mismatch
  in llm_judge_copy, add peak_kb on macOS, add ValueError handling

**Plugin packaging (NEW):**
- plugin.json, settings.json, CLAUDE.md for plugin registry
- 5 slash commands: /ar:setup, /ar:run, /ar:loop, /ar:status, /ar:resume
- /ar:loop supports user-selected intervals (10m, 1h, daily, weekly, monthly)
- experiment-runner agent for autonomous loop iterations
- Registered in marketplace.json as plugin #20

**SKILL.md rewrite:**
- Replace ambiguous "Loop Protocol" with clear "Agent Protocol"
- Add results.tsv format spec, strategy escalation, self-improvement
- Replace "NEVER STOP" with resumable stopping logic

**Docs & sync:**
- Codex (157 skills), Gemini (229 items), convert.sh all pick up the skill
- 6 new MkDocs pages, mkdocs.yml nav updated
- Counts updated: 17 agents, 22 slash commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 14:38:59 +01:00

54 lines
1.6 KiB
Python

#!/usr/bin/env python3
"""Measure peak memory usage of a command.
DO NOT MODIFY after experiment starts — this is the fixed evaluator."""
import platform
import subprocess
import sys
# --- CONFIGURE THESE ---
COMMAND = "python src/module.py" # Command to measure
# --- END CONFIG ---
system = platform.system()
if system == "Linux":
# Use /usr/bin/time for peak RSS
result = subprocess.run(
f"/usr/bin/time -v {COMMAND}",
shell=True, capture_output=True, text=True, timeout=300
)
output = result.stderr
for line in output.splitlines():
if "Maximum resident set size" in line:
kb = int(line.split(":")[-1].strip())
mb = kb / 1024
print(f"peak_mb: {mb:.1f}")
print(f"peak_kb: {kb}")
sys.exit(0)
print("Could not parse memory from /usr/bin/time output", file=sys.stderr)
sys.exit(1)
elif system == "Darwin":
# macOS: use /usr/bin/time -l
result = subprocess.run(
f"/usr/bin/time -l {COMMAND}",
shell=True, capture_output=True, text=True, timeout=300
)
output = result.stderr
for line in output.splitlines():
if "maximum resident set size" in line.lower():
# macOS reports in bytes
val = int(line.strip().split()[0])
kb = val / 1024
mb = val / (1024 * 1024)
print(f"peak_mb: {mb:.1f}")
print(f"peak_kb: {int(kb)}")
sys.exit(0)
print("Could not parse memory from time output", file=sys.stderr)
sys.exit(1)
else:
print(f"Unsupported platform: {system}. Use Linux or macOS.", file=sys.stderr)
sys.exit(1)