Files
claude-skills-reference/engineering/autoresearch-agent/evaluators/benchmark_size.py
Leo 12591282da refactor: autoresearch-agent v2.0 — multi-experiment, multi-domain, real-world evaluators
Major rewrite based on deep study of Karpathy's autoresearch repo.

Architecture changes:
- Multi-experiment support: .autoresearch/{domain}/{name}/ structure
- Domain categories: engineering, marketing, content, prompts, custom
- Project-level (git-tracked, shareable) or user-level (~/.autoresearch/) scope
- User chooses scope during setup, not installation

New evaluators (8 ready-to-use):
- Free: benchmark_speed, benchmark_size, test_pass_rate, build_speed, memory_usage
- LLM judge (uses existing subscription): llm_judge_content, llm_judge_prompt, llm_judge_copy
- LLM judges call user's CLI tool (claude/codex/gemini) — no extra API keys needed

Script improvements:
- setup_experiment.py: --domain, --scope, --evaluator, --list, --list-evaluators
- run_experiment.py: --experiment domain/name, --resume, --loop, --single
- log_results.py: --dashboard, --domain, --format csv|markdown|terminal, --output

Results export:
- Terminal (default), CSV, and Markdown formats
- Per-experiment, per-domain, or cross-experiment dashboard view

SKILL.md rewritten:
- Clear activation triggers (when the skill should activate)
- Practical examples for each domain
- Evaluator documentation with cost transparency
- Simplified loop protocol matching Karpathy's original philosophy
2026-03-13 08:22:29 +01:00

57 lines
1.7 KiB
Python

#!/usr/bin/env python3
"""Measure file, bundle, or Docker image size.
DO NOT MODIFY after experiment starts — this is the fixed evaluator."""
import os
import subprocess
import sys
# --- CONFIGURE ONE OF THESE ---
# Option 1: File size
TARGET_FILE = "dist/main.js"
# Option 2: Directory size (uncomment to use)
# TARGET_DIR = "dist/"
# Option 3: Docker image (uncomment to use)
# DOCKER_IMAGE = "myapp:latest"
# DOCKER_BUILD_CMD = "docker build -t myapp:latest ."
# Option 4: Build first, then measure (uncomment to use)
# BUILD_CMD = "npm run build"
# --- END CONFIG ---
# Build if needed
if "BUILD_CMD" in dir() or "BUILD_CMD" in globals():
result = subprocess.run(BUILD_CMD, shell=True, capture_output=True)
if result.returncode != 0:
print(f"Build failed: {result.stderr.decode()[:200]}", file=sys.stderr)
sys.exit(1)
# Measure
if "DOCKER_IMAGE" in dir() or "DOCKER_IMAGE" in globals():
if "DOCKER_BUILD_CMD" in dir():
subprocess.run(DOCKER_BUILD_CMD, shell=True, capture_output=True)
result = subprocess.run(
f"docker image inspect {DOCKER_IMAGE} --format '{{{{.Size}}}}'",
shell=True, capture_output=True, text=True
)
size_bytes = int(result.stdout.strip())
elif "TARGET_DIR" in dir() or "TARGET_DIR" in globals():
size_bytes = sum(
os.path.getsize(os.path.join(dp, f))
for dp, _, fns in os.walk(TARGET_DIR) for f in fns
)
elif os.path.exists(TARGET_FILE):
size_bytes = os.path.getsize(TARGET_FILE)
else:
print(f"Target not found: {TARGET_FILE}", file=sys.stderr)
sys.exit(1)
size_kb = size_bytes / 1024
size_mb = size_bytes / (1024 * 1024)
print(f"size_bytes: {size_bytes}")
print(f"size_kb: {size_kb:.1f}")
print(f"size_mb: {size_mb:.2f}")