4.3 KiB
4.3 KiB
Loki Mode - Claude Code Skill
Multi-agent autonomous startup system for Claude Code. Takes PRD to fully deployed, revenue-generating product with zero human intervention.
Quick Start
# Launch Claude Code with autonomous permissions
claude --dangerously-skip-permissions
# Then invoke:
# "Loki Mode" or "Loki Mode with PRD at path/to/prd"
Project Structure
SKILL.md # Main skill definition (read this first)
references/ # Detailed documentation (loaded progressively)
openai-patterns.md # OpenAI Agents SDK: guardrails, tripwires, handoffs
lab-research-patterns.md # DeepMind + Anthropic: Constitutional AI, debate
production-patterns.md # HN 2025: What actually works in production
advanced-patterns.md # 2025 research patterns (MAR, Iter-VF, GoalAct)
tool-orchestration.md # ToolOrchestra-inspired efficiency & rewards
memory-system.md # Episodic/semantic memory architecture
quality-control.md # Code review, anti-sycophancy, guardrails
agent-types.md # 37 specialized agent definitions
sdlc-phases.md # Full SDLC workflow
task-queue.md # Queue system, circuit breakers
spec-driven-dev.md # OpenAPI-first development
architecture.md # Directory structure, state schemas
core-workflow.md # RARV cycle, autonomy rules
claude-best-practices.md # Boris Cherny patterns
deployment.md # Cloud deployment instructions
business-ops.md # Business operation workflows
mcp-integration.md # MCP server capabilities
autonomy/ # Runtime state and constitution
benchmarks/ # SWE-bench and HumanEval benchmarks
Key Concepts
RARV Cycle
Every iteration follows: Reason -> Act -> Reflect -> Verify
Model Selection
- Opus: Planning and architecture ONLY (system design, high-level decisions)
- Sonnet: Development and functional testing (implementation, integration tests)
- Haiku: Unit tests, monitoring, and simple tasks - use extensively for parallelization
Quality Gates
- Static analysis (CodeQL, ESLint)
- 3-reviewer parallel system (blind review)
- Anti-sycophancy checks (devil's advocate on unanimous approval)
- Severity-based blocking (Critical/High/Medium = BLOCK)
- Test coverage gates (>80% unit, 100% pass)
Memory System
- Episodic: Specific interaction traces (
.loki/memory/episodic/) - Semantic: Generalized patterns (
.loki/memory/semantic/) - Procedural: Learned skills (
.loki/memory/skills/)
Metrics System (ToolOrchestra-inspired)
- Efficiency: Task cost tracking (
.loki/metrics/efficiency/) - Rewards: Outcome/efficiency/preference signals (
.loki/metrics/rewards/)
Development Guidelines
When Modifying SKILL.md
- Keep under 500 lines (currently ~370)
- Reference detailed docs in
references/instead of inlining - Update version in header AND footer
- Update CHANGELOG.md with new version entry
Version Numbering
Follows semantic versioning: MAJOR.MINOR.PATCH
- Current: v2.35.0
- MINOR bump for new features
- PATCH bump for fixes
Code Style
- No emojis in code or documentation
- Clear, concise comments only when necessary
- Follow existing patterns in codebase
Testing
# Run benchmarks
./benchmarks/run-benchmarks.sh humaneval --execute --loki
./benchmarks/run-benchmarks.sh swebench --execute --loki
Research Foundation
Built on 2025 research from three major AI labs:
OpenAI:
- Agents SDK (guardrails, tripwires, handoffs, tracing)
- AGENTS.md / Agentic AI Foundation (AAIF) standards
Google DeepMind:
- SIMA 2 (self-improvement, hierarchical reasoning)
- Gemini Robotics (VLA models, planning)
- Dreamer 4 (world model training)
- Scalable Oversight via Debate
Anthropic:
- Constitutional AI (principles-based self-critique)
- Alignment Faking Detection (sleeper agent probes)
- Claude Code Best Practices (Explore-Plan-Code)
Academic:
- CONSENSAGENT (anti-sycophancy)
- GoalAct (hierarchical planning)
- A-Mem/MIRIX (memory systems)
- Multi-Agent Reflexion (MAR)
- NVIDIA ToolOrchestra (efficiency metrics)
See references/openai-patterns.md, references/lab-research-patterns.md, and references/advanced-patterns.md.