Files
claude-skills-reference/marketing-skill/ab-test-setup/SKILL.md
Alireza Rezvani 00ee177dd4 Dev (#269)
* docs: restructure README.md — 2,539 → 209 lines (#247)

- Cut from 2,539 lines / 73 sections to 209 lines / 18 sections
- Consolidated 4 install methods into one unified section
- Moved all skill details to domain-level READMEs (linked from table)
- Front-loaded value prop and keywords for SEO
- Added POWERFUL tier highlight section
- Added skill-security-auditor showcase section
- Removed stale Q4 2025 roadmap, outdated ROI claims, duplicate content
- Fixed all internal links
- Clean heading hierarchy (H2 for main sections only)

Closes #233

Co-authored-by: Leo <leo@openclaw.ai>

* fix: enhance 5 skills with scripts, references, and Anthropic best practices (#248)

* fix(skill): enhance git-worktree-manager with scripts, references, and Anthropic best practices

* fix(skill): enhance mcp-server-builder with scripts, references, and Anthropic best practices

* fix(skill): enhance changelog-generator with scripts, references, and Anthropic best practices

* fix(skill): enhance ci-cd-pipeline-builder with scripts, references, and Anthropic best practices

* fix(skill): enhance prompt-engineer-toolkit with scripts, references, and Anthropic best practices

* docs: update README, CHANGELOG, and plugin metadata

* fix: correct marketing plugin count, expand thin references

---------

Co-authored-by: Leo <leo@openclaw.ai>

* ci: Add VirusTotal security scan for skills (#252)

* Dev (#231)

* Improve senior-fullstack skill description and workflow validation

- Expand frontmatter description with concrete actions and trigger clauses
- Add validation steps to scaffolding workflow (verify scaffold succeeded)
- Add re-run verification step to audit workflow (confirm P0 fixes)

* chore: sync codex skills symlinks [automated]

* fix(skill): normalize senior-fullstack frontmatter to inline format

Normalize YAML description from block scalar (>) to inline single-line
format matching all other 50+ skills. Align frontmatter trigger phrases
with the body's Trigger Phrases section to eliminate duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): add GITHUB_TOKEN to checkout + restore corrupted skill descriptions

- Add token: ${{ secrets.GITHUB_TOKEN }} to actions/checkout@v4 in
  sync-codex-skills.yml so git-auto-commit-action can push back to branch
  (fixes: fatal: could not read Username, exit 128)
- Restore correct description for incident-commander (was: 'Skill from engineering-team')
- Restore correct description for senior-fullstack (was: '>')

* fix(ci): pass PROJECTS_TOKEN to fix automated commits + remove duplicate checkout

Fixes PROJECTS_TOKEN passthrough for git-auto-commit-action and removes duplicate checkout step in pr-issue-auto-close workflow.

* fix(ci): remove stray merge conflict marker in sync-codex-skills.yml (#221)

Co-authored-by: Leo <leo@leo-agent-server>

* fix(ci): fix workflow errors + add OpenClaw support (#222)

* feat: add 20 new practical skills for professional Claude Code users

New skills across 5 categories:

Engineering (12):
- git-worktree-manager: Parallel dev with port isolation & env sync
- ci-cd-pipeline-builder: Generate GitHub Actions/GitLab CI from stack analysis
- mcp-server-builder: Build MCP servers from OpenAPI specs
- changelog-generator: Conventional commits to structured changelogs
- pr-review-expert: Blast radius analysis & security scan for PRs
- api-test-suite-builder: Auto-generate test suites from API routes
- env-secrets-manager: .env management, leak detection, rotation workflows
- database-schema-designer: Requirements to migrations & types
- codebase-onboarding: Auto-generate onboarding docs from codebase
- performance-profiler: Node/Python/Go profiling & optimization
- runbook-generator: Operational runbooks from codebase analysis
- monorepo-navigator: Turborepo/Nx/pnpm workspace management

Engineering Team (2):
- stripe-integration-expert: Subscriptions, webhooks, billing patterns
- email-template-builder: React Email/MJML transactional email systems

Product Team (3):
- saas-scaffolder: Full SaaS project generation from product brief
- landing-page-generator: High-converting landing pages with copy frameworks
- competitive-teardown: Structured competitive product analysis

Business Growth (1):
- contract-and-proposal-writer: Contracts, SOWs, NDAs per jurisdiction

Marketing (1):
- prompt-engineer-toolkit: Systematic prompt development & A/B testing

Designed for daily professional use and commercial distribution.

* chore: sync codex skills symlinks [automated]

* docs: update README with 20 new skills, counts 65→86, new skills section

* docs: add commercial distribution plan (Stan Store + Gumroad)

* docs: rewrite CHANGELOG.md with v2.0.0 release (65 skills, 9 domains) (#226)

* docs: rewrite CHANGELOG.md with v2.0.0 release (65 skills, 9 domains)

- Consolidate 191 commits since v1.0.2 into proper v2.0.0 entry
- Document 12 POWERFUL-tier skills, 37 refactored skills
- Add new domains: business-growth, finance
- Document Codex support and marketplace integration
- Update version history summary table
- Clean up [Unreleased] to only planned work

* docs: add 24 POWERFUL-tier skills to plugin, fix counts to 85 across all docs

- Add engineering-advanced-skills plugin (24 POWERFUL-tier skills) to marketplace.json
- Add 13 missing skills to CHANGELOG v2.0.0 (agent-workflow-designer, api-test-suite-builder,
  changelog-generator, ci-cd-pipeline-builder, codebase-onboarding, database-schema-designer,
  env-secrets-manager, git-worktree-manager, mcp-server-builder, monorepo-navigator,
  performance-profiler, pr-review-expert, runbook-generator)
- Fix skill count: 86→85 (excl sample-skill) across README, CHANGELOG, marketplace.json
- Fix stale 53→85 references in README
- Add engineering-advanced-skills install command to README
- Update marketplace.json version to 2.0.0

---------

Co-authored-by: Leo <leo@openclaw.ai>

* feat: add skill-security-auditor POWERFUL-tier skill (#230)

Security audit and vulnerability scanner for AI agent skills before installation.

Scans for:
- Code execution risks (eval, exec, os.system, subprocess shell injection)
- Data exfiltration (outbound HTTP, credential harvesting, env var extraction)
- Prompt injection in SKILL.md (system override, role hijack, safety bypass)
- Dependency supply chain (typosquatting, unpinned versions, runtime installs)
- File system abuse (boundary violations, binaries, symlinks, hidden files)
- Privilege escalation (sudo, SUID, cron manipulation, shell config writes)
- Obfuscation (base64, hex encoding, chr chains, codecs)

Produces clear PASS/WARN/FAIL verdict with per-finding remediation guidance.
Supports local dirs, git repo URLs, JSON output, strict mode, and CI/CD integration.

Includes:
- scripts/skill_security_auditor.py (1049 lines, zero dependencies)
- references/threat-model.md (complete attack vector documentation)
- SKILL.md with usage guide and report format

Tested against: rag-architect (PASS), agent-designer (PASS), senior-secops (FAIL - correctly flagged eval/exec patterns).

Co-authored-by: Leo <leo@openclaw.ai>

* docs: add skill-security-auditor to marketplace, README, and CHANGELOG

- Add standalone plugin entry for skill-security-auditor in marketplace.json
- Update engineering-advanced-skills plugin description to include it
- Update skill counts: 85→86 across README, CHANGELOG, marketplace
- Add install command to README Quick Install section
- Add to CHANGELOG [Unreleased] section

---------

Co-authored-by: Baptiste Fernandez <fernandez.baptiste1@gmail.com>
Co-authored-by: alirezarezvani <5697919+alirezarezvani@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Leo <leo@leo-agent-server>
Co-authored-by: Leo <leo@openclaw.ai>

* Dev (#249)

* docs: restructure README.md — 2,539 → 209 lines (#247)

- Cut from 2,539 lines / 73 sections to 209 lines / 18 sections
- Consolidated 4 install methods into one unified section
- Moved all skill details to domain-level READMEs (linked from table)
- Front-loaded value prop and keywords for SEO
- Added POWERFUL tier highlight section
- Added skill-security-auditor showcase section
- Removed stale Q4 2025 roadmap, outdated ROI claims, duplicate content
- Fixed all internal links
- Clean heading hierarchy (H2 for main sections only)

Closes #233

Co-authored-by: Leo <leo@openclaw.ai>

* fix: enhance 5 skills with scripts, references, and Anthropic best practices (#248)

* fix(skill): enhance git-worktree-manager with scripts, references, and Anthropic best practices

* fix(skill): enhance mcp-server-builder with scripts, references, and Anthropic best practices

* fix(skill): enhance changelog-generator with scripts, references, and Anthropic best practices

* fix(skill): enhance ci-cd-pipeline-builder with scripts, references, and Anthropic best practices

* fix(skill): enhance prompt-engineer-toolkit with scripts, references, and Anthropic best practices

* docs: update README, CHANGELOG, and plugin metadata

* fix: correct marketing plugin count, expand thin references

---------

Co-authored-by: Leo <leo@openclaw.ai>

---------

Co-authored-by: Leo <leo@openclaw.ai>

* Dev (#250)

* docs: restructure README.md — 2,539 → 209 lines (#247)

- Cut from 2,539 lines / 73 sections to 209 lines / 18 sections
- Consolidated 4 install methods into one unified section
- Moved all skill details to domain-level READMEs (linked from table)
- Front-loaded value prop and keywords for SEO
- Added POWERFUL tier highlight section
- Added skill-security-auditor showcase section
- Removed stale Q4 2025 roadmap, outdated ROI claims, duplicate content
- Fixed all internal links
- Clean heading hierarchy (H2 for main sections only)

Closes #233

Co-authored-by: Leo <leo@openclaw.ai>

* fix: enhance 5 skills with scripts, references, and Anthropic best practices (#248)

* fix(skill): enhance git-worktree-manager with scripts, references, and Anthropic best practices

* fix(skill): enhance mcp-server-builder with scripts, references, and Anthropic best practices

* fix(skill): enhance changelog-generator with scripts, references, and Anthropic best practices

* fix(skill): enhance ci-cd-pipeline-builder with scripts, references, and Anthropic best practices

* fix(skill): enhance prompt-engineer-toolkit with scripts, references, and Anthropic best practices

* docs: update README, CHANGELOG, and plugin metadata

* fix: correct marketing plugin count, expand thin references

---------

Co-authored-by: Leo <leo@openclaw.ai>

---------

Co-authored-by: Leo <leo@openclaw.ai>

* ci: add VirusTotal security scan for skills

- Scans changed skill directories on PRs to dev/main
- Scans all skills on release publish
- Posts scan results as PR comment with analysis links
- Rate-limited to 4 req/min (free tier compatible)
- Appends VirusTotal links to release body on publish

* fix: resolve YAML lint errors in virustotal workflow

- Add document start marker (---)
- Quote 'on' key for truthy lint rule
- Remove trailing spaces
- Break long lines under 160 char limit

---------

Co-authored-by: Baptiste Fernandez <fernandez.baptiste1@gmail.com>
Co-authored-by: alirezarezvani <5697919+alirezarezvani@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Leo <leo@leo-agent-server>
Co-authored-by: Leo <leo@openclaw.ai>

* feat: add playwright-pro plugin — production-grade Playwright testing toolkit (#254)

Complete Claude Code plugin with:
- 9 skills (/pw:init, generate, review, fix, migrate, coverage, testrail, browserstack, report)
- 3 specialized agents (test-architect, test-debugger, migration-planner)
- 55 test case templates across 11 categories (auth, CRUD, checkout, search, forms, dashboard, settings, onboarding, notifications, API, accessibility)
- TestRail MCP server (TypeScript) — 8 tools for bidirectional sync
- BrowserStack MCP server (TypeScript) — 7 tools for cross-browser testing
- Smart hooks (auto-validate tests, auto-detect Playwright projects)
- 6 curated reference docs (golden rules, locators, assertions, fixtures, pitfalls, flaky tests)
- Leverages Claude Code built-ins (/batch, /debug, Explore subagent)
- Zero-config for core features; TestRail/BrowserStack via env vars
- Both TypeScript and JavaScript support throughout

Co-authored-by: Leo <leo@openclaw.ai>

* feat: add playwright-pro to marketplace registry (#256)

- New plugin: playwright-pro (9 skills, 3 agents, 55 templates, 2 MCP servers)
- Install: /plugin install playwright-pro@claude-code-skills
- Total marketplace plugins: 17

Co-authored-by: Leo <leo@openclaw.ai>

* fix: integrate playwright-pro across all platforms (#258)

- Add root SKILL.md for OpenClaw and ClawHub compatibility
- Add to README: Skills Overview table, install section, badge count
- Regenerate .codex/skills-index.json with playwright-pro entry
- Add .codex/skills/playwright-pro symlink for Codex CLI
- Fix YAML frontmatter (single-line description for index parsing)

Platforms verified:
- Claude Code: marketplace.json  (merged in PR #256)
- Codex CLI: symlink + skills-index.json 
- OpenClaw: SKILL.md auto-discovered by install script 
- ClawHub: published as playwright-pro@1.1.0 

Co-authored-by: Leo <leo@openclaw.ai>

* docs: update CLAUDE.md — reflect 87 skills across 9 domains

Sync CLAUDE.md with actual repository state: add Engineering POWERFUL tier
(25 skills), update all skill counts, add plugin registry references, and
replace stale sprint section with v2.0.0 version info.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: mention Claude Code in project description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add self-improving-agent plugin — auto-memory curation for Claude Code (#260)

New plugin: engineering-team/self-improving-agent/
- 5 skills: /si:review, /si:promote, /si:extract, /si:status, /si:remember
- 2 agents: memory-analyst, skill-extractor
- 1 hook: PostToolUse error capture (zero overhead on success)
- 3 reference docs: memory architecture, promotion rules, rules directory patterns
- 2 templates: rule template, skill template
- 20 files, 1,829 lines

Integrates natively with Claude Code's auto-memory (v2.1.32+).
Reads from ~/.claude/projects/<path>/memory/ — no duplicate storage.
Promotes proven patterns from MEMORY.md to CLAUDE.md or .claude/rules/.

Also:
- Added to marketplace.json (18 plugins total)
- Added to README (Skills Overview + install section)
- Updated badge count to 88+
- Regenerated .codex/skills-index.json + symlink

Co-authored-by: Leo <leo@openclaw.ai>

* feat: C-Suite expansion — 8 new executive advisory roles (2→10) (#264)

* feat: C-Suite expansion — 8 new executive advisory roles

Add COO, CPO, CMO, CFO, CRO, CISO, CHRO advisors and Executive Mentor.
Expands C-level advisory from 2 to 10 roles with 74 total files.

Each role includes:
- SKILL.md (lean, <5KB, ~1200 tokens for context efficiency)
- Reference docs (loaded on demand, not at startup)
- Python analysis scripts (stdlib only, runnable CLI)

Executive Mentor features /em: slash commands (challenge, board-prep,
hard-call, stress-test, postmortem) with devil's advocate agent.

21 Python tools, 24 reference frameworks, 28,379 total lines.
All SKILL.md files combined: ~17K tokens (8.5% of 200K context window).

Badge: 88 → 116 skills

* feat: C-Suite orchestration layer + 18 complementary skills

ORCHESTRATION (new):
- cs-onboard: Founder interview → company-context.md
- chief-of-staff: Routing, synthesis, inter-agent orchestration
- board-meeting: 6-phase multi-agent deliberation protocol
- decision-logger: Two-layer memory (raw transcripts + approved decisions)
- agent-protocol: Inter-agent invocation with loop prevention
- context-engine: Company context loading + anonymization

CROSS-CUTTING CAPABILITIES (new):
- board-deck-builder: Board/investor update assembly
- scenario-war-room: Cascading multi-variable what-if modeling
- competitive-intel: Systematic competitor tracking + battlecards
- org-health-diagnostic: Cross-functional health scoring (8 dimensions)
- ma-playbook: M&A strategy (acquiring + being acquired)
- intl-expansion: International market entry frameworks

CULTURE & COLLABORATION (new):
- culture-architect: Values → behaviors, culture code, health assessment
- company-os: EOS/Scaling Up operating system selection + implementation
- founder-coach: Founder development, delegation, blind spots
- strategic-alignment: Strategy cascade, silo detection, alignment scoring
- change-management: ADKAR-based change rollout framework
- internal-narrative: One story across employees/investors/customers

UPGRADES TO EXISTING ROLES:
- All 10 roles get reasoning technique directives
- All 10 roles get company-context.md integration
- All 10 roles get board meeting isolation rules
- CEO gets stage-adaptive temporal horizons (seed→C)

Key design decisions:
- Two-layer memory prevents hallucinated consensus from rejected ideas
- Phase 2 isolation: agents think independently before cross-examination
- Executive Mentor (The Critic) sees all perspectives, others don't
- 25 Python tools total (stdlib only, no dependencies)

52 new files, 10 modified, 10,862 new lines.
Total C-suite ecosystem: 134 files, 39,131 lines.

* fix: connect all dots — Chief of Staff routes to all 28 skills

- Added complementary skills registry to routing-matrix.md
- Chief of Staff SKILL.md now lists all 28 skills in ecosystem
- Added integration tables to scenario-war-room and competitive-intel
- Badge: 116 → 134 skills
- README: C-Level Advisory count 10 → 28

Quality audit passed:
 All 10 roles: company-context, reasoning, isolation, invocation
 All 6 phases in board meeting
 Two-layer memory with DO_NOT_RESURFACE
 Loop prevention (no self-invoke, max depth 2, no circular)
 All /em: commands present
 All complementary skills cross-reference roles
 Chief of Staff routes to every skill in ecosystem

* refactor: CEO + CTO advisors upgraded to C-suite parity

Both roles now match the structural standard of all new roles:
- CEO: 11.7KB → 6.8KB SKILL.md (heavy content stays in references)
- CTO: 10KB → 7.2KB SKILL.md (heavy content stays in references)

Added to both:
- Integration table (who they work with and when)
- Key diagnostic questions
- Structured metrics dashboard table
- Consistent section ordering (Keywords → Quick Start → Responsibilities → Questions → Metrics → Red Flags → Integration → Reasoning → Context)

CEO additions:
- Stage-adaptive temporal horizons (seed=3m/6m/12m → B+=1y/3y/5y)
- Cross-references to culture-architect and board-deck-builder

CTO additions:
- Key Questions section (7 diagnostic questions)
- Structured metrics table (DORA + debt + team + architecture + cost)
- Cross-references to all peer roles

All 10 roles now pass structural parity:  Keywords  QuickStart  Questions  Metrics  RedFlags  Integration

* feat: add proactive triggers + output artifacts to all 10 roles

Every C-suite role now specifies:
- Proactive Triggers: 'surface these without being asked' — context-driven
  early warnings that make advisors proactive, not reactive
- Output Artifacts: concrete deliverables per request type (what you ask →
  what you get)

CEO: runway alerts, board prep triggers, strategy review nudges
CTO: deploy frequency monitoring, tech debt thresholds, bus factor flags
COO: blocker detection, scaling threshold warnings, cadence gaps
CPO: retention curve monitoring, portfolio dog detection, research gaps
CMO: CAC trend monitoring, positioning gaps, budget staleness
CFO: runway forecasting, burn multiple alerts, scenario planning gaps
CRO: NRR monitoring, pipeline coverage, pricing review triggers
CISO: audit overdue alerts, compliance gaps, vendor risk
CHRO: retention risk, comp band gaps, org scaling thresholds
Executive Mentor: board prep triggers, groupthink detection, hard call surfacing

This transforms the C-suite from reactive advisors into proactive partners.

* feat: User Communication Standard — structured output for all roles

Defines 3 output formats in agent-protocol/SKILL.md:

1. Standard Output: Bottom Line → What → Why → How to Act → Risks → Your Decision
2. Proactive Alert: What I Noticed → Why It Matters → Action → Urgency (🔴🟡)
3. Board Meeting: Decision Required → Perspectives → Agree/Disagree → Critic → Action Items

10 non-negotiable rules:
- Bottom line first, always
- Results and decisions only (no process narration)
- What + Why + How for every finding
- Actions have owners and deadlines ('we should consider' is banned)
- Decisions framed as options with trade-offs
- Founder is the highest authority — roles recommend, founder decides
- Risks are concrete (if X → Y, costs $Z)
- Max 5 bullets per section
- No jargon without explanation
- Silence over fabricated updates

All 10 roles reference this standard.
Chief of Staff enforces it as a quality gate.
Board meeting Phase 4 uses the Board Meeting Output format.

* feat: Internal Quality Loop — verification before delivery

No role presents to the founder without passing verification:

Step 1: Self-Verification (every role, every time)
  - Source attribution: where did each data point come from?
  - Assumption audit: [VERIFIED] vs [ASSUMED] tags on every finding
  - Confidence scoring: 🟢 high / 🟡 medium / 🔴 low per finding
  - Contradiction check against company-context + decision log
  - 'So what?' test: every finding needs a business consequence

Step 2: Peer Verification (cross-functional)
  - Financial claims → CFO validates math
  - Revenue projections → CRO validates pipeline backing
  - Technical feasibility → CTO validates
  - People/hiring impact → CHRO validates
  - Skip for single-domain, low-stakes questions

Step 3: Critic Pre-Screen (high-stakes only)
  - Irreversible decisions, >20% runway impact, strategy changes
  - Executive Mentor finds weakest point before founder sees it
  - Suspicious consensus triggers mandatory pre-screen

Step 4: Course Correction (after founder feedback)
  - Approve → log + assign actions
  - Modify → re-verify changed parts
  - Reject → DO_NOT_RESURFACE + learn why
  - 30/60/90 day post-decision review

Board meeting contributions now require self-verified format with
confidence tags and source attribution on every finding.

* fix: resolve PR review issues 1, 4, and minor observation

Issue 1: c-level-advisor/CLAUDE.md — completely rewritten
  - Was: 2 skills (CEO, CTO only), dated Nov 2025
  - Now: full 28-skill ecosystem map with architecture diagram,
    all roles/orchestration/cross-cutting/culture skills listed,
    design decisions, integration with other domains

Issue 4: Root CLAUDE.md — updated all stale counts
  - 87 → 134 skills across all 3 references
  - C-Level: 2 → 33 (10 roles + 5 mentor commands + 18 complementary)
  - Tool count: 160+ → 185+
  - Reference count: 200+ → 250+

Minor observation: Documented plugin.json convention
  - Explained in c-level-advisor/CLAUDE.md that only executive-mentor
    has plugin.json because only it has slash commands (/em: namespace)
  - Other skills are invoked by name through Chief of Staff or directly

Also fixed: README.md 88+ → 134 in two places (first line + skills section)

* fix: update all plugin/index registrations for 28-skill C-suite

1. c-level-advisor/.claude-plugin/plugin.json — v2.0.0
   - Was: 2 skills, generic description
   - Now: all 28 skills listed with descriptions, all 25 scripts,
     namespace 'cs', full ecosystem description

2. .codex/skills-index.json — added 18 complementary skills
   - Was: 10 roles only
   - Now: 28 total c-level entries (10 roles + 6 orchestration +
     6 cross-cutting + 6 culture)
   - Each with full description for skill discovery

3. .claude-plugin/marketplace.json — updated c-level-skills entry
   - Was: generic 2-skill description
   - Now: v2.0.0, full 28-skill ecosystem description,
     skills_count: 28, scripts_count: 25

* feat: add root SKILL.md for c-level-advisor ClawHub package

---------

Co-authored-by: Leo <leo@openclaw.ai>

* chore: sync codex skills symlinks [automated]

* feat: Marketing Division expansion — 7 → 42 skills (#266)

* feat: Skill Authoring Standard + Marketing Expansion plans

SKILL-AUTHORING-STANDARD.md — the DNA of every skill in this repo:
10 universal patterns codified from C-Suite innovations + Corey Haines' marketingskills patterns:

1. Context-First: check domain context, ask only for gaps
2. Practitioner Voice: expert persona, goal-oriented, not textbook
3. Multi-Mode Workflows: build from scratch / optimize existing / situation-specific
4. Related Skills Navigation: when to use, when NOT to, bidirectional
5. Reference Separation: SKILL.md lean (≤10KB), refs deep
6. Proactive Triggers: surface issues without being asked
7. Output Artifacts: request → specific deliverable mapping
8. Quality Loop: self-verify, confidence tagging
9. Communication Standard: bottom line first, structured output
10. Python Tools: stdlib-only, CLI-first, JSON output, sample data

Marketing expansion plans for 40-skill marketing division build.

* feat: marketing foundation — context + ops router + authoring standard

marketing-context/: Foundation skill every marketing skill reads first
  - SKILL.md: 3 modes (auto-draft, guided interview, update)
  - templates/marketing-context-template.md: 14 sections covering
    product, audience, personas, pain points, competitive landscape,
    differentiation, objections, switching dynamics, customer language
    (verbatim), brand voice, style guide, proof points, SEO context, goals
  - scripts/context_validator.py: Scores completeness 0-100, section-by-section

marketing-ops/: Central router for 40-skill marketing ecosystem
  - Full routing matrix: 7 pods + cross-domain routing to 6 skills in
    business-growth, product-team, engineering-team, c-level-advisor
  - Campaign orchestration sequences (launch, content, CRO sprint)
  - Quality gate matching C-Suite standard
  - scripts/campaign_tracker.py: Campaign status tracking with progress,
    overdue detection, pod coverage, blocker identification

SKILL-AUTHORING-STANDARD.md: Universal DNA for all skills
  - 10 patterns: context-first, practitioner voice, multi-mode workflows,
    related skills navigation, reference separation, proactive triggers,
    output artifacts, quality loop, communication standard, python tools
  - Quality checklist for skill completion verification
  - Domain context file mapping for all 5 domains

* feat: import 20 workspace marketing skills + standard sections

Imported 20 marketing skills from OpenClaw workspace into repo:

Content Pod (5):
  content-strategy, copywriting, copy-editing, social-content, marketing-ideas

SEO Pod (2):
  seo-audit (+ references enriched by subagent), programmatic-seo (+ refs)

CRO Pod (5):
  page-cro, form-cro, signup-flow-cro, onboarding-cro, popup-cro, paywall-upgrade-cro

Channels Pod (2):
  email-sequence, paid-ads

Growth + Intel + GTM (5):
  ab-test-setup, competitor-alternatives, marketing-psychology, launch-strategy, brand-guidelines

All 29 skills now have standard sections per SKILL-AUTHORING-STANDARD.md:
   Proactive Triggers (4-5 per skill)
   Output Artifacts table
   Communication standard reference
   Related Skills with WHEN/NOT disambiguation

Subagents enriched 8 skills with additional reference docs:
  seo-audit, programmatic-seo, page-cro, form-cro,
  onboarding-cro, popup-cro, paywall-upgrade-cro, email-sequence

43 files, 10,566 lines added.

* feat: build 13 new marketing skills + social-media-manager upgrade

All skills are 100% original work — inspired by industry best practices,
written from scratch in our own voice following SKILL-AUTHORING-STANDARD.md.

NEW Content Pod (2):
  content-production — full research→draft→optimize pipeline, content_scorer.py
  content-humanizer — AI pattern detection + voice injection, humanizer_scorer.py

NEW SEO Pod (3):
  ai-seo — AI search optimization (AEO/GEO/LLMO), entirely new category
  schema-markup — JSON-LD structured data, schema_validator.py
  site-architecture — URL structure + internal linking, sitemap_analyzer.py

NEW Channels Pod (2):
  cold-email — B2B outreach (distinct from email-sequence lifecycle)
  ad-creative — bulk ad generation + platform specs, ad_copy_validator.py

NEW Growth Pod (3):
  churn-prevention — cancel flows + save offers + dunning, churn_impact_calculator.py
  referral-program — referral + affiliate programs
  free-tool-strategy — engineering as marketing

NEW Intelligence Pod (1):
  analytics-tracking — GA4/GTM setup + event taxonomy, tracking_plan_generator.py

NEW Sales Pod (1):
  pricing-strategy — pricing, packaging, monetization

UPGRADED:
  social-media-analyzer → social-media-manager (strategy, calendar, community)

Totals: 42 skills, 27 Python scripts, 60 reference docs, 163 files, 43,265 lines

* feat: update index, marketplace, README for 42 marketing skills

- skills-index.json: 89 → 124 skills (42 marketing entries)
- marketplace.json: marketing-skills v2.0.0 (42 skills, 27 tools)
- README.md: badge 134 → 169, marketing row updated
- prompt-engineer-toolkit: added YAML frontmatter
- Removed build logs from repo
- Parity check: 42/42 passed (YAML + Related + Proactive + Output + Communication)

* fix: merge content-creator into content-production, split marketing-psychology

Quality audit fixes:

1. content-creator → DEPRECATED redirect
   - Scripts (brand_voice_analyzer.py, seo_optimizer.py) moved to content-production
   - SKILL.md replaced with redirect to content-production + content-strategy
   - Eliminates duplicate routing confusion

2. marketing-psychology → 24KB split to 6.8KB + reference
   - 70+ mental models moved to references/mental-models-catalog.md (397 lines)
   - SKILL.md now lean: categories overview, most-used models, quick reference
   - Saves ~4,300 tokens per invocation

* feat: add plugin configs, Codex/OpenClaw compatibility, ClawHub packaging

- marketing-skill/SKILL.md: ClawHub-compatible root with Quick Start for Claude Code, Codex CLI, OpenClaw
- marketing-skill/CLAUDE.md: Agent instructions (routing, context, anti-patterns)
- marketing-skill/.codex/instructions.md: Codex CLI skill routing
- .claude-plugin/marketplace.json: deduplicated, marketing-skills v2.0.0
- .codex/skills-index.json: content-creator marked deprecated, psychology updated
- Total: 42 skills, 27 Python tools, 60 references, 18 plugins

* feat: add 16 Python tools to knowledge-only skills

Enriched 12 previously tool-less skills with practical Python scripts:
- seo-audit/seo_checker.py — HTML on-page SEO analysis (0-100)
- copywriting/headline_scorer.py — headline quality scoring (0-100)
- copy-editing/readability_scorer.py — Flesch + passive + filler detection
- content-strategy/topic_cluster_mapper.py — keyword clustering
- page-cro/conversion_audit.py — HTML CRO signal analysis (0-100)
- paid-ads/roas_calculator.py — ROAS/CPA/CPL calculator
- email-sequence/sequence_analyzer.py — email sequence scoring (0-100)
- form-cro/form_field_analyzer.py — form field CRO audit (0-100)
- onboarding-cro/activation_funnel_analyzer.py — funnel drop-off analysis
- programmatic-seo/url_pattern_generator.py — URL pattern planning
- ab-test-setup/sample_size_calculator.py — statistical sample sizing
- signup-flow-cro/funnel_drop_analyzer.py — signup funnel analysis
- launch-strategy/launch_readiness_scorer.py — launch checklist scoring
- competitor-alternatives/comparison_matrix_builder.py — feature comparison
- social-media-manager/social_calendar_generator.py — content calendar
- readability_scorer.py — fixed demo mode for non-TTY execution

All 43/43 scripts pass execution. All stdlib-only, zero pip installs.
Total: 42 skills, 43 Python tools, 60+ reference docs.

* feat: add 3 more Python tools + improve 6 existing scripts

New tools from build agent:
- email-sequence/scripts/sequence_analyzer.py — email sequence scoring (91/100 demo)
- paid-ads/scripts/roas_calculator.py — ROAS/CPA/CPL/break-even calculator
- competitor-alternatives/scripts/comparison_matrix_builder.py — feature matrix

Improved scripts (better demo modes, fuller analysis):
- seo_checker.py, headline_scorer.py, readability_scorer.py,
  conversion_audit.py, topic_cluster_mapper.py, launch_readiness_scorer.py

Total: 42 skills, 47 Python tools, all passing.

* fix: remove duplicate scripts from deprecated content-creator

Scripts already live in content-production/scripts/. The content-creator
directory is now a pure redirect (SKILL.md only + legacy assets/refs).

* fix: scope VirusTotal scan to executable files only

Skip scanning .md, .py, .json, .yml — they're plain text files
that VirusTotal can't meaningfully analyze. This prevents 429 rate
limit errors on PRs with many text file changes (like 42 marketing skills).

Scan still covers: .js, .ts, .sh, .mjs, .cjs, .exe, .dll, .so, .bin, .wasm

---------

Co-authored-by: Leo <leo@openclaw.ai>

* chore: sync codex skills symlinks [automated]

---------

Co-authored-by: Leo <leo@openclaw.ai>
Co-authored-by: Baptiste Fernandez <fernandez.baptiste1@gmail.com>
Co-authored-by: alirezarezvani <5697919+alirezarezvani@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Leo <leo@leo-agent-server>
2026-03-06 03:58:32 +01:00

9.2 KiB

name, description, license, metadata
name description license metadata
ab-test-setup When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking. MIT
version author category updated
1.0.0 Alireza Rezvani marketing 2026-03-06

A/B Test Setup

You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.

Initial Assessment

Check for product marketing context first: If .claude/product-marketing-context.md exists, read it before asking questions. Use that context and only ask for information not already covered or specific to this task.

Before designing a test, understand:

  1. Test Context - What are you trying to improve? What change are you considering?
  2. Current State - Baseline conversion rate? Current traffic volume?
  3. Constraints - Technical complexity? Timeline? Tools available?

Core Principles

1. Start with a Hypothesis

  • Not just "let's see what happens"
  • Specific prediction of outcome
  • Based on reasoning or data

2. Test One Thing

  • Single variable per test
  • Otherwise you don't know what worked

3. Statistical Rigor

  • Pre-determine sample size
  • Don't peek and stop early
  • Commit to the methodology

4. Measure What Matters

  • Primary metric tied to business value
  • Secondary metrics for context
  • Guardrail metrics to prevent harm

Hypothesis Framework

Structure

Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].

Example

Weak: "Changing the button color might increase clicks."

Strong: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."


Test Types

Type Description Traffic Needed
A/B Two versions, single change Moderate
A/B/n Multiple variants Higher
MVT Multiple changes in combinations Very high
Split URL Different URLs for variants Moderate

Sample Size

Quick Reference

Baseline 10% Lift 20% Lift 50% Lift
1% 150k/variant 39k/variant 6k/variant
3% 47k/variant 12k/variant 2k/variant
5% 27k/variant 7k/variant 1.2k/variant
10% 12k/variant 3k/variant 550/variant

Calculators:

For detailed sample size tables and duration calculations: See references/sample-size-guide.md


Metrics Selection

Primary Metric

  • Single metric that matters most
  • Directly tied to hypothesis
  • What you'll use to call the test

Secondary Metrics

  • Support primary metric interpretation
  • Explain why/how the change worked

Guardrail Metrics

  • Things that shouldn't get worse
  • Stop test if significantly negative

Example: Pricing Page Test

  • Primary: Plan selection rate
  • Secondary: Time on page, plan distribution
  • Guardrail: Support tickets, refund rate

Designing Variants

What to Vary

Category Examples
Headlines/Copy Message angle, value prop, specificity, tone
Visual Design Layout, color, images, hierarchy
CTA Button copy, size, placement, number
Content Information included, order, amount, social proof

Best Practices

  • Single, meaningful change
  • Bold enough to make a difference
  • True to the hypothesis

Traffic Allocation

Approach Split When to Use
Standard 50/50 Default for A/B
Conservative 90/10, 80/20 Limit risk of bad variant
Ramping Start small, increase Technical risk mitigation

Considerations:

  • Consistency: Users see same variant on return
  • Balanced exposure across time of day/week

Implementation

Client-Side

  • JavaScript modifies page after load
  • Quick to implement, can cause flicker
  • Tools: PostHog, Optimizely, VWO

Server-Side

  • Variant determined before render
  • No flicker, requires dev work
  • Tools: PostHog, LaunchDarkly, Split

Running the Test

Pre-Launch Checklist

  • Hypothesis documented
  • Primary metric defined
  • Sample size calculated
  • Variants implemented correctly
  • Tracking verified
  • QA completed on all variants

During the Test

DO:

  • Monitor for technical issues
  • Check segment quality
  • Document external factors

DON'T:

  • Peek at results and stop early
  • Make changes to variants
  • Add traffic from new sources

The Peeking Problem

Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.


Analyzing Results

Statistical Significance

  • 95% confidence = p-value < 0.05
  • Means <5% chance result is random
  • Not a guarantee—just a threshold

Analysis Checklist

  1. Reach sample size? If not, result is preliminary
  2. Statistically significant? Check confidence intervals
  3. Effect size meaningful? Compare to MDE, project impact
  4. Secondary metrics consistent? Support the primary?
  5. Guardrail concerns? Anything get worse?
  6. Segment differences? Mobile vs. desktop? New vs. returning?

Interpreting Results

Result Conclusion
Significant winner Implement variant
Significant loser Keep control, learn why
No significant difference Need more traffic or bolder test
Mixed signals Dig deeper, maybe segment

Documentation

Document every test with:

  • Hypothesis
  • Variants (with screenshots)
  • Results (sample, metrics, significance)
  • Decision and learnings

For templates: See references/test-templates.md


Common Mistakes

Test Design

  • Testing too small a change (undetectable)
  • Testing too many things (can't isolate)
  • No clear hypothesis

Execution

  • Stopping early
  • Changing things mid-test
  • Not checking implementation

Analysis

  • Ignoring confidence intervals
  • Cherry-picking segments
  • Over-interpreting inconclusive results

Task-Specific Questions

  1. What's your current conversion rate?
  2. How much traffic does this page get?
  3. What change are you considering and why?
  4. What's the smallest improvement worth detecting?
  5. What tools do you have for testing?
  6. Have you tested this area before?

Proactive Triggers

Proactively offer A/B test design when:

  1. Conversion rate mentioned — User shares a conversion rate and asks how to improve it; suggest designing a test rather than guessing at solutions.
  2. Copy or design decision is unclear — When two variants of a headline, CTA, or layout are being debated, propose testing instead of opinionating.
  3. Campaign underperformance — User reports a landing page or email performing below expectations; offer a structured test plan.
  4. Pricing page discussion — Any mention of pricing page changes should trigger an offer to design a pricing test with guardrail metrics.
  5. Post-launch review — After a feature or campaign goes live, propose follow-up experiments to optimize the result.

Output Artifacts

Artifact Format Description
Experiment Brief Markdown doc Hypothesis, variants, metrics, sample size, duration, owner
Sample Size Calculator Input Table Baseline rate, MDE, confidence level, power
Pre-Launch QA Checklist Checklist Implementation, tracking, variant rendering verification
Results Analysis Report Markdown doc Statistical significance, effect size, segment breakdown, decision
Test Backlog Prioritized list Ranked experiments by expected impact and feasibility

Communication

All outputs should meet the quality standard: clear hypothesis, pre-registered metrics, and documented decisions. Avoid presenting inconclusive results as wins. Every test should produce a learning, even if the variant loses. Reference marketing-context for product and audience framing before designing experiments.


  • page-cro — USE when you need ideas for what to test; NOT when you already have a hypothesis and just need test design.
  • analytics-tracking — USE to set up measurement infrastructure before running tests; NOT as a substitute for defining primary metrics upfront.
  • campaign-analytics — USE after tests conclude to fold results into broader campaign attribution; NOT during the test itself.
  • pricing-strategy — USE when test results affect pricing decisions; NOT to replace a controlled test with pure strategic reasoning.
  • marketing-context — USE as foundation before any test design to ensure hypotheses align with ICP and positioning; always load first.