Files
claude-skills-reference/marketing-skill/content-strategy/scripts/topic_cluster_mapper.py
Alireza Rezvani 52321c86bc feat: Marketing Division expansion — 7 → 42 skills (#266)
* feat: Skill Authoring Standard + Marketing Expansion plans

SKILL-AUTHORING-STANDARD.md — the DNA of every skill in this repo:
10 universal patterns codified from C-Suite innovations + Corey Haines' marketingskills patterns:

1. Context-First: check domain context, ask only for gaps
2. Practitioner Voice: expert persona, goal-oriented, not textbook
3. Multi-Mode Workflows: build from scratch / optimize existing / situation-specific
4. Related Skills Navigation: when to use, when NOT to, bidirectional
5. Reference Separation: SKILL.md lean (≤10KB), refs deep
6. Proactive Triggers: surface issues without being asked
7. Output Artifacts: request → specific deliverable mapping
8. Quality Loop: self-verify, confidence tagging
9. Communication Standard: bottom line first, structured output
10. Python Tools: stdlib-only, CLI-first, JSON output, sample data

Marketing expansion plans for 40-skill marketing division build.

* feat: marketing foundation — context + ops router + authoring standard

marketing-context/: Foundation skill every marketing skill reads first
  - SKILL.md: 3 modes (auto-draft, guided interview, update)
  - templates/marketing-context-template.md: 14 sections covering
    product, audience, personas, pain points, competitive landscape,
    differentiation, objections, switching dynamics, customer language
    (verbatim), brand voice, style guide, proof points, SEO context, goals
  - scripts/context_validator.py: Scores completeness 0-100, section-by-section

marketing-ops/: Central router for 40-skill marketing ecosystem
  - Full routing matrix: 7 pods + cross-domain routing to 6 skills in
    business-growth, product-team, engineering-team, c-level-advisor
  - Campaign orchestration sequences (launch, content, CRO sprint)
  - Quality gate matching C-Suite standard
  - scripts/campaign_tracker.py: Campaign status tracking with progress,
    overdue detection, pod coverage, blocker identification

SKILL-AUTHORING-STANDARD.md: Universal DNA for all skills
  - 10 patterns: context-first, practitioner voice, multi-mode workflows,
    related skills navigation, reference separation, proactive triggers,
    output artifacts, quality loop, communication standard, python tools
  - Quality checklist for skill completion verification
  - Domain context file mapping for all 5 domains

* feat: import 20 workspace marketing skills + standard sections

Imported 20 marketing skills from OpenClaw workspace into repo:

Content Pod (5):
  content-strategy, copywriting, copy-editing, social-content, marketing-ideas

SEO Pod (2):
  seo-audit (+ references enriched by subagent), programmatic-seo (+ refs)

CRO Pod (5):
  page-cro, form-cro, signup-flow-cro, onboarding-cro, popup-cro, paywall-upgrade-cro

Channels Pod (2):
  email-sequence, paid-ads

Growth + Intel + GTM (5):
  ab-test-setup, competitor-alternatives, marketing-psychology, launch-strategy, brand-guidelines

All 29 skills now have standard sections per SKILL-AUTHORING-STANDARD.md:
   Proactive Triggers (4-5 per skill)
   Output Artifacts table
   Communication standard reference
   Related Skills with WHEN/NOT disambiguation

Subagents enriched 8 skills with additional reference docs:
  seo-audit, programmatic-seo, page-cro, form-cro,
  onboarding-cro, popup-cro, paywall-upgrade-cro, email-sequence

43 files, 10,566 lines added.

* feat: build 13 new marketing skills + social-media-manager upgrade

All skills are 100% original work — inspired by industry best practices,
written from scratch in our own voice following SKILL-AUTHORING-STANDARD.md.

NEW Content Pod (2):
  content-production — full research→draft→optimize pipeline, content_scorer.py
  content-humanizer — AI pattern detection + voice injection, humanizer_scorer.py

NEW SEO Pod (3):
  ai-seo — AI search optimization (AEO/GEO/LLMO), entirely new category
  schema-markup — JSON-LD structured data, schema_validator.py
  site-architecture — URL structure + internal linking, sitemap_analyzer.py

NEW Channels Pod (2):
  cold-email — B2B outreach (distinct from email-sequence lifecycle)
  ad-creative — bulk ad generation + platform specs, ad_copy_validator.py

NEW Growth Pod (3):
  churn-prevention — cancel flows + save offers + dunning, churn_impact_calculator.py
  referral-program — referral + affiliate programs
  free-tool-strategy — engineering as marketing

NEW Intelligence Pod (1):
  analytics-tracking — GA4/GTM setup + event taxonomy, tracking_plan_generator.py

NEW Sales Pod (1):
  pricing-strategy — pricing, packaging, monetization

UPGRADED:
  social-media-analyzer → social-media-manager (strategy, calendar, community)

Totals: 42 skills, 27 Python scripts, 60 reference docs, 163 files, 43,265 lines

* feat: update index, marketplace, README for 42 marketing skills

- skills-index.json: 89 → 124 skills (42 marketing entries)
- marketplace.json: marketing-skills v2.0.0 (42 skills, 27 tools)
- README.md: badge 134 → 169, marketing row updated
- prompt-engineer-toolkit: added YAML frontmatter
- Removed build logs from repo
- Parity check: 42/42 passed (YAML + Related + Proactive + Output + Communication)

* fix: merge content-creator into content-production, split marketing-psychology

Quality audit fixes:

1. content-creator → DEPRECATED redirect
   - Scripts (brand_voice_analyzer.py, seo_optimizer.py) moved to content-production
   - SKILL.md replaced with redirect to content-production + content-strategy
   - Eliminates duplicate routing confusion

2. marketing-psychology → 24KB split to 6.8KB + reference
   - 70+ mental models moved to references/mental-models-catalog.md (397 lines)
   - SKILL.md now lean: categories overview, most-used models, quick reference
   - Saves ~4,300 tokens per invocation

* feat: add plugin configs, Codex/OpenClaw compatibility, ClawHub packaging

- marketing-skill/SKILL.md: ClawHub-compatible root with Quick Start for Claude Code, Codex CLI, OpenClaw
- marketing-skill/CLAUDE.md: Agent instructions (routing, context, anti-patterns)
- marketing-skill/.codex/instructions.md: Codex CLI skill routing
- .claude-plugin/marketplace.json: deduplicated, marketing-skills v2.0.0
- .codex/skills-index.json: content-creator marked deprecated, psychology updated
- Total: 42 skills, 27 Python tools, 60 references, 18 plugins

* feat: add 16 Python tools to knowledge-only skills

Enriched 12 previously tool-less skills with practical Python scripts:
- seo-audit/seo_checker.py — HTML on-page SEO analysis (0-100)
- copywriting/headline_scorer.py — headline quality scoring (0-100)
- copy-editing/readability_scorer.py — Flesch + passive + filler detection
- content-strategy/topic_cluster_mapper.py — keyword clustering
- page-cro/conversion_audit.py — HTML CRO signal analysis (0-100)
- paid-ads/roas_calculator.py — ROAS/CPA/CPL calculator
- email-sequence/sequence_analyzer.py — email sequence scoring (0-100)
- form-cro/form_field_analyzer.py — form field CRO audit (0-100)
- onboarding-cro/activation_funnel_analyzer.py — funnel drop-off analysis
- programmatic-seo/url_pattern_generator.py — URL pattern planning
- ab-test-setup/sample_size_calculator.py — statistical sample sizing
- signup-flow-cro/funnel_drop_analyzer.py — signup funnel analysis
- launch-strategy/launch_readiness_scorer.py — launch checklist scoring
- competitor-alternatives/comparison_matrix_builder.py — feature comparison
- social-media-manager/social_calendar_generator.py — content calendar
- readability_scorer.py — fixed demo mode for non-TTY execution

All 43/43 scripts pass execution. All stdlib-only, zero pip installs.
Total: 42 skills, 43 Python tools, 60+ reference docs.

* feat: add 3 more Python tools + improve 6 existing scripts

New tools from build agent:
- email-sequence/scripts/sequence_analyzer.py — email sequence scoring (91/100 demo)
- paid-ads/scripts/roas_calculator.py — ROAS/CPA/CPL/break-even calculator
- competitor-alternatives/scripts/comparison_matrix_builder.py — feature matrix

Improved scripts (better demo modes, fuller analysis):
- seo_checker.py, headline_scorer.py, readability_scorer.py,
  conversion_audit.py, topic_cluster_mapper.py, launch_readiness_scorer.py

Total: 42 skills, 47 Python tools, all passing.

* fix: remove duplicate scripts from deprecated content-creator

Scripts already live in content-production/scripts/. The content-creator
directory is now a pure redirect (SKILL.md only + legacy assets/refs).

* fix: scope VirusTotal scan to executable files only

Skip scanning .md, .py, .json, .yml — they're plain text files
that VirusTotal can't meaningfully analyze. This prevents 429 rate
limit errors on PRs with many text file changes (like 42 marketing skills).

Scan still covers: .js, .ts, .sh, .mjs, .cjs, .exe, .dll, .so, .bin, .wasm

---------

Co-authored-by: Leo <leo@openclaw.ai>
2026-03-06 03:56:16 +01:00

244 lines
8.0 KiB
Python
Executable File

#!/usr/bin/env python3
"""
topic_cluster_mapper.py — Groups keywords/topics into content clusters
Usage:
python3 topic_cluster_mapper.py --file keywords.txt
python3 topic_cluster_mapper.py --json
python3 topic_cluster_mapper.py # demo mode (20 marketing topics)
"""
import argparse
import json
import re
import sys
from collections import defaultdict
# ---------------------------------------------------------------------------
# Simple stemmer (no nltk)
# ---------------------------------------------------------------------------
STOP_WORDS = {
"a", "an", "the", "and", "or", "but", "in", "on", "at", "to", "for",
"of", "with", "by", "from", "is", "are", "was", "were", "be", "been",
"how", "what", "why", "when", "where", "who", "which", "that", "this",
"it", "its", "do", "does", "your", "our", "my", "their", "we", "you",
"get", "make", "use", "using", "used", "can", "will", "should", "best",
}
def simple_stem(word: str) -> str:
"""Very simple suffix-stripping stemmer."""
w = word.lower()
if len(w) <= 3:
return w
# Order matters — try longer suffixes first
suffixes = [
"ization", "isation", "ational", "fulness", "ousness", "iveness",
"iveness", "ingness", "ations", "nesses", "ators", "ation",
"ating", "alism", "ality", "alize", "alise", "ation", "ator",
"ness", "ment", "less", "tion", "sion", "tion", "ing", "ers",
"ies", "ied", "ily", "ful", "ous", "ive", "ize", "ise", "est",
"ed", "er", "ly", "al", "ic", "s",
]
for sfx in suffixes:
if w.endswith(sfx) and len(w) - len(sfx) >= 3:
return w[: -len(sfx)]
return w
def extract_stems(topic: str) -> set:
words = re.findall(r"\b[a-zA-Z]+\b", topic.lower())
return {simple_stem(w) for w in words if w not in STOP_WORDS and len(w) > 2}
# ---------------------------------------------------------------------------
# Clustering
# ---------------------------------------------------------------------------
def compute_similarity(stems_a: set, stems_b: set) -> float:
"""Jaccard similarity between two stem sets."""
if not stems_a or not stems_b:
return 0.0
intersection = stems_a & stems_b
union = stems_a | stems_b
return len(intersection) / len(union)
def build_clusters(topics: list, threshold: float = 0.15) -> list:
"""
Greedy clustering: assign each topic to the first cluster it's
similar-enough to; else start a new cluster.
"""
# Pre-compute stems
topic_stems = {t: extract_stems(t) for t in topics}
clusters = [] # list of {"pillar": str, "topics": [str], "stems": set}
for topic in topics:
t_stems = topic_stems[topic]
best_cluster = None
best_score = 0.0
for cluster in clusters:
sim = compute_similarity(t_stems, cluster["stems"])
if sim > best_score:
best_score = sim
best_cluster = cluster
if best_cluster and best_score >= threshold:
best_cluster["topics"].append(topic)
best_cluster["stems"] |= t_stems # grow cluster centroid
else:
clusters.append({
"pillar": topic,
"topics": [topic],
"stems": set(t_stems),
})
# Identify best pillar: topic with most shared stems to others in cluster
for cluster in clusters:
if len(cluster["topics"]) == 1:
continue
all_stems = [topic_stems[t] for t in cluster["topics"]]
best_topic = cluster["topics"][0]
best_conn = 0
for i, topic in enumerate(cluster["topics"]):
conn = sum(
len(topic_stems[topic] & topic_stems[other])
for j, other in enumerate(cluster["topics"]) if i != j
)
if conn > best_conn:
best_conn = conn
best_topic = topic
cluster["pillar"] = best_topic
return clusters
def build_output(topics: list, clusters: list) -> dict:
cluster_output = []
for i, c in enumerate(clusters, 1):
supporting = [t for t in c["topics"] if t != c["pillar"]]
cluster_output.append({
"cluster_id": i,
"pillar_topic": c["pillar"],
"size": len(c["topics"]),
"supporting_topics": supporting,
"suggested_url_slug": re.sub(r"[^a-z0-9]+", "-", c["pillar"].lower()).strip("-"),
})
# Sort by cluster size desc
cluster_output.sort(key=lambda x: -x["size"])
return {
"total_topics": len(topics),
"total_clusters": len(clusters),
"clusters": cluster_output,
"recommendations": _make_recommendations(cluster_output),
}
def _make_recommendations(clusters: list) -> list:
recs = []
large = [c for c in clusters if c["size"] >= 3]
singletons = [c for c in clusters if c["size"] == 1]
if large:
recs.append(f"Create {len(large)} pillar page(s) for clusters with 3+ topics")
if singletons:
recs.append(
f"{len(singletons)} singleton topic(s) — consider merging or expanding to form mini-clusters"
)
if clusters:
biggest = clusters[0]
recs.append(
f"Highest-priority cluster: '{biggest['pillar_topic']}' "
f"({biggest['size']} related topics) — start content here"
)
return recs
# ---------------------------------------------------------------------------
# Demo topics
# ---------------------------------------------------------------------------
DEMO_TOPICS = [
"email marketing strategy",
"email subject line tips",
"email open rate optimization",
"email automation workflows",
"SEO keyword research",
"on-page SEO optimization",
"SEO content strategy",
"technical SEO audit",
"social media marketing",
"social media content calendar",
"Instagram marketing tips",
"LinkedIn marketing for B2B",
"content marketing ROI",
"content strategy planning",
"blog content ideas",
"landing page conversion rate",
"conversion rate optimization",
"A/B testing landing pages",
"paid ads budget allocation",
"Google Ads campaign setup",
]
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Topic cluster mapper — groups keywords into content clusters."
)
parser.add_argument("--file", help="Text file with one topic/keyword per line")
parser.add_argument("--threshold", type=float, default=0.15,
help="Similarity threshold for clustering (default: 0.15)")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
if args.file:
with open(args.file, "r", encoding="utf-8") as f:
topics = [line.strip() for line in f if line.strip()]
else:
topics = DEMO_TOPICS
if not args.json:
print("No input provided — running in demo mode with 20 marketing topics.\n")
if not topics:
print("No topics found.", file=sys.stderr)
sys.exit(1)
clusters = build_clusters(topics, threshold=args.threshold)
output = build_output(topics, clusters)
if args.json:
print(json.dumps(output, indent=2))
return
print("=" * 62)
print(f" TOPIC CLUSTER MAP {output['total_topics']} topics → {output['total_clusters']} clusters")
print("=" * 62)
for cluster in output["clusters"]:
print(f"\n Cluster {cluster['cluster_id']} ({cluster['size']} topics)")
print(f" ┌─ PILLAR: {cluster['pillar_topic']}")
print(f" │ Slug: /{cluster['suggested_url_slug']}")
for st in cluster["supporting_topics"]:
print(f" └─ Supporting: {st}")
print("\n" + "=" * 62)
print(" RECOMMENDATIONS")
print("=" * 62)
for rec in output["recommendations"]:
print(f"{rec}")
print()
if __name__ == "__main__":
main()