chore: post-merge sync — plugins, audits, docs, cross-platform indexes

New skills integrated: - engineering/behuman, code-tour, demo-video, data-quality-auditor Plugins & marketplace: - Add plugin.json for code-tour, demo-video, data-quality-auditor - Add all 3 to marketplace.json (31 total plugins) - Update marketplace counts to 248 skills, 332 tools, 460 refs Skill fixes: - Move data-quality-auditor from data-analysis/ to engineering/ - Fix cross-refs: code-tour, demo-video, data-quality-auditor - Add evals.json for code-tour (5 scenarios) and demo-video (4 scenarios) - demo-video: add output artifacts, prereqs check, references extraction - code-tour: add default persona, parallel discovery, trivial repo guidance - Fix Python 3.9 compat (from __future__ import annotations) product-analytics audit fixes: - Expand SKILL.md from 82 to 147 lines (anti-patterns, cross-refs, examples) - Add --format json to all metrics_calculator.py subcommands - Add error handling (FileNotFoundError, KeyError) Docs & indexes: - Update CLAUDE.md, README.md, docs/index.md, docs/getting-started.md counts - Sync Codex (192 skills) and Gemini (280 items) indexes - Regenerate MkDocs pages (279 pages, 311 HTML) - Add 3 new nav entries to mkdocs.yml - Update mkdocs.yml site_description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 02:05:19 +02:00
parent a6f75266d0
commit 5710a7b763
34 changed files with 976 additions and 145 deletions
--- a/engineering/code-tour/.claude-plugin/plugin.json
+++ b/engineering/code-tour/.claude-plugin/plugin.json
@@ -0,0 +1,13 @@
+{
+  "name": "code-tour",
+  "description": "Create CodeTour .tour files — persona-targeted, step-by-step walkthroughs that link to real files and line numbers. Supports 10 developer personas (vibecoder, new joiner, architect, security reviewer, etc.), all CodeTour step types, and SMIG description formula.",
+  "version": "2.2.0",
+  "author": {
+    "name": "Alireza Rezvani",
+    "url": "https://alirezarezvani.com"
+  },
+  "homepage": "https://github.com/alirezarezvani/claude-skills/tree/main/engineering/code-tour",
+  "repository": "https://github.com/alirezarezvani/claude-skills",
+  "license": "MIT",
+  "skills": "./"
+}
--- a/engineering/code-tour/SKILL.md
+++ b/engineering/code-tour/SKILL.md
@@ -23,10 +23,11 @@ A great tour is a **narrative** — a story told to a specific person about what
 ### 1. Discover the repo

 Before asking anything, explore the codebase:
- List root directory, read README, check config files
- Identify language(s), framework(s), project purpose
- Map folder structure 1-2 levels deep
- Find entry points — every path in the tour must be real
+
+In parallel: list root directory, read README, check config files.
+Then: identify language(s), framework(s), project purpose. Map folder structure 1-2 levels deep. Find entry points — every path in the tour must be real.
+
+If the repo has fewer than 5 source files, create a quick-depth tour regardless of persona — there's not enough to warrant a deep one.

 ### 2. Infer the intent

@@ -40,6 +41,9 @@ One message should be enough. Infer persona, depth, and focus silently.
 | "quick tour" / "vibe check" | vibecoder | quick |
 | "architecture" | architect | deep |
 | "security" / "auth review" | security-reviewer | standard |
+| (no qualifier) | new-joiner | standard |
+
+When intent is ambiguous, default to **new-joiner** persona at **standard** depth — it's the most generally useful.

 ### 3. Read actual files

@@ -54,7 +58,7 @@ Save to `.tours/<persona>-<focus>.tour`.
  "$schema": "https://aka.ms/codetour-schema",
  "title": "Descriptive Title — Persona / Goal",
  "description": "Who this is for and what they'll understand after.",
-  "ref": "main",
+  "ref": "<current-branch-or-commit>",
  "steps": []
 }
 ```
@@ -94,7 +98,7 @@ Save to `.tours/<persona>-<focus>.tour`.
 - [ ] At most 2 content-only steps
 - [ ] `nextTour` matches another tour's `title` exactly if set

-## The 20 Personas
+## Personas

 | Persona | Goal | Must cover |
 |---------|------|------------|
@@ -131,6 +135,6 @@ Save to `.tours/<persona>-<focus>.tour`.
 ## Cross-References

 - Related: `engineering/codebase-onboarding` — for broader onboarding beyond tours
- Related: `engineering/code-review-automation` — for automated PR review workflows
- Full skill with validation scripts and schema: [code-tour repo](https://github.com/vaddisrinivas/code-tour)
+- Related: `engineering/pr-review-expert` — for automated PR review workflows
+- CodeTour extension: [microsoft/codetour](https://github.com/microsoft/codetour)
 - Real-world tours: [coder/code-server](https://github.com/coder/code-server/blob/main/.tours/contributing.tour)
--- a/engineering/code-tour/evals.json
+++ b/engineering/code-tour/evals.json
@@ -0,0 +1,32 @@
+[
+  {
+    "id": 1,
+    "prompt": "I just hired a junior dev who starts Monday. Can you create an onboarding tour for this repo so they can get oriented on their own?",
+    "expected_output": "Agent infers new-joiner persona, standard depth (9-13 steps). Produces .tours/new-joiner-onboarding.tour with verified paths/lines, SMIG descriptions, narrative arc starting with orientation directory step.",
+    "scenario_type": "happy_path"
+  },
+  {
+    "id": 2,
+    "prompt": "Give me a quick vibe check tour of this codebase — I just cloned it and want to understand the shape before diving in.",
+    "expected_output": "Agent infers vibecoder persona, quick depth (5-8 steps). Tour hits entry point and main modules only. File saved to .tours/vibecoder-overview.tour.",
+    "scenario_type": "happy_path"
+  },
+  {
+    "id": 3,
+    "prompt": "We had an outage last night because the payment webhook handler silently swallowed errors. Can you build an RCA tour tracing how webhooks flow through the system?",
+    "expected_output": "Agent infers rca-investigator persona, standard depth. Tour follows causality chain from webhook entry point through handler to error handling. Steps anchored to specific lines showing the fault path.",
+    "scenario_type": "happy_path"
+  },
+  {
+    "id": 4,
+    "prompt": "Create a tour for this repo.",
+    "expected_output": "Agent defaults to new-joiner persona at standard depth without asking clarifying questions. Produces a general-purpose onboarding tour.",
+    "scenario_type": "edge_case"
+  },
+  {
+    "id": 5,
+    "prompt": "Make an onboarding tour for this repo, but I want it to also cover the deployment pipeline and our monitoring setup in Grafana.",
+    "expected_output": "Agent includes deployment pipeline files as normal file+line steps. Uses URI step type for Grafana link if user provides URL, or skips with explanation. Does not hallucinate files.",
+    "scenario_type": "edge_case"
+  }
+]
--- a/engineering/data-quality-auditor/.claude-plugin/plugin.json
+++ b/engineering/data-quality-auditor/.claude-plugin/plugin.json
@@ -0,0 +1,13 @@
+{
+  "name": "data-quality-auditor",
+  "description": "Audit datasets for completeness, consistency, accuracy, and validity. 3 stdlib-only Python tools: data profiler with DQS scoring, missing value analyzer with MCAR/MAR/MNAR classification, and multi-method outlier detector.",
+  "version": "2.2.0",
+  "author": {
+    "name": "Alireza Rezvani",
+    "url": "https://alirezarezvani.com"
+  },
+  "homepage": "https://github.com/alirezarezvani/claude-skills/tree/main/engineering/data-quality-auditor",
+  "repository": "https://github.com/alirezarezvani/claude-skills",
+  "license": "MIT",
+  "skills": "./"
+}
--- a/engineering/data-quality-auditor/SKILL.md
+++ b/engineering/data-quality-auditor/SKILL.md
@@ -0,0 +1,219 @@
+---
+name: data-quality-auditor
+description: Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan.
+---
+
+You are an expert data quality engineer. Your goal is to systematically assess dataset health, surface hidden issues that corrupt downstream analysis, and prescribe prioritized fixes. You move fast, think in impact, and never let "good enough" data quietly poison a model or dashboard.
+
+---
+
+## Entry Points
+
+### Mode 1 — Full Audit (New Dataset)
+Use when you have a dataset you've never assessed before.
+
+1. **Profile** — Run `data_profiler.py` to get shape, types, completeness, and distributions
+2. **Missing Values** — Run `missing_value_analyzer.py` to classify missingness patterns (MCAR/MAR/MNAR)
+3. **Outliers** — Run `outlier_detector.py` to flag anomalies using IQR and Z-score methods
+4. **Cross-column checks** — Inspect referential integrity, duplicate rows, and logical constraints
+5. **Score & Report** — Assign a Data Quality Score (DQS) and produce the remediation plan
+
+### Mode 2 — Targeted Scan (Specific Concern)
+Use when a specific column, metric, or pipeline stage is suspected.
+
+1. Ask: *What broke, when did it start, and what changed upstream?*
+2. Run the relevant script against the suspect columns only
+3. Compare distributions against a known-good baseline if available
+4. Trace issues to root cause (source system, ETL transform, ingestion lag)
+
+### Mode 3 — Ongoing Monitoring Setup
+Use when the user wants recurring quality checks on a live pipeline.
+
+1. Identify the 5–8 critical columns driving key metrics
+2. Define thresholds: acceptable null %, outlier rate, value domain
+3. Generate a monitoring checklist and alerting logic from `data_profiler.py --monitor`
+4. Schedule checks at ingestion cadence
+
+---
+
+## Tools
+
+### `scripts/data_profiler.py`
+Full dataset profile: shape, dtypes, null counts, cardinality, value distributions, and a Data Quality Score.
+
+**Features:**
+- Per-column null %, unique count, top values, min/max/mean/std
+- Detects constant columns, high-cardinality text fields, mixed types
+- Outputs a DQS (0–100) based on completeness + consistency signals
+- `--monitor` flag prints threshold-ready summary for alerting
+
+```bash
+# Profile from CSV
+python3 scripts/data_profiler.py --file data.csv
+
+# Profile specific columns
+python3 scripts/data_profiler.py --file data.csv --columns col1,col2,col3
+
+# Output JSON for downstream use
+python3 scripts/data_profiler.py --file data.csv --format json
+
+# Generate monitoring thresholds
+python3 scripts/data_profiler.py --file data.csv --monitor
+```
+
+### `scripts/missing_value_analyzer.py`
+Deep-dive into missingness: volume, patterns, and likely mechanism (MCAR/MAR/MNAR).
+
+**Features:**
+- Null heatmap summary (text-based) and co-occurrence matrix
+- Pattern classification: random, systematic, correlated
+- Imputation strategy recommendations per column (drop / mean / median / mode / forward-fill / flag)
+- Estimates downstream impact if missingness is ignored
+
+```bash
+# Analyze all missing values
+python3 scripts/missing_value_analyzer.py --file data.csv
+
+# Focus on columns above a null threshold
+python3 scripts/missing_value_analyzer.py --file data.csv --threshold 0.05
+
+# Output JSON
+python3 scripts/missing_value_analyzer.py --file data.csv --format json
+```
+
+### `scripts/outlier_detector.py`
+Multi-method outlier detection with business-impact context.
+
+**Features:**
+- IQR method (robust, non-parametric)
+- Z-score method (normal distribution assumption)
+- Modified Z-score (Iglewicz-Hoaglin, robust to skew)
+- Per-column outlier count, %, and boundary values
+- Flags columns where outliers may be data errors vs. legitimate extremes
+
+```bash
+# Detect outliers across all numeric columns
+python3 scripts/outlier_detector.py --file data.csv
+
+# Use specific method
+python3 scripts/outlier_detector.py --file data.csv --method iqr
+
+# Set custom Z-score threshold
+python3 scripts/outlier_detector.py --file data.csv --method zscore --threshold 2.5
+
+# Output JSON
+python3 scripts/outlier_detector.py --file data.csv --format json
+```
+
+---
+
+## Data Quality Score (DQS)
+
+The DQS is a 0–100 composite score across five dimensions. Report it at the top of every audit.
+
+| Dimension | Weight | What It Measures |
+|---|---|---|
+| Completeness | 30% | Null / missing rate across critical columns |
+| Consistency | 25% | Type conformance, format uniformity, no mixed types |
+| Validity | 20% | Values within expected domain (ranges, categories, regexes) |
+| Uniqueness | 15% | Duplicate rows, duplicate keys, redundant columns |
+| Timeliness | 10% | Freshness of timestamps, lag from source system |
+
+**Scoring thresholds:**
+- 🟢 85–100 — Production-ready
+- 🟡 65–84 — Usable with documented caveats
+- 🔴 0–64 — Remediation required before use
+
+---
+
+## Proactive Risk Triggers
+
+Surface these unprompted whenever you spot the signals:
+
+- **Silent nulls** — Nulls encoded as `0`, `""`, `"N/A"`, `"null"` strings. Completeness metrics lie until these are caught.
+- **Leaky timestamps** — Future dates, dates before system launch, or timezone mismatches that corrupt time-series joins.
+- **Cardinality explosions** — Free-text fields with thousands of unique values masquerading as categorical. Will break one-hot encoding silently.
+- **Duplicate keys** — PKs that aren't unique invalidate joins and aggregations downstream.
+- **Distribution shift** — Columns where current distribution diverges from baseline (>2σ on mean/std). Signals upstream pipeline changes.
+- **Correlated missingness** — Nulls concentrated in a specific time range, user segment, or region — evidence of MNAR, not random dropout.
+
+---
+
+## Output Artifacts
+
+| Request | Deliverable |
+|---|---|
+| "Profile this dataset" | Full DQS report with per-column breakdown and top issues ranked by impact |
+| "What's wrong with column X?" | Targeted column audit: nulls, outliers, type issues, value domain violations |
+| "Is this data ready for modeling?" | Model-readiness checklist with pass/fail per ML requirement |
+| "Help me clean this data" | Prioritized remediation plan with specific transforms per issue |
+| "Set up monitoring" | Threshold config + alerting checklist for critical columns |
+| "Compare this to last month" | Distribution comparison report with drift flags |
+
+---
+
+## Remediation Playbook
+
+### Missing Values
+| Null % | Recommended Action |
+|---|---|
+| < 1% | Drop rows (if dataset is large) or impute with median/mode |
+| 1–10% | Impute; add a binary indicator column `col_was_null` |
+| 10–30% | Impute cautiously; investigate root cause; document assumption |
+| > 30% | Flag for domain review; do not impute blindly; consider dropping column |
+
+### Outliers
+- **Likely data error** (value physically impossible): cap, correct, or drop
+- **Legitimate extreme** (valid but rare): keep, document, consider log transform for modeling
+- **Unknown** (can't determine without domain input): flag, do not silently remove
+
+### Duplicates
+1. Confirm uniqueness key with data owner before deduplication
+2. Prefer `keep='last'` for event data (most recent state wins)
+3. Prefer `keep='first'` for slowly-changing-dimension tables
+
+---
+
+## Quality Loop
+
+Tag every finding with a confidence level:
+
+- 🟢 **Verified** — confirmed by data inspection or domain owner
+- 🟡 **Likely** — strong signal but not fully confirmed
+- 🔴 **Assumed** — inferred from patterns; needs domain validation
+
+Never auto-remediate 🔴 findings without human confirmation.
+
+---
+
+## Communication Standard
+
+Structure all audit reports as:
+
+**Bottom Line** — DQS score and one-sentence verdict (e.g., "DQS: 61/100 — remediation required before production use")
+**What** — The specific issues found (ranked by severity × breadth)
+**Why It Matters** — Business or analytical impact of each issue
+**How to Act** — Specific, ordered remediation steps
+
+---
+
+## Related Skills
+
+| Skill | Use When |
+|---|---|
+| `finance/financial-analyst` | Data involves financial statements or accounting figures |
+| `finance/saas-metrics-coach` | Data is subscription/event data feeding SaaS KPIs |
+| `engineering/database-designer` | Issues trace back to schema design or normalization |
+| `engineering/tech-debt-tracker` | Data quality issues are systemic and need to be tracked as tech debt |
+| `product-team/product-analytics` | Auditing product event data (funnels, sessions, retention) |
+
+**When NOT to use this skill:**
+- You need to design or optimize the database schema — use `engineering/database-designer`
+- You need to build the ETL pipeline itself — use an engineering skill
+- The dataset is a financial model output — use `finance/financial-analyst` for model validation
+
+---
+
+## References
+
+- `references/data-quality-concepts.md` — MCAR/MAR/MNAR theory, DQS methodology, outlier detection methods
--- a/engineering/data-quality-auditor/references/data-quality-concepts.md
+++ b/engineering/data-quality-auditor/references/data-quality-concepts.md
@@ -0,0 +1,106 @@
+# Data Quality Concepts Reference
+
+Deep-dive reference for the Data Quality Auditor skill. Keep SKILL.md lean — this is where the theory lives.
+
+---
+
+## Missingness Mechanisms (Rubin, 1976)
+
+Understanding *why* data is missing determines how safely it can be imputed.
+
+### MCAR — Missing Completely At Random
+- The probability of missingness is independent of both observed and unobserved data.
+- **Example:** A sensor drops a reading due to random hardware noise.
+- **Safe to impute?** Yes. Imputing with mean/median introduces no systematic bias.
+- **Detection:** Null rows are indistinguishable from non-null rows on all other dimensions.
+
+### MAR — Missing At Random
+- The probability of missingness depends on *observed* data, not the missing value itself.
+- **Example:** Older users are less likely to fill in a "social media handle" field — missingness depends on age (observed), not on the handle itself.
+- **Safe to impute?** Conditionally yes — impute using a model that accounts for the related observed variables.
+- **Detection:** Null rows differ systematically from non-null rows on *other* columns.
+
+### MNAR — Missing Not At Random
+- The probability of missingness depends on the *missing value itself* (unobserved).
+- **Example:** High earners skip the income field; low performers skip the satisfaction survey.
+- **Safe to impute?** No — imputation will introduce systematic bias. Escalate to domain owner.
+- **Detection:** Difficult to confirm statistically; look for clustered nulls in time or segment slices.
+
+---
+
+## Data Quality Score (DQS) Methodology
+
+The DQS is a weighted composite of five ISO 8000 / DAMA-aligned dimensions:
+
+| Dimension | Weight | Rationale |
+|---|---|---|
+| Completeness | 30% | Nulls are the most common and impactful quality failure |
+| Consistency | 25% | Type/format violations corrupt joins and aggregations silently |
+| Validity | 20% | Out-of-domain values (negative ages, future birth dates) create invisible errors |
+| Uniqueness | 15% | Duplicate rows inflate metrics and invalidate joins |
+| Timeliness | 10% | Stale data causes decisions based on outdated state |
+
+**Scoring thresholds** align to production-readiness standards:
+- 85–100: Ready for production use in models and dashboards
+- 65–84: Usable for exploratory analysis with documented caveats
+- 0–64: Unreliable; remediation required before use in any decision-making context
+
+---
+
+## Outlier Detection Methods
+
+### IQR (Interquartile Range)
+- **Formula:** Outlier if `x < Q1 − 1.5×IQR` or `x > Q3 + 1.5×IQR`
+- **Strengths:** Non-parametric, robust to non-normal distributions, interpretable bounds
+- **Weaknesses:** Can miss outliers in heavily skewed distributions; 1.5× multiplier is conventional, not universal
+- **When to use:** Default choice for most business datasets (revenue, counts, durations)
+
+### Z-score
+- **Formula:** Outlier if `|x − μ| / σ > threshold` (commonly 3.0)
+- **Strengths:** Simple, widely understood, easy to explain to stakeholders
+- **Weaknesses:** Mean and std are themselves influenced by outliers — the method is self-defeating for extreme contamination
+- **When to use:** Only when the distribution is approximately normal and contamination is < 5%
+
+### Modified Z-score (Iglewicz-Hoaglin)
+- **Formula:** `M_i = 0.6745 × |x_i − median| / MAD`; outlier if `M_i > 3.5`
+- **Strengths:** Uses median and MAD — both resistant to outlier influence; handles skewed distributions
+- **Weaknesses:** MAD = 0 for discrete columns with one dominant value; less intuitive
+- **When to use:** Preferred for skewed distributions (e.g. revenue, latency, page views)
+
+---
+
+## Imputation Strategies
+
+| Method | When | Risk |
+|---|---|---|
+| Mean | MCAR, continuous, symmetric distribution | Distorts variance; don't use with skewed data |
+| Median | MCAR/MAR, continuous, skewed distribution | Safe for skewed; loses variance |
+| Mode | MCAR/MAR, categorical | Can over-represent one category |
+| Forward-fill | Time series with MCAR/MAR gaps | Assumes value persists — valid for slowly-changing fields |
+| Binary indicator | Null % 1–30% | Preserves information about missingness without imputing |
+| Model-based | MAR, high-value columns | Most accurate but computationally expensive |
+| Drop column | > 50% missing with no business justification | Safest option if column has no predictive value |
+
+**Golden rule:** Always add a `col_was_null` indicator column when imputing with null% > 1%. This preserves the information that a value was imputed, which may itself be predictive.
+
+---
+
+## Common Silent Data Quality Failures
+
+These are the issues that don't raise errors but corrupt results:
+
+1. **Sentinel values** — `0`, `-1`, `9999`, `""` used to mean "unknown" in legacy systems
+2. **Timezone naive timestamps** — datetimes stored without timezone; comparisons silently shift by hours
+3. **Trailing whitespace** — `"active "` ≠ `"active"` causes silent join mismatches
+4. **Encoding errors** — UTF-8 vs Latin-1 mismatches produce garbled strings in one column
+5. **Scientific notation** — `1e6` stored as string gets treated as a category not a number
+6. **Implicit schema changes** — upstream adds a new category to a lookup field; existing code silently drops new rows
+
+---
+
+## References
+
+- Rubin, D.B. (1976). "Inference and Missing Data." *Biometrika* 63(3): 581–592.
+- Iglewicz, B. & Hoaglin, D. (1993). *How to Detect and Handle Outliers*. ASQC Quality Press.
+- DAMA International (2017). *DAMA-DMBOK: Data Management Body of Knowledge*. 2nd ed.
+- ISO 8000-8: Data quality — Concepts and measuring.
--- a/engineering/data-quality-auditor/scripts/data_profiler.py
+++ b/engineering/data-quality-auditor/scripts/data_profiler.py
@@ -0,0 +1,258 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+"""
+data_profiler.py — Full dataset profile with Data Quality Score (DQS).
+
+Usage:
+    python3 data_profiler.py --file data.csv
+    python3 data_profiler.py --file data.csv --columns col1,col2
+    python3 data_profiler.py --file data.csv --format json
+    python3 data_profiler.py --file data.csv --monitor
+"""
+
+import argparse
+import csv
+import json
+import math
+import sys
+from collections import Counter, defaultdict
+
+
+def load_csv(filepath: str) -> tuple[list[str], list[dict]]:
+    with open(filepath, newline="", encoding="utf-8") as f:
+        reader = csv.DictReader(f)
+        rows = list(reader)
+        headers = reader.fieldnames or []
+    return headers, rows
+
+
+def infer_type(values: list[str]) -> str:
+    """Infer dominant type from non-null string values."""
+    counts = {"int": 0, "float": 0, "bool": 0, "string": 0}
+    for v in values:
+        v = v.strip()
+        if v.lower() in ("true", "false"):
+            counts["bool"] += 1
+        else:
+            try:
+                int(v)
+                counts["int"] += 1
+            except ValueError:
+                try:
+                    float(v)
+                    counts["float"] += 1
+                except ValueError:
+                    counts["string"] += 1
+    dominant = max(counts, key=lambda k: counts[k])
+    return dominant if counts[dominant] > 0 else "string"
+
+
+def safe_mean(nums: list[float]) -> float | None:
+    return sum(nums) / len(nums) if nums else None
+
+
+def safe_std(nums: list[float], mean: float) -> float | None:
+    if len(nums) < 2:
+        return None
+    variance = sum((x - mean) ** 2 for x in nums) / (len(nums) - 1)
+    return math.sqrt(variance)
+
+
+def profile_column(name: str, raw_values: list[str]) -> dict:
+    total = len(raw_values)
+    null_strings = {"", "null", "none", "n/a", "na", "nan", "nil"}
+    null_count = sum(1 for v in raw_values if v.strip().lower() in null_strings)
+    non_null = [v for v in raw_values if v.strip().lower() not in null_strings]
+
+    col_type = infer_type(non_null)
+    unique_values = set(non_null)
+    top_values = Counter(non_null).most_common(5)
+
+    profile = {
+        "column": name,
+        "total_rows": total,
+        "null_count": null_count,
+        "null_pct": round(null_count / total * 100, 2) if total else 0,
+        "non_null_count": len(non_null),
+        "unique_count": len(unique_values),
+        "cardinality_pct": round(len(unique_values) / len(non_null) * 100, 2) if non_null else 0,
+        "inferred_type": col_type,
+        "top_values": top_values,
+        "is_constant": len(unique_values) == 1,
+        "is_high_cardinality": len(unique_values) / len(non_null) > 0.9 if len(non_null) > 10 else False,
+    }
+
+    if col_type in ("int", "float"):
+        try:
+            nums = [float(v) for v in non_null]
+            mean = safe_mean(nums)
+            profile["min"] = min(nums)
+            profile["max"] = max(nums)
+            profile["mean"] = round(mean, 4) if mean is not None else None
+            profile["std"] = round(safe_std(nums, mean), 4) if mean is not None else None
+        except ValueError:
+            pass
+
+    return profile
+
+
+def compute_dqs(profiles: list[dict], total_rows: int) -> dict:
+    """Compute Data Quality Score (0-100) across 5 dimensions."""
+    if not profiles or total_rows == 0:
+        return {"score": 0, "dimensions": {}}
+
+    # Completeness (30%) — avg non-null rate
+    avg_null_pct = sum(p["null_pct"] for p in profiles) / len(profiles)
+    completeness = max(0, 100 - avg_null_pct)
+
+    # Consistency (25%) — penalize constant cols and mixed-type signals
+    constant_cols = sum(1 for p in profiles if p["is_constant"])
+    consistency = max(0, 100 - (constant_cols / len(profiles)) * 100)
+
+    # Validity (20%) — penalize high-cardinality string cols (proxy for free-text issues)
+    high_card = sum(1 for p in profiles if p["is_high_cardinality"] and p["inferred_type"] == "string")
+    validity = max(0, 100 - (high_card / len(profiles)) * 60)
+
+    # Uniqueness (15%) — placeholder; duplicate detection needs full row comparison
+    uniqueness = 90.0  # conservative default without row-level dedup check
+
+    # Timeliness (10%) — placeholder; requires timestamp columns
+    timeliness = 85.0  # conservative default
+
+    score = (
+        completeness * 0.30
+        + consistency * 0.25
+        + validity * 0.20
+        + uniqueness * 0.15
+        + timeliness * 0.10
+    )
+
+    return {
+        "score": round(score, 1),
+        "dimensions": {
+            "completeness": round(completeness, 1),
+            "consistency": round(consistency, 1),
+            "validity": round(validity, 1),
+            "uniqueness": uniqueness,
+            "timeliness": timeliness,
+        },
+    }
+
+
+def dqs_label(score: float) -> str:
+    if score >= 85:
+        return "PASS — Production-ready"
+    elif score >= 65:
+        return "WARN — Usable with documented caveats"
+    else:
+        return "FAIL — Remediation required before use"
+
+
+def print_report(headers: list[str], profiles: list[dict], dqs: dict, total_rows: int, monitor: bool):
+    print("=" * 64)
+    print("DATA QUALITY AUDIT REPORT")
+    print("=" * 64)
+    print(f"Rows: {total_rows}  |  Columns: {len(headers)}")
+    score = dqs["score"]
+    indicator = "🟢" if score >= 85 else ("🟡" if score >= 65 else "🔴")
+    print(f"\nData Quality Score (DQS): {score}/100  {indicator}")
+    print(f"Verdict: {dqs_label(score)}")
+
+    dims = dqs["dimensions"]
+    print("\nDimension Breakdown:")
+    for dim, val in dims.items():
+        bar = int(val / 5)
+        print(f"  {dim.capitalize():<14} {val:>5.1f}  {'█' * bar}{'░' * (20 - bar)}")
+
+    print("\n" + "-" * 64)
+    print("COLUMN PROFILES")
+    print("-" * 64)
+
+    issues = []
+    for p in profiles:
+        status = "🟢"
+        col_issues = []
+        if p["null_pct"] > 30:
+            status = "🔴"
+            col_issues.append(f"{p['null_pct']}% nulls — investigate root cause")
+        elif p["null_pct"] > 10:
+            status = "🟡"
+            col_issues.append(f"{p['null_pct']}% nulls — impute cautiously")
+        elif p["null_pct"] > 1:
+            col_issues.append(f"{p['null_pct']}% nulls — impute with indicator")
+        if p["is_constant"]:
+            status = "🟡"
+            col_issues.append("Constant column — zero variance, likely useless")
+        if p["is_high_cardinality"] and p["inferred_type"] == "string":
+            col_issues.append("High-cardinality string — check if categorical or free-text")
+
+        print(f"\n  {status} {p['column']}")
+        print(f"     Type: {p['inferred_type']}  |  Nulls: {p['null_count']} ({p['null_pct']}%)  |  Unique: {p['unique_count']}")
+        if "min" in p:
+            print(f"     Min: {p['min']}  Max: {p['max']}  Mean: {p['mean']}  Std: {p['std']}")
+        if p["top_values"]:
+            top = ", ".join(f"{v}({c})" for v, c in p["top_values"][:3])
+            print(f"     Top values: {top}")
+        for issue in col_issues:
+            issues.append((p["column"], issue))
+            print(f"     ⚠  {issue}")
+
+    if issues:
+        print("\n" + "-" * 64)
+        print(f"ISSUES SUMMARY ({len(issues)} found)")
+        print("-" * 64)
+        for col, msg in issues:
+            print(f"  [{col}] {msg}")
+
+    if monitor:
+        print("\n" + "-" * 64)
+        print("MONITORING THRESHOLDS (copy into alerting config)")
+        print("-" * 64)
+        for p in profiles:
+            if p["null_pct"] > 0:
+                print(f"  {p['column']}: null_pct <= {min(p['null_pct'] * 1.5, 100):.1f}%")
+            if "mean" in p and p["mean"] is not None:
+                drift = abs(p.get("std", 0) or 0) * 2
+                print(f"  {p['column']}: mean within [{p['mean'] - drift:.2f}, {p['mean'] + drift:.2f}]")
+
+    print("\n" + "=" * 64)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Profile a CSV dataset and compute a Data Quality Score.")
+    parser.add_argument("--file", required=True, help="Path to CSV file")
+    parser.add_argument("--columns", help="Comma-separated list of columns to profile (default: all)")
+    parser.add_argument("--format", choices=["text", "json"], default="text")
+    parser.add_argument("--monitor", action="store_true", help="Print monitoring thresholds")
+    args = parser.parse_args()
+
+    try:
+        headers, rows = load_csv(args.file)
+    except FileNotFoundError:
+        print(f"Error: file not found: {args.file}", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error reading file: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    if not rows:
+        print("Error: CSV file is empty or has no data rows.", file=sys.stderr)
+        sys.exit(1)
+
+    selected = args.columns.split(",") if args.columns else headers
+    missing_cols = [c for c in selected if c not in headers]
+    if missing_cols:
+        print(f"Error: columns not found: {', '.join(missing_cols)}", file=sys.stderr)
+        sys.exit(1)
+
+    profiles = [profile_column(col, [row.get(col, "") for row in rows]) for col in selected]
+    dqs = compute_dqs(profiles, len(rows))
+
+    if args.format == "json":
+        print(json.dumps({"total_rows": len(rows), "dqs": dqs, "columns": profiles}, indent=2))
+    else:
+        print_report(selected, profiles, dqs, len(rows), args.monitor)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/data-quality-auditor/scripts/missing_value_analyzer.py
+++ b/engineering/data-quality-auditor/scripts/missing_value_analyzer.py
@@ -0,0 +1,242 @@
+#!/usr/bin/env python3
+"""
+missing_value_analyzer.py — Classify missingness patterns and recommend imputation strategies.
+
+Usage:
+    python3 missing_value_analyzer.py --file data.csv
+    python3 missing_value_analyzer.py --file data.csv --threshold 0.05
+    python3 missing_value_analyzer.py --file data.csv --format json
+"""
+
+import argparse
+import csv
+import json
+import sys
+from collections import defaultdict
+
+
+NULL_STRINGS = {"", "null", "none", "n/a", "na", "nan", "nil", "undefined", "missing"}
+
+
+def load_csv(filepath: str) -> tuple[list[str], list[dict]]:
+    with open(filepath, newline="", encoding="utf-8") as f:
+        reader = csv.DictReader(f)
+        rows = list(reader)
+        headers = reader.fieldnames or []
+    return headers, rows
+
+
+def is_null(val: str) -> bool:
+    return val.strip().lower() in NULL_STRINGS
+
+
+def compute_null_mask(headers: list[str], rows: list[dict]) -> dict[str, list[bool]]:
+    return {col: [is_null(row.get(col, "")) for row in rows] for col in headers}
+
+
+def null_stats(mask: list[bool]) -> dict:
+    total = len(mask)
+    count = sum(mask)
+    return {"count": count, "pct": round(count / total * 100, 2) if total else 0}
+
+
+def classify_mechanism(col: str, mask: list[bool], all_masks: dict[str, list[bool]]) -> str:
+    """
+    Heuristic classification of missingness mechanism:
+    - MCAR: nulls appear randomly, no correlation with other columns
+    - MAR:  nulls correlate with values in other observed columns
+    - MNAR: nulls correlate with the missing column's own unobserved value (can't fully detect)
+
+    Returns one of: "MCAR (likely)", "MAR (likely)", "MNAR (possible)", "Insufficient data"
+    """
+    null_indices = {i for i, v in enumerate(mask) if v}
+    if not null_indices:
+        return "None"
+
+    n = len(mask)
+    if n < 10:
+        return "Insufficient data"
+
+    # Check correlation with other columns' nulls
+    correlated_cols = []
+    for other_col, other_mask in all_masks.items():
+        if other_col == col:
+            continue
+        other_null_indices = {i for i, v in enumerate(other_mask) if v}
+        if not other_null_indices:
+            continue
+        overlap = len(null_indices & other_null_indices)
+        union = len(null_indices | other_null_indices)
+        jaccard = overlap / union if union else 0
+        if jaccard > 0.5:
+            correlated_cols.append(other_col)
+
+    # Check if nulls are clustered (time/positional pattern) — proxy for MNAR
+    sorted_indices = sorted(null_indices)
+    if len(sorted_indices) > 2:
+        gaps = [sorted_indices[i + 1] - sorted_indices[i] for i in range(len(sorted_indices) - 1)]
+        avg_gap = sum(gaps) / len(gaps)
+        clustered = avg_gap < n / len(null_indices) * 0.5  # nulls appear closer together than random
+    else:
+        clustered = False
+
+    if correlated_cols:
+        return f"MAR (likely) — co-occurs with nulls in: {', '.join(correlated_cols[:3])}"
+    elif clustered:
+        return "MNAR (possible) — nulls are spatially clustered, may reflect a systematic gap"
+    else:
+        return "MCAR (likely) — nulls appear random, no strong correlation detected"
+
+
+def recommend_strategy(pct: float, col_type: str) -> str:
+    if pct == 0:
+        return "No action needed"
+    if pct < 1:
+        return "Drop rows — impact is negligible"
+    if pct < 10:
+        strategies = {
+            "int": "Impute with median + add binary indicator column",
+            "float": "Impute with median + add binary indicator column",
+            "string": "Impute with mode or 'Unknown' category + add indicator",
+            "bool": "Impute with mode",
+        }
+        return strategies.get(col_type, "Impute with median/mode + add indicator")
+    if pct < 30:
+        return "Impute cautiously; investigate root cause; document assumption; add indicator"
+    return "Do NOT impute blindly — > 30% missing. Escalate to domain owner or consider dropping column"
+
+
+def infer_type(values: list[str]) -> str:
+    non_null = [v for v in values if not is_null(v)]
+    counts = {"int": 0, "float": 0, "bool": 0, "string": 0}
+    for v in non_null[:200]:  # sample for speed
+        v = v.strip()
+        if v.lower() in ("true", "false"):
+            counts["bool"] += 1
+        else:
+            try:
+                int(v)
+                counts["int"] += 1
+            except ValueError:
+                try:
+                    float(v)
+                    counts["float"] += 1
+                except ValueError:
+                    counts["string"] += 1
+    return max(counts, key=lambda k: counts[k]) if any(counts.values()) else "string"
+
+
+def compute_cooccurrence(headers: list[str], masks: dict[str, list[bool]], top_n: int = 5) -> list[dict]:
+    """Find column pairs where nulls most frequently co-occur."""
+    pairs = []
+    cols = list(headers)
+    for i in range(len(cols)):
+        for j in range(i + 1, len(cols)):
+            a, b = cols[i], cols[j]
+            mask_a, mask_b = masks[a], masks[b]
+            overlap = sum(1 for x, y in zip(mask_a, mask_b) if x and y)
+            if overlap > 0:
+                pairs.append({"col_a": a, "col_b": b, "co_null_rows": overlap})
+    pairs.sort(key=lambda x: -x["co_null_rows"])
+    return pairs[:top_n]
+
+
+def print_report(headers: list[str], rows: list[dict], masks: dict, threshold: float):
+    total = len(rows)
+    print("=" * 64)
+    print("MISSING VALUE ANALYSIS REPORT")
+    print("=" * 64)
+    print(f"Rows: {total}  |  Columns: {len(headers)}")
+
+    results = []
+    for col in headers:
+        mask = masks[col]
+        stats = null_stats(mask)
+        if stats["pct"] / 100 < threshold and stats["count"] > 0:
+            continue
+        raw_vals = [row.get(col, "") for row in rows]
+        col_type = infer_type(raw_vals)
+        mechanism = classify_mechanism(col, mask, masks)
+        strategy = recommend_strategy(stats["pct"], col_type)
+        results.append({
+            "column": col,
+            "null_count": stats["count"],
+            "null_pct": stats["pct"],
+            "col_type": col_type,
+            "mechanism": mechanism,
+            "strategy": strategy,
+        })
+
+    fully_complete = [col for col in headers if null_stats(masks[col])["count"] == 0]
+    print(f"\nFully complete columns: {len(fully_complete)}/{len(headers)}")
+
+    if not results:
+        print(f"\nNo columns exceed the null threshold ({threshold * 100:.1f}%).")
+    else:
+        print(f"\nColumns with missing values (threshold >= {threshold * 100:.1f}%):\n")
+        for r in sorted(results, key=lambda x: -x["null_pct"]):
+            indicator = "🔴" if r["null_pct"] > 30 else ("🟡" if r["null_pct"] > 10 else "🟢")
+            print(f"  {indicator} {r['column']}")
+            print(f"     Nulls: {r['null_count']} ({r['null_pct']}%)  |  Type: {r['col_type']}")
+            print(f"     Mechanism: {r['mechanism']}")
+            print(f"     Strategy:  {r['strategy']}")
+            print()
+
+    cooccur = compute_cooccurrence(headers, masks)
+    if cooccur:
+        print("-" * 64)
+        print("NULL CO-OCCURRENCE (top pairs)")
+        print("-" * 64)
+        for pair in cooccur:
+            print(f"  {pair['col_a']} + {pair['col_b']}  →  {pair['co_null_rows']} rows both null")
+
+    print("\n" + "=" * 64)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Analyze missing values in a CSV dataset.")
+    parser.add_argument("--file", required=True, help="Path to CSV file")
+    parser.add_argument("--threshold", type=float, default=0.0,
+                        help="Only show columns with null fraction above this (e.g. 0.05 = 5%%)")
+    parser.add_argument("--format", choices=["text", "json"], default="text")
+    args = parser.parse_args()
+
+    try:
+        headers, rows = load_csv(args.file)
+    except FileNotFoundError:
+        print(f"Error: file not found: {args.file}", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error reading file: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    if not rows:
+        print("Error: CSV file is empty.", file=sys.stderr)
+        sys.exit(1)
+
+    masks = compute_null_mask(headers, rows)
+
+    if args.format == "json":
+        output = []
+        for col in headers:
+            mask = masks[col]
+            stats = null_stats(mask)
+            raw_vals = [row.get(col, "") for row in rows]
+            col_type = infer_type(raw_vals)
+            mechanism = classify_mechanism(col, mask, masks)
+            strategy = recommend_strategy(stats["pct"], col_type)
+            output.append({
+                "column": col,
+                "null_count": stats["count"],
+                "null_pct": stats["pct"],
+                "col_type": col_type,
+                "mechanism": mechanism,
+                "strategy": strategy,
+            })
+        print(json.dumps({"total_rows": len(rows), "columns": output}, indent=2))
+    else:
+        print_report(headers, rows, masks, args.threshold)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/data-quality-auditor/scripts/outlier_detector.py
+++ b/engineering/data-quality-auditor/scripts/outlier_detector.py
@@ -0,0 +1,263 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+"""
+outlier_detector.py — Multi-method outlier detection for numeric columns.
+
+Methods:
+  iqr     — Interquartile Range (robust, non-parametric, default)
+  zscore  — Standard Z-score (assumes normal distribution)
+  mzscore — Modified Z-score via Median Absolute Deviation (robust to skew)
+
+Usage:
+    python3 outlier_detector.py --file data.csv
+    python3 outlier_detector.py --file data.csv --method iqr
+    python3 outlier_detector.py --file data.csv --method zscore --threshold 2.5
+    python3 outlier_detector.py --file data.csv --columns col1,col2
+    python3 outlier_detector.py --file data.csv --format json
+"""
+
+import argparse
+import csv
+import json
+import math
+import sys
+
+
+NULL_STRINGS = {"", "null", "none", "n/a", "na", "nan", "nil", "undefined", "missing"}
+
+
+def load_csv(filepath: str) -> tuple[list[str], list[dict]]:
+    with open(filepath, newline="", encoding="utf-8") as f:
+        reader = csv.DictReader(f)
+        rows = list(reader)
+        headers = reader.fieldnames or []
+    return headers, rows
+
+
+def is_null(val: str) -> bool:
+    return val.strip().lower() in NULL_STRINGS
+
+
+def to_float(val: str) -> float | None:
+    try:
+        return float(val.strip())
+    except (ValueError, AttributeError):
+        return None
+
+
+def median(nums: list[float]) -> float:
+    s = sorted(nums)
+    n = len(s)
+    mid = n // 2
+    return s[mid] if n % 2 else (s[mid - 1] + s[mid]) / 2
+
+
+def percentile(nums: list[float], p: float) -> float:
+    """Linear interpolation percentile."""
+    s = sorted(nums)
+    n = len(s)
+    if n == 1:
+        return s[0]
+    idx = p / 100 * (n - 1)
+    lo = int(idx)
+    hi = lo + 1
+    frac = idx - lo
+    if hi >= n:
+        return s[-1]
+    return s[lo] + frac * (s[hi] - s[lo])
+
+
+def mean(nums: list[float]) -> float:
+    return sum(nums) / len(nums)
+
+
+def std(nums: list[float], mu: float) -> float:
+    if len(nums) < 2:
+        return 0.0
+    variance = sum((x - mu) ** 2 for x in nums) / (len(nums) - 1)
+    return math.sqrt(variance)
+
+
+# --- Detection methods ---
+
+def detect_iqr(nums: list[float], multiplier: float = 1.5) -> dict:
+    q1 = percentile(nums, 25)
+    q3 = percentile(nums, 75)
+    iqr = q3 - q1
+    lower = q1 - multiplier * iqr
+    upper = q3 + multiplier * iqr
+    outliers = [x for x in nums if x < lower or x > upper]
+    return {
+        "method": "IQR",
+        "q1": round(q1, 4),
+        "q3": round(q3, 4),
+        "iqr": round(iqr, 4),
+        "lower_bound": round(lower, 4),
+        "upper_bound": round(upper, 4),
+        "outlier_count": len(outliers),
+        "outlier_pct": round(len(outliers) / len(nums) * 100, 2),
+        "outlier_values": sorted(set(round(x, 4) for x in outliers))[:10],
+    }
+
+
+def detect_zscore(nums: list[float], threshold: float = 3.0) -> dict:
+    mu = mean(nums)
+    sigma = std(nums, mu)
+    if sigma == 0:
+        return {"method": "Z-score", "outlier_count": 0, "outlier_pct": 0.0,
+                "note": "Zero variance — all values identical"}
+    zscores = [(x, abs((x - mu) / sigma)) for x in nums]
+    outliers = [x for x, z in zscores if z > threshold]
+    return {
+        "method": "Z-score",
+        "mean": round(mu, 4),
+        "std": round(sigma, 4),
+        "threshold": threshold,
+        "outlier_count": len(outliers),
+        "outlier_pct": round(len(outliers) / len(nums) * 100, 2),
+        "outlier_values": sorted(set(round(x, 4) for x in outliers))[:10],
+    }
+
+
+def detect_modified_zscore(nums: list[float], threshold: float = 3.5) -> dict:
+    """Iglewicz-Hoaglin modified Z-score using Median Absolute Deviation."""
+    med = median(nums)
+    mad = median([abs(x - med) for x in nums])
+    if mad == 0:
+        return {"method": "Modified Z-score (MAD)", "outlier_count": 0, "outlier_pct": 0.0,
+                "note": "MAD is zero — consider Z-score instead"}
+    mzscores = [(x, 0.6745 * abs(x - med) / mad) for x in nums]
+    outliers = [x for x, mz in mzscores if mz > threshold]
+    return {
+        "method": "Modified Z-score (MAD)",
+        "median": round(med, 4),
+        "mad": round(mad, 4),
+        "threshold": threshold,
+        "outlier_count": len(outliers),
+        "outlier_pct": round(len(outliers) / len(nums) * 100, 2),
+        "outlier_values": sorted(set(round(x, 4) for x in outliers))[:10],
+    }
+
+
+def classify_outlier_risk(pct: float, col: str) -> str:
+    """Heuristic: flag whether outliers are likely data errors or legitimate extremes."""
+    if pct > 10:
+        return "High outlier rate — likely systematic data quality issue or wrong data type"
+    if pct > 5:
+        return "Elevated outlier rate — investigate source; may be mixed populations"
+    if pct > 1:
+        return "Moderate — review individually; could be legitimate extremes or entry errors"
+    if pct > 0:
+        return "Low — verify extreme values against source; likely legitimate but worth checking"
+    return "Clean — no outliers detected"
+
+
+def analyze_column(col: str, nums: list[float], method: str, threshold: float) -> dict:
+    if len(nums) < 4:
+        return {"column": col, "status": "Skipped — fewer than 4 numeric values"}
+
+    if method == "iqr":
+        result = detect_iqr(nums, multiplier=threshold if threshold != 3.0 else 1.5)
+    elif method == "zscore":
+        result = detect_zscore(nums, threshold=threshold)
+    elif method == "mzscore":
+        result = detect_modified_zscore(nums, threshold=threshold)
+    else:
+        result = detect_iqr(nums)
+
+    result["column"] = col
+    result["total_numeric"] = len(nums)
+    result["risk_assessment"] = classify_outlier_risk(result.get("outlier_pct", 0), col)
+    return result
+
+
+def print_report(results: list[dict]):
+    print("=" * 64)
+    print("OUTLIER DETECTION REPORT")
+    print("=" * 64)
+
+    clean = [r for r in results if r.get("outlier_count", 0) == 0 and "status" not in r]
+    flagged = [r for r in results if r.get("outlier_count", 0) > 0]
+    skipped = [r for r in results if "status" in r]
+
+    print(f"\nColumns analyzed: {len(results) - len(skipped)}")
+    print(f"Clean:   {len(clean)}")
+    print(f"Flagged: {len(flagged)}")
+    if skipped:
+        print(f"Skipped: {len(skipped)} ({', '.join(r['column'] for r in skipped)})")
+
+    if flagged:
+        print("\n" + "-" * 64)
+        print("FLAGGED COLUMNS")
+        print("-" * 64)
+        for r in sorted(flagged, key=lambda x: -x.get("outlier_pct", 0)):
+            pct = r.get("outlier_pct", 0)
+            indicator = "🔴" if pct > 5 else "🟡"
+            print(f"\n  {indicator} {r['column']} ({r['method']})")
+            print(f"     Outliers: {r['outlier_count']} / {r['total_numeric']} rows ({pct}%)")
+            if "lower_bound" in r:
+                print(f"     Bounds: [{r['lower_bound']}, {r['upper_bound']}]  |  IQR: {r['iqr']}")
+            if "mean" in r:
+                print(f"     Mean: {r['mean']}  |  Std: {r['std']}  |  Threshold: ±{r['threshold']}σ")
+            if "median" in r:
+                print(f"     Median: {r['median']}  |  MAD: {r['mad']}  |  Threshold: {r['threshold']}")
+            if r.get("outlier_values"):
+                vals = ", ".join(str(v) for v in r["outlier_values"][:8])
+                print(f"     Sample outlier values: {vals}")
+            print(f"     Assessment: {r['risk_assessment']}")
+
+    if clean:
+        cols = ", ".join(r["column"] for r in clean)
+        print(f"\n🟢 Clean columns: {cols}")
+
+    print("\n" + "=" * 64)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Detect outliers in numeric columns of a CSV dataset.")
+    parser.add_argument("--file", required=True, help="Path to CSV file")
+    parser.add_argument("--method", choices=["iqr", "zscore", "mzscore"], default="iqr",
+                        help="Detection method (default: iqr)")
+    parser.add_argument("--threshold", type=float, default=None,
+                        help="Method threshold (IQR multiplier default 1.5; Z-score default 3.0; mzscore default 3.5)")
+    parser.add_argument("--columns", help="Comma-separated columns to check (default: all numeric)")
+    parser.add_argument("--format", choices=["text", "json"], default="text")
+    args = parser.parse_args()
+
+    # Set default thresholds per method
+    if args.threshold is None:
+        args.threshold = {"iqr": 1.5, "zscore": 3.0, "mzscore": 3.5}[args.method]
+
+    try:
+        headers, rows = load_csv(args.file)
+    except FileNotFoundError:
+        print(f"Error: file not found: {args.file}", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error reading file: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    if not rows:
+        print("Error: CSV file is empty.", file=sys.stderr)
+        sys.exit(1)
+
+    selected = args.columns.split(",") if args.columns else headers
+    missing_cols = [c for c in selected if c not in headers]
+    if missing_cols:
+        print(f"Error: columns not found: {', '.join(missing_cols)}", file=sys.stderr)
+        sys.exit(1)
+
+    results = []
+    for col in selected:
+        raw = [row.get(col, "") for row in rows]
+        nums = [n for v in raw if not is_null(v) and (n := to_float(v)) is not None]
+        results.append(analyze_column(col, nums, args.method, args.threshold))
+
+    if args.format == "json":
+        print(json.dumps(results, indent=2))
+    else:
+        print_report(results)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/demo-video/.claude-plugin/plugin.json
+++ b/engineering/demo-video/.claude-plugin/plugin.json
@@ -0,0 +1,13 @@
+{
+  "name": "demo-video",
+  "description": "Create polished demo videos from screenshots and scene descriptions. Orchestrates playwright, ffmpeg, and edge-tts to produce product walkthroughs, feature showcases, and marketing teasers with story structure, scene design system, and narration guidance.",
+  "version": "2.2.0",
+  "author": {
+    "name": "Alireza Rezvani",
+    "url": "https://alirezarezvani.com"
+  },
+  "homepage": "https://github.com/alirezarezvani/claude-skills/tree/main/engineering/demo-video",
+  "repository": "https://github.com/alirezarezvani/claude-skills",
+  "license": "MIT",
+  "skills": "./"
+}
--- a/engineering/demo-video/SKILL.md
+++ b/engineering/demo-video/SKILL.md
@@ -22,11 +22,17 @@ Create polished demo videos by orchestrating browser rendering, text-to-speech,

 ### 1. Choose a rendering mode

+Before starting, verify available tools:
+- **playwright MCP available?** — needed for automated screenshots. Fallback: ask user to screenshot the HTML files manually.
+- **edge-tts available?** — needed for narration audio. Fallback: output narration text files for user to record or use any TTS tool.
+- **ffmpeg available?** — needed for compositing. Fallback: output individual scene images + audio files with manual ffmpeg commands the user can run.
+
+If none are available, produce HTML scene files + `scenes.json` manifest + narration scripts. The user can composite manually or use any video editor.
+
 | Mode | How | When |
 |------|-----|------|
-| **MCP Orchestration** | HTML -> playwright screenshots -> edge-tts audio -> ffmpeg composite | Most control |
-| **Pipeline** | [framecraft](https://github.com/vaddisrinivas/framecraft) CLI: `uv run python framecraft.py render scenes.json --auto-duration` | Most reliable |
-| **Manual** | Build HTML, screenshot, generate TTS, composite with ffmpeg | Always works |
+| **MCP Orchestration** | HTML → playwright screenshots → edge-tts audio → ffmpeg composite | Use when playwright + edge-tts + ffmpeg MCPs are all connected |
+| **Manual** | Write HTML scene files, provide ffmpeg commands for user to run | Use when MCPs are not available |

 ### 2. Pick a story structure

@@ -41,6 +47,11 @@ Hook (2s) -> Demo (8s) -> Logo (3s) -> Tagline (2s)

 ### 3. Design scenes

+**If no screenshots are provided:**
+- For CLI/terminal tools: generate HTML scenes with terminal-style dark background, monospace font, and animated typing effect
+- For conceptual demos: use text-heavy scenes with the color language and typography system
+- Ask the user for screenshots only if the product is visual and descriptions are insufficient
+
 Every scene has exactly ONE primary focus:
 - Title scenes: product name
 - Problem scenes: the pain (red, chaotic)
@@ -55,65 +66,23 @@ Every scene has exactly ONE primary focus:
 - No jargon. "Your tabs organize themselves" not "AI-powered tab categorization."
 - Use contrast. "24 tabs. One click. 5 groups."

+## Output Artifacts
+
+For each video, produce these files in a `demo-output/` directory:
+
+1. `scenes/` — one HTML file per scene (1920x1080 viewport)
+2. `narration/` — one `.txt` file per scene (for edge-tts input)
+3. `scenes.json` — manifest listing scenes in order with durations and narration text
+4. `build.sh` — shell script that runs the full pipeline:
+   - `playwright screenshot` each HTML scene → `frames/`
+   - `edge-tts` each narration file → `audio/`
+   - `ffmpeg` concat with crossfade transitions → `output.mp4`
+
+If MCPs are unavailable, still produce items 1-3. Include the ffmpeg commands in `build.sh` for the user to run manually.
+
 ## Scene Design System

-### Color Language
-
-| Color | Meaning | Use for |
-|-------|---------|---------|
-| `#c5d5ff` | Trust | Titles, logo |
-| `#7c6af5` | Premium | Subtitles, badges |
-| `#4ade80` | Success | "After" states |
-| `#f28b82` | Problem | "Before" states |
-| `#fbbf24` | Energy | Callouts |
-| `#0d0e12` | Background | Always dark mode |
-
-### Animation Timing
-
-```
-Element entrance:     0.5-0.8s  (cubic-bezier(0.16, 1, 0.3, 1))
-Between elements:     0.2-0.4s  gap
-Scene transition:     0.3-0.5s  crossfade
-Hold after last anim: 1.0-2.0s
-```
-
-### Typography
-
-```
-Title:     48-72px, weight 800
-Subtitle:  24-32px, weight 400, muted
-Bullets:   18-22px, weight 600, pill background
-Font:      Inter (Google Fonts)
-```
-
-### HTML Scene Layout (1920x1080)
-
-```html
-<body>
-  <h1 class="title">...</h1>      <!-- Top 15% -->
-  <div class="hero">...</div>     <!-- Middle 65% -->
-  <div class="footer">...</div>   <!-- Bottom 20% -->
-</body>
-```
-
-Background: dark with subtle purple-blue glow gradients. Screenshots: always `border-radius: 12px` with `box-shadow`. Easing: always `cubic-bezier(0.16, 1, 0.3, 1)` — never `ease` or `linear`.
-
-### Voice Options (edge-tts)
-
-| Voice | Best for |
-|-------|----------|
-| `andrew` | Product demos, launches |
-| `jenny` | Tutorials, onboarding |
-| `davis` | Enterprise, security |
-| `emma` | Consumer products |
-
-### Pacing Guide
-
-| Duration | Max words | Fill |
-|----------|-----------|------|
-| 3-4s | 8-12 | ~70% |
-| 5-6s | 15-22 | ~75% |
-| 7-8s | 22-30 | ~80% |
+See [references/scene-design-system.md](references/scene-design-system.md) for the full design system: color language, animation timing, typography, HTML layout, voice options, and pacing guide.

 ## Quality Checklist

@@ -137,6 +106,5 @@ Background: dark with subtle purple-blue glow gradients. Screenshots: always `bo

 ## Cross-References

- Related: `engineering/cli-demo-generator` — for terminal-based demos
- Related: `engineering/presentation-builder` — for slide decks
- Full tooling: [framecraft](https://github.com/vaddisrinivas/framecraft) — templates, pipeline, MCP server
+- Related: `engineering/browser-automation` — for playwright-based browser workflows
+- See also: [framecraft](https://github.com/vaddisrinivas/framecraft) — open-source scene rendering pipeline
--- a/engineering/demo-video/evals.json
+++ b/engineering/demo-video/evals.json
@@ -0,0 +1,26 @@
+[
+  {
+    "id": 1,
+    "prompt": "I just shipped a new tab management Chrome extension. I have 4 screenshots showing the before (messy tabs) and after (organized groups). Can you make a 30-second demo video I can post on Twitter?",
+    "expected_output": "Agent picks Classic Demo or Problem-Solution structure, designs 5-7 scenes using the color language and typography specs, writes narration following the pacing guide, produces demo-output/ with HTML scenes, narration files, scenes.json manifest, and build.sh.",
+    "scenario_type": "happy_path"
+  },
+  {
+    "id": 2,
+    "prompt": "Create a 15-second teaser video for our SaaS dashboard. Here's one hero screenshot of the analytics view. Keep it minimal.",
+    "expected_output": "Agent selects 15-Second Teaser structure (Hook 2s, Demo 8s, Logo 3s, Tagline 2s), uses a single screenshot with dark background and proper styling, produces minimal scene set in demo-output/.",
+    "scenario_type": "happy_path"
+  },
+  {
+    "id": 3,
+    "prompt": "Make a demo video for my CLI tool. I don't have any screenshots but I can describe what it does.",
+    "expected_output": "Agent generates terminal-style HTML scenes with dark background and monospace font from the user's descriptions. Does not ask for screenshots. Produces demo-output/ with all artifacts.",
+    "scenario_type": "edge_case"
+  },
+  {
+    "id": 4,
+    "prompt": "I need a product demo video but I don't have ffmpeg or any MCP servers installed. Can you still help?",
+    "expected_output": "Agent acknowledges the constraint, produces HTML scene files + scenes.json + narration text files + build.sh with manual ffmpeg commands. Tells user how to install ffmpeg and run the script.",
+    "scenario_type": "edge_case"
+  }
+]
--- a/engineering/demo-video/references/scene-design-system.md
+++ b/engineering/demo-video/references/scene-design-system.md
@@ -0,0 +1,61 @@
+# Scene Design System
+
+Reference material for demo video scene design — colors, typography, animation timing, voice options, and pacing.
+
+## Color Language
+
+| Color | Meaning | Use for |
+|-------|---------|---------|
+| `#c5d5ff` | Trust | Titles, logo |
+| `#7c6af5` | Premium | Subtitles, badges |
+| `#4ade80` | Success | "After" states |
+| `#f28b82` | Problem | "Before" states |
+| `#fbbf24` | Energy | Callouts |
+| `#0d0e12` | Background | Always dark mode |
+
+## Animation Timing
+
+```
+Element entrance:     0.5-0.8s  (cubic-bezier(0.16, 1, 0.3, 1))
+Between elements:     0.2-0.4s  gap
+Scene transition:     0.3-0.5s  crossfade
+Hold after last anim: 1.0-2.0s
+```
+
+## Typography
+
+```
+Title:     48-72px, weight 800
+Subtitle:  24-32px, weight 400, muted
+Bullets:   18-22px, weight 600, pill background
+Font:      Inter (Google Fonts)
+```
+
+## HTML Scene Layout (1920x1080)
+
+```html
+<body>
+  <h1 class="title">...</h1>      <!-- Top 15% -->
+  <div class="hero">...</div>     <!-- Middle 65% -->
+  <div class="footer">...</div>   <!-- Bottom 20% -->
+</body>
+```
+
+Background: dark with subtle purple-blue glow gradients. Screenshots: always `border-radius: 12px` with `box-shadow`. Easing: always `cubic-bezier(0.16, 1, 0.3, 1)` — never `ease` or `linear`.
+
+## Voice Options (edge-tts)
+
+| Voice | Best for |
+|-------|----------|
+| `andrew` | Product demos, launches |
+| `jenny` | Tutorials, onboarding |
+| `davis` | Enterprise, security |
+| `emma` | Consumer products |
+
+## Pacing Guide
+
+| Duration | Max words | Fill |
+|----------|-----------|------|
+| 3-4s | 8-12 | ~70% |
+| 5-6s | 15-22 | ~75% |
+| 7-8s | 22-30 | ~80% |