From 1a5c4059bde610252d1973c4f97002f1368bb6b9 Mon Sep 17 00:00:00 2001
From: Gizzant <frankmiller@gmail.com>
Date: Mon, 16 Mar 2026 10:56:50 -0400
Subject: [PATCH] Update SKILL.md to final version (#305)

* Update SKILL.md to final version

* fix: restore analyze-project frontmatter

---------

Co-authored-by: sck_0 <samujackson1337@gmail.com>
---
 skills/analyze-project/SKILL.md | 456 +++++++++++++-------------------
 1 file changed, 186 insertions(+), 270 deletions(-)

diff --git a/skills/analyze-project/SKILL.md b/skills/analyze-project/SKILL.md
index dcf7031d..20e8daeb 100644
--- a/skills/analyze-project/SKILL.md
+++ b/skills/analyze-project/SKILL.md
@@ -7,99 +7,109 @@ tags: [analysis, diagnostics, meta, root-cause, project-health, session-review]
 
 # /analyze-project — Root Cause Analyst Workflow
 
-Analyze AI-assisted coding sessions in `brain/` and produce a diagnostic report that explains not just **what happened**, but **why it happened**, **who/what caused it**, and **what should change next time**.
+Analyze AI-assisted coding sessions in `~/.gemini/antigravity/brain/` and produce a report that explains not just **what happened**, but **why it happened**, **who/what caused it**, and **what should change next time**.
 
-This workflow is not a simple metrics dashboard.
-It is a forensic analysis workflow for AI coding sessions.
-
----
-
-## Primary Objective
+## Goal
 
 For each session, determine:
 
 1. What changed from the initial ask to the final executed work
-2. Whether the change was caused primarily by:
-   - the user/spec
-   - the agent
-   - the codebase/repo
-   - testing/verification
+2. Whether the main cause was:
+   - user/spec
+   - agent
+   - repo/codebase
+   - validation/testing
    - legitimate task complexity
-3. Whether the original prompt was sufficient for the actual job
-4. Which subsystems or files repeatedly correlate with struggle
-5. What concrete changes would most improve future sessions
+3. Whether the opening prompt was sufficient
+4. Which files/subsystems repeatedly correlate with struggle
+5. What changes would most improve future sessions
 
----
+## Global Rules
 
-## Core Principles
-
-- Treat `.resolved.N` counts as **signals of iteration intensity**, not proof of failure
-- Do not label struggle based on counts alone; classify the **shape** of rework
-- Separate **human-added scope** from **necessary discovered scope**
+- Treat `.resolved.N` counts as **iteration signals**, not proof of failure
+- Separate **human-added scope**, **necessary discovered scope**, and **agent-introduced scope**
 - Separate **agent error** from **repo friction**
-- Every diagnosis must include **evidence**
-- Every recommendation must map to a specific observed pattern
-- Use confidence levels:
-  - **High** = directly supported by artifact contents or timestamps
-  - **Medium** = supported by multiple indirect signals
+- Every diagnosis must include **evidence** and **confidence**
+- Confidence levels:
+  - **High** = direct artifact/timestamp evidence
+  - **Medium** = multiple supporting signals
   - **Low** = plausible inference, not directly proven
+- Evidence precedence:
+  - artifact contents > timestamps > metadata summaries > inference
+- If evidence is weak, say so
 
 ---
 
-## Step 1: Discovery — Find Relevant Conversations
+## Step 0.5: Session Intent Classification
 
-1. Read the conversation summaries available in the system context.
-2. List all subdirectories in:
-   `~/.gemini/antigravity/brain/
-3. Build a **Conversation Index** by cross-referencing summaries with UUID folders.
-4. Record for each conversation:
+Classify the primary session intent from objective + artifacts:
+
+- `DELIVERY`
+- `DEBUGGING`
+- `REFACTOR`
+- `RESEARCH`
+- `EXPLORATION`
+- `AUDIT_ANALYSIS`
+
+Record:
+- `session_intent`
+- `session_intent_confidence`
+
+Use intent to contextualize severity and rework shape.
+Do not judge exploratory or research sessions by the same standards as narrow delivery sessions.
+
+---
+
+## Step 1: Discover Conversations
+
+1. Read available conversation summaries from system context
+2. List conversation folders in the user’s Antigravity `brain/` directory
+3. Build a conversation index with:
    - `conversation_id`
    - `title`
    - `objective`
    - `created`
    - `last_modified`
-5. If the user supplied a keyword/path, filter on that. Otherwise analyze all workspace conversations.
+4. If the user supplied a keyword/path, filter to matching conversations; otherwise analyze all
 
-> Output: indexed list of conversations to analyze.
+Output: indexed list of conversations to analyze.
 
 ---
 
-## Step 2: Artifact Extraction — Build Session Evidence
+## Step 2: Extract Session Evidence
 
-For each conversation, read all structured artifacts that exist.
+For each conversation, read if present:
 
-### 2a. Core Artifacts
+### Core artifacts
 - `task.md`
 - `implementation_plan.md`
 - `walkthrough.md`
 
-### 2b. Metadata
+### Metadata
 - `*.metadata.json`
 
-### 2c. Version Snapshots
+### Version snapshots
 - `task.md.resolved.0 ... N`
 - `implementation_plan.md.resolved.0 ... N`
 - `walkthrough.md.resolved.0 ... N`
 
-### 2d. Additional Signals
+### Additional signals
 - other `.md` artifacts
-- report/evaluation files
 - timestamps across artifact updates
-- file/folder names mentioned in plans and walkthroughs
-- repeated subsystem references
-- explicit testing/validation language
-- explicit non-goals or constraints, if present
+- file/folder/subsystem names mentioned in plans/walkthroughs
+- validation/testing language
+- explicit acceptance criteria, constraints, non-goals, and file targets
 
-### 2e. Record Per Conversation
+Record per conversation:
 
-#### Presence / Lifecycle
+#### Lifecycle
 - `has_task`
 - `has_plan`
 - `has_walkthrough`
 - `is_completed`
-- `is_abandoned_candidate` = has task but no walkthrough
+- `is_abandoned_candidate` = task exists but no walkthrough
 
-#### Revision / Change Volume
+#### Revision / change volume
 - `task_versions`
 - `plan_versions`
 - `walkthrough_versions`
@@ -117,7 +127,7 @@ For each conversation, read all structured artifacts that exist.
 - `completed_at`
 - `duration_minutes`
 
-#### Content / Quality Signals
+#### Content / quality
 - `objective_text`
 - `initial_plan_summary`
 - `final_plan_summary`
@@ -134,81 +144,64 @@ For each conversation, read all structured artifacts that exist.
 
 ---
 
-## Step 3: Prompt Sufficiency Analysis
+## Step 3: Prompt Sufficiency
 
-For each conversation, score the opening objective/request on a 0–2 scale for each dimension:
+Score the opening request on a 0–2 scale for:
 
-- **Clarity** — is the ask understandable?
-- **Boundedness** — are scope limits defined?
-- **Testability** — are success conditions or acceptance criteria defined?
-- **Architectural specificity** — are files/modules/systems identified?
-- **Constraint awareness** — are non-goals, constraints, or environment details included?
-- **Dependency awareness** — does the prompt acknowledge affected systems or hidden coupling?
+- **Clarity**
+- **Boundedness**
+- **Testability**
+- **Architectural specificity**
+- **Constraint awareness**
+- **Dependency awareness**
 
 Create:
 - `prompt_sufficiency_score`
 - `prompt_sufficiency_band` = High / Medium / Low
 
-Then note which missing ingredients likely contributed to later friction.
+Then note which missing prompt ingredients likely contributed to later friction.
 
-Important:
-Do not assume a low-detail prompt is bad by default.
-Short prompts can still be good if the task is narrow and the repo context is obvious.
+Do not punish short prompts by default; a narrow, obvious task can still have high sufficiency.
 
 ---
 
 ## Step 4: Scope Change Classification
 
-Do not treat all scope growth as the same.
+Classify scope change into:
 
-For each conversation, classify scope delta into:
+- **Human-added scope** — new asks beyond the original task
+- **Necessary discovered scope** — work required to complete the original task correctly
+- **Agent-introduced scope** — likely unnecessary work introduced by the agent
 
-### 4a. Human-Added Scope
-New items clearly introduced beyond the initial ask.
-Examples:
-- optional enhancements
-- follow-on refactors
-- “while we are here” additions
-- cosmetic or adjacent work added later
-
-### 4b. Necessary Discovered Scope
-Work that was not in the opening ask but appears required to complete it correctly.
-Examples:
-- dependency fixes
-- required validation work
-- hidden integration tasks
-- migration fallout
-- coupled module updates
-
-### 4c. Agent-Introduced Scope
-Work that appears not requested and not necessary, likely introduced by agent overreach.
-
-For each conversation record:
+Record:
 - `scope_change_type_primary`
 - `scope_change_type_secondary` (optional)
 - `scope_change_confidence`
-- evidence for classification
+- evidence
+
+Keep one short example in mind for calibration:
+- Human-added: “also refactor nearby code while you’re here”
+- Necessary discovered: hidden dependency must be fixed for original task to work
+- Agent-introduced: extra cleanup or redesign not requested and not required
 
 ---
 
-## Step 5: Rework Shape Analysis
+## Step 5: Rework Shape
 
-Do not just count revisions. Determine the **shape** of session rework.
+Classify each session into one primary pattern:
 
-Classify each conversation into one of these patterns:
-
-- **Clean execution** — little change, smooth completion
-- **Early replan then stable finish** — plan changed early, then execution converged
-- **Progressive scope expansion** — work kept growing throughout the session
-- **Reopen/reclose churn** — repeated task adjustments/backtracking
-- **Late-stage verification churn** — implementation mostly done, but testing/validation caused loops
-- **Abandoned mid-flight** — work started but did not reach walkthrough
-- **Exploratory / research session** — iterations are high but expected due to problem discovery
+- **Clean execution**
+- **Early replan then stable finish**
+- **Progressive scope expansion**
+- **Reopen/reclose churn**
+- **Late-stage verification churn**
+- **Abandoned mid-flight**
+- **Exploratory / research session**
 
 Record:
 - `rework_shape`
 - `rework_shape_confidence`
-- supporting evidence
+- evidence
 
 ---
 
@@ -216,8 +209,8 @@ Record:
 
 For every non-clean session, assign:
 
-### 6a. Primary Root Cause
-Choose one:
+### Primary root cause
+One of:
 - `SPEC_AMBIGUITY`
 - `HUMAN_SCOPE_CHANGE`
 - `REPO_FRAGILITY`
@@ -225,46 +218,58 @@ Choose one:
 - `VERIFICATION_CHURN`
 - `LEGITIMATE_TASK_COMPLEXITY`
 
-### 6b. Secondary Root Cause
-Optional if a second factor materially contributed.
+### Secondary root cause
+Optional if materially relevant
 
-### 6c. Evidence Requirements
-Every root cause assignment must include:
-- evidence from artifacts or metadata
-- why competing causes were rejected
-- confidence level
+### Root-cause guidance
+- **SPEC_AMBIGUITY**: opening ask lacked boundaries, targets, criteria, or constraints
+- **HUMAN_SCOPE_CHANGE**: scope expanded because the user broadened the task
+- **REPO_FRAGILITY**: hidden coupling, brittle files, unclear architecture, or environment issues forced extra work
+- **AGENT_ARCHITECTURAL_ERROR**: wrong files, wrong assumptions, wrong approach, hallucinated structure
+- **VERIFICATION_CHURN**: implementation mostly worked, but testing/validation caused loops
+- **LEGITIMATE_TASK_COMPLEXITY**: revisions were expected for the difficulty and not clearly avoidable
 
-### 6d. Root Cause Heuristics
+Every root-cause assignment must include:
+- evidence
+- why stronger alternative causes were rejected
+- confidence
 
-#### SPEC_AMBIGUITY
-Use when the opening ask lacked boundaries, targets, criteria, or constraints, and the plan had to invent them.
+---
 
-#### HUMAN_SCOPE_CHANGE
-Use when the task set expanded due to new asks, broadened goals, or post-hoc additions.
+## Step 6.5: Session Severity Scoring (0–100)
 
-#### REPO_FRAGILITY
-Use when hidden coupling, unclear architecture, brittle files, or environmental issues forced extra work.
+Assign each session a severity score to prioritize attention.
 
-#### AGENT_ARCHITECTURAL_ERROR
-Use when the agent chose the wrong approach, wrong files, wrong assumptions, or hallucinated structure.
+Components (sum, clamp 0–100):
+- **Completion failure**: 0–25 (`abandoned = 25`)
+- **Replanning intensity**: 0–15
+- **Scope instability**: 0–15
+- **Rework shape severity**: 0–15
+- **Prompt sufficiency deficit**: 0–10 (`low = 10`)
+- **Root cause impact**: 0–10 (`REPO_FRAGILITY` / `AGENT_ARCHITECTURAL_ERROR` highest)
+- **Hotspot recurrence**: 0–10
 
-#### VERIFICATION_CHURN
-Use when implementation mostly succeeded but tests, validation, QA, or fixes created repeated loops.
+Bands:
+- **0–19 Low**
+- **20–39 Moderate**
+- **40–59 Significant**
+- **60–79 High**
+- **80–100 Critical**
 
-#### LEGITIMATE_TASK_COMPLEXITY
-Use when revisions were reasonable given the difficulty and do not strongly indicate avoidable failure.
+Record:
+- `session_severity_score`
+- `severity_band`
+- `severity_drivers` = top 2–4 contributors
+- `severity_confidence`
+
+Use severity as a prioritization signal, not a verdict. Always explain the drivers.
+Contextualize severity using session intent so research/exploration sessions are not over-penalized.
 
 ---
 
 ## Step 7: Subsystem / File Clustering
 
-Across all conversations, cluster repeated struggle by subsystem, folder, or file mentions.
-
-Examples:
-- `frontend/auth/*`
-- `db.py`
-- `ui.py`
-- `video_pipeline/*`
+Across all conversations, cluster repeated struggle by file, folder, or subsystem.
 
 For each cluster, calculate:
 - number of conversations touching it
@@ -272,18 +277,15 @@ For each cluster, calculate:
 - completion rate
 - abandonment rate
 - common root causes
+- average severity
 
-Output the top recurring friction zones.
-
-Goal:
-Identify whether struggle is prompt-driven, agent-driven, or concentrated in specific repo areas.
+Goal: identify whether friction is mostly prompt-driven, agent-driven, or concentrated in specific repo areas.
 
 ---
 
-## Step 8: Comparative Cohort Analysis
-
-Compare these cohorts:
+## Step 8: Comparative Cohorts
 
+Compare:
 - first-shot successes vs re-planned sessions
 - completed vs abandoned
 - high prompt sufficiency vs low prompt sufficiency
@@ -296,8 +298,7 @@ For each comparison, identify:
 - which prompt traits correlate with smoother execution
 - which repo traits correlate with repeated struggle
 
-Do not merely restate averages.
-Extract causal-looking patterns cautiously and label them as inference where appropriate.
+Do not just restate averages; extract cautious evidence-backed patterns.
 
 ---
 
@@ -305,38 +306,29 @@ Extract causal-looking patterns cautiously and label them as inference where app
 
 Generate 3–7 findings that are not simple metric restatements.
 
-Good examples:
-- “Most replans happen in sessions with weak file targeting, not weak acceptance criteria.”
-- “Scope growth usually begins after the first successful implementation, suggesting post-success human expansion.”
-- “Auth-related sessions cluster around repo fragility rather than agent hallucination.”
-- “Abandoned work is strongly associated with missing validation criteria.”
-
-Bad examples:
-- “Some sessions had many revisions.”
-- “Some sessions were longer than others.”
-
 Each finding must include:
 - observation
 - why it matters
 - evidence
 - confidence
 
+Examples of strong findings:
+- replans cluster around weak file targeting rather than weak acceptance criteria
+- scope growth often begins after initial success, suggesting post-success human expansion
+- auth-related struggle is driven more by repo fragility than agent hallucination
+
 ---
 
 ## Step 10: Report Generation
 
-Create `session_analysis_report.md` in the current conversation’s brain folder.
-
-Use this structure:
+Create `session_analysis_report.md` with this structure:
 
 # 📊 Session Analysis Report — [Project Name]
 
-**Generated**: [timestamp]
-**Conversations Analyzed**: [N]
+**Generated**: [timestamp]  
+**Conversations Analyzed**: [N]  
 **Date Range**: [earliest] → [latest]
 
----
-
 ## Executive Summary
 
 | Metric | Value | Rating |
@@ -346,91 +338,61 @@ Use this structure:
 | Avg Scope Growth | X% | 🟢/🟡/🔴 |
 | Replan Rate | X% | 🟢/🟡/🔴 |
 | Median Duration | Xm | — |
-| Avg Revision Intensity | X | 🟢/🟡/🔴 |
+| Avg Session Severity | X | 🟢/🟡/🔴 |
+| High-Severity Sessions | X / N | 🟢/🟡/🔴 |
 
-Then include a short narrative summary:
-- what is going well
-- what is breaking down
-- whether the main issue is prompt quality, repo fragility, or workflow discipline
+Thresholds:
+- First-shot: 🟢 >70 / 🟡 40–70 / 🔴 <40
+- Scope growth: 🟢 <15 / 🟡 15–40 / 🔴 >40
+- Replan rate: 🟢 <20 / 🟡 20–50 / 🔴 >50
 
----
+Avg severity guidance:
+- 🟢 <25
+- 🟡 25–50
+- 🔴 >50
+
+Note: avg severity is an aggregate health signal, not the same as per-session severity bands.
+
+Then add a short narrative summary of what is going well, what is breaking down, and whether the main issue is prompt quality, repo fragility, workflow discipline, or validation churn.
 
 ## Root Cause Breakdown
 
 | Root Cause | Count | % | Notes |
 |:---|:---|:---|:---|
-| Spec Ambiguity | X | X% | ... |
-| Human Scope Change | X | X% | ... |
-| Repo Fragility | X | X% | ... |
-| Agent Architectural Error | X | X% | ... |
-| Verification Churn | X | X% | ... |
-| Legitimate Task Complexity | X | X% | ... |
-
----
 
 ## Prompt Sufficiency Analysis
-
 - common traits of high-sufficiency prompts
 - common missing inputs in low-sufficiency prompts
 - which missing prompt ingredients correlate most with replanning or abandonment
 
----
-
 ## Scope Change Analysis
-
 Separate:
 - Human-added scope
 - Necessary discovered scope
 - Agent-introduced scope
 
-Show top offenders in each category.
-
----
-
 ## Rework Shape Analysis
-
-Summarize how sessions tend to fail:
-- early replan then recover
-- progressive scope expansion
-- late verification churn
-- abandonments
-- reopen/reclose cycles
-
----
+Summarize the main failure patterns across sessions.
 
 ## Friction Hotspots
-
-Cluster repeated struggle by subsystem/file/domain.
-Show which areas correlate with:
-- replanning
-- abandonment
-- verification churn
-- agent architectural mistakes
-
----
+Show the files/folders/subsystems most associated with replanning, abandonment, verification churn, and high severity.
 
 ## First-Shot Successes
-
-List the cleanest sessions and extract what made them work:
-- scope boundaries
-- acceptance criteria
-- file targeting
-- validation clarity
-- narrowness of change surface
-
----
+List the cleanest sessions and extract what made them work.
 
 ## Non-Obvious Findings
+List 3–7 evidence-backed findings with confidence.
 
-List 3–7 high-value findings with evidence and confidence.
-
----
+## Severity Triage
+List the highest-severity sessions and say whether the best intervention is:
+- prompt improvement
+- scope discipline
+- targeted skill/workflow
+- repo refactor / architecture cleanup
+- validation/test harness improvement
 
 ## Recommendations
-
-Each recommendation must use this format:
-
-### Recommendation [N]
+For each recommendation, use:
 - **Observed pattern**
 - **Likely cause**
 - **Evidence**
@@ -438,79 +400,33 @@ Each recommendation must use this format:
 - **Expected benefit**
 - **Confidence**
 
-Recommendations must be specific, not generic.
-
----
-
 ## Per-Conversation Breakdown
 
-| # | Title | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Complete? |
-|:---|:---|:---|:---|:---|:---|:---|:---|:---|
-
-Add short notes only where meaningful.
+| # | Title | Intent | Duration | Scope Δ | Plan Revs | Task Revs | Root Cause | Rework Shape | Severity | Complete? |
+|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|
 
 ---
 
-## Step 11: Auto-Optimize — Improve Future Sessions
+## Step 11: Optional Post-Analysis Improvements
 
-### 11a. Update Project Health State
-# Example path (update to your actual location):
-# `~/.gemini/antigravity/.agent/skills/project-health-state/SKILL.md`
+If appropriate, also:
+- update any local project-health or memory artifact (if present) with recurring failure modes and fragile subsystems
+- generate `prompt_improvement_tips.md` from high-sufficiency / first-shot-success sessions
+- suggest missing skills or workflows when the same subsystem or task sequence repeatedly causes struggle
 
-Update:
-- session analysis metrics
-- recurring fragile files/subsystems
-- recurring failure modes
-- last updated timestamp
-
-### 11b. Generate Prompt Improvement Guidance
-Create `prompt_improvement_tips.md`
-
-Do not give generic advice.
-Instead extract:
-- traits of high-sufficiency prompts
-- examples of effective scope boundaries
-- examples of good acceptance criteria
-- examples of useful file targeting
-- common missing details that led to replans
-
-### 11c. Suggest Missing Skills / Workflows
-If multiple struggle sessions cluster around the same subsystem or repeated sequence, recommend:
-- a targeted skill
-- a repeatable workflow
-- a reusable prompt template
-- a repo note / architecture map
-
-Only recommend workflows when the pattern appears repeatedly.
+Only recommend workflows/skills when the pattern appears repeatedly.
 
 ---
 
 ## Final Output Standard
 
 The workflow must produce:
-1. A metrics summary
-2. A root-cause diagnosis
-3. A subsystem/friction map
-4. A prompt-sufficiency assessment
-5. Evidence-backed recommendations
-6. Non-obvious findings
+1. metrics summary
+2. root-cause diagnosis
+3. prompt-sufficiency assessment
+4. subsystem/friction map
+5. severity triage and prioritization
+6. evidence-backed recommendations
+7. non-obvious findings
 
-If evidence is weak, say so.
-Do not overclaim.
 Prefer explicit uncertainty over fake precision.
-
-
-
-
-
-
-
-
-**How to invoke this skill**  
-Just say any of these in a new conversation:
-- “Run analyze-project on the workspace”
-- “Do a full session analysis report”
-- “Root cause my recent brain/ sessions”
-- “Update project health state”
-
-The agent will automatically discover and use the skill.