Update skill docs and resources

This commit is contained in:
daymade
2026-02-23 16:16:58 +08:00
parent c1cfacaf76
commit 72d879e609
15 changed files with 1430 additions and 89 deletions

View File

@@ -0,0 +1,109 @@
# Analysis Dimensions
Detailed definitions for each audit dimension. Agents should use these as exploration guides.
## Dimension 1: Frontend Navigation & Information Density
**Goal**: Quantify cognitive load for a new user.
**Key questions**:
1. How many top-level components does App.tsx mount simultaneously?
2. How many tabs/sections exist in each sidebar panel?
3. Which features have multiple entry points (duplicate navigation)?
4. What is the total count of interactive elements on first screen?
5. Are there panels/drawers that overlap in functionality?
**Exploration targets**:
- Main app entry (App.tsx or equivalent)
- Left sidebar / navigation components
- Right sidebar / inspector panels
- Floating panels, drawers, modals
- Settings / configuration panels
- Control center / dashboard panels
**Output format**:
```
| Component | Location | Interactive Elements | Overlaps With |
|-----------|----------|---------------------|----------------|
```
## Dimension 2: User Journey & Empty State
**Goal**: Evaluate time-to-first-value for a new user.
**Key questions**:
1. What does a user see when they have no data/sessions/projects?
2. How many steps from launch to first successful action?
3. Is there an onboarding flow? How many steps?
4. How many clickable elements compete for attention in the empty state?
5. Are high-frequency actions visually prioritized over low-frequency ones?
**Exploration targets**:
- Empty state components
- Onboarding dialogs/wizards
- Prompt input area and surrounding controls
- Quick start templates / suggested actions
- Mobile-specific navigation and input
**Output format**:
```
Step N: [Action] → [What user sees] → [Next possible actions: count]
```
## Dimension 3: Backend API Surface
**Goal**: Identify API bloat, inconsistency, and unused endpoints.
**Key questions**:
1. How many total API endpoints exist?
2. Which endpoints have no corresponding frontend call?
3. Are error responses consistent across all endpoints?
4. Is authentication applied consistently?
5. Are there duplicate endpoints serving similar purposes?
**Exploration targets**:
- Router files (API route definitions)
- Frontend API client / fetch calls
- Error handling middleware
- Authentication middleware
- API documentation / OpenAPI spec
**Output format**:
```
| Method | Path | Purpose | Has Frontend Consumer | Auth Required |
|--------|------|---------|----------------------|---------------|
```
## Dimension 4: Architecture & Module Structure
**Goal**: Identify coupling, duplication, and dead code.
**Key questions**:
1. Which modules have circular dependencies?
2. Where is the same pattern duplicated across 3+ files?
3. Which modules have unclear single responsibility?
4. Are there unused exports or dead code paths?
5. How deep is the import chain for core operations?
**Exploration targets**:
- Module `__init__.py` / `index.ts` files
- Import graphs (who imports whom)
- Utility files and shared helpers
- Configuration and factory patterns
## Dimension 5: Documentation & Config Consistency
**Goal**: Find gaps between claims and reality.
**Key questions**:
1. Does README list features that don't exist in code?
2. Are config file defaults consistent with code defaults?
3. Is there documentation for removed/renamed features?
4. Which modules have zero test coverage?
5. Are there TODO/FIXME/HACK comments in production code?
**Exploration targets**:
- README.md, CLAUDE.md, CONTRIBUTING.md
- Config files (YAML, JSON, .env)
- Test directories (coverage gaps)
- Source code comments (TODO/FIXME/HACK)

View File

@@ -0,0 +1,82 @@
# Codex CLI Integration Patterns
How to use OpenAI Codex CLI for cross-model parallel analysis.
## Basic Invocation
```bash
codex -m o4-mini \
-c model_reasoning_effort="high" \
--full-auto \
"Your analysis prompt here"
```
## Flag Reference
| Flag | Purpose | Values |
|------|---------|--------|
| `-m` | Model selection | `o4-mini` (fast), `gpt-5.3-codex-spark` (deep) |
| `-c model_reasoning_effort` | Reasoning depth | `low`, `medium`, `high`, `xhigh` |
| `-c model_reasoning_summary_format` | Summary format | `experimental` (structured output) |
| `--full-auto` | Skip all approval prompts | (no value) |
| `--dangerously-bypass-approvals-and-sandbox` | Legacy full-auto flag | (no value, older versions) |
## Recommended Configurations
### Fast Scan (quick validation)
```bash
codex -m o4-mini \
-c model_reasoning_effort="medium" \
--full-auto \
"prompt"
```
### Deep Analysis (thorough investigation)
```bash
codex -m o4-mini \
-c model_reasoning_effort="xhigh" \
-c model_reasoning_summary_format="experimental" \
--full-auto \
"prompt"
```
## Parallel Execution Pattern
Launch multiple Codex analyses in background using Bash tool with `run_in_background: true`:
```bash
# Dimension 1: Frontend
codex -m o4-mini -c model_reasoning_effort="high" --full-auto \
"Analyze frontend navigation: count interactive elements, find duplicate entry points, assess cognitive load for new users. Give file paths and counts."
# Dimension 2: User Journey
codex -m o4-mini -c model_reasoning_effort="high" --full-auto \
"Analyze new user experience: what does empty state show? How many steps to first action? Count clickable elements competing for attention. Give file paths."
# Dimension 3: Backend API
codex -m o4-mini -c model_reasoning_effort="high" --full-auto \
"List all API endpoints. Identify unused endpoints with no frontend consumer. Check error handling consistency. Give router file paths."
```
## Output Handling
Codex outputs to stdout. When run in background:
1. Use Bash `run_in_background: true` to launch
2. Use `TaskOutput` to retrieve results when done
3. Parse the text output for findings
## Cross-Model Value
The primary value of Codex in this workflow is **independent perspective**:
- Different training data may surface different patterns
- Different reasoning approach may catch what Claude misses
- Agreement across models = high confidence
- Disagreement = worth investigating manually
## Limitations
- Codex CLI must be installed and configured (`codex` command available)
- Requires OpenAI API key configured
- No MCP server access (only filesystem tools)
- Output is unstructured text (needs parsing)
- Rate limits apply per OpenAI account

View File

@@ -0,0 +1,68 @@
# Synthesis Methodology
How to weight, merge, and validate findings from multiple parallel agents.
## Multi-Agent Synthesis Framework
### Step 1: Collect Raw Findings
Wait for all agents to complete. For each agent, extract:
- **Quantitative data**: counts, measurements, lists
- **Qualitative assessments**: good/bad/unclear judgments
- **Evidence**: file paths, line numbers, code snippets
### Step 2: Cross-Validation Matrix
Create a matrix comparing findings across agents:
```
| Finding | Agent A | Agent B | Codex | Confidence |
|---------|---------|---------|-------|------------|
| "57 interactive elements on first screen" | 57 | 54 | 61 | HIGH (3/3 agree on magnitude) |
| "Skills has 3 entry points" | 3 | 3 | 2 | HIGH (2/3 exact match) |
| "Risk pages should be removed" | Yes | - | No | LOW (disagreement, investigate) |
```
**Confidence levels**:
- **HIGH**: 2+ agents agree (exact or same magnitude)
- **MEDIUM**: 1 agent found, others didn't look
- **LOW**: Agents disagree — requires manual investigation
### Step 3: Disagreement Resolution
When agents disagree:
1. Check if they analyzed different files/scopes
2. Check if one agent missed context (e.g., conditional rendering)
3. If genuine disagreement, note both perspectives in report
4. Codex-only findings are "different model perspective" — valuable but need validation
### Step 4: Priority Assignment
**P0 (Critical)**: Issues that prevent a new user from completing basic tasks
- Examples: broken onboarding, missing error messages, dead navigation links
**P1 (High)**: Issues that significantly increase cognitive load or confusion
- Examples: duplicate entry points, information overload, unclear primary action
**P2 (Medium)**: Issues worth addressing but not blocking launch
- Examples: unused API endpoints, minor inconsistencies, missing edge case handling
### Step 5: Report Generation
Structure the report for actionability:
1. **Executive Summary** (2-3 sentences, the "so what")
2. **Quantified Metrics** (hard numbers, no adjectives)
3. **P0 Issues** (with specific file:line references)
4. **P1 Issues** (with suggested fixes)
5. **P2 Issues** (backlog items)
6. **Cross-Model Insights** (findings unique to one model)
7. **Competitive Position** (if compare scope was used)
## Weighting Rules
- Quantitative findings (counts, measurements) > Qualitative judgments
- Code-evidenced findings > Assumption-based findings
- Multi-agent agreement > Single-agent finding
- User-facing issues > Internal code quality issues
- Findings with clear fix path > Vague "should improve" suggestions