docs: translate daily-news-report skill and update registry files

2026-01-27 15:24:44 -06:00
parent 3446382851
commit 2a92eba60e
2 changed files with 173 additions and 174 deletions
--- a/README.md
+++ b/README.md
@@ -200,7 +200,7 @@ The repository is organized into several key areas of expertise:
 | **crewai** | ⚪ | Expert in CrewAI - the leading role-based multi-agent framework used by 60% of Fortune 500 companies. Covers agent design with roles and goals, task definition, crew orchestration, process types (sequential, hierarchical, parallel), memory systems, and flows for complex workflows. Essential for building collaborative AI agent teams. Use when: crewai, multi-agent team, agent roles, crew of agents, role-based agents. | `skills/crewai` |
 | **Cross-Site Scripting and HTML Injection Testing** | ⚪ | This skill should be used when the user asks to "test for XSS vulnerabilities", "perform cross-site scripting attacks", "identify HTML injection flaws", "exploit client-side injection vulnerabilities", "steal cookies via XSS", or "bypass content security policies". It provides comprehensive techniques for detecting, exploiting, and understanding XSS and HTML injection attack vectors in web applications. | `skills/xss-html-injection` |
 | **d3-viz** | ⚪ | Creating interactive data visualisations using d3.js. This skill should be used when creating custom charts, graphs, network diagrams, geographic visualisations, or any complex SVG-based data visualisation that requires fine-grained control over visual elements, transitions, or interactions. Use this for bespoke visualisations beyond standard charting libraries, whether in React, Vue, Svelte, vanilla JavaScript, or any other environment. | `skills/claude-d3js-skill` |
-| **daily-news-report** | ⚪ | Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports. | `skills/daily-news-report` |
+| **daily-news-report** | ⚪ | 基于预设 URL 列表抓取内容，筛选高质量技术信息并生成每日 Markdown 报告。 | `skills/daily-news-report` |
 | **database-design** | ⚪ | Database design principles and decision-making. Schema design, indexing strategy, ORM selection, serverless databases. | `skills/database-design` |
 | **deployment-procedures** | ⚪ | Production deployment principles and decision-making. Safe deployment workflows, rollback strategies, and verification. Teaches thinking, not scripts. | `skills/deployment-procedures` |
 | **design-orchestration** | ⚪ | Orchestrates design workflows by routing work through brainstorming, multi-agent review, and execution readiness in the correct order. Prevents premature implementation, skipped validation, and unreviewed high-risk designs. | `skills/design-orchestration` |
--- a/skills/daily-news-report/SKILL.md
+++ b/skills/daily-news-report/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: daily-news-report
-description: 基于预设 URL 列表抓取内容，筛选高质量技术信息并生成每日 Markdown 报告。
-argument-hint: [可选: 日期]
+description: Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.
+argument-hint: [optional: date]
 disable-model-invocation: false
 user-invocable: true
 allowed-tools: Task, WebFetch, Read, Write, Bash(mkdir*), Bash(date*), Bash(ls*), mcp__chrome-devtools__*
@@ -9,93 +9,93 @@ allowed-tools: Task, WebFetch, Read, Write, Bash(mkdir*), Bash(date*), Bash(ls*)

 # Daily News Report v3.0

-> **架构升级**：主 Agent 调度 + SubAgent 执行 + 浏览器抓取 + 智能缓存
+> **Architecture Upgrade**: Main Agent Orchestration + SubAgent Execution + Browser Scraping + Smart Caching

-## 核心架构
+## Core Architecture

 ```
 ┌─────────────────────────────────────────────────────────────────────┐
-│                        主 Agent (Orchestrator)                       │
-│  职责：调度、监控、评估、决策、汇总                                    │
+│                        Main Agent (Orchestrator)                    │
+│  Role: Scheduling, Monitoring, Evaluation, Decision, Aggregation    │
 ├─────────────────────────────────────────────────────────────────────┤
 │                                                                      │
-│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
-│   │ 1. 初始化 │ → │ 2. 调度   │ → │ 3. 监控   │ → │ 4. 评估   │     │
-│   │ 读取配置  │    │ 分发任务  │    │ 收集结果  │    │ 筛选排序  │     │
-│   └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
-│         │               │               │               │           │
-│         ▼               ▼               ▼               ▼           │
-│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐     │
-│   │ 5. 决策   │ ← │ 够20条？  │    │ 6. 生成   │ → │ 7. 更新   │     │
-│   │ 继续/停止 │    │ Y/N      │    │ 日报文件  │    │ 缓存统计  │     │
-│   └──────────┘    └──────────┘    └──────────┘    └──────────┘     │
+│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
+│   │ 1. Init     │ → │ 2. Dispatch │ → │ 3. Monitor  │ → │ 4. Evaluate │     │
+│   │ Read Config │    │ Assign Tasks│    │ Collect Res │    │ Filter/Sort │     │
+│   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘     │
+│         │                  │                  │                  │           │
+│         ▼                  ▼                  ▼                  ▼           │
+│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐     │
+│   │ 5. Decision │ ← │ Enough 20?  │    │ 6. Generate │ → │ 7. Update   │     │
+│   │ Cont/Stop   │    │ Y/N         │    │ Report File │    │ Cache Stats │     │
+│   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘     │
 │                                                                      │
 └──────────────────────────────────────────────────────────────────────┘
-         ↓ 调度                              ↑ 返回结果
+         ↓ Dispatch                          ↑ Return Results
 ┌─────────────────────────────────────────────────────────────────────┐
-│                        SubAgent 执行层                               │
+│                        SubAgent Execution Layer                      │
 ├─────────────────────────────────────────────────────────────────────┤
 │                                                                      │
 │   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐              │
 │   │ Worker A    │   │ Worker B    │   │ Browser     │              │
 │   │ (WebFetch)  │   │ (WebFetch)  │   │ (Headless)  │              │
-│   │ Tier1 Batch │   │ Tier2 Batch │   │ JS渲染页面   │              │
+│   │ Tier1 Batch │   │ Tier2 Batch │   │ JS Render   │              │
 │   └─────────────┘   └─────────────┘   └─────────────┘              │
 │         ↓                 ↓                 ↓                        │
 │   ┌─────────────────────────────────────────────────────────────┐   │
-│   │                    结构化结果返回                             │   │
+│   │                    Structured Result Return                 │   │
 │   │  { status, data: [...], errors: [...], metadata: {...} }    │   │
 │   └─────────────────────────────────────────────────────────────┘   │
 │                                                                      │
 └─────────────────────────────────────────────────────────────────────┘
 ```

-## 配置文件
+## Configuration Files

-本 Skill 使用以下配置文件：
+This skill uses the following configuration files:

-| 文件 | 用途 |
-|------|------|
-| `sources.json` | 信息源配置、优先级、抓取方法 |
-| `cache.json` | 缓存数据、历史统计、去重指纹 |
+| File | Purpose |
+|------|---------|
+| `sources.json` | Source configuration, priorities, scrape methods |
+| `cache.json` | Cached data, historical stats, deduplication fingerprints |

-## 执行流程详解
+## Execution Process Details

-### Phase 1: 初始化
+### Phase 1: Initialization

 ```yaml
-步骤:
-  1. 确定日期（用户参数或当前日期）
-  2. 读取 sources.json 获取源配置
-  3. 读取 cache.json 获取历史数据
-  4. 创建输出目录 NewsReport/
-  5. 检查今日是否已有部分报告（追加模式）
+Steps:
+  1. Determine date (user argument or current date)
+  2. Read sources.json for source configurations
+  3. Read cache.json for historical data
+  4. Create output directory NewsReport/
+  5. Check if a partial report exists for today (append mode)
 ```

-### Phase 2: 调度 SubAgent
+### Phase 2: Dispatch SubAgents

-**策略**：并行调度，分批执行，早停机制
+**Strategy**: Parallel dispatch, batch execution, early stopping mechanism

 ```yaml
-第1波 (并行):
+Wave 1 (Parallel):
  - Worker A: Tier1 Batch A (HN, HuggingFace Papers)
  - Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)

-等待结果 → 评估数量
+Wait for results → Evaluate count

-如果 < 15 条高质量:
-  第2波 (并行):
+If < 15 high-quality items:
+  Wave 2 (Parallel):
    - Worker C: Tier2 Batch A (James Clear, FS Blog)
    - Worker D: Tier2 Batch B (HackerNoon, Scott Young)

-如果仍 < 20 条:
-  第3波 (浏览器):
-    - Browser Worker: ProductHunt, Latent Space (需要JS渲染)
+If still < 20 items:
+  Wave 3 (Browser):
+    - Browser Worker: ProductHunt, Latent Space (Require JS rendering)
 ```

-### Phase 3: SubAgent 任务格式
+### Phase 3: SubAgent Task Format

-每个 SubAgent 接收的任务格式：
+Task format received by each SubAgent:

 ```yaml
 task: fetch_and_extract
@@ -109,127 +109,127 @@ sources:

 output_schema:
  items:
-    - source_id: string      # 来源标识
-      title: string          # 标题
-      summary: string        # 2-4句摘要
-      key_points: string[]   # 最多3个要点
-      url: string            # 原文链接
-      keywords: string[]     # 关键词
-      quality_score: 1-5     # 质量评分
+    - source_id: string      # Source Identifier
+      title: string          # Title
+      summary: string        # 2-4 sentence summary
+      key_points: string[]   # Max 3 key points
+      url: string            # Original URL
+      keywords: string[]     # Keywords
+      quality_score: 1-5     # Quality Score

 constraints:
-  filter: "前沿技术/高深技术/提效技术/实用资讯"
-  exclude: "泛科普/营销软文/过度学术化/招聘帖"
+  filter: "Cutting-edge Tech/Deep Tech/Productivity/Practical Info"
+  exclude: "General Science/Marketing Puff/Overly Academic/Job Posts"
  max_items_per_source: 10
  skip_on_error: true

 return_format: JSON
 ```

-### Phase 4: 主 Agent 监控与反馈
+### Phase 4: Main Agent Monitoring & Feedback

-主 Agent 职责：
+Main Agent Responsibilities:

 ```yaml
-监控:
-  - 检查 SubAgent 返回状态 (success/partial/failed)
-  - 统计收集到的条目数量
-  - 记录每个源的成功率
+Monitoring:
+  - Check SubAgent return status (success/partial/failed)
+  - Count collected items
+  - Record success rate per source

-反馈循环:
-  - 如果某 SubAgent 失败，决定是否重试或跳过
-  - 如果某源持续失败，标记为禁用
-  - 动态调整后续批次的源选择
+Feedback Loop:
+  - If a SubAgent fails, decide whether to retry or skip
+  - If a source fails persistently, mark as disabled
+  - Dynamically adjust source selection for subsequent batches

-决策:
-  - 条目数 >= 25 且高质量 >= 20 → 停止抓取
-  - 条目数 < 15 → 继续下一批
-  - 所有批次完成但 < 20 → 用现有内容生成（宁缺毋滥）
+Decision:
+  - Items >= 25 AND HighQuality >= 20 → Stop scraping
+  - Items < 15 → Continue to next batch
+  - All batches done but < 20 → Generate with available content (Quality over Quantity)
 ```

-### Phase 5: 评估与筛选
+### Phase 5: Evaluation & Filtering

 ```yaml
-去重:
-  - 基于 URL 完全匹配
-  - 基于标题相似度 (>80% 视为重复)
-  - 检查 cache.json 避免与历史重复
+Deduplication:
+  - Exact URL match
+  - Title similarity (>80% considered duplicate)
+  - Check cache.json to avoid history duplicates

-评分校准:
-  - 统一各 SubAgent 的评分标准
-  - 根据来源可信度调整权重
-  - 手动标注的高质量源加分
+Score Calibration:
+  - Unify scoring standards across SubAgents
+  - Adjust weights based on source credibility
+  - Bonus points for manually curated high-quality sources

-排序:
-  - 按 quality_score 降序
-  - 同分按来源优先级排序
-  - 截取 Top 20
+Sorting:
+  - Descending order by quality_score
+  - Sort by source priority if scores are equal
+  - Take Top 20
 ```

-### Phase 6: 浏览器抓取 (MCP Chrome DevTools)
+### Phase 6: Browser Scraping (MCP Chrome DevTools)

-对于需要 JS 渲染的页面，使用无头浏览器：
+For pages requiring JS rendering, use a headless browser:

 ```yaml
-流程:
-  1. 调用 mcp__chrome-devtools__new_page 打开页面
-  2. 调用 mcp__chrome-devtools__wait_for 等待内容加载
-  3. 调用 mcp__chrome-devtools__take_snapshot 获取页面结构
-  4. 解析 snapshot 提取所需内容
-  5. 调用 mcp__chrome-devtools__close_page 关闭页面
+Process:
+  1. Call mcp__chrome-devtools__new_page to open page
+  2. Call mcp__chrome-devtools__wait_for to wait for content load
+  3. Call mcp__chrome-devtools__take_snapshot to get page structure
+  4. Parse snapshot to extract required content
+  5. Call mcp__chrome-devtools__close_page to close page

-适用场景:
+Applicable Scenarios:
  - ProductHunt (403 on WebFetch)
-  - Latent Space (Substack JS 渲染)
-  - 其他 SPA 应用
+  - Latent Space (Substack JS rendering)
+  - Other SPA applications
 ```

-### Phase 7: 生成日报
+### Phase 7: Generate Report

 ```yaml
-输出:
-  - 目录: NewsReport/
-  - 文件名: YYYY-MM-DD-news-report.md
-  - 格式: 标准 Markdown
+Output:
+  - Directory: NewsReport/
+  - Filename: YYYY-MM-DD-news-report.md
+  - Format: Standard Markdown

-内容结构:
-  - 标题 + 日期
-  - 统计摘要（源数量、收录数量）
-  - 20条高质量内容（按模板）
-  - 生成信息（版本、时间戳）
+Content Structure:
+  - Title + Date
+  - Statistical Summary (Source count, items collected)
+  - 20 High-Quality Items (Template based)
+  - Generation Info (Version, Timestamps)
 ```

-### Phase 8: 更新缓存
+### Phase 8: Update Cache

 ```yaml
-更新 cache.json:
-  - last_run: 记录本次运行信息
-  - source_stats: 更新各源统计数据
-  - url_cache: 添加已处理的 URL
-  - content_hashes: 添加内容指纹
-  - article_history: 记录收录文章
+Update cache.json:
+  - last_run: Record this run info
+  - source_stats: Update stats per source
+  - url_cache: Add processed URLs
+  - content_hashes: Add content fingerprints
+  - article_history: Record included articles
 ```

-## SubAgent 调用示例
+## SubAgent Call Examples

-### 使用 general-purpose Agent
+### Using general-purpose Agent

-由于自定义 agent 需要 session 重启才能发现，可以使用 general-purpose 并注入 worker prompt：
+Since custom agents require session restart to be discovered, use general-purpose and inject worker prompts:

 ```
-Task 调用:
+Task Call:
  subagent_type: general-purpose
  model: haiku
  prompt: |
-    你是一个无状态的执行单元。只做被分配的任务，返回结构化 JSON。
+    You are a stateless execution unit. Only do the assigned task and return structured JSON.

-    任务：抓取以下 URL 并提取内容
+    Task: Scrape the following URLs and extract content

    URLs:
-    - https://news.ycombinator.com (提取 Top 10)
-    - https://huggingface.co/papers (提取高投票论文)
+    - https://news.ycombinator.com (Extract Top 10)
+    - https://huggingface.co/papers (Extract top voted papers)

-    输出格式：
+    Output Format:
    {
      "status": "success" | "partial" | "failed",
      "data": [
@@ -247,17 +247,17 @@ Task 调用:
      "metadata": { "processed": 2, "failed": 0 }
    }

-    筛选标准：
-    - 保留：前沿技术/高深技术/提效技术/实用资讯
-    - 排除：泛科普/营销软文/过度学术化/招聘帖
+    Filter Criteria:
+    - Keep: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
+    - Exclude: General Science/Marketing Puff/Overly Academic/Job Posts

-    直接返回 JSON，不要解释。
+    Return JSON directly, no explanation.
 ```

-### 使用 worker Agent（需重启 session）
+### Using worker Agent (Requires session restart)

 ```
-Task 调用:
+Task Call:
  subagent_type: worker
  prompt: |
    task: fetch_and_extract
@@ -274,37 +274,36 @@ Task 调用:
      - keywords: string[]
      - quality_score: 1-5
    constraints:
-      filter: 前沿技术/高深技术/提效技术/实用资讯
-      exclude: 泛科普/营销软文/过度学术化
+      filter: Cutting-edge Tech/Deep Tech/Productivity/Practical Info
+      exclude: General Science/Marketing Puff/Overly Academic
 ```

-## 输出模板
+## Output Template

 ```markdown
-# Daily News Report（YYYY-MM-DD）
+# Daily News Report (YYYY-MM-DD)

-> 本日筛选自 N 个信息源，共收录 20 条高质量内容
-> 生成耗时: X 分钟 | 版本: v3.0
+> Curated from N sources today, containing 20 high-quality items
+> Generation Time: X min | Version: v3.0
 >
 > **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
-> **警告**：未检测到 Sub-agent 'worker'。正在以通用模式（串行执行）运行。性能可能会受影响。

 ---

-## 1. 标题
+## 1. Title

- **摘要**：2-4 行概述
- **要点**：
-  1. 要点一
-  2. 要点二
-  3. 要点三
- **来源**：[链接](URL)
- **关键词**：`keyword1` `keyword2` `keyword3`
- **评分**：⭐⭐⭐⭐⭐ (5/5)
+- **Summary**: 2-4 lines overview
+- **Key Points**:
+  1. Point one
+  2. Point two
+  3. Point three
+- **Source**: [Link](URL)
+- **Keywords**: `keyword1` `keyword2` `keyword3`
+- **Score**: ⭐⭐⭐⭐⭐ (5/5)

 ---

-## 2. 标题
+## 2. Title
 ...

 ---
@@ -313,45 +312,45 @@ Task 调用:
 *Sources: HN, HuggingFace, OneUsefulThing, ...*
 ```

-## 约束与原则
+## Constraints & Principles

-1. **宁缺毋滥**：低质量内容不进入日报
-2. **早停机制**：够 20 条高质量就停止抓取
-3. **并行优先**：同一批次的 SubAgent 并行执行
-4. **失败容错**：单个源失败不影响整体流程
-5. **缓存复用**：避免重复抓取相同内容
-6. **主 Agent 控制**：所有决策由主 Agent 做出
-7. **Fallback Awareness**：检测 sub-agent 可用性，不可用时优雅降级
+1.  **Quality over Quantity**: Low-quality content does not enter the report.
+2.  **Early Stop**: Stop scraping once 20 high-quality items are reached.
+3.  **Parallel First**: SubAgents in the same batch execute in parallel.
+4.  **Fault Tolerance**: Failure of a single source does not affect the whole process.
+5.  **Cache Reuse**: Avoid re-scraping the same content.
+6.  **Main Agent Control**: All decisions are made by the Main Agent.
+7.  **Fallback Awareness**: Detect sub-agent availability, gracefully degrade if unavailable.

-## 预期性能
+## Expected Performance

-| 场景 | 预期时间 | 说明 |
-|------|----------|------|
-| 最优情况 | ~2 分钟 | Tier1 足够，无需浏览器 |
-| 正常情况 | ~3-4 分钟 | 需要 Tier2 补充 |
-| 需要浏览器 | ~5-6 分钟 | 包含 JS 渲染页面 |
+| Scenario | Expected Time | Note |
+|---|---|---|
+| Optimal | ~2 mins | Tier1 sufficient, no browser needed |
+| Normal | ~3-4 mins | Requires Tier2 supplement |
+| Browser Needed | ~5-6 mins | Includes JS rendered pages |

-## 错误处理
+## Error Handling

-| 错误类型 | 处理方式 |
-|----------|----------|
-| SubAgent 超时 | 记录错误，继续下一个 |
-| 源 403/404 | 标记禁用，更新 sources.json |
-| 内容提取失败 | 返回原始内容，主 Agent 决定 |
-| 浏览器崩溃 | 跳过该源，记录日志 |
+| Error Type | Handling |
+|---|---|
+| SubAgent Timeout | Log error, continue to next |
+| Source 403/404 | Mark disabled, update sources.json |
+| Extraction Failed | Return raw content, Main Agent decides |
+| Browser Crash | Skip source, log entry |

-## 兼容性与兜底 (Compatibility & Fallback)
+## Compatibility & Fallback

-为了确保在不同 Agent 环境下的可用性，必须执行以下检查：
+To ensure usability across different Agent environments, the following checks must be performed:

-1.  **环境检查**:
-    -   在 Phase 1 初始化阶段，尝试检测 `worker` sub-agent 是否存在。
-    -   如果不存在（或未安装相关插件），自动切换到 **串行执行模式 (Serial Mode)**。
+1.  **Environment Check**:
+    -   In Phase 1 initialization, attempt to detect if `worker` sub-agent exists.
+    -   If not exists (or plugin not installed), automatically switch to **Serial Execution Mode**.

-2.  **串行执行模式**:
-    -   不使用 parallel block。
-    -   主 Agent 依次执行每个源的抓取任务。
-    -   虽然速度较慢，但保证基本功能可用。
+2.  **Serial Execution Mode**:
+    -   Do not use parallel block.
+    -   Main Agent executes scraping tasks for each source sequentially.
+    -   Slower, but guarantees basic functionality.

-3.  **用户提示**:
-    -   必须在生成的日报开头（引用块部分）包含明显的警告信息，提示用户当前正在运行于降级模式。
+3.  **User Alert**:
+    -   MUST include a clear warning in the generated report header indicating the current degraded mode.