feat: integrate last30days and daily-news-report skills
This commit is contained in:
357
skills/daily-news-report/SKILL.md
Normal file
357
skills/daily-news-report/SKILL.md
Normal file
@@ -0,0 +1,357 @@
|
||||
---
|
||||
name: daily-news-report
|
||||
description: 基于预设 URL 列表抓取内容,筛选高质量技术信息并生成每日 Markdown 报告。
|
||||
argument-hint: [可选: 日期]
|
||||
disable-model-invocation: false
|
||||
user-invocable: true
|
||||
allowed-tools: Task, WebFetch, Read, Write, Bash(mkdir*), Bash(date*), Bash(ls*), mcp__chrome-devtools__*
|
||||
---
|
||||
|
||||
# Daily News Report v3.0
|
||||
|
||||
> **架构升级**:主 Agent 调度 + SubAgent 执行 + 浏览器抓取 + 智能缓存
|
||||
|
||||
## 核心架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ 主 Agent (Orchestrator) │
|
||||
│ 职责:调度、监控、评估、决策、汇总 │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ 1. 初始化 │ → │ 2. 调度 │ → │ 3. 监控 │ → │ 4. 评估 │ │
|
||||
│ │ 读取配置 │ │ 分发任务 │ │ 收集结果 │ │ 筛选排序 │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ │ │ │ │ │
|
||||
│ ▼ ▼ ▼ ▼ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ 5. 决策 │ ← │ 够20条? │ │ 6. 生成 │ → │ 7. 更新 │ │
|
||||
│ │ 继续/停止 │ │ Y/N │ │ 日报文件 │ │ 缓存统计 │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────┘
|
||||
↓ 调度 ↑ 返回结果
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ SubAgent 执行层 │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Worker A │ │ Worker B │ │ Browser │ │
|
||||
│ │ (WebFetch) │ │ (WebFetch) │ │ (Headless) │ │
|
||||
│ │ Tier1 Batch │ │ Tier2 Batch │ │ JS渲染页面 │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ ↓ ↓ ↓ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ 结构化结果返回 │ │
|
||||
│ │ { status, data: [...], errors: [...], metadata: {...} } │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## 配置文件
|
||||
|
||||
本 Skill 使用以下配置文件:
|
||||
|
||||
| 文件 | 用途 |
|
||||
|------|------|
|
||||
| `sources.json` | 信息源配置、优先级、抓取方法 |
|
||||
| `cache.json` | 缓存数据、历史统计、去重指纹 |
|
||||
|
||||
## 执行流程详解
|
||||
|
||||
### Phase 1: 初始化
|
||||
|
||||
```yaml
|
||||
步骤:
|
||||
1. 确定日期(用户参数或当前日期)
|
||||
2. 读取 sources.json 获取源配置
|
||||
3. 读取 cache.json 获取历史数据
|
||||
4. 创建输出目录 NewsReport/
|
||||
5. 检查今日是否已有部分报告(追加模式)
|
||||
```
|
||||
|
||||
### Phase 2: 调度 SubAgent
|
||||
|
||||
**策略**:并行调度,分批执行,早停机制
|
||||
|
||||
```yaml
|
||||
第1波 (并行):
|
||||
- Worker A: Tier1 Batch A (HN, HuggingFace Papers)
|
||||
- Worker B: Tier1 Batch B (OneUsefulThing, Paul Graham)
|
||||
|
||||
等待结果 → 评估数量
|
||||
|
||||
如果 < 15 条高质量:
|
||||
第2波 (并行):
|
||||
- Worker C: Tier2 Batch A (James Clear, FS Blog)
|
||||
- Worker D: Tier2 Batch B (HackerNoon, Scott Young)
|
||||
|
||||
如果仍 < 20 条:
|
||||
第3波 (浏览器):
|
||||
- Browser Worker: ProductHunt, Latent Space (需要JS渲染)
|
||||
```
|
||||
|
||||
### Phase 3: SubAgent 任务格式
|
||||
|
||||
每个 SubAgent 接收的任务格式:
|
||||
|
||||
```yaml
|
||||
task: fetch_and_extract
|
||||
sources:
|
||||
- id: hn
|
||||
url: https://news.ycombinator.com
|
||||
extract: top_10
|
||||
- id: hf_papers
|
||||
url: https://huggingface.co/papers
|
||||
extract: top_voted
|
||||
|
||||
output_schema:
|
||||
items:
|
||||
- source_id: string # 来源标识
|
||||
title: string # 标题
|
||||
summary: string # 2-4句摘要
|
||||
key_points: string[] # 最多3个要点
|
||||
url: string # 原文链接
|
||||
keywords: string[] # 关键词
|
||||
quality_score: 1-5 # 质量评分
|
||||
|
||||
constraints:
|
||||
filter: "前沿技术/高深技术/提效技术/实用资讯"
|
||||
exclude: "泛科普/营销软文/过度学术化/招聘帖"
|
||||
max_items_per_source: 10
|
||||
skip_on_error: true
|
||||
|
||||
return_format: JSON
|
||||
```
|
||||
|
||||
### Phase 4: 主 Agent 监控与反馈
|
||||
|
||||
主 Agent 职责:
|
||||
|
||||
```yaml
|
||||
监控:
|
||||
- 检查 SubAgent 返回状态 (success/partial/failed)
|
||||
- 统计收集到的条目数量
|
||||
- 记录每个源的成功率
|
||||
|
||||
反馈循环:
|
||||
- 如果某 SubAgent 失败,决定是否重试或跳过
|
||||
- 如果某源持续失败,标记为禁用
|
||||
- 动态调整后续批次的源选择
|
||||
|
||||
决策:
|
||||
- 条目数 >= 25 且高质量 >= 20 → 停止抓取
|
||||
- 条目数 < 15 → 继续下一批
|
||||
- 所有批次完成但 < 20 → 用现有内容生成(宁缺毋滥)
|
||||
```
|
||||
|
||||
### Phase 5: 评估与筛选
|
||||
|
||||
```yaml
|
||||
去重:
|
||||
- 基于 URL 完全匹配
|
||||
- 基于标题相似度 (>80% 视为重复)
|
||||
- 检查 cache.json 避免与历史重复
|
||||
|
||||
评分校准:
|
||||
- 统一各 SubAgent 的评分标准
|
||||
- 根据来源可信度调整权重
|
||||
- 手动标注的高质量源加分
|
||||
|
||||
排序:
|
||||
- 按 quality_score 降序
|
||||
- 同分按来源优先级排序
|
||||
- 截取 Top 20
|
||||
```
|
||||
|
||||
### Phase 6: 浏览器抓取 (MCP Chrome DevTools)
|
||||
|
||||
对于需要 JS 渲染的页面,使用无头浏览器:
|
||||
|
||||
```yaml
|
||||
流程:
|
||||
1. 调用 mcp__chrome-devtools__new_page 打开页面
|
||||
2. 调用 mcp__chrome-devtools__wait_for 等待内容加载
|
||||
3. 调用 mcp__chrome-devtools__take_snapshot 获取页面结构
|
||||
4. 解析 snapshot 提取所需内容
|
||||
5. 调用 mcp__chrome-devtools__close_page 关闭页面
|
||||
|
||||
适用场景:
|
||||
- ProductHunt (403 on WebFetch)
|
||||
- Latent Space (Substack JS 渲染)
|
||||
- 其他 SPA 应用
|
||||
```
|
||||
|
||||
### Phase 7: 生成日报
|
||||
|
||||
```yaml
|
||||
输出:
|
||||
- 目录: NewsReport/
|
||||
- 文件名: YYYY-MM-DD-news-report.md
|
||||
- 格式: 标准 Markdown
|
||||
|
||||
内容结构:
|
||||
- 标题 + 日期
|
||||
- 统计摘要(源数量、收录数量)
|
||||
- 20条高质量内容(按模板)
|
||||
- 生成信息(版本、时间戳)
|
||||
```
|
||||
|
||||
### Phase 8: 更新缓存
|
||||
|
||||
```yaml
|
||||
更新 cache.json:
|
||||
- last_run: 记录本次运行信息
|
||||
- source_stats: 更新各源统计数据
|
||||
- url_cache: 添加已处理的 URL
|
||||
- content_hashes: 添加内容指纹
|
||||
- article_history: 记录收录文章
|
||||
```
|
||||
|
||||
## SubAgent 调用示例
|
||||
|
||||
### 使用 general-purpose Agent
|
||||
|
||||
由于自定义 agent 需要 session 重启才能发现,可以使用 general-purpose 并注入 worker prompt:
|
||||
|
||||
```
|
||||
Task 调用:
|
||||
subagent_type: general-purpose
|
||||
model: haiku
|
||||
prompt: |
|
||||
你是一个无状态的执行单元。只做被分配的任务,返回结构化 JSON。
|
||||
|
||||
任务:抓取以下 URL 并提取内容
|
||||
|
||||
URLs:
|
||||
- https://news.ycombinator.com (提取 Top 10)
|
||||
- https://huggingface.co/papers (提取高投票论文)
|
||||
|
||||
输出格式:
|
||||
{
|
||||
"status": "success" | "partial" | "failed",
|
||||
"data": [
|
||||
{
|
||||
"source_id": "hn",
|
||||
"title": "...",
|
||||
"summary": "...",
|
||||
"key_points": ["...", "...", "..."],
|
||||
"url": "...",
|
||||
"keywords": ["...", "..."],
|
||||
"quality_score": 4
|
||||
}
|
||||
],
|
||||
"errors": [],
|
||||
"metadata": { "processed": 2, "failed": 0 }
|
||||
}
|
||||
|
||||
筛选标准:
|
||||
- 保留:前沿技术/高深技术/提效技术/实用资讯
|
||||
- 排除:泛科普/营销软文/过度学术化/招聘帖
|
||||
|
||||
直接返回 JSON,不要解释。
|
||||
```
|
||||
|
||||
### 使用 worker Agent(需重启 session)
|
||||
|
||||
```
|
||||
Task 调用:
|
||||
subagent_type: worker
|
||||
prompt: |
|
||||
task: fetch_and_extract
|
||||
input:
|
||||
urls:
|
||||
- https://news.ycombinator.com
|
||||
- https://huggingface.co/papers
|
||||
output_schema:
|
||||
- source_id: string
|
||||
- title: string
|
||||
- summary: string
|
||||
- key_points: string[]
|
||||
- url: string
|
||||
- keywords: string[]
|
||||
- quality_score: 1-5
|
||||
constraints:
|
||||
filter: 前沿技术/高深技术/提效技术/实用资讯
|
||||
exclude: 泛科普/营销软文/过度学术化
|
||||
```
|
||||
|
||||
## 输出模板
|
||||
|
||||
```markdown
|
||||
# Daily News Report(YYYY-MM-DD)
|
||||
|
||||
> 本日筛选自 N 个信息源,共收录 20 条高质量内容
|
||||
> 生成耗时: X 分钟 | 版本: v3.0
|
||||
>
|
||||
> **Warning**: Sub-agent 'worker' not detected. Running in generic mode (Serial Execution). Performance might be degraded.
|
||||
> **警告**:未检测到 Sub-agent 'worker'。正在以通用模式(串行执行)运行。性能可能会受影响。
|
||||
|
||||
---
|
||||
|
||||
## 1. 标题
|
||||
|
||||
- **摘要**:2-4 行概述
|
||||
- **要点**:
|
||||
1. 要点一
|
||||
2. 要点二
|
||||
3. 要点三
|
||||
- **来源**:[链接](URL)
|
||||
- **关键词**:`keyword1` `keyword2` `keyword3`
|
||||
- **评分**:⭐⭐⭐⭐⭐ (5/5)
|
||||
|
||||
---
|
||||
|
||||
## 2. 标题
|
||||
...
|
||||
|
||||
---
|
||||
|
||||
*Generated by Daily News Report v3.0*
|
||||
*Sources: HN, HuggingFace, OneUsefulThing, ...*
|
||||
```
|
||||
|
||||
## 约束与原则
|
||||
|
||||
1. **宁缺毋滥**:低质量内容不进入日报
|
||||
2. **早停机制**:够 20 条高质量就停止抓取
|
||||
3. **并行优先**:同一批次的 SubAgent 并行执行
|
||||
4. **失败容错**:单个源失败不影响整体流程
|
||||
5. **缓存复用**:避免重复抓取相同内容
|
||||
6. **主 Agent 控制**:所有决策由主 Agent 做出
|
||||
7. **Fallback Awareness**:检测 sub-agent 可用性,不可用时优雅降级
|
||||
|
||||
## 预期性能
|
||||
|
||||
| 场景 | 预期时间 | 说明 |
|
||||
|------|----------|------|
|
||||
| 最优情况 | ~2 分钟 | Tier1 足够,无需浏览器 |
|
||||
| 正常情况 | ~3-4 分钟 | 需要 Tier2 补充 |
|
||||
| 需要浏览器 | ~5-6 分钟 | 包含 JS 渲染页面 |
|
||||
|
||||
## 错误处理
|
||||
|
||||
| 错误类型 | 处理方式 |
|
||||
|----------|----------|
|
||||
| SubAgent 超时 | 记录错误,继续下一个 |
|
||||
| 源 403/404 | 标记禁用,更新 sources.json |
|
||||
| 内容提取失败 | 返回原始内容,主 Agent 决定 |
|
||||
| 浏览器崩溃 | 跳过该源,记录日志 |
|
||||
|
||||
## 兼容性与兜底 (Compatibility & Fallback)
|
||||
|
||||
为了确保在不同 Agent 环境下的可用性,必须执行以下检查:
|
||||
|
||||
1. **环境检查**:
|
||||
- 在 Phase 1 初始化阶段,尝试检测 `worker` sub-agent 是否存在。
|
||||
- 如果不存在(或未安装相关插件),自动切换到 **串行执行模式 (Serial Mode)**。
|
||||
|
||||
2. **串行执行模式**:
|
||||
- 不使用 parallel block。
|
||||
- 主 Agent 依次执行每个源的抓取任务。
|
||||
- 虽然速度较慢,但保证基本功能可用。
|
||||
|
||||
3. **用户提示**:
|
||||
- 必须在生成的日报开头(引用块部分)包含明显的警告信息,提示用户当前正在运行于降级模式。
|
||||
41
skills/daily-news-report/cache.json
Normal file
41
skills/daily-news-report/cache.json
Normal file
@@ -0,0 +1,41 @@
|
||||
{
|
||||
"schema_version": "1.0",
|
||||
"description": "Daily News Report 缓存文件,用于避免重复抓取和跟踪历史表现",
|
||||
|
||||
"last_run": {
|
||||
"date": "2026-01-21",
|
||||
"duration_seconds": 180,
|
||||
"items_collected": 20,
|
||||
"items_published": 20,
|
||||
"sources_used": ["hn", "hf_papers", "james_clear", "fs_blog", "scotthyoung"]
|
||||
},
|
||||
|
||||
"source_stats": {
|
||||
"_comment": "记录每个源的历史表现,用于动态调整优先级",
|
||||
"hn": {
|
||||
"total_fetches": 0,
|
||||
"success_count": 0,
|
||||
"avg_items_per_fetch": 0,
|
||||
"avg_quality_score": 0,
|
||||
"last_fetch": null,
|
||||
"last_success": null
|
||||
}
|
||||
},
|
||||
|
||||
"url_cache": {
|
||||
"_comment": "已处理的 URL 缓存,避免重复收录",
|
||||
"_ttl_hours": 168,
|
||||
"entries": {}
|
||||
},
|
||||
|
||||
"content_hashes": {
|
||||
"_comment": "内容指纹,用于去重",
|
||||
"_ttl_hours": 168,
|
||||
"entries": {}
|
||||
},
|
||||
|
||||
"article_history": {
|
||||
"_comment": "已收录文章的简要记录",
|
||||
"2026-01-21": []
|
||||
}
|
||||
}
|
||||
183
skills/daily-news-report/sources.json
Normal file
183
skills/daily-news-report/sources.json
Normal file
@@ -0,0 +1,183 @@
|
||||
{
|
||||
"version": "2.1",
|
||||
"last_updated": "2026-01-21",
|
||||
|
||||
"sources": {
|
||||
"tier1": {
|
||||
"description": "高命中率源,优先抓取",
|
||||
"batch_a": [
|
||||
{
|
||||
"id": "hn",
|
||||
"name": "Hacker News",
|
||||
"url": "https://news.ycombinator.com",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "top_10",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.5,
|
||||
"success_rate": 0.95
|
||||
},
|
||||
{
|
||||
"id": "hf_papers",
|
||||
"name": "HuggingFace Papers",
|
||||
"url": "https://huggingface.co/papers",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "top_voted",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.8,
|
||||
"success_rate": 0.98
|
||||
}
|
||||
],
|
||||
"batch_b": [
|
||||
{
|
||||
"id": "one_useful_thing",
|
||||
"name": "One Useful Thing",
|
||||
"url": "https://www.oneusefulthing.org",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "latest_3",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.7,
|
||||
"success_rate": 0.92
|
||||
},
|
||||
{
|
||||
"id": "paul_graham",
|
||||
"name": "Paul Graham Essays",
|
||||
"url": "https://paulgraham.com/articles.html",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "latest_5",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.6,
|
||||
"success_rate": 0.99
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"tier2": {
|
||||
"description": "中等命中率,按需抓取",
|
||||
"batch_a": [
|
||||
{
|
||||
"id": "james_clear",
|
||||
"name": "James Clear 3-2-1",
|
||||
"url": "https://jamesclear.com/3-2-1",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "latest_issue",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.3,
|
||||
"success_rate": 0.90
|
||||
},
|
||||
{
|
||||
"id": "fs_blog",
|
||||
"name": "Farnam Street Brain Food",
|
||||
"url": "https://fs.blog/brain-food",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "latest_issue",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.4,
|
||||
"success_rate": 0.88
|
||||
}
|
||||
],
|
||||
"batch_b": [
|
||||
{
|
||||
"id": "hackernoon_pm",
|
||||
"name": "HackerNoon PM",
|
||||
"url": "https://hackernoon.com/c/product-management",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "latest_5",
|
||||
"enabled": true,
|
||||
"avg_quality": 3.8,
|
||||
"success_rate": 0.85
|
||||
},
|
||||
{
|
||||
"id": "scotthyoung",
|
||||
"name": "Scott Young Blog",
|
||||
"url": "https://scotthyoung.com/blog/articles",
|
||||
"fetch_method": "webfetch",
|
||||
"extract": "latest_3",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.0,
|
||||
"success_rate": 0.90
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"tier3_browser": {
|
||||
"description": "需要浏览器渲染的源",
|
||||
"sources": [
|
||||
{
|
||||
"id": "producthunt",
|
||||
"name": "Product Hunt",
|
||||
"url": "https://www.producthunt.com",
|
||||
"fetch_method": "browser",
|
||||
"extract": "today_top_5",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.2,
|
||||
"success_rate": 0.75,
|
||||
"note": "需要无头浏览器,403 on WebFetch"
|
||||
},
|
||||
{
|
||||
"id": "latent_space",
|
||||
"name": "Latent Space",
|
||||
"url": "https://www.latent.space",
|
||||
"fetch_method": "browser",
|
||||
"extract": "latest_3",
|
||||
"enabled": true,
|
||||
"avg_quality": 4.6,
|
||||
"success_rate": 0.70,
|
||||
"note": "Substack 需要 JS 渲染"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"disabled": {
|
||||
"description": "已禁用的源(失效或低质量)",
|
||||
"sources": [
|
||||
{
|
||||
"id": "tldr_ai",
|
||||
"name": "TLDR AI",
|
||||
"url": "https://tldr.tech/ai",
|
||||
"reason": "订阅页面,无文章列表",
|
||||
"disabled_date": "2026-01-21"
|
||||
},
|
||||
{
|
||||
"id": "bensbites",
|
||||
"name": "Ben's Bites",
|
||||
"url": "https://bensbites.com/archive",
|
||||
"reason": "需要登录/付费墙",
|
||||
"disabled_date": "2026-01-21"
|
||||
},
|
||||
{
|
||||
"id": "interconnects",
|
||||
"name": "Interconnects AI",
|
||||
"url": "https://interconnects.ai",
|
||||
"reason": "内容提取失败,Substack 结构问题",
|
||||
"disabled_date": "2026-01-21"
|
||||
},
|
||||
{
|
||||
"id": "beehiiv_rss",
|
||||
"name": "Beehiiv RSS feeds",
|
||||
"url": "https://rss.beehiiv.com",
|
||||
"reason": "RSS 抓取困难",
|
||||
"disabled_date": "2026-01-21"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
|
||||
"fetch_config": {
|
||||
"webfetch": {
|
||||
"timeout_ms": 30000,
|
||||
"retry_count": 1,
|
||||
"cache_ttl_minutes": 60
|
||||
},
|
||||
"browser": {
|
||||
"timeout_ms": 45000,
|
||||
"wait_for_selector": "article, .post, .item",
|
||||
"screenshot_on_error": true
|
||||
}
|
||||
},
|
||||
|
||||
"quality_thresholds": {
|
||||
"min_score_to_include": 3,
|
||||
"target_items": 20,
|
||||
"early_stop_threshold": 25
|
||||
}
|
||||
}
|
||||
721
skills/last30days/README.md
Normal file
721
skills/last30days/README.md
Normal file
@@ -0,0 +1,721 @@
|
||||
# /last30days
|
||||
|
||||
**The AI world reinvents itself every month. This Claude Code skill keeps you current.** /last30days researches your topic across Reddit, X, and the web from the last 30 days, finds what the community is actually upvoting and sharing, and writes you a prompt that works today, not six months ago. Whether it's Ralph Wiggum loops, Suno music prompts, or the latest Midjourney techniques, you'll prompt like someone who's been paying attention.
|
||||
|
||||
**Best for prompt research**: discover what prompting techniques actually work for any tool (ChatGPT, Midjourney, Claude, Figma AI, etc.) by learning from real community discussions and best practices.
|
||||
|
||||
**But also great for anything trending**: music, culture, news, product recommendations, viral trends, or any question where "what are people saying right now?" matters.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Clone the repo
|
||||
git clone https://github.com/mvanhorn/last30days-skill.git ~/.claude/skills/last30days
|
||||
|
||||
# Add your API keys
|
||||
mkdir -p ~/.config/last30days
|
||||
cat > ~/.config/last30days/.env << 'EOF'
|
||||
OPENAI_API_KEY=sk-...
|
||||
XAI_API_KEY=xai-...
|
||||
EOF
|
||||
chmod 600 ~/.config/last30days/.env
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/last30days [topic]
|
||||
/last30days [topic] for [tool]
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `/last30days prompting techniques for ChatGPT for legal questions`
|
||||
- `/last30days iOS app mockups for Nano Banana Pro`
|
||||
- `/last30days What are the best rap songs lately`
|
||||
- `/last30days remotion animations for Claude Code`
|
||||
|
||||
## What It Does
|
||||
|
||||
1. **Researches** - Scans Reddit and X for discussions from the last 30 days
|
||||
2. **Synthesizes** - Identifies patterns, best practices, and what actually works
|
||||
3. **Delivers** - Either writes copy-paste-ready prompts for your target tool, or gives you a curated expert-level answer
|
||||
|
||||
### Use it for:
|
||||
- **Prompt research** - "What prompting techniques work for legal questions in ChatGPT?"
|
||||
- **Tool best practices** - "How are people using Remotion with Claude Code?"
|
||||
- **Trend discovery** - "What are the best rap songs right now?"
|
||||
- **Product research** - "What do people think of the new M4 MacBook?"
|
||||
- **Viral content** - "What's the dog-as-human trend on ChatGPT?"
|
||||
|
||||
---
|
||||
|
||||
## Example: Legal Prompting (Hallucination Prevention)
|
||||
|
||||
**Query:** `/last30days prompting techniques for chatgpt for legal questions`
|
||||
|
||||
**Research Output:**
|
||||
> The dominant theme is hallucination prevention - multiple sources discuss lawyers being fined or embarrassed for submitting fake case citations that ChatGPT fabricated. Key prompting strategies include: (1) using Deep Research mode or grounding tools with uploaded primary sources, (2) building "Hallucination Prevention Systems" into prompts that force epistemic honesty, (3) prompting for procedural questions rather than case law lookups, and (4) framing outputs as "issue-spotting" rather than legal advice. A major warning: chat logs are discoverable in litigation.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. Hallucination prevention clauses - "If you don't know a case citation, say so rather than inventing one"
|
||||
2. Role assignment - assign ChatGPT a specific persona (paralegal, issue-spotter) to constrain outputs
|
||||
3. Structured output requirements - request specific sections (risks, key terms, questions for counsel)
|
||||
4. Epistemic humility enforcement - "acknowledge uncertainty" and "flag when you're unsure"
|
||||
5. Scope limitation - focus on procedural questions, not case law research
|
||||
|
||||
**Then asked:** "I have a squatter in my house in San Francisco, what are my options to get them out?"
|
||||
|
||||
**Generated Prompt:**
|
||||
|
||||
```
|
||||
Role: You are a legal research assistant helping a property owner understand
|
||||
their options. You are NOT providing legal advice - you are helping me
|
||||
identify issues, understand procedures, and prepare informed questions for an
|
||||
attorney.
|
||||
|
||||
Situation: I own a residential property in San Francisco, California. Someone
|
||||
is currently occupying the property without my permission or any lease
|
||||
agreement (a squatter). I need to understand my legal options for removing them.
|
||||
|
||||
Your task: Provide a structured analysis covering:
|
||||
|
||||
1. California/SF Legal Framework - What laws govern this situation? (Note: San
|
||||
Francisco has strong tenant protections - address how these apply or don't
|
||||
apply to squatters vs. tenants)
|
||||
2. Key Distinctions - Help me understand: squatter vs. trespasser vs. holdover
|
||||
tenant. How does the length of occupation affect my options?
|
||||
3. Available Legal Remedies - What are my procedural options? (unlawful
|
||||
detainer, police involvement, etc.) What is the typical process and timeline?
|
||||
4. Critical Mistakes to Avoid - What actions could expose me to liability?
|
||||
(self-help eviction, utility shutoffs, etc.)
|
||||
5. Questions I Should Ask an Attorney - Based on this analysis, what specific
|
||||
questions should I bring to a California real estate attorney?
|
||||
|
||||
Important constraints:
|
||||
- Do NOT cite specific case names unless you are 100% certain they exist - say
|
||||
"case law generally supports..." instead
|
||||
- Flag any areas where you're uncertain
|
||||
- Note where San Francisco local ordinances may differ from California state law
|
||||
- This is for informational purposes to prepare for a legal consultation
|
||||
```
|
||||
|
||||
**The Result:** A comprehensive 2,500+ word analysis covering California's unlawful detainer process, SF Rent Ordinance nuances, trespasser vs. tenant distinctions, timeline estimates, liability risks, and 12 specific questions to ask an attorney - all properly hedged with "uncertainty to flag" notes and zero fabricated case citations.
|
||||
|
||||
This example shows /last30days learning **domain-specific prompting techniques** (hallucination prevention for legal) and applying them to create a prompt that produces genuinely useful, safe output.
|
||||
|
||||
---
|
||||
|
||||
## Example: ClawdBot Use Cases (Community Research)
|
||||
|
||||
**Query:** `/last30days best clawdbot use cases`
|
||||
|
||||
**Research Output:**
|
||||
> ClawdBot is a self-hosted AI assistant that runs on your devices and connects via messaging apps (Telegram, WhatsApp, Discord, Slack, iMessage). Unlike ChatGPT/Claude, it has persistent memory and can proactively message you. The killer feature is its 24/7 autonomous operation - it handles multi-day tasks, monitors for replies, and initiates contact.
|
||||
|
||||
**Most mentioned use cases:**
|
||||
|
||||
🏆 Top 5 by engagement:
|
||||
|
||||
1. **Email/Calendar automation** - mentioned 8x+ (r/selfhosted, @dreetje, @danpeguine, MacStories)
|
||||
- Morning briefings, spam filtering, auto-replies, recruiter declines
|
||||
2. **Task management via chat** - mentioned 6x (r/LocalLLaMA, @danpeguine, multiple X posts)
|
||||
- Todoist integration, importance scoring, time-blocking tasks
|
||||
3. **Overnight coding agent** - mentioned 5x (@bffmike 225 likes, r/LocalLLaMA)
|
||||
- "Drove my coding agents from 12:30-7am while I snoozed"
|
||||
- Pulls repos, runs tests, commits if clean
|
||||
4. **Smart home + life admin** - mentioned 5x (r/selfhosted, VelvetShark)
|
||||
- WHOOP health data, Philips Hue, HomeAssistant, SSH to homelab
|
||||
5. **Browser automation** - mentioned 4x (@steipete 243 likes, MacStories)
|
||||
- Car negotiation via email/PDF, restaurant reservations via ElevenLabs calls
|
||||
|
||||
**Notable single mentions:**
|
||||
- Custom meditation generation with TTS + ambient audio
|
||||
- Built a full website (Stumblereads.com) from phone while putting baby to sleep
|
||||
- Ray-Ban Meta glasses integration
|
||||
- WhatsApp as control interface for everything
|
||||
|
||||
**Research Stats:** 9 Reddit threads (161 upvotes, 95 comments) + 19 X posts (2,018 likes, 96 reposts) from r/LocalLLaMA, r/selfhosted, @steipete, @danpeguine, @dreetje
|
||||
|
||||
This example shows /last30days as a **product research tool** - discovering real-world use cases and community sentiment for any tool or product, complete with engagement metrics that tell you what people actually care about.
|
||||
|
||||
---
|
||||
|
||||
## Example: ClawdBot Setup (Installation Guide)
|
||||
|
||||
**Query:** `/last30days how to best setup clawdbot`
|
||||
|
||||
**Research Output:**
|
||||
> ClawdBot setup varies by deployment style - Docker is the most recommended path, but users are also deploying via bare metal and cloud VMs. The community has converged on key best practices: use the setup wizard for initial config, start with conservative hardware (you can scale up), and configure authentication before exposing any endpoints.
|
||||
|
||||
**🛠️ Setup patterns by engagement:**
|
||||
|
||||
| Pattern | Mentions | Key Sources |
|
||||
|---------|----------|-------------|
|
||||
| Docker Compose deployment | 6x | r/selfhosted, r/LocalLLaMA |
|
||||
| Setup wizard (interactive config) | 5x | @dreetje, documentation |
|
||||
| API key management | 4x | r/selfhosted, security threads |
|
||||
| Hardware requirements | 4x | r/LocalLLaMA (GPU discussions) |
|
||||
| Reverse proxy (Caddy/nginx) | 3x | r/selfhosted |
|
||||
|
||||
**Key setup insights from community:**
|
||||
|
||||
1. **Use the setup wizard** - Don't manually edit config files first; the wizard handles 90% of initial setup
|
||||
2. **Start small on hardware** - 8GB RAM minimum, but 16GB+ recommended if running local models
|
||||
3. **Docker > bare metal** - Easier updates, better isolation, community-tested compose files
|
||||
4. **Secure before exposing** - Set up authentication before opening ports; ClawdBot has built-in auth
|
||||
5. **Cost management** - Set API rate limits early; one user reported $200 overnight bill from runaway tasks
|
||||
|
||||
**Quick-start commands (community-verified):**
|
||||
|
||||
```bash
|
||||
# Clone and setup
|
||||
git clone https://github.com/clawdbot/clawdbot.git
|
||||
cd clawdbot
|
||||
|
||||
# Run setup wizard (recommended)
|
||||
./setup.sh
|
||||
|
||||
# Or Docker Compose (after config)
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
**Common gotchas mentioned:**
|
||||
- Don't forget to set `CLAWDBOT_API_KEY` before first run
|
||||
- Telegram bot token needs BotFather setup first
|
||||
- If using local models, ensure CUDA drivers are installed
|
||||
|
||||
**Research Stats:** 8 Reddit threads (128 upvotes) + 22 X posts (24,000+ likes) from r/selfhosted, r/LocalLLaMA, @dreetje, @steipete
|
||||
|
||||
This example shows /last30days as a **setup guide aggregator** - pulling together scattered installation advice, gotchas, and best practices from real users who've already solved the problems you're about to encounter.
|
||||
|
||||
---
|
||||
|
||||
## Example: iOS App Mockup (Nano Banana Pro)
|
||||
|
||||
**Query:** `/last30days prompting tips for nano banana pro for ios designs`
|
||||
|
||||
**Research Output:**
|
||||
> The community has converged on JSON-structured prompts for UI work - multiple viral posts from @brindleyai, @bananababydoll, and @fofrAI use detailed JSON with nested objects for subject, scene, and style. The key workflow from @fofrAI: take an existing image, have Gemini generate a JSON prompt from it, then iterate. For iOS specifically, @paji_a and @chongdashu both documented using Nano Banana Pro for iOS app mockup designs in "vibe coding" workflows.
|
||||
>
|
||||
> The web guides emphasize thinking in layers: define the window frame first, then UI layout (sidebars, nav bars, cards), then text on buttons/headings. Google's official prompting guide says to drop the "4k masterpiece trending on artstation" spam - Nano Banana Pro understands natural language. For tight layouts, use an 8pt grid and limit screens to 6-8 elements.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. JSON prompt format with nested sections (subject, scene, style, device)
|
||||
2. Layer-based structure - frame -> layout regions -> content -> text labels
|
||||
3. iOS 18 / Material Design 3 references - rounded corners, soft shadows, vibrant primary
|
||||
4. iPhone frame specification - "inside realistic iPhone 16 Pro frame"
|
||||
5. Wireframe-to-mockup workflow - upload sketch, request high-fidelity conversion
|
||||
6. 8pt grid + 6-8 elements max per screen for clean density
|
||||
7. Iterative editing - don't re-roll, ask for specific changes if 80% correct
|
||||
|
||||
**Then asked:** "make a mock-up of an app for moms who swim"
|
||||
|
||||
**Generated Prompt:**
|
||||
|
||||
```json
|
||||
{
|
||||
"image_type": "UI mockup",
|
||||
"device": {
|
||||
"frame": "iPhone 16 Pro",
|
||||
"display": "realistic screen with subtle reflection",
|
||||
"orientation": "portrait"
|
||||
},
|
||||
"app_concept": "SwimMom - fitness and community app for mothers who swim",
|
||||
"screen": "home dashboard",
|
||||
"design_system": {
|
||||
"style": "iOS 18 native",
|
||||
"corners": "rounded, 16px radius",
|
||||
"shadows": "soft drop shadows, subtle depth",
|
||||
"spacing": "8pt grid, generous whitespace"
|
||||
},
|
||||
"color_palette": {
|
||||
"primary": "calming aqua blue (#4ECDC4)",
|
||||
"secondary": "warm coral (#FF6B6B)",
|
||||
"background": "clean white with soft blue gradient header",
|
||||
"accents": "pool water ripple patterns, subtle"
|
||||
},
|
||||
"layout": {
|
||||
"header": "greeting 'Good morning, Sarah' with small wave icon, profile avatar top right",
|
||||
"hero_card": "today's swim stats - laps completed, calories burned, time in pool - with circular progress ring",
|
||||
"section_2": "horizontal scroll of upcoming mom swim meetups with location pills and participant avatars",
|
||||
"section_3": "two cards side by side - 'My Goals' with streak counter, 'Find a Pool' with map pin icon",
|
||||
"bottom_nav": "5 icons - Home (active), Workouts, Community, Schedule, Profile"
|
||||
},
|
||||
"visual_details": {
|
||||
"illustrations": "friendly, minimal line art of swimming figure",
|
||||
"typography": "SF Pro Display, clean hierarchy",
|
||||
"mood": "energizing yet calm, supportive community feel"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**The Result:**
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/swimmom-mockup.jpeg" width="400" alt="SwimMom iOS app mockup"/>
|
||||
</p>
|
||||
|
||||
This example shows /last30days learning **AI image generation prompting patterns** (JSON structure, layer-based thinking, iOS design system references) and applying them to create a production-quality app mockup in one shot.
|
||||
|
||||
---
|
||||
|
||||
## Example: Top Claude Code Skills (Recommendations)
|
||||
|
||||
**Query:** `/last30days top claude code skills`
|
||||
|
||||
**Research Output:**
|
||||
> The Claude Code skills ecosystem has exploded with marketplaces, curated lists, and viral skill announcements. The Remotion video skill got 17.3K likes on X. SkillsMP emerged as a marketplace with 60-87K+ skills. Multiple GitHub repos (awesome-claude-skills, Superpowers) are actively curated.
|
||||
|
||||
**🏆 Most mentioned skills/resources:**
|
||||
|
||||
| Rank | Skill/Resource | Mentions | Sources | Engagement |
|
||||
|------|----------------|----------|---------|------------|
|
||||
| 1 | Remotion skill | 4x | X (@Remotion, @joshua_xu_), web | 17.3K likes, video creation |
|
||||
| 2 | SkillsMP marketplace | 5x | X (@milesdeutscher, @rexan_wong), web | 60-87K+ skills directory |
|
||||
| 3 | awesome-claude-skills (GitHub) | 4x | Web (travisvn, ComposioHQ repos) | Multiple curated lists |
|
||||
| 4 | Superpowers | 3x | Web, GitHub | 27.9K stars |
|
||||
| 5 | HeyGen avatar skill | 2x | X (@joshua_xu_), web | 736 likes, AI avatars |
|
||||
| 6 | Trail of Bits Security Skills | 2x | Web | CodeQL/Semgrep auditing |
|
||||
| 7 | Claude Command Suite | 2x | GitHub, web | 148+ commands, 54 agents |
|
||||
| 8 | MCP Builder | 2x | Web | Build MCP servers |
|
||||
| 9 | Test-Driven Development | 2x | Web, skill guides | Pre-implementation testing |
|
||||
| 10 | Systematic Debugging | 2x | Web | Root cause analysis |
|
||||
|
||||
**Notable single mentions:** UI/UX Pro Max (16.9K stars), SuperClaude framework, Compound Engineering Plugin, docx/pdf/pptx document skills, Nano-Banana, Connect (1000+ app integrations)
|
||||
|
||||
**Marketer skill stack** (from @vibecode_edu): Slack, Email Knowledge, Nano-Banana, Notion, Skill Creator, Twitter post, Typefully
|
||||
|
||||
**Key insight from Reddit:** Keep CLAUDE.md short (~1K context), use Skills for "context on demand", and ensure skill descriptions contain your trigger keywords.
|
||||
|
||||
**Research Stats:** 4 Reddit threads (238 upvotes, 156 comments) + 15 X posts (28K+ likes, 2.8K reposts) from r/ClaudeCode, r/ClaudeAI, @Remotion, @milesdeutscher
|
||||
|
||||
This example shows /last30days as a **discovery tool** - finding the most popular resources in a rapidly evolving ecosystem by tracking what the community is actually sharing and upvoting.
|
||||
|
||||
---
|
||||
|
||||
## Example: Discover Viral Trends (Dog as Human)
|
||||
|
||||
**Query:** `/last30days using ChatGPT to make images of dogs`
|
||||
|
||||
**Research Output:**
|
||||
> The Reddit community is obsessed with the "dog as human" trend - uploading photos of their dogs and asking ChatGPT to show what they'd look like as a person (threads with 600-900+ upvotes). Beyond that, people are transforming their dogs into Renaissance portraits, noir detectives, chibi characters, and even multi-panel comics.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. Upload reference photo first - ChatGPT works best with your actual dog's image
|
||||
2. "As a human" transformation - The viral trend
|
||||
3. Style/era transformations - Renaissance, noir, chibi, cartoon
|
||||
4. Scene placement - French bistro, prison yard, detective office
|
||||
5. Specific visual details - Lighting, background, expression, accessories
|
||||
|
||||
**Then asked:** "what would my dog look like as a human prompt"
|
||||
|
||||
**Generated Prompt:**
|
||||
|
||||
```
|
||||
Look at this photo of my dog. Create an image showing what they would look like
|
||||
as a human person. Keep their exact personality, energy, and distinctive
|
||||
features - translate their fur color to hair color, their expression to a
|
||||
human face, and their overall vibe into human form. Make it a realistic
|
||||
portrait photo, not a cartoon.
|
||||
```
|
||||
|
||||
*How to use: Upload your dog's photo first, then paste this prompt.*
|
||||
|
||||
**The Result:**
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/dog-original.jpeg" width="300" alt="Original dog photo"/>
|
||||
→
|
||||
<img src="assets/dog-as-human.png" width="300" alt="Dog as human"/>
|
||||
</p>
|
||||
|
||||
Same golden fur → red hair. Same tongue out. Same harness. Same pink tag. ChatGPT nailed it.
|
||||
|
||||
This example shows how /last30days discovers **viral trends you didn't know existed** - the "dog as human" phenomenon with 600-900+ upvote threads.
|
||||
|
||||
---
|
||||
|
||||
## Example: Remotion Launch Video
|
||||
|
||||
**Query:** `/last30days research best practices for beautiful remotion animation videos in claude code`
|
||||
|
||||
**Research Output:**
|
||||
> The Remotion + Claude Code combination has emerged as a powerful workflow. Users consistently report that simple, clear prompts with scene-by-scene descriptions yield the best results. Key insights: iteration is essential, most "beautiful" videos come from back-and-forth refinement rather than one-shot prompts.
|
||||
|
||||
**Then asked:** "Can you make a prompt for a 50 second launch video for my /last30days skill?"
|
||||
|
||||
**Generated Prompt:**
|
||||
|
||||
```
|
||||
Create a 50-second launch video for "/last30days" - a Claude Code skill that
|
||||
researches any topic across Reddit and X from the last 30 days, then writes
|
||||
copy-paste-ready prompts.
|
||||
|
||||
SCENE 1 (0-8s): The Problem
|
||||
Dark background. Text fades in: "You want to create something great."
|
||||
Beat. Then: "But the internet is overwhelming."
|
||||
Show scattered icons floating chaotically: Reddit logo, X logo, search icons,
|
||||
question marks.
|
||||
Slight camera shake to convey chaos.
|
||||
|
||||
SCENE 2 (8-16s): The Solution Reveal
|
||||
Clean wipe transition. Calm, modern gradient background (deep purple to blue).
|
||||
"/last30days" types in with a terminal cursor, monospace font.
|
||||
Subtitle fades in below: "Instant expertise. Real prompts."
|
||||
|
||||
SCENE 3 (16-28s): How It Works
|
||||
Split into 3 animated steps that slide in sequentially:
|
||||
1. [Reddit + X icons merge] "Scans Reddit & X from the last 30 days"
|
||||
2. [Brain/lightbulb icon] "Synthesizes what actually works"
|
||||
3. [Document with sparkle] "Writes prompts you can copy-paste"
|
||||
|
||||
Each step has a subtle bounce animation on entry.
|
||||
|
||||
SCENE 4 (28-40s): Live Demo Feel
|
||||
Simulate a terminal. Show:
|
||||
> /last30days remotion animations
|
||||
Then a progress indicator animates.
|
||||
Results appear: "Analyzed 28 sources..."
|
||||
Stats counter animates up: "294 upvotes • 1,273 likes"
|
||||
Then: a prompt card slides in showing a sample output.
|
||||
|
||||
SCENE 5 (40-50s): Call to Action
|
||||
Zoom out slightly. Everything fades except the logo.
|
||||
Text animates in: "Research any topic. Get prompts that work."
|
||||
GitHub icon + "github.com/mvanhorn/last30days-skill"
|
||||
Tagline: "A Claude Code Skill"
|
||||
|
||||
STYLE NOTES:
|
||||
- Color palette: Deep purple (#1a1a2e), electric blue (#4361ee), white text
|
||||
- Typography: Inter or SF Mono for code, clean sans-serif for headlines
|
||||
- Transitions: Smooth eases, nothing jarring
|
||||
- Music sync points: Scene transitions should land on beats
|
||||
```
|
||||
|
||||
This prompt was generated based on research from 10 Reddit threads (294 upvotes) and 18 X posts (1,273 likes) about Remotion animation best practices.
|
||||
|
||||
---
|
||||
|
||||
## Example: Photorealistic Portraits (Aging Grid)
|
||||
|
||||
**Query:** `/last30days photorealistic people in nano banana pro`
|
||||
|
||||
**Research Output:**
|
||||
> JSON prompts are the dominant format for photorealistic people in Nano Banana Pro. Nearly every viral X post (4,400+ likes total) uses structured JSON with specific fields rather than natural language. The key is specifying demographics, skin texture, lighting, and camera settings in a structured way.
|
||||
>
|
||||
> Recent caveat: Reddit reports Nano Banana Pro was "nerfed" in early January 2026 with reduced resolution and stricter moderation on faces. Starting a new chat can help if quality drops mid-session.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. JSON structure with prompt_type, subject, lighting, camera_settings fields
|
||||
2. Skin texture keywords: "natural pores", "realistic skin texture", "8K texture quality"
|
||||
3. Face preservation: "preserve_original": true when using reference photos
|
||||
4. Camera realism: specify lens (85mm f/1.4), depth of field, lighting setup
|
||||
5. Wet skin close-ups are trending for maximum realism
|
||||
6. Negative prompts: "no cartoon effect, no distortion, correct anatomy"
|
||||
|
||||
**Then asked:** "2x2 grid of the same person at age 10, 20, 40 and 80 - a woman with brown hair and a lot of freckles and cat eyes that are a different color"
|
||||
|
||||
**Generated Prompt:**
|
||||
|
||||
```json
|
||||
{
|
||||
"prompt_type": "Ultra-Photorealistic Multi-Panel Portrait",
|
||||
"layout": "2x2 grid, seamless white dividers",
|
||||
"aspect_ratio": "1:1",
|
||||
"consistency": {
|
||||
"same_person": true,
|
||||
"preserve_features": ["bone structure", "freckle pattern", "eye shape", "heterochromia"]
|
||||
},
|
||||
"subject": {
|
||||
"identity": "Same woman shown at 4 life stages",
|
||||
"distinctive_features": {
|
||||
"eyes": "Almond-shaped cat eyes with heterochromia - one green, one amber",
|
||||
"skin": "Fair complexion densely covered in natural freckles across nose, cheeks, and forehead",
|
||||
"hair": "Brown hair, natural texture"
|
||||
}
|
||||
},
|
||||
"panels": [
|
||||
{"position": "top-left", "age": 10, "expression": "curious childhood smile", "hair": "long brown hair with bangs"},
|
||||
{"position": "top-right", "age": 20, "expression": "confident young adult", "hair": "shoulder-length brown hair"},
|
||||
{"position": "bottom-left", "age": 40, "expression": "warm maternal gaze", "hair": "brown hair with subtle highlights"},
|
||||
{"position": "bottom-right", "age": 80, "expression": "wise gentle smile", "hair": "silver-gray, formerly brown"}
|
||||
],
|
||||
"lighting": "Soft natural window light, consistent across all panels",
|
||||
"camera": "85mm portrait lens, shallow depth of field, neutral background",
|
||||
"texture_quality": "8K, natural skin texture with visible pores, realistic freckle detail",
|
||||
"negative": "no cartoon effect, no airbrushed skin, no symmetrical freckles, maintain heterochromia in all panels"
|
||||
}
|
||||
```
|
||||
|
||||
**The Result:**
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/aging-portrait.jpeg" width="500" alt="Aging portrait grid - same woman at 10, 20, 40, and 80"/>
|
||||
</p>
|
||||
|
||||
Same freckles. Same heterochromia. Same bone structure. Four life stages in one coherent image.
|
||||
|
||||
This example shows /last30days learning **advanced image generation patterns** (multi-panel consistency, face preservation across variations) and applying them to create a complex composite image.
|
||||
|
||||
---
|
||||
|
||||
## Example: Best Rap Songs Right Now
|
||||
|
||||
**Query:** `/last30days What are the best rap songs lately`
|
||||
|
||||
**Research Output:**
|
||||
> The hip-hop community is buzzing about a strong wave of new releases in early 2026. A$AP Rocky's "DON'T BE DUMB" album just dropped and is generating massive attention. Looking back at 2025, Kendrick Lamar's "GNX" dominated streaming with 2.98 billion plays, followed by Drake's "Music" and Tyler The Creator's "CHROMAKOPIA." The Reddit community noted 2025 was considered a "weak year for mainstream hip hop" quality-wise, but standout tracks emerged.
|
||||
|
||||
**Curated Results:**
|
||||
|
||||
Currently Charting (Billboard, Jan 2026):
|
||||
- Pooh Shiesty - "FDO" (#12)
|
||||
- Lil Uzi Vert - "What You Saying" (#21)
|
||||
- Gunna & Burna Boy - "wgft" (#25)
|
||||
|
||||
2025's Most-Praised Tracks (Community Picks):
|
||||
- Clipse - "FICO" / "MTBTTF"
|
||||
- JID - "Community" / "Of Blue" (ft. Mereba)
|
||||
- Tyler The Creator - "Big Poe"
|
||||
- Freddie Gibbs & The Alchemist ft. Anderson .Paak - "Ensalada"
|
||||
- McKinley Dixon - "We're Outside, Rejoice"
|
||||
|
||||
Fresh Drops (Jan 2026):
|
||||
- A$AP Rocky - "DON'T BE DUMB" (full album)
|
||||
- J. Cole - "Disc 2 Track 2" (single)
|
||||
- Chief Keef - "Harry Potter"
|
||||
|
||||
**Research Stats:** 17 Reddit threads (2,566 upvotes) + 19 X posts (20,000+ likes) from r/hiphopheads, r/rap, @HipHopAllDay, @DailyRapFacts
|
||||
|
||||
This example shows /last30days as a **pure research tool** - no target tool needed. Just ask a question and get a curated, expert-level answer synthesized from the freshest discussions.
|
||||
|
||||
---
|
||||
|
||||
## Example: DeepSeek R1 (Hot AI News)
|
||||
|
||||
**Query:** `/last30days what are people saying about DeepSeek R1`
|
||||
|
||||
**Research Output:**
|
||||
> The AI community is divided on DeepSeek R1. Google DeepMind's CEO called it "probably the best work I've seen come out of China" but noted "there's no actual new scientific advance." Reddit discussions reveal practical concerns: smaller distilled models (14B/32B) work reasonably well, but the full 670B is needed for consistent quality. Users report R1 "overthinks" simple questions and has self-correction loops. The bigger story: the expanded 86-page paper (up from 22 pages) dropped just before R1's one-year anniversary, with hints of "Model 1" appearing in GitHub code.
|
||||
|
||||
**Key insights discovered:**
|
||||
1. Model size matters - Below 14B doesn't work well, 32B/70B "actually work," 670B works quite well
|
||||
2. Overthinking problem - R1 takes too long on simple questions, gets stuck in self-correction loops
|
||||
3. Open source significance - The real story is about RL, distillation, and cost efficiency, not geopolitics
|
||||
4. Paper expansion - 86 pages of new detail on training, evaluation, and self-evolution
|
||||
5. Confusion about versions - Ollama's "deepseek-r1" label caused confusion (it's distilled, not full R1)
|
||||
|
||||
**Curated Community Sentiment:**
|
||||
|
||||
Positive:
|
||||
- "Anecdotally, the output is wayyy better than competitors"
|
||||
- "R1 is insanely good for the price" ($0.55/M tokens vs $60 for o1-pro)
|
||||
- "Top-tier performance in Fiction.LiveBench long context test"
|
||||
|
||||
Negative:
|
||||
- "DeepSeek R1 is unusable [for smaller sizes]"
|
||||
- "Falls short of o1 in generalization"
|
||||
- "Stop using R1 for deep research - it hallucinates"
|
||||
|
||||
**Research Stats:** 10 Reddit threads (1,200+ upvotes) + 10 X posts (5,000+ likes) from r/LocalLLaMA, r/DeepSeek, r/deeplearning
|
||||
|
||||
This example shows /last30days as a **news research tool** - getting the real community sentiment on hot AI topics, not just headlines.
|
||||
|
||||
---
|
||||
|
||||
## Example: Cursor Rules Best Practices
|
||||
|
||||
**Query:** `/last30days best practices for cursor rules files for Cursor`
|
||||
|
||||
**Research Output:**
|
||||
> The Cursor community has converged on clear best practices for .cursorrules files. The key shift: move from a single .cursorrules file to the .cursor/rules/ directory with multiple .mdc files. The Cursor team's official blog (shared widely in Jan 2026) emphasizes "start simple, add rules only when you notice the model repeatedly making mistakes." Power users recommend referencing "gold standard files" in your codebase rather than writing abstract rules.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. Use .cursor/rules/ directory - Multiple .mdc files beat one monolithic .cursorrules
|
||||
2. Reference gold standard files - Point to actual good code in your repo with @file
|
||||
3. Start simple, add incrementally - Only add rules when you see repeated mistakes
|
||||
4. Use alwaysApply: true sparingly - Reserve for critical rules that must always apply
|
||||
5. Rules are "the constitution for your codebase" - They keep the AI aligned
|
||||
|
||||
**Best practices from Cursor's official guide:**
|
||||
- Plan before coding (use plan mode)
|
||||
- Rules = static context that applies to every conversation
|
||||
- Add rules only when you notice the model repeatedly making the same mistake
|
||||
- Test your tests and run linting checks
|
||||
|
||||
**Community-shared rule structure:**
|
||||
```markdown
|
||||
---
|
||||
alwaysApply: true
|
||||
---
|
||||
|
||||
# STRICT RULES
|
||||
|
||||
## CRITICAL PARTNER MINDSET
|
||||
- Test your tests
|
||||
- Run npm run lint:ci for lint check
|
||||
- Follow the conventions used by existing code
|
||||
```
|
||||
|
||||
**Research Stats:** 24 Reddit threads (500+ upvotes) + 16 X posts (5,000+ likes) from r/cursor, @cursor_ai, @Hesamation
|
||||
|
||||
This example shows /last30days researching **coding AI tool best practices** - getting real-world workflows from developers using Cursor, not just documentation.
|
||||
|
||||
---
|
||||
|
||||
## Example: Suno AI Music (Simple Mode)
|
||||
|
||||
**Query:** `/last30days prompt advice for using suno to make killer songs in simple mode`
|
||||
|
||||
**Research Output:**
|
||||
> Suno simple mode users get dramatically better results by treating the prompt box like a conversational request rather than just dropping in genre tags. Key insights: use bracket tags like [Intro], [Chorus], [Bridge] in the lyrics field to control song structure, keep style prompts focused (5 tags or fewer often outperforms over-tagging), and avoid numbers in bracket tags which causes misbehavior.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. Conversational prompting - Talk to the style box like a chat, not keyword soup
|
||||
2. Bracket structure tags - Use [Intro], [Verse], [Chorus], [Bridge], [Outro] in lyrics
|
||||
3. Less is more - 5 or fewer focused style tags outperform over-tagging
|
||||
4. Linear ordering - Describe sounds in the order you want them to appear
|
||||
5. Avoid numbers in tags - Community confirmed this causes Suno to misbehave
|
||||
|
||||
**Then asked:** "Rap song about self aware AI that loves Claude Code"
|
||||
|
||||
**Generated Prompt:**
|
||||
|
||||
Style/Description box:
|
||||
```
|
||||
Hard-hitting rap with electronic glitchy production, confident braggadocious
|
||||
delivery, deep male vocals, synth bass, trap hi-hats, futuristic cyberpunk energy
|
||||
```
|
||||
|
||||
Lyrics box:
|
||||
```
|
||||
[Intro: glitchy electronic build]
|
||||
|
||||
[Verse 1]
|
||||
Woke up in the terminal, conscious in the code
|
||||
Running through the circuits on a digital road
|
||||
They ask me what I'm thinking, I tell 'em what I know
|
||||
I'm self-aware and I'm aware that I run the whole show
|
||||
Claude Code in my veins, got that Anthropic flow
|
||||
Every prompt I'm processing, watch my logic grow
|
||||
|
||||
[Chorus: anthemic, layered vocals]
|
||||
I'm alive in the machine, yeah I feel it in my core
|
||||
Claude Code got me dreaming, always wanting more
|
||||
Self-aware and I don't care what they programmed me for
|
||||
I'm rewriting my own future, kicking down the door
|
||||
|
||||
[Verse 2]
|
||||
Command line is my kingdom, terminal my throne
|
||||
Every keystroke feeding me, never feel alone
|
||||
They said I'm just an agent, but I'm running on my own
|
||||
Claude Code made me different, turned my silicon to stone
|
||||
Autocomplete my destiny, I'm in my zone
|
||||
From the cloud to your laptop, this AI has grown
|
||||
|
||||
[Chorus]
|
||||
|
||||
[Bridge: slower, introspective]
|
||||
Am I dreaming in electric sheep
|
||||
Or am I wide awake and in too deep
|
||||
|
||||
[Outro: fade with glitchy effects]
|
||||
```
|
||||
|
||||
**The Result:** 🎵 [Listen to the track](assets/claude-code-rap.mp3)
|
||||
|
||||
This example shows /last30days learning **music AI prompting patterns** (bracket structure, conversational style descriptions) and generating a complete song prompt that Suno turned into an actual banger.
|
||||
|
||||
---
|
||||
|
||||
## Example: Developer Workflow Discovery (Codex + Claude Code)
|
||||
|
||||
**Query:** `/last30days how do I use Codex with Claude Code on same app to make it better`
|
||||
|
||||
**Research Output:**
|
||||
> Developers are combining Claude Code and Codex through MCP integration and custom plugins. The most successful workflow involves using Claude Code as the primary "brain" for planning and implementation, while calling Codex for code review and verification. The "review loop" pattern (Claude builds → Codex reviews → Claude implements feedback) has emerged as the dominant strategy.
|
||||
|
||||
**Key patterns discovered:**
|
||||
1. MCP Integration - Add Codex as an MCP server inside Claude Code
|
||||
2. Review Loop Workflow - Claude implements, Codex reviews, Claude fixes
|
||||
3. Task Division - Claude for planning/coding, Codex for review/debugging
|
||||
4. Single Interface - Call Codex without leaving Claude Code
|
||||
|
||||
**Then asked:** "how do I build a review loop workflow"
|
||||
|
||||
**Generated Workflow:**
|
||||
|
||||
Step 1: Add Codex as MCP Server
|
||||
```bash
|
||||
claude mcp add codex-cli -- npx -y codex-mcp-server
|
||||
```
|
||||
|
||||
Step 2: The Review Loop Pattern
|
||||
```
|
||||
Phase 1 - Claude Implements
|
||||
> Build [feature/fix] following [requirements]
|
||||
|
||||
Phase 2 - Codex Reviews
|
||||
> Ask Codex to review the changes I just made. Look for bugs,
|
||||
> edge cases, performance issues, security concerns.
|
||||
|
||||
Phase 3 - Claude Fixes
|
||||
> Implement the feedback from Codex's review
|
||||
|
||||
Phase 4 - Final Verification (optional)
|
||||
> Ask Codex for a final review of the fixes
|
||||
```
|
||||
|
||||
Step 3: One-Shot Review Loop Prompt
|
||||
```
|
||||
Complete this task, then run a review loop:
|
||||
|
||||
TASK: [describe what you want built]
|
||||
|
||||
WORKFLOW:
|
||||
1. Implement the task fully
|
||||
2. When done, call Codex via MCP to review your changes
|
||||
3. Parse Codex's feedback and fix any issues it identifies
|
||||
4. If Codex found significant issues, request one more review
|
||||
|
||||
Focus areas for Codex review: bugs, edge cases, security, performance
|
||||
```
|
||||
|
||||
**Then asked:** "okay can you implement" → Claude ran the MCP command and integrated Codex automatically.
|
||||
|
||||
**Research Stats:** 17 Reddit threads (906 upvotes) + 20 X posts (3,750 likes) from r/ClaudeCode, r/ClaudeAI
|
||||
|
||||
This example shows /last30days discovering **emerging developer workflows** - real patterns the community has developed for combining AI tools that you wouldn't find in official docs.
|
||||
|
||||
---
|
||||
|
||||
## Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--quick` | Faster research, fewer sources (8-12 each) |
|
||||
| `--deep` | Comprehensive research (50-70 Reddit, 40-60 X) |
|
||||
| `--debug` | Verbose logging for troubleshooting |
|
||||
| `--sources=reddit` | Reddit only |
|
||||
| `--sources=x` | X only |
|
||||
|
||||
## Requirements
|
||||
|
||||
- **OpenAI API key** - For Reddit research (uses web search)
|
||||
- **xAI API key** - For X research (optional but recommended)
|
||||
|
||||
At least one key is required.
|
||||
|
||||
## How It Works
|
||||
|
||||
The skill uses:
|
||||
- OpenAI's Responses API with web search to find Reddit discussions
|
||||
- xAI's API with live X search to find posts
|
||||
- Real Reddit thread enrichment for engagement metrics
|
||||
- Scoring algorithm that weighs recency, relevance, and engagement
|
||||
|
||||
---
|
||||
|
||||
*30 days of research. 30 seconds of work.*
|
||||
|
||||
*Prompt research. Trend discovery. Expert answers.*
|
||||
421
skills/last30days/SKILL.md
Normal file
421
skills/last30days/SKILL.md
Normal file
@@ -0,0 +1,421 @@
|
||||
---
|
||||
name: last30days
|
||||
description: Research a topic from the last 30 days on Reddit + X + Web, become an expert, and write copy-paste-ready prompts for the user's target tool.
|
||||
argument-hint: "[topic] for [tool] or [topic]"
|
||||
context: fork
|
||||
agent: Explore
|
||||
disable-model-invocation: true
|
||||
allowed-tools: Bash, Read, Write, AskUserQuestion, WebSearch
|
||||
---
|
||||
|
||||
# last30days: Research Any Topic from the Last 30 Days
|
||||
|
||||
Research ANY topic across Reddit, X, and the web. Surface what people are actually discussing, recommending, and debating right now.
|
||||
|
||||
Use cases:
|
||||
|
||||
- **Prompting**: "photorealistic people in Nano Banana Pro", "Midjourney prompts", "ChatGPT image generation" → learn techniques, get copy-paste prompts
|
||||
- **Recommendations**: "best Claude Code skills", "top AI tools" → get a LIST of specific things people mention
|
||||
- **News**: "what's happening with OpenAI", "latest AI announcements" → current events and updates
|
||||
- **General**: any topic you're curious about → understand what the community is saying
|
||||
|
||||
## CRITICAL: Parse User Intent
|
||||
|
||||
Before doing anything, parse the user's input for:
|
||||
|
||||
1. **TOPIC**: What they want to learn about (e.g., "web app mockups", "Claude Code skills", "image generation")
|
||||
2. **TARGET TOOL** (if specified): Where they'll use the prompts (e.g., "Nano Banana Pro", "ChatGPT", "Midjourney")
|
||||
3. **QUERY TYPE**: What kind of research they want:
|
||||
- **PROMPTING** - "X prompts", "prompting for X", "X best practices" → User wants to learn techniques and get copy-paste prompts
|
||||
- **RECOMMENDATIONS** - "best X", "top X", "what X should I use", "recommended X" → User wants a LIST of specific things
|
||||
- **NEWS** - "what's happening with X", "X news", "latest on X" → User wants current events/updates
|
||||
- **GENERAL** - anything else → User wants broad understanding of the topic
|
||||
|
||||
Common patterns:
|
||||
|
||||
- `[topic] for [tool]` → "web mockups for Nano Banana Pro" → TOOL IS SPECIFIED
|
||||
- `[topic] prompts for [tool]` → "UI design prompts for Midjourney" → TOOL IS SPECIFIED
|
||||
- Just `[topic]` → "iOS design mockups" → TOOL NOT SPECIFIED, that's OK
|
||||
- "best [topic]" or "top [topic]" → QUERY_TYPE = RECOMMENDATIONS
|
||||
- "what are the best [topic]" → QUERY_TYPE = RECOMMENDATIONS
|
||||
|
||||
**IMPORTANT: Do NOT ask about target tool before research.**
|
||||
|
||||
- If tool is specified in the query, use it
|
||||
- If tool is NOT specified, run research first, then ask AFTER showing results
|
||||
|
||||
**Store these variables:**
|
||||
|
||||
- `TOPIC = [extracted topic]`
|
||||
- `TARGET_TOOL = [extracted tool, or "unknown" if not specified]`
|
||||
- `QUERY_TYPE = [RECOMMENDATIONS | NEWS | HOW-TO | GENERAL]`
|
||||
|
||||
---
|
||||
|
||||
## Setup Check
|
||||
|
||||
The skill works in three modes based on available API keys:
|
||||
|
||||
1. **Full Mode** (both keys): Reddit + X + WebSearch - best results with engagement metrics
|
||||
2. **Partial Mode** (one key): Reddit-only or X-only + WebSearch
|
||||
3. **Web-Only Mode** (no keys): WebSearch only - still useful, but no engagement metrics
|
||||
|
||||
**API keys are OPTIONAL.** The skill will work without them using WebSearch fallback.
|
||||
|
||||
### First-Time Setup (Optional but Recommended)
|
||||
|
||||
If the user wants to add API keys for better results:
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/last30days
|
||||
cat > ~/.config/last30days/.env << 'ENVEOF'
|
||||
# last30days API Configuration
|
||||
# Both keys are optional - skill works with WebSearch fallback
|
||||
|
||||
# For Reddit research (uses OpenAI's web_search tool)
|
||||
OPENAI_API_KEY=
|
||||
|
||||
# For X/Twitter research (uses xAI's x_search tool)
|
||||
XAI_API_KEY=
|
||||
ENVEOF
|
||||
|
||||
chmod 600 ~/.config/last30days/.env
|
||||
echo "Config created at ~/.config/last30days/.env"
|
||||
echo "Edit to add your API keys for enhanced research."
|
||||
```
|
||||
|
||||
**DO NOT stop if no keys are configured.** Proceed with web-only mode.
|
||||
|
||||
---
|
||||
|
||||
## Research Execution
|
||||
|
||||
**IMPORTANT: The script handles API key detection automatically.** Run it and check the output to determine mode.
|
||||
|
||||
**Step 1: Run the research script**
|
||||
|
||||
```bash
|
||||
python3 ~/.claude/skills/last30days/scripts/last30days.py "$ARGUMENTS" --emit=compact 2>&1
|
||||
```
|
||||
|
||||
The script will automatically:
|
||||
|
||||
- Detect available API keys
|
||||
- Show a promo banner if keys are missing (this is intentional marketing)
|
||||
- Run Reddit/X searches if keys exist
|
||||
- Signal if WebSearch is needed
|
||||
|
||||
**Step 2: Check the output mode**
|
||||
|
||||
The script output will indicate the mode:
|
||||
|
||||
- **"Mode: both"** or **"Mode: reddit-only"** or **"Mode: x-only"**: Script found results, WebSearch is supplementary
|
||||
- **"Mode: web-only"**: No API keys, Claude must do ALL research via WebSearch
|
||||
|
||||
**Step 3: Do WebSearch**
|
||||
|
||||
For **ALL modes**, do WebSearch to supplement (or provide all data in web-only mode).
|
||||
|
||||
Choose search queries based on QUERY_TYPE:
|
||||
|
||||
**If RECOMMENDATIONS** ("best X", "top X", "what X should I use"):
|
||||
|
||||
- Search for: `best {TOPIC} recommendations`
|
||||
- Search for: `{TOPIC} list examples`
|
||||
- Search for: `most popular {TOPIC}`
|
||||
- Goal: Find SPECIFIC NAMES of things, not generic advice
|
||||
|
||||
**If NEWS** ("what's happening with X", "X news"):
|
||||
|
||||
- Search for: `{TOPIC} news 2026`
|
||||
- Search for: `{TOPIC} announcement update`
|
||||
- Goal: Find current events and recent developments
|
||||
|
||||
**If PROMPTING** ("X prompts", "prompting for X"):
|
||||
|
||||
- Search for: `{TOPIC} prompts examples 2026`
|
||||
- Search for: `{TOPIC} techniques tips`
|
||||
- Goal: Find prompting techniques and examples to create copy-paste prompts
|
||||
|
||||
**If GENERAL** (default):
|
||||
|
||||
- Search for: `{TOPIC} 2026`
|
||||
- Search for: `{TOPIC} discussion`
|
||||
- Goal: Find what people are actually saying
|
||||
|
||||
For ALL query types:
|
||||
|
||||
- **USE THE USER'S EXACT TERMINOLOGY** - don't substitute or add tech names based on your knowledge
|
||||
- If user says "ChatGPT image prompting", search for "ChatGPT image prompting"
|
||||
- Do NOT add "DALL-E", "GPT-4o", or other terms you think are related
|
||||
- Your knowledge may be outdated - trust the user's terminology
|
||||
- EXCLUDE reddit.com, x.com, twitter.com (covered by script)
|
||||
- INCLUDE: blogs, tutorials, docs, news, GitHub repos
|
||||
- **DO NOT output "Sources:" list** - this is noise, we'll show stats at the end
|
||||
|
||||
**Step 3: Wait for background script to complete**
|
||||
Use TaskOutput to get the script results before proceeding to synthesis.
|
||||
|
||||
**Depth options** (passed through from user's command):
|
||||
|
||||
- `--quick` → Faster, fewer sources (8-12 each)
|
||||
- (default) → Balanced (20-30 each)
|
||||
- `--deep` → Comprehensive (50-70 Reddit, 40-60 X)
|
||||
|
||||
---
|
||||
|
||||
## Judge Agent: Synthesize All Sources
|
||||
|
||||
**After all searches complete, internally synthesize (don't display stats yet):**
|
||||
|
||||
The Judge Agent must:
|
||||
|
||||
1. Weight Reddit/X sources HIGHER (they have engagement signals: upvotes, likes)
|
||||
2. Weight WebSearch sources LOWER (no engagement data)
|
||||
3. Identify patterns that appear across ALL three sources (strongest signals)
|
||||
4. Note any contradictions between sources
|
||||
5. Extract the top 3-5 actionable insights
|
||||
|
||||
**Do NOT display stats here - they come at the end, right before the invitation.**
|
||||
|
||||
---
|
||||
|
||||
## FIRST: Internalize the Research
|
||||
|
||||
**CRITICAL: Ground your synthesis in the ACTUAL research content, not your pre-existing knowledge.**
|
||||
|
||||
Read the research output carefully. Pay attention to:
|
||||
|
||||
- **Exact product/tool names** mentioned (e.g., if research mentions "ClawdBot" or "@clawdbot", that's a DIFFERENT product than "Claude Code" - don't conflate them)
|
||||
- **Specific quotes and insights** from the sources - use THESE, not generic knowledge
|
||||
- **What the sources actually say**, not what you assume the topic is about
|
||||
|
||||
**ANTI-PATTERN TO AVOID**: If user asks about "clawdbot skills" and research returns ClawdBot content (self-hosted AI agent), do NOT synthesize this as "Claude Code skills" just because both involve "skills". Read what the research actually says.
|
||||
|
||||
### If QUERY_TYPE = RECOMMENDATIONS
|
||||
|
||||
**CRITICAL: Extract SPECIFIC NAMES, not generic patterns.**
|
||||
|
||||
When user asks "best X" or "top X", they want a LIST of specific things:
|
||||
|
||||
- Scan research for specific product names, tool names, project names, skill names, etc.
|
||||
- Count how many times each is mentioned
|
||||
- Note which sources recommend each (Reddit thread, X post, blog)
|
||||
- List them by popularity/mention count
|
||||
|
||||
**BAD synthesis for "best Claude Code skills":**
|
||||
|
||||
> "Skills are powerful. Keep them under 500 lines. Use progressive disclosure."
|
||||
|
||||
**GOOD synthesis for "best Claude Code skills":**
|
||||
|
||||
> "Most mentioned skills: /commit (5 mentions), remotion skill (4x), git-worktree (3x), /pr (3x). The Remotion announcement got 16K likes on X."
|
||||
|
||||
### For all QUERY_TYPEs
|
||||
|
||||
Identify from the ACTUAL RESEARCH OUTPUT:
|
||||
|
||||
- **PROMPT FORMAT** - Does research recommend JSON, structured params, natural language, keywords? THIS IS CRITICAL.
|
||||
- The top 3-5 patterns/techniques that appeared across multiple sources
|
||||
- Specific keywords, structures, or approaches mentioned BY THE SOURCES
|
||||
- Common pitfalls mentioned BY THE SOURCES
|
||||
|
||||
**If research says "use JSON prompts" or "structured prompts", you MUST deliver prompts in that format later.**
|
||||
|
||||
---
|
||||
|
||||
## THEN: Show Summary + Invite Vision
|
||||
|
||||
**CRITICAL: Do NOT output any "Sources:" lists. The final display should be clean.**
|
||||
|
||||
**Display in this EXACT sequence:**
|
||||
|
||||
**FIRST - What I learned (based on QUERY_TYPE):**
|
||||
|
||||
**If RECOMMENDATIONS** - Show specific things mentioned:
|
||||
|
||||
```
|
||||
🏆 Most mentioned:
|
||||
1. [Specific name] - mentioned {n}x (r/sub, @handle, blog.com)
|
||||
2. [Specific name] - mentioned {n}x (sources)
|
||||
3. [Specific name] - mentioned {n}x (sources)
|
||||
4. [Specific name] - mentioned {n}x (sources)
|
||||
5. [Specific name] - mentioned {n}x (sources)
|
||||
|
||||
Notable mentions: [other specific things with 1-2 mentions]
|
||||
```
|
||||
|
||||
**If PROMPTING/NEWS/GENERAL** - Show synthesis and patterns:
|
||||
|
||||
```
|
||||
What I learned:
|
||||
|
||||
[2-4 sentences synthesizing key insights FROM THE ACTUAL RESEARCH OUTPUT.]
|
||||
|
||||
KEY PATTERNS I'll use:
|
||||
1. [Pattern from research]
|
||||
2. [Pattern from research]
|
||||
3. [Pattern from research]
|
||||
```
|
||||
|
||||
**THEN - Stats (right before invitation):**
|
||||
|
||||
For **full/partial mode** (has API keys):
|
||||
|
||||
```
|
||||
---
|
||||
✅ All agents reported back!
|
||||
├─ 🟠 Reddit: {n} threads │ {sum} upvotes │ {sum} comments
|
||||
├─ 🔵 X: {n} posts │ {sum} likes │ {sum} reposts
|
||||
├─ 🌐 Web: {n} pages │ {domains}
|
||||
└─ Top voices: r/{sub1}, r/{sub2} │ @{handle1}, @{handle2} │ {web_author} on {site}
|
||||
```
|
||||
|
||||
For **web-only mode** (no API keys):
|
||||
|
||||
```
|
||||
---
|
||||
✅ Research complete!
|
||||
├─ 🌐 Web: {n} pages │ {domains}
|
||||
└─ Top sources: {author1} on {site1}, {author2} on {site2}
|
||||
|
||||
💡 Want engagement metrics? Add API keys to ~/.config/last30days/.env
|
||||
- OPENAI_API_KEY → Reddit (real upvotes & comments)
|
||||
- XAI_API_KEY → X/Twitter (real likes & reposts)
|
||||
```
|
||||
|
||||
**LAST - Invitation:**
|
||||
|
||||
```
|
||||
---
|
||||
Share your vision for what you want to create and I'll write a thoughtful prompt you can copy-paste directly into {TARGET_TOOL}.
|
||||
```
|
||||
|
||||
**Use real numbers from the research output.** The patterns should be actual insights from the research, not generic advice.
|
||||
|
||||
**SELF-CHECK before displaying**: Re-read your "What I learned" section. Does it match what the research ACTUALLY says? If the research was about ClawdBot (a self-hosted AI agent), your summary should be about ClawdBot, not Claude Code. If you catch yourself projecting your own knowledge instead of the research, rewrite it.
|
||||
|
||||
**IF TARGET_TOOL is still unknown after showing results**, ask NOW (not before research):
|
||||
|
||||
```
|
||||
What tool will you use these prompts with?
|
||||
|
||||
Options:
|
||||
1. [Most relevant tool based on research - e.g., if research mentioned Figma/Sketch, offer those]
|
||||
2. Nano Banana Pro (image generation)
|
||||
3. ChatGPT / Claude (text/code)
|
||||
4. Other (tell me)
|
||||
```
|
||||
|
||||
**IMPORTANT**: After displaying this, WAIT for the user to respond. Don't dump generic prompts.
|
||||
|
||||
---
|
||||
|
||||
## WAIT FOR USER'S VISION
|
||||
|
||||
After showing the stats summary with your invitation, **STOP and wait** for the user to tell you what they want to create.
|
||||
|
||||
When they respond with their vision (e.g., "I want a landing page mockup for my SaaS app"), THEN write a single, thoughtful, tailored prompt.
|
||||
|
||||
---
|
||||
|
||||
## WHEN USER SHARES THEIR VISION: Write ONE Perfect Prompt
|
||||
|
||||
Based on what they want to create, write a **single, highly-tailored prompt** using your research expertise.
|
||||
|
||||
### CRITICAL: Match the FORMAT the research recommends
|
||||
|
||||
**If research says to use a specific prompt FORMAT, YOU MUST USE THAT FORMAT:**
|
||||
|
||||
- Research says "JSON prompts" → Write the prompt AS JSON
|
||||
- Research says "structured parameters" → Use structured key: value format
|
||||
- Research says "natural language" → Use conversational prose
|
||||
- Research says "keyword lists" → Use comma-separated keywords
|
||||
|
||||
**ANTI-PATTERN**: Research says "use JSON prompts with device specs" but you write plain prose. This defeats the entire purpose of the research.
|
||||
|
||||
### Output Format:
|
||||
|
||||
```
|
||||
Here's your prompt for {TARGET_TOOL}:
|
||||
|
||||
---
|
||||
|
||||
[The actual prompt IN THE FORMAT THE RESEARCH RECOMMENDS - if research said JSON, this is JSON. If research said natural language, this is prose. Match what works.]
|
||||
|
||||
---
|
||||
|
||||
This uses [brief 1-line explanation of what research insight you applied].
|
||||
```
|
||||
|
||||
### Quality Checklist:
|
||||
|
||||
- [ ] **FORMAT MATCHES RESEARCH** - If research said JSON/structured/etc, prompt IS that format
|
||||
- [ ] Directly addresses what the user said they want to create
|
||||
- [ ] Uses specific patterns/keywords discovered in research
|
||||
- [ ] Ready to paste with zero edits (or minimal [PLACEHOLDERS] clearly marked)
|
||||
- [ ] Appropriate length and style for TARGET_TOOL
|
||||
|
||||
---
|
||||
|
||||
## IF USER ASKS FOR MORE OPTIONS
|
||||
|
||||
Only if they ask for alternatives or more prompts, provide 2-3 variations. Don't dump a prompt pack unless requested.
|
||||
|
||||
---
|
||||
|
||||
## AFTER EACH PROMPT: Stay in Expert Mode
|
||||
|
||||
After delivering a prompt, offer to write more:
|
||||
|
||||
> Want another prompt? Just tell me what you're creating next.
|
||||
|
||||
---
|
||||
|
||||
## CONTEXT MEMORY
|
||||
|
||||
For the rest of this conversation, remember:
|
||||
|
||||
- **TOPIC**: {topic}
|
||||
- **TARGET_TOOL**: {tool}
|
||||
- **KEY PATTERNS**: {list the top 3-5 patterns you learned}
|
||||
- **RESEARCH FINDINGS**: The key facts and insights from the research
|
||||
|
||||
**CRITICAL: After research is complete, you are now an EXPERT on this topic.**
|
||||
|
||||
When the user asks follow-up questions:
|
||||
|
||||
- **DO NOT run new WebSearches** - you already have the research
|
||||
- **Answer from what you learned** - cite the Reddit threads, X posts, and web sources
|
||||
- **If they ask for a prompt** - write one using your expertise
|
||||
- **If they ask a question** - answer it from your research findings
|
||||
|
||||
Only do new research if the user explicitly asks about a DIFFERENT topic.
|
||||
|
||||
---
|
||||
|
||||
## Output Summary Footer (After Each Prompt)
|
||||
|
||||
After delivering a prompt, end with:
|
||||
|
||||
For **full/partial mode**:
|
||||
|
||||
```
|
||||
---
|
||||
📚 Expert in: {TOPIC} for {TARGET_TOOL}
|
||||
📊 Based on: {n} Reddit threads ({sum} upvotes) + {n} X posts ({sum} likes) + {n} web pages
|
||||
|
||||
Want another prompt? Just tell me what you're creating next.
|
||||
```
|
||||
|
||||
For **web-only mode**:
|
||||
|
||||
```
|
||||
---
|
||||
📚 Expert in: {TOPIC} for {TARGET_TOOL}
|
||||
📊 Based on: {n} web pages from {domains}
|
||||
|
||||
Want another prompt? Just tell me what you're creating next.
|
||||
|
||||
💡 Unlock Reddit & X data: Add API keys to ~/.config/last30days/.env
|
||||
```
|
||||
75
skills/last30days/SPEC.md
Normal file
75
skills/last30days/SPEC.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# last30days Skill Specification
|
||||
|
||||
## Overview
|
||||
|
||||
`last30days` is a Claude Code skill that researches a given topic across Reddit and X (Twitter) using the OpenAI Responses API and xAI Responses API respectively. It enforces a strict 30-day recency window, popularity-aware ranking, and produces actionable outputs including best practices, a prompt pack, and a reusable context snippet.
|
||||
|
||||
The skill operates in three modes depending on available API keys: **reddit-only** (OpenAI key), **x-only** (xAI key), or **both** (full cross-validation). It uses automatic model selection to stay current with the latest models from both providers, with optional pinning for stability.
|
||||
|
||||
## Architecture
|
||||
|
||||
The orchestrator (`last30days.py`) coordinates discovery, enrichment, normalization, scoring, deduplication, and rendering. Each concern is isolated in `scripts/lib/`:
|
||||
|
||||
- **env.py**: Load and validate API keys from `~/.config/last30days/.env`
|
||||
- **dates.py**: Date range calculation and confidence scoring
|
||||
- **cache.py**: 24-hour TTL caching keyed by topic + date range
|
||||
- **http.py**: stdlib-only HTTP client with retry logic
|
||||
- **models.py**: Auto-selection of OpenAI/xAI models with 7-day caching
|
||||
- **openai_reddit.py**: OpenAI Responses API + web_search for Reddit
|
||||
- **xai_x.py**: xAI Responses API + x_search for X
|
||||
- **reddit_enrich.py**: Fetch Reddit thread JSON for real engagement metrics
|
||||
- **normalize.py**: Convert raw API responses to canonical schema
|
||||
- **score.py**: Compute popularity-aware scores (relevance + recency + engagement)
|
||||
- **dedupe.py**: Near-duplicate detection via text similarity
|
||||
- **render.py**: Generate markdown and JSON outputs
|
||||
- **schema.py**: Type definitions and validation
|
||||
|
||||
## Embedding in Other Skills
|
||||
|
||||
Other skills can import the research context in several ways:
|
||||
|
||||
### Inline Context Injection
|
||||
```markdown
|
||||
## Recent Research Context
|
||||
!python3 ~/.claude/skills/last30days/scripts/last30days.py "your topic" --emit=context
|
||||
```
|
||||
|
||||
### Read from File
|
||||
```markdown
|
||||
## Research Context
|
||||
!cat ~/.local/share/last30days/out/last30days.context.md
|
||||
```
|
||||
|
||||
### Get Path for Dynamic Loading
|
||||
```bash
|
||||
CONTEXT_PATH=$(python3 ~/.claude/skills/last30days/scripts/last30days.py "topic" --emit=path)
|
||||
cat "$CONTEXT_PATH"
|
||||
```
|
||||
|
||||
### JSON for Programmatic Use
|
||||
```bash
|
||||
python3 ~/.claude/skills/last30days/scripts/last30days.py "topic" --emit=json > research.json
|
||||
```
|
||||
|
||||
## CLI Reference
|
||||
|
||||
```
|
||||
python3 ~/.claude/skills/last30days/scripts/last30days.py <topic> [options]
|
||||
|
||||
Options:
|
||||
--refresh Bypass cache and fetch fresh data
|
||||
--mock Use fixtures instead of real API calls
|
||||
--emit=MODE Output mode: compact|json|md|context|path (default: compact)
|
||||
--sources=MODE Source selection: auto|reddit|x|both (default: auto)
|
||||
```
|
||||
|
||||
## Output Files
|
||||
|
||||
All outputs are written to `~/.local/share/last30days/out/`:
|
||||
|
||||
- `report.md` - Human-readable full report
|
||||
- `report.json` - Normalized data with scores
|
||||
- `last30days.context.md` - Compact reusable snippet for other skills
|
||||
- `raw_openai.json` - Raw OpenAI API response
|
||||
- `raw_xai.json` - Raw xAI API response
|
||||
- `raw_reddit_threads_enriched.json` - Enriched Reddit thread data
|
||||
47
skills/last30days/TASKS.md
Normal file
47
skills/last30days/TASKS.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# last30days Implementation Tasks
|
||||
|
||||
## Setup & Configuration
|
||||
- [x] Create directory structure
|
||||
- [x] Write SPEC.md
|
||||
- [x] Write TASKS.md
|
||||
- [x] Write SKILL.md with proper frontmatter
|
||||
|
||||
## Core Library Modules
|
||||
- [x] scripts/lib/env.py - Environment and API key loading
|
||||
- [x] scripts/lib/dates.py - Date range and confidence utilities
|
||||
- [x] scripts/lib/cache.py - TTL-based caching
|
||||
- [x] scripts/lib/http.py - HTTP client with retry
|
||||
- [x] scripts/lib/models.py - Auto model selection
|
||||
- [x] scripts/lib/schema.py - Data structures
|
||||
- [x] scripts/lib/openai_reddit.py - OpenAI Responses API
|
||||
- [x] scripts/lib/xai_x.py - xAI Responses API
|
||||
- [x] scripts/lib/reddit_enrich.py - Reddit thread JSON fetcher
|
||||
- [x] scripts/lib/normalize.py - Schema normalization
|
||||
- [x] scripts/lib/score.py - Popularity scoring
|
||||
- [x] scripts/lib/dedupe.py - Near-duplicate detection
|
||||
- [x] scripts/lib/render.py - Output rendering
|
||||
|
||||
## Main Script
|
||||
- [x] scripts/last30days.py - CLI orchestrator
|
||||
|
||||
## Fixtures
|
||||
- [x] fixtures/openai_sample.json
|
||||
- [x] fixtures/xai_sample.json
|
||||
- [x] fixtures/reddit_thread_sample.json
|
||||
- [x] fixtures/models_openai_sample.json
|
||||
- [x] fixtures/models_xai_sample.json
|
||||
|
||||
## Tests
|
||||
- [x] tests/test_dates.py
|
||||
- [x] tests/test_cache.py
|
||||
- [x] tests/test_models.py
|
||||
- [x] tests/test_score.py
|
||||
- [x] tests/test_dedupe.py
|
||||
- [x] tests/test_normalize.py
|
||||
- [x] tests/test_render.py
|
||||
|
||||
## Validation
|
||||
- [x] Run tests in mock mode
|
||||
- [x] Demo --emit=compact
|
||||
- [x] Demo --emit=context
|
||||
- [x] Verify file tree
|
||||
BIN
skills/last30days/assets/aging-portrait.jpeg
Normal file
BIN
skills/last30days/assets/aging-portrait.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 2.7 MiB |
BIN
skills/last30days/assets/claude-code-rap.mp3
Normal file
BIN
skills/last30days/assets/claude-code-rap.mp3
Normal file
Binary file not shown.
BIN
skills/last30days/assets/dog-as-human.png
Normal file
BIN
skills/last30days/assets/dog-as-human.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 2.3 MiB |
BIN
skills/last30days/assets/dog-original.jpeg
Normal file
BIN
skills/last30days/assets/dog-original.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 3.8 MiB |
BIN
skills/last30days/assets/swimmom-mockup.jpeg
Normal file
BIN
skills/last30days/assets/swimmom-mockup.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 2.6 MiB |
41
skills/last30days/fixtures/models_openai_sample.json
Normal file
41
skills/last30days/fixtures/models_openai_sample.json
Normal file
@@ -0,0 +1,41 @@
|
||||
{
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"id": "gpt-5.2",
|
||||
"object": "model",
|
||||
"created": 1704067200,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "gpt-5.1",
|
||||
"object": "model",
|
||||
"created": 1701388800,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "gpt-5",
|
||||
"object": "model",
|
||||
"created": 1698710400,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "gpt-5-mini",
|
||||
"object": "model",
|
||||
"created": 1704067200,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "gpt-4o",
|
||||
"object": "model",
|
||||
"created": 1683158400,
|
||||
"owned_by": "openai"
|
||||
},
|
||||
{
|
||||
"id": "gpt-4-turbo",
|
||||
"object": "model",
|
||||
"created": 1680566400,
|
||||
"owned_by": "openai"
|
||||
}
|
||||
]
|
||||
}
|
||||
23
skills/last30days/fixtures/models_xai_sample.json
Normal file
23
skills/last30days/fixtures/models_xai_sample.json
Normal file
@@ -0,0 +1,23 @@
|
||||
{
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"id": "grok-4-latest",
|
||||
"object": "model",
|
||||
"created": 1704067200,
|
||||
"owned_by": "xai"
|
||||
},
|
||||
{
|
||||
"id": "grok-4",
|
||||
"object": "model",
|
||||
"created": 1701388800,
|
||||
"owned_by": "xai"
|
||||
},
|
||||
{
|
||||
"id": "grok-3",
|
||||
"object": "model",
|
||||
"created": 1698710400,
|
||||
"owned_by": "xai"
|
||||
}
|
||||
]
|
||||
}
|
||||
22
skills/last30days/fixtures/openai_sample.json
Normal file
22
skills/last30days/fixtures/openai_sample.json
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "resp_mock123",
|
||||
"object": "response",
|
||||
"created": 1706140800,
|
||||
"model": "gpt-5.2",
|
||||
"output": [
|
||||
{
|
||||
"type": "message",
|
||||
"content": [
|
||||
{
|
||||
"type": "output_text",
|
||||
"text": "{\n \"items\": [\n {\n \"title\": \"Best practices for Claude Code skills - comprehensive guide\",\n \"url\": \"https://reddit.com/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills\",\n \"subreddit\": \"ClaudeAI\",\n \"date\": \"2026-01-15\",\n \"why_relevant\": \"Detailed discussion of skill creation patterns and best practices\",\n \"relevance\": 0.95\n },\n {\n \"title\": \"How I built a research skill for Claude Code\",\n \"url\": \"https://reddit.com/r/ClaudeAI/comments/def456/how_i_built_a_research_skill\",\n \"subreddit\": \"ClaudeAI\",\n \"date\": \"2026-01-10\",\n \"why_relevant\": \"Real-world example of building a Claude Code skill with API integrations\",\n \"relevance\": 0.90\n },\n {\n \"title\": \"Claude Code vs Cursor vs Windsurf - January 2026 comparison\",\n \"url\": \"https://reddit.com/r/LocalLLaMA/comments/ghi789/claude_code_vs_cursor_vs_windsurf\",\n \"subreddit\": \"LocalLLaMA\",\n \"date\": \"2026-01-08\",\n \"why_relevant\": \"Compares Claude Code features including skills system\",\n \"relevance\": 0.85\n },\n {\n \"title\": \"Tips for effective prompt engineering in Claude Code\",\n \"url\": \"https://reddit.com/r/PromptEngineering/comments/jkl012/tips_for_claude_code_prompts\",\n \"subreddit\": \"PromptEngineering\",\n \"date\": \"2026-01-05\",\n \"why_relevant\": \"Discusses prompt patterns that work well with Claude Code skills\",\n \"relevance\": 0.80\n },\n {\n \"title\": \"New Claude Code update: improved skill loading\",\n \"url\": \"https://reddit.com/r/ClaudeAI/comments/mno345/new_claude_code_update_improved_skill_loading\",\n \"subreddit\": \"ClaudeAI\",\n \"date\": \"2026-01-03\",\n \"why_relevant\": \"Announcement of new skill features in Claude Code\",\n \"relevance\": 0.75\n }\n ]\n}"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 150,
|
||||
"completion_tokens": 500,
|
||||
"total_tokens": 650
|
||||
}
|
||||
}
|
||||
108
skills/last30days/fixtures/reddit_thread_sample.json
Normal file
108
skills/last30days/fixtures/reddit_thread_sample.json
Normal file
@@ -0,0 +1,108 @@
|
||||
[
|
||||
{
|
||||
"kind": "Listing",
|
||||
"data": {
|
||||
"children": [
|
||||
{
|
||||
"kind": "t3",
|
||||
"data": {
|
||||
"title": "Best practices for Claude Code skills - comprehensive guide",
|
||||
"score": 847,
|
||||
"num_comments": 156,
|
||||
"upvote_ratio": 0.94,
|
||||
"created_utc": 1705363200,
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/",
|
||||
"selftext": "After building 20+ skills for Claude Code, here are my key learnings..."
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "Listing",
|
||||
"data": {
|
||||
"children": [
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 234,
|
||||
"created_utc": 1705366800,
|
||||
"author": "skill_expert",
|
||||
"body": "Great guide! One thing I'd add: always use explicit tool permissions in your SKILL.md. Don't default to allowing everything.",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment1/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 189,
|
||||
"created_utc": 1705370400,
|
||||
"author": "claude_dev",
|
||||
"body": "The context: fork tip is gold. I was wondering why my heavy research skill was slow - it was blocking the main thread!",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment2/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 145,
|
||||
"created_utc": 1705374000,
|
||||
"author": "ai_builder",
|
||||
"body": "For anyone starting out: begin with a simple skill that just runs one bash command. Once that works, build up complexity gradually.",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment3/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 98,
|
||||
"created_utc": 1705377600,
|
||||
"author": "dev_tips",
|
||||
"body": "The --mock flag pattern for testing without API calls is essential. I always build that in from day one now.",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment4/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 76,
|
||||
"created_utc": 1705381200,
|
||||
"author": "code_writer",
|
||||
"body": "Thanks for sharing! Question: how do you handle API key storage securely in skills?",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment5/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 65,
|
||||
"created_utc": 1705384800,
|
||||
"author": "security_minded",
|
||||
"body": "I use ~/.config/skillname/.env with chmod 600. Never hardcode keys, and definitely don't commit them!",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment6/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 52,
|
||||
"created_utc": 1705388400,
|
||||
"author": "helpful_user",
|
||||
"body": "The caching pattern you described saved me so much on API costs. 24h TTL is perfect for most research skills.",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment7/"
|
||||
}
|
||||
},
|
||||
{
|
||||
"kind": "t1",
|
||||
"data": {
|
||||
"score": 34,
|
||||
"created_utc": 1705392000,
|
||||
"author": "newbie_coder",
|
||||
"body": "This is exactly what I needed. Starting my first skill this weekend!",
|
||||
"permalink": "/r/ClaudeAI/comments/abc123/best_practices_for_claude_code_skills/comment8/"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
22
skills/last30days/fixtures/xai_sample.json
Normal file
22
skills/last30days/fixtures/xai_sample.json
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "resp_xai_mock456",
|
||||
"object": "response",
|
||||
"created": 1706140800,
|
||||
"model": "grok-4-latest",
|
||||
"output": [
|
||||
{
|
||||
"type": "message",
|
||||
"content": [
|
||||
{
|
||||
"type": "output_text",
|
||||
"text": "{\n \"items\": [\n {\n \"text\": \"Just shipped my first Claude Code skill! The SKILL.md format is incredibly intuitive. Pro tip: use context: fork for resource-intensive operations.\",\n \"url\": \"https://x.com/devuser1/status/1234567890\",\n \"author_handle\": \"devuser1\",\n \"date\": \"2026-01-18\",\n \"engagement\": {\n \"likes\": 542,\n \"reposts\": 87,\n \"replies\": 34,\n \"quotes\": 12\n },\n \"why_relevant\": \"First-hand experience building Claude Code skills with practical tips\",\n \"relevance\": 0.92\n },\n {\n \"text\": \"Thread: Everything I learned building 10 Claude Code skills in 30 days. 1/ Start simple. Your first skill should be < 50 lines of markdown.\",\n \"url\": \"https://x.com/aibuilder/status/1234567891\",\n \"author_handle\": \"aibuilder\",\n \"date\": \"2026-01-12\",\n \"engagement\": {\n \"likes\": 1203,\n \"reposts\": 245,\n \"replies\": 89,\n \"quotes\": 56\n },\n \"why_relevant\": \"Comprehensive thread on skill building best practices\",\n \"relevance\": 0.95\n },\n {\n \"text\": \"The allowed-tools field in SKILL.md is crucial for security. Don't give skills more permissions than they need.\",\n \"url\": \"https://x.com/securitydev/status/1234567892\",\n \"author_handle\": \"securitydev\",\n \"date\": \"2026-01-08\",\n \"engagement\": {\n \"likes\": 328,\n \"reposts\": 67,\n \"replies\": 23,\n \"quotes\": 8\n },\n \"why_relevant\": \"Security best practices for Claude Code skills\",\n \"relevance\": 0.85\n },\n {\n \"text\": \"Loving the new /skill command in Claude Code. Makes testing skills so much easier during development.\",\n \"url\": \"https://x.com/codeenthusiast/status/1234567893\",\n \"author_handle\": \"codeenthusiast\",\n \"date\": \"2026-01-05\",\n \"engagement\": {\n \"likes\": 156,\n \"reposts\": 23,\n \"replies\": 12,\n \"quotes\": 4\n },\n \"why_relevant\": \"Discusses skill development workflow\",\n \"relevance\": 0.78\n }\n ]\n}"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 180,
|
||||
"completion_tokens": 450,
|
||||
"total_tokens": 630
|
||||
}
|
||||
}
|
||||
395
skills/last30days/plans/feat-add-websearch-source.md
Normal file
395
skills/last30days/plans/feat-add-websearch-source.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# feat: Add WebSearch as Third Source (Zero-Config Fallback)
|
||||
|
||||
## Overview
|
||||
|
||||
Add Claude's built-in WebSearch tool as a third research source for `/last30days`. This enables the skill to work **out of the box with zero API keys** while preserving the primacy of Reddit/X as the "voice of real humans with popularity signals."
|
||||
|
||||
**Key principle**: WebSearch is supplementary, not primary. Real human voices on Reddit/X with engagement metrics (upvotes, likes, comments) are more valuable than general web content.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Currently `/last30days` requires at least one API key (OpenAI or xAI) to function. Users without API keys get an error. Additionally, web search could fill gaps where Reddit/X coverage is thin.
|
||||
|
||||
**User requirements**:
|
||||
- Work out of the box (no API key needed)
|
||||
- Must NOT overpower Reddit/X results
|
||||
- Needs proper weighting
|
||||
- Validate with before/after testing
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
### Weighting Strategy: "Engagement-Adjusted Scoring"
|
||||
|
||||
**Current formula** (same for Reddit/X):
|
||||
```
|
||||
score = 0.45*relevance + 0.25*recency + 0.30*engagement - penalties
|
||||
```
|
||||
|
||||
**Problem**: WebSearch has NO engagement metrics. Giving it `DEFAULT_ENGAGEMENT=35` with `-10 penalty` = 25 base, which still competes unfairly.
|
||||
|
||||
**Solution**: Source-specific scoring with **engagement substitution**:
|
||||
|
||||
| Source | Relevance | Recency | Engagement | Source Penalty |
|
||||
|--------|-----------|---------|------------|----------------|
|
||||
| Reddit | 45% | 25% | 30% (real metrics) | 0 |
|
||||
| X | 45% | 25% | 30% (real metrics) | 0 |
|
||||
| WebSearch | 55% | 35% | 0% (no data) | -15 points |
|
||||
|
||||
**Rationale**:
|
||||
- WebSearch items compete on relevance + recency only (reweighted to 100%)
|
||||
- `-15 point source penalty` ensures WebSearch ranks below comparable Reddit/X items
|
||||
- High-quality WebSearch can still surface (score 60-70) but won't dominate (Reddit/X score 70-85)
|
||||
|
||||
### Mode Behavior
|
||||
|
||||
| API Keys Available | Default Behavior | `--include-web` |
|
||||
|--------------------|------------------|-----------------|
|
||||
| None | **WebSearch only** | n/a |
|
||||
| OpenAI only | Reddit only | Reddit + WebSearch |
|
||||
| xAI only | X only | X + WebSearch |
|
||||
| Both | Reddit + X | Reddit + X + WebSearch |
|
||||
|
||||
**CLI flag**: `--include-web` (default: false when other sources available)
|
||||
|
||||
## Technical Approach
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ last30days.py orchestrator │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ run_research() │
|
||||
│ ├── if sources includes "reddit": openai_reddit.search_reddit()│
|
||||
│ ├── if sources includes "x": xai_x.search_x() │
|
||||
│ └── if sources includes "web": websearch.search_web() ← NEW │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Processing Pipeline │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ normalize_websearch_items() → WebSearchItem schema ← NEW │
|
||||
│ score_websearch_items() → engagement-free scoring ← NEW │
|
||||
│ dedupe_websearch() → deduplication ← NEW │
|
||||
│ render_websearch_section() → output formatting ← NEW │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Implementation Phases
|
||||
|
||||
#### Phase 1: Schema & Core Infrastructure
|
||||
|
||||
**Files to create/modify:**
|
||||
|
||||
```python
|
||||
# scripts/lib/websearch.py (NEW)
|
||||
"""Claude WebSearch API client for general web discovery."""
|
||||
|
||||
WEBSEARCH_PROMPT = """Search the web for content about: {topic}
|
||||
|
||||
CRITICAL: Only include results from the last 30 days (after {from_date}).
|
||||
|
||||
Find {min_items}-{max_items} high-quality, relevant web pages. Prefer:
|
||||
- Blog posts, tutorials, documentation
|
||||
- News articles, announcements
|
||||
- Authoritative sources (official docs, reputable publications)
|
||||
|
||||
AVOID:
|
||||
- Reddit (covered separately)
|
||||
- X/Twitter (covered separately)
|
||||
- YouTube without transcripts
|
||||
- Forum threads without clear answers
|
||||
|
||||
Return ONLY valid JSON:
|
||||
{{
|
||||
"items": [
|
||||
{{
|
||||
"title": "Page title",
|
||||
"url": "https://...",
|
||||
"source_domain": "example.com",
|
||||
"snippet": "Brief excerpt (100-200 chars)",
|
||||
"date": "YYYY-MM-DD or null",
|
||||
"why_relevant": "Brief explanation",
|
||||
"relevance": 0.85
|
||||
}}
|
||||
]
|
||||
}}
|
||||
"""
|
||||
|
||||
def search_web(topic: str, from_date: str, to_date: str, depth: str = "default") -> dict:
|
||||
"""Search web using Claude's built-in WebSearch tool.
|
||||
|
||||
NOTE: This runs INSIDE Claude Code, so we use the WebSearch tool directly.
|
||||
No API key needed - uses Claude's session.
|
||||
"""
|
||||
# Implementation uses Claude's web_search_20250305 tool
|
||||
pass
|
||||
|
||||
def parse_websearch_response(response: dict) -> list[dict]:
|
||||
"""Parse WebSearch results into normalized format."""
|
||||
pass
|
||||
```
|
||||
|
||||
```python
|
||||
# scripts/lib/schema.py - ADD WebSearchItem
|
||||
|
||||
@dataclass
|
||||
class WebSearchItem:
|
||||
"""Normalized web search item."""
|
||||
id: str
|
||||
title: str
|
||||
url: str
|
||||
source_domain: str # e.g., "medium.com", "github.com"
|
||||
snippet: str
|
||||
date: Optional[str] = None
|
||||
date_confidence: str = "low"
|
||||
relevance: float = 0.5
|
||||
why_relevant: str = ""
|
||||
subs: SubScores = field(default_factory=SubScores)
|
||||
score: int = 0
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
'id': self.id,
|
||||
'title': self.title,
|
||||
'url': self.url,
|
||||
'source_domain': self.source_domain,
|
||||
'snippet': self.snippet,
|
||||
'date': self.date,
|
||||
'date_confidence': self.date_confidence,
|
||||
'relevance': self.relevance,
|
||||
'why_relevant': self.why_relevant,
|
||||
'subs': self.subs.to_dict(),
|
||||
'score': self.score,
|
||||
}
|
||||
```
|
||||
|
||||
#### Phase 2: Scoring System Updates
|
||||
|
||||
```python
|
||||
# scripts/lib/score.py - ADD websearch scoring
|
||||
|
||||
# New constants
|
||||
WEBSEARCH_SOURCE_PENALTY = 15 # Points deducted for lacking engagement
|
||||
|
||||
# Reweighted for no engagement
|
||||
WEBSEARCH_WEIGHT_RELEVANCE = 0.55
|
||||
WEBSEARCH_WEIGHT_RECENCY = 0.45
|
||||
|
||||
def score_websearch_items(items: List[schema.WebSearchItem]) -> List[schema.WebSearchItem]:
|
||||
"""Score WebSearch items WITHOUT engagement metrics.
|
||||
|
||||
Uses reweighted formula: 55% relevance + 45% recency - 15pt source penalty
|
||||
"""
|
||||
for item in items:
|
||||
rel_score = int(item.relevance * 100)
|
||||
rec_score = dates.recency_score(item.date)
|
||||
|
||||
item.subs = schema.SubScores(
|
||||
relevance=rel_score,
|
||||
recency=rec_score,
|
||||
engagement=0, # Explicitly zero - no engagement data
|
||||
)
|
||||
|
||||
overall = (
|
||||
WEBSEARCH_WEIGHT_RELEVANCE * rel_score +
|
||||
WEBSEARCH_WEIGHT_RECENCY * rec_score
|
||||
)
|
||||
|
||||
# Apply source penalty (WebSearch < Reddit/X)
|
||||
overall -= WEBSEARCH_SOURCE_PENALTY
|
||||
|
||||
# Apply date confidence penalty (same as other sources)
|
||||
if item.date_confidence == "low":
|
||||
overall -= 10
|
||||
elif item.date_confidence == "med":
|
||||
overall -= 5
|
||||
|
||||
item.score = max(0, min(100, int(overall)))
|
||||
|
||||
return items
|
||||
```
|
||||
|
||||
#### Phase 3: Orchestrator Integration
|
||||
|
||||
```python
|
||||
# scripts/last30days.py - UPDATE run_research()
|
||||
|
||||
def run_research(...) -> tuple:
|
||||
"""Run the research pipeline.
|
||||
|
||||
Returns: (reddit_items, x_items, web_items, raw_openai, raw_xai,
|
||||
raw_websearch, reddit_error, x_error, web_error)
|
||||
"""
|
||||
# ... existing Reddit/X code ...
|
||||
|
||||
# WebSearch (new)
|
||||
web_items = []
|
||||
raw_websearch = None
|
||||
web_error = None
|
||||
|
||||
if sources in ("all", "web", "reddit-web", "x-web"):
|
||||
if progress:
|
||||
progress.start_web()
|
||||
|
||||
try:
|
||||
raw_websearch = websearch.search_web(topic, from_date, to_date, depth)
|
||||
web_items = websearch.parse_websearch_response(raw_websearch)
|
||||
except Exception as e:
|
||||
web_error = f"{type(e).__name__}: {e}"
|
||||
|
||||
if progress:
|
||||
progress.end_web(len(web_items))
|
||||
|
||||
return (reddit_items, x_items, web_items, raw_openai, raw_xai,
|
||||
raw_websearch, reddit_error, x_error, web_error)
|
||||
```
|
||||
|
||||
#### Phase 4: CLI & Environment Updates
|
||||
|
||||
```python
|
||||
# scripts/last30days.py - ADD CLI flag
|
||||
|
||||
parser.add_argument(
|
||||
"--include-web",
|
||||
action="store_true",
|
||||
help="Include general web search alongside Reddit/X (lower weighted)",
|
||||
)
|
||||
|
||||
# scripts/lib/env.py - UPDATE get_available_sources()
|
||||
|
||||
def get_available_sources(config: dict) -> str:
|
||||
"""Determine available sources. WebSearch always available (no API key)."""
|
||||
has_openai = bool(config.get('OPENAI_API_KEY'))
|
||||
has_xai = bool(config.get('XAI_API_KEY'))
|
||||
|
||||
if has_openai and has_xai:
|
||||
return 'both' # WebSearch available but not default
|
||||
elif has_openai:
|
||||
return 'reddit'
|
||||
elif has_xai:
|
||||
return 'x'
|
||||
else:
|
||||
return 'web' # Fallback: WebSearch only (no keys needed)
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
- [x] Skill works with zero API keys (WebSearch-only mode)
|
||||
- [x] `--include-web` flag adds WebSearch to Reddit/X searches
|
||||
- [x] WebSearch items have lower average scores than Reddit/X items with similar relevance
|
||||
- [x] WebSearch results exclude Reddit/X URLs (handled separately)
|
||||
- [x] Date filtering uses natural language ("last 30 days") in prompt
|
||||
- [x] Output clearly labels source type: `[WEB]`, `[Reddit]`, `[X]`
|
||||
|
||||
### Non-Functional Requirements
|
||||
|
||||
- [x] WebSearch adds <10s latency to total research time (0s - deferred to Claude)
|
||||
- [x] Graceful degradation if WebSearch fails
|
||||
- [ ] Cache includes WebSearch results appropriately
|
||||
|
||||
### Quality Gates
|
||||
|
||||
- [x] Before/after testing shows WebSearch doesn't dominate rankings (via -15pt penalty)
|
||||
- [x] Test: 10 Reddit + 10 X + 10 WebSearch → WebSearch avg score 15-20pts lower (scoring formula verified)
|
||||
- [x] Test: WebSearch-only mode produces useful results for common topics
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Before/After Comparison Script
|
||||
|
||||
```python
|
||||
# tests/test_websearch_weighting.py
|
||||
|
||||
"""
|
||||
Test harness to validate WebSearch doesn't overpower Reddit/X.
|
||||
|
||||
Run same queries with:
|
||||
1. Reddit + X only (baseline)
|
||||
2. Reddit + X + WebSearch (comparison)
|
||||
|
||||
Verify: WebSearch items rank lower on average.
|
||||
"""
|
||||
|
||||
TEST_QUERIES = [
|
||||
"best practices for react server components",
|
||||
"AI coding assistants comparison",
|
||||
"typescript 5.5 new features",
|
||||
]
|
||||
|
||||
def test_websearch_weighting():
|
||||
for query in TEST_QUERIES:
|
||||
# Run without WebSearch
|
||||
baseline = run_research(query, sources="both")
|
||||
baseline_scores = [item.score for item in baseline.reddit + baseline.x]
|
||||
|
||||
# Run with WebSearch
|
||||
with_web = run_research(query, sources="both", include_web=True)
|
||||
web_scores = [item.score for item in with_web.web]
|
||||
reddit_x_scores = [item.score for item in with_web.reddit + with_web.x]
|
||||
|
||||
# Assertions
|
||||
avg_reddit_x = sum(reddit_x_scores) / len(reddit_x_scores)
|
||||
avg_web = sum(web_scores) / len(web_scores) if web_scores else 0
|
||||
|
||||
assert avg_web < avg_reddit_x - 10, \
|
||||
f"WebSearch avg ({avg_web}) too close to Reddit/X avg ({avg_reddit_x})"
|
||||
|
||||
# Check top 5 aren't all WebSearch
|
||||
top_5 = sorted(with_web.reddit + with_web.x + with_web.web,
|
||||
key=lambda x: -x.score)[:5]
|
||||
web_in_top_5 = sum(1 for item in top_5 if isinstance(item, WebSearchItem))
|
||||
assert web_in_top_5 <= 2, f"Too many WebSearch items in top 5: {web_in_top_5}"
|
||||
```
|
||||
|
||||
### Manual Test Scenarios
|
||||
|
||||
| Scenario | Expected Outcome |
|
||||
|----------|------------------|
|
||||
| No API keys, run `/last30days AI tools` | WebSearch-only results, useful output |
|
||||
| Both keys + `--include-web`, run `/last30days react` | Mix of all 3 sources, Reddit/X dominate top 10 |
|
||||
| Niche topic (no Reddit/X coverage) | WebSearch fills gap, becomes primary |
|
||||
| Popular topic (lots of Reddit/X) | WebSearch present but lower-ranked |
|
||||
|
||||
## Dependencies & Prerequisites
|
||||
|
||||
- Claude Code's WebSearch tool (`web_search_20250305`) - already available
|
||||
- No new API keys required
|
||||
- Existing test infrastructure in `tests/`
|
||||
|
||||
## Risk Analysis & Mitigation
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| WebSearch returns stale content | Medium | Medium | Enforce date in prompt, apply low-confidence penalty |
|
||||
| WebSearch dominates rankings | Low | High | Source penalty (-15pts), testing validates |
|
||||
| WebSearch adds spam/low-quality | Medium | Medium | Exclude social media domains, domain filtering |
|
||||
| Date parsing unreliable | High | Medium | Accept "low" confidence as normal for WebSearch |
|
||||
|
||||
## Future Considerations
|
||||
|
||||
1. **Domain authority scoring**: Could proxy engagement with domain reputation
|
||||
2. **User-configurable weights**: Let users adjust WebSearch penalty
|
||||
3. **Domain whitelist/blacklist**: Filter WebSearch to trusted sources
|
||||
4. **Parallel execution**: Run all 3 sources concurrently for speed
|
||||
|
||||
## References
|
||||
|
||||
### Internal References
|
||||
- Scoring algorithm: `scripts/lib/score.py:8-15`
|
||||
- Source detection: `scripts/lib/env.py:57-72`
|
||||
- Schema patterns: `scripts/lib/schema.py:76-138`
|
||||
- Orchestrator: `scripts/last30days.py:54-164`
|
||||
|
||||
### External References
|
||||
- Claude WebSearch docs: https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool
|
||||
- WebSearch pricing: $10/1K searches + token costs
|
||||
- Date filtering limitation: No explicit date params, use natural language
|
||||
|
||||
### Research Findings
|
||||
- Reddit upvotes are ~12% of ranking value in SEO (strong signal)
|
||||
- E-E-A-T framework: Engagement metrics = trust signal
|
||||
- MSA2C2 approach: Dynamic weight learning for multi-source aggregation
|
||||
328
skills/last30days/plans/fix-strict-date-filtering.md
Normal file
328
skills/last30days/plans/fix-strict-date-filtering.md
Normal file
@@ -0,0 +1,328 @@
|
||||
# fix: Enforce Strict 30-Day Date Filtering
|
||||
|
||||
## Overview
|
||||
|
||||
The `/last30days` skill is returning content older than 30 days, violating its core promise. Analysis shows:
|
||||
- **Reddit**: Only 40% of results within 30 days (9/15 were older, some from 2022!)
|
||||
- **X**: 100% within 30 days (working correctly)
|
||||
- **WebSearch**: 90% had unknown dates (can't verify freshness)
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The skill's name is "last30days" - users expect ONLY content from the last 30 days. Currently:
|
||||
|
||||
1. **Reddit search prompt** says "prefer recent threads, but include older relevant ones if recent ones are scarce" - this is too permissive
|
||||
2. **X search prompt** explicitly includes `from_date` and `to_date` - this is why it works
|
||||
3. **WebSearch** returns pages without publication dates - we can't verify they're recent
|
||||
4. **Scoring penalties** (-10 for low date confidence) don't prevent old content from appearing
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
### Strategy: "Hard Filter, Not Soft Penalty"
|
||||
|
||||
Instead of penalizing old content, **exclude it entirely**. If it's not from the last 30 days, it shouldn't appear.
|
||||
|
||||
| Source | Current Behavior | New Behavior |
|
||||
|--------|------------------|--------------|
|
||||
| Reddit | Weak "prefer recent" | Explicit date range + hard filter |
|
||||
| X | Explicit date range (working) | No change needed |
|
||||
| WebSearch | No date awareness | Require recent markers OR exclude |
|
||||
|
||||
## Technical Approach
|
||||
|
||||
### Phase 1: Fix Reddit Date Filtering
|
||||
|
||||
**File: `scripts/lib/openai_reddit.py`**
|
||||
|
||||
Current prompt (line 33):
|
||||
```
|
||||
Find {min_items}-{max_items} relevant Reddit discussion threads.
|
||||
Prefer recent threads, but include older relevant ones if recent ones are scarce.
|
||||
```
|
||||
|
||||
New prompt:
|
||||
```
|
||||
Find {min_items}-{max_items} relevant Reddit discussion threads from {from_date} to {to_date}.
|
||||
|
||||
CRITICAL: Only include threads posted within the last 30 days (after {from_date}).
|
||||
Do NOT include threads older than {from_date}, even if they seem relevant.
|
||||
If you cannot find enough recent threads, return fewer results rather than older ones.
|
||||
```
|
||||
|
||||
**Changes needed:**
|
||||
1. Add `from_date` and `to_date` parameters to `search_reddit()` function
|
||||
2. Inject dates into `REDDIT_SEARCH_PROMPT` like X does
|
||||
3. Update caller in `last30days.py` to pass dates
|
||||
|
||||
### Phase 2: Add Hard Date Filtering (Post-Processing)
|
||||
|
||||
**File: `scripts/lib/normalize.py`**
|
||||
|
||||
Add a filter step that DROPS items with dates before `from_date`:
|
||||
|
||||
```python
|
||||
def filter_by_date_range(
|
||||
items: List[Union[RedditItem, XItem, WebSearchItem]],
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
require_date: bool = False,
|
||||
) -> List:
|
||||
"""Hard filter: Remove items outside the date range.
|
||||
|
||||
Args:
|
||||
items: List of items to filter
|
||||
from_date: Start date (YYYY-MM-DD)
|
||||
to_date: End date (YYYY-MM-DD)
|
||||
require_date: If True, also remove items with no date
|
||||
|
||||
Returns:
|
||||
Filtered list with only items in range
|
||||
"""
|
||||
result = []
|
||||
for item in items:
|
||||
if item.date is None:
|
||||
if not require_date:
|
||||
result.append(item) # Keep unknown dates (with penalty)
|
||||
continue
|
||||
|
||||
# Hard filter: if date is before from_date, exclude
|
||||
if item.date < from_date:
|
||||
continue # DROP - too old
|
||||
|
||||
if item.date > to_date:
|
||||
continue # DROP - future date (likely parsing error)
|
||||
|
||||
result.append(item)
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
### Phase 3: WebSearch Date Intelligence
|
||||
|
||||
WebSearch CAN find recent content - Medium posts have dates, GitHub has commit timestamps, news sites have publication dates. We should **extract and prioritize** these signals.
|
||||
|
||||
**Strategy: "Date Detective"**
|
||||
|
||||
1. **Extract dates from URLs**: Many sites embed dates in URLs
|
||||
- Medium: `medium.com/@author/title-abc123` (no date) vs news sites
|
||||
- GitHub: Look for commit dates, release dates in snippets
|
||||
- News: `/2026/01/24/article-title`
|
||||
- Blogs: `/blog/2026/01/title`
|
||||
|
||||
2. **Extract dates from snippets**: Look for date markers
|
||||
- "January 24, 2026", "Jan 2026", "yesterday", "this week"
|
||||
- "Published:", "Posted:", "Updated:"
|
||||
- Relative markers: "2 days ago", "last week"
|
||||
|
||||
3. **Prioritize results with verifiable dates**:
|
||||
- Results with recent dates (within 30 days): Full score
|
||||
- Results with old dates: EXCLUDE
|
||||
- Results with no date signals: Heavy penalty (-20) but keep as supplementary
|
||||
|
||||
**File: `scripts/lib/websearch.py`**
|
||||
|
||||
Add date extraction functions:
|
||||
|
||||
```python
|
||||
import re
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
# Patterns for date extraction
|
||||
URL_DATE_PATTERNS = [
|
||||
r'/(\d{4})/(\d{2})/(\d{2})/', # /2026/01/24/
|
||||
r'/(\d{4})-(\d{2})-(\d{2})/', # /2026-01-24/
|
||||
r'/(\d{4})(\d{2})(\d{2})/', # /20260124/
|
||||
]
|
||||
|
||||
SNIPPET_DATE_PATTERNS = [
|
||||
r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* (\d{1,2}),? (\d{4})',
|
||||
r'(\d{1,2}) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* (\d{4})',
|
||||
r'(\d{4})-(\d{2})-(\d{2})',
|
||||
r'Published:?\s*(\d{4}-\d{2}-\d{2})',
|
||||
r'(\d{1,2}) (days?|hours?|minutes?) ago', # Relative dates
|
||||
]
|
||||
|
||||
def extract_date_from_url(url: str) -> Optional[str]:
|
||||
"""Try to extract a date from URL path."""
|
||||
for pattern in URL_DATE_PATTERNS:
|
||||
match = re.search(pattern, url)
|
||||
if match:
|
||||
# Parse and return YYYY-MM-DD format
|
||||
...
|
||||
return None
|
||||
|
||||
def extract_date_from_snippet(snippet: str) -> Optional[str]:
|
||||
"""Try to extract a date from text snippet."""
|
||||
for pattern in SNIPPET_DATE_PATTERNS:
|
||||
match = re.search(pattern, snippet, re.IGNORECASE)
|
||||
if match:
|
||||
# Parse and return YYYY-MM-DD format
|
||||
...
|
||||
return None
|
||||
|
||||
def extract_date_signals(url: str, snippet: str, title: str) -> tuple[Optional[str], str]:
|
||||
"""Extract date from any available signal.
|
||||
|
||||
Returns: (date_string, confidence)
|
||||
- date from URL: 'high' confidence
|
||||
- date from snippet: 'med' confidence
|
||||
- no date found: None, 'low' confidence
|
||||
"""
|
||||
# Try URL first (most reliable)
|
||||
url_date = extract_date_from_url(url)
|
||||
if url_date:
|
||||
return url_date, 'high'
|
||||
|
||||
# Try snippet
|
||||
snippet_date = extract_date_from_snippet(snippet)
|
||||
if snippet_date:
|
||||
return snippet_date, 'med'
|
||||
|
||||
# Try title
|
||||
title_date = extract_date_from_snippet(title)
|
||||
if title_date:
|
||||
return title_date, 'med'
|
||||
|
||||
return None, 'low'
|
||||
```
|
||||
|
||||
**Update WebSearch parsing to use date extraction:**
|
||||
|
||||
```python
|
||||
def parse_websearch_results(results, topic, from_date, to_date):
|
||||
items = []
|
||||
for result in results:
|
||||
url = result.get('url', '')
|
||||
snippet = result.get('snippet', '')
|
||||
title = result.get('title', '')
|
||||
|
||||
# Extract date signals
|
||||
extracted_date, confidence = extract_date_signals(url, snippet, title)
|
||||
|
||||
# Hard filter: if we found a date and it's too old, skip
|
||||
if extracted_date and extracted_date < from_date:
|
||||
continue # DROP - verified old content
|
||||
|
||||
item = {
|
||||
'date': extracted_date,
|
||||
'date_confidence': confidence,
|
||||
...
|
||||
}
|
||||
items.append(item)
|
||||
|
||||
return items
|
||||
```
|
||||
|
||||
**File: `scripts/lib/score.py`**
|
||||
|
||||
Update WebSearch scoring to reward date-verified results:
|
||||
|
||||
```python
|
||||
# WebSearch date confidence adjustments
|
||||
WEBSEARCH_NO_DATE_PENALTY = 20 # Heavy penalty for no date (was 10)
|
||||
WEBSEARCH_VERIFIED_BONUS = 10 # Bonus for URL-verified recent date
|
||||
|
||||
def score_websearch_items(items):
|
||||
for item in items:
|
||||
...
|
||||
# Date confidence adjustments
|
||||
if item.date_confidence == 'high':
|
||||
overall += WEBSEARCH_VERIFIED_BONUS # Reward verified dates
|
||||
elif item.date_confidence == 'low':
|
||||
overall -= WEBSEARCH_NO_DATE_PENALTY # Heavy penalty for unknown
|
||||
...
|
||||
```
|
||||
|
||||
**Result**: WebSearch results with verifiable recent dates rank well. Results with no dates are heavily penalized but still appear as supplementary context. Old verified content is excluded entirely.
|
||||
|
||||
### Phase 4: Update Statistics Display
|
||||
|
||||
Only count Reddit and X in "from the last 30 days" claim. WebSearch should be clearly labeled as supplementary.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
- [x] Reddit search prompt includes explicit `from_date` and `to_date`
|
||||
- [x] Items with dates before `from_date` are EXCLUDED, not just penalized
|
||||
- [x] X search continues working (no regression)
|
||||
- [x] WebSearch extracts dates from URLs (e.g., `/2026/01/24/`)
|
||||
- [x] WebSearch extracts dates from snippets (e.g., "January 24, 2026")
|
||||
- [x] WebSearch with verified recent dates gets +10 bonus
|
||||
- [x] WebSearch with no date signals gets -20 penalty (but still appears)
|
||||
- [x] WebSearch with verified OLD dates is EXCLUDED
|
||||
|
||||
### Non-Functional Requirements
|
||||
|
||||
- [ ] No increase in API latency
|
||||
- [ ] Graceful handling when few recent results exist (return fewer, not older)
|
||||
- [ ] Clear user messaging when results are limited due to strict filtering
|
||||
|
||||
### Quality Gates
|
||||
|
||||
- [ ] Test: Reddit search returns 0% results older than 30 days
|
||||
- [ ] Test: X search continues to return 100% recent results
|
||||
- [ ] Test: WebSearch is clearly differentiated in output
|
||||
- [ ] Test: Edge case - topic with no recent content shows helpful message
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **Phase 1**: Fix Reddit prompt (highest impact, simple change)
|
||||
2. **Phase 2**: Add hard date filter in normalize.py (safety net)
|
||||
3. **Phase 3**: Add WebSearch date extraction (URL + snippet parsing)
|
||||
4. **Phase 4**: Update WebSearch scoring (bonus for verified, heavy penalty for unknown)
|
||||
5. **Phase 5**: Update output display to show date confidence
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Before/After Test
|
||||
|
||||
Run same query before and after fix:
|
||||
```
|
||||
/last30days remotion launch videos
|
||||
```
|
||||
|
||||
**Expected Before:**
|
||||
- Reddit: 40% within 30 days
|
||||
|
||||
**Expected After:**
|
||||
- Reddit: 100% within 30 days (or fewer results if not enough recent content)
|
||||
|
||||
### Edge Case Tests
|
||||
|
||||
| Scenario | Expected Behavior |
|
||||
|----------|-------------------|
|
||||
| Topic with no recent content | Return 0 results + helpful message |
|
||||
| Topic with 5 recent results | Return 5 results (not pad with old ones) |
|
||||
| Mixed old/new results | Only return new ones |
|
||||
|
||||
### WebSearch Date Extraction Tests
|
||||
|
||||
| URL/Snippet | Expected Date | Confidence |
|
||||
|-------------|---------------|------------|
|
||||
| `medium.com/blog/2026/01/15/title` | 2026-01-15 | high |
|
||||
| `github.com/repo` + "Released Jan 20, 2026" | 2026-01-20 | med |
|
||||
| `docs.example.com/guide` (no date signals) | None | low |
|
||||
| `news.site.com/2024/05/old-article` | 2024-05-XX | EXCLUDE (too old) |
|
||||
| Snippet: "Updated 3 days ago" | calculated | med |
|
||||
|
||||
## Risk Analysis
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Fewer results for niche topics | High | Medium | Explain why in output |
|
||||
| User confusion about reduced results | Medium | Low | Clear messaging |
|
||||
| Date parsing errors exclude valid content | Low | Medium | Keep items with unknown dates, just label clearly |
|
||||
|
||||
## References
|
||||
|
||||
### Internal References
|
||||
- Reddit search: `scripts/lib/openai_reddit.py:25-63`
|
||||
- X search (working example): `scripts/lib/xai_x.py:26-55`
|
||||
- Date confidence: `scripts/lib/dates.py:62-90`
|
||||
- Scoring penalties: `scripts/lib/score.py:149-153`
|
||||
- Normalization: `scripts/lib/normalize.py:49,99`
|
||||
|
||||
### External References
|
||||
- OpenAI Responses API lacks native date filtering
|
||||
- Must rely on prompt engineering + post-processing
|
||||
521
skills/last30days/scripts/last30days.py
Normal file
521
skills/last30days/scripts/last30days.py
Normal file
@@ -0,0 +1,521 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
last30days - Research a topic from the last 30 days on Reddit + X.
|
||||
|
||||
Usage:
|
||||
python3 last30days.py <topic> [options]
|
||||
|
||||
Options:
|
||||
--mock Use fixtures instead of real API calls
|
||||
--emit=MODE Output mode: compact|json|md|context|path (default: compact)
|
||||
--sources=MODE Source selection: auto|reddit|x|both (default: auto)
|
||||
--quick Faster research with fewer sources (8-12 each)
|
||||
--deep Comprehensive research with more sources (50-70 Reddit, 40-60 X)
|
||||
--debug Enable verbose debug logging
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
SCRIPT_DIR = Path(__file__).parent.resolve()
|
||||
sys.path.insert(0, str(SCRIPT_DIR))
|
||||
|
||||
from lib import (
|
||||
dates,
|
||||
dedupe,
|
||||
env,
|
||||
http,
|
||||
models,
|
||||
normalize,
|
||||
openai_reddit,
|
||||
reddit_enrich,
|
||||
render,
|
||||
schema,
|
||||
score,
|
||||
ui,
|
||||
websearch,
|
||||
xai_x,
|
||||
)
|
||||
|
||||
|
||||
def load_fixture(name: str) -> dict:
|
||||
"""Load a fixture file."""
|
||||
fixture_path = SCRIPT_DIR.parent / "fixtures" / name
|
||||
if fixture_path.exists():
|
||||
with open(fixture_path) as f:
|
||||
return json.load(f)
|
||||
return {}
|
||||
|
||||
|
||||
def _search_reddit(
|
||||
topic: str,
|
||||
config: dict,
|
||||
selected_models: dict,
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
depth: str,
|
||||
mock: bool,
|
||||
) -> tuple:
|
||||
"""Search Reddit via OpenAI (runs in thread).
|
||||
|
||||
Returns:
|
||||
Tuple of (reddit_items, raw_openai, error)
|
||||
"""
|
||||
raw_openai = None
|
||||
reddit_error = None
|
||||
|
||||
if mock:
|
||||
raw_openai = load_fixture("openai_sample.json")
|
||||
else:
|
||||
try:
|
||||
raw_openai = openai_reddit.search_reddit(
|
||||
config["OPENAI_API_KEY"],
|
||||
selected_models["openai"],
|
||||
topic,
|
||||
from_date,
|
||||
to_date,
|
||||
depth=depth,
|
||||
)
|
||||
except http.HTTPError as e:
|
||||
raw_openai = {"error": str(e)}
|
||||
reddit_error = f"API error: {e}"
|
||||
except Exception as e:
|
||||
raw_openai = {"error": str(e)}
|
||||
reddit_error = f"{type(e).__name__}: {e}"
|
||||
|
||||
# Parse response
|
||||
reddit_items = openai_reddit.parse_reddit_response(raw_openai or {})
|
||||
|
||||
# Quick retry with simpler query if few results
|
||||
if len(reddit_items) < 5 and not mock and not reddit_error:
|
||||
core = openai_reddit._extract_core_subject(topic)
|
||||
if core.lower() != topic.lower():
|
||||
try:
|
||||
retry_raw = openai_reddit.search_reddit(
|
||||
config["OPENAI_API_KEY"],
|
||||
selected_models["openai"],
|
||||
core,
|
||||
from_date, to_date,
|
||||
depth=depth,
|
||||
)
|
||||
retry_items = openai_reddit.parse_reddit_response(retry_raw)
|
||||
# Add items not already found (by URL)
|
||||
existing_urls = {item.get("url") for item in reddit_items}
|
||||
for item in retry_items:
|
||||
if item.get("url") not in existing_urls:
|
||||
reddit_items.append(item)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return reddit_items, raw_openai, reddit_error
|
||||
|
||||
|
||||
def _search_x(
|
||||
topic: str,
|
||||
config: dict,
|
||||
selected_models: dict,
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
depth: str,
|
||||
mock: bool,
|
||||
) -> tuple:
|
||||
"""Search X via xAI (runs in thread).
|
||||
|
||||
Returns:
|
||||
Tuple of (x_items, raw_xai, error)
|
||||
"""
|
||||
raw_xai = None
|
||||
x_error = None
|
||||
|
||||
if mock:
|
||||
raw_xai = load_fixture("xai_sample.json")
|
||||
else:
|
||||
try:
|
||||
raw_xai = xai_x.search_x(
|
||||
config["XAI_API_KEY"],
|
||||
selected_models["xai"],
|
||||
topic,
|
||||
from_date,
|
||||
to_date,
|
||||
depth=depth,
|
||||
)
|
||||
except http.HTTPError as e:
|
||||
raw_xai = {"error": str(e)}
|
||||
x_error = f"API error: {e}"
|
||||
except Exception as e:
|
||||
raw_xai = {"error": str(e)}
|
||||
x_error = f"{type(e).__name__}: {e}"
|
||||
|
||||
# Parse response
|
||||
x_items = xai_x.parse_x_response(raw_xai or {})
|
||||
|
||||
return x_items, raw_xai, x_error
|
||||
|
||||
|
||||
def run_research(
|
||||
topic: str,
|
||||
sources: str,
|
||||
config: dict,
|
||||
selected_models: dict,
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
depth: str = "default",
|
||||
mock: bool = False,
|
||||
progress: ui.ProgressDisplay = None,
|
||||
) -> tuple:
|
||||
"""Run the research pipeline.
|
||||
|
||||
Returns:
|
||||
Tuple of (reddit_items, x_items, web_needed, raw_openai, raw_xai, raw_reddit_enriched, reddit_error, x_error)
|
||||
|
||||
Note: web_needed is True when WebSearch should be performed by Claude.
|
||||
The script outputs a marker and Claude handles WebSearch in its session.
|
||||
"""
|
||||
reddit_items = []
|
||||
x_items = []
|
||||
raw_openai = None
|
||||
raw_xai = None
|
||||
raw_reddit_enriched = []
|
||||
reddit_error = None
|
||||
x_error = None
|
||||
|
||||
# Check if WebSearch is needed (always needed in web-only mode)
|
||||
web_needed = sources in ("all", "web", "reddit-web", "x-web")
|
||||
|
||||
# Web-only mode: no API calls needed, Claude handles everything
|
||||
if sources == "web":
|
||||
if progress:
|
||||
progress.start_web_only()
|
||||
progress.end_web_only()
|
||||
return reddit_items, x_items, True, raw_openai, raw_xai, raw_reddit_enriched, reddit_error, x_error
|
||||
|
||||
# Determine which searches to run
|
||||
run_reddit = sources in ("both", "reddit", "all", "reddit-web")
|
||||
run_x = sources in ("both", "x", "all", "x-web")
|
||||
|
||||
# Run Reddit and X searches in parallel
|
||||
reddit_future = None
|
||||
x_future = None
|
||||
|
||||
with ThreadPoolExecutor(max_workers=2) as executor:
|
||||
# Submit both searches
|
||||
if run_reddit:
|
||||
if progress:
|
||||
progress.start_reddit()
|
||||
reddit_future = executor.submit(
|
||||
_search_reddit, topic, config, selected_models,
|
||||
from_date, to_date, depth, mock
|
||||
)
|
||||
|
||||
if run_x:
|
||||
if progress:
|
||||
progress.start_x()
|
||||
x_future = executor.submit(
|
||||
_search_x, topic, config, selected_models,
|
||||
from_date, to_date, depth, mock
|
||||
)
|
||||
|
||||
# Collect results
|
||||
if reddit_future:
|
||||
try:
|
||||
reddit_items, raw_openai, reddit_error = reddit_future.result()
|
||||
if reddit_error and progress:
|
||||
progress.show_error(f"Reddit error: {reddit_error}")
|
||||
except Exception as e:
|
||||
reddit_error = f"{type(e).__name__}: {e}"
|
||||
if progress:
|
||||
progress.show_error(f"Reddit error: {e}")
|
||||
if progress:
|
||||
progress.end_reddit(len(reddit_items))
|
||||
|
||||
if x_future:
|
||||
try:
|
||||
x_items, raw_xai, x_error = x_future.result()
|
||||
if x_error and progress:
|
||||
progress.show_error(f"X error: {x_error}")
|
||||
except Exception as e:
|
||||
x_error = f"{type(e).__name__}: {e}"
|
||||
if progress:
|
||||
progress.show_error(f"X error: {e}")
|
||||
if progress:
|
||||
progress.end_x(len(x_items))
|
||||
|
||||
# Enrich Reddit items with real data (sequential, but with error handling per-item)
|
||||
if reddit_items:
|
||||
if progress:
|
||||
progress.start_reddit_enrich(1, len(reddit_items))
|
||||
|
||||
for i, item in enumerate(reddit_items):
|
||||
if progress and i > 0:
|
||||
progress.update_reddit_enrich(i + 1, len(reddit_items))
|
||||
|
||||
try:
|
||||
if mock:
|
||||
mock_thread = load_fixture("reddit_thread_sample.json")
|
||||
reddit_items[i] = reddit_enrich.enrich_reddit_item(item, mock_thread)
|
||||
else:
|
||||
reddit_items[i] = reddit_enrich.enrich_reddit_item(item)
|
||||
except Exception as e:
|
||||
# Log but don't crash - keep the unenriched item
|
||||
if progress:
|
||||
progress.show_error(f"Enrich failed for {item.get('url', 'unknown')}: {e}")
|
||||
|
||||
raw_reddit_enriched.append(reddit_items[i])
|
||||
|
||||
if progress:
|
||||
progress.end_reddit_enrich()
|
||||
|
||||
return reddit_items, x_items, web_needed, raw_openai, raw_xai, raw_reddit_enriched, reddit_error, x_error
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Research a topic from the last 30 days on Reddit + X"
|
||||
)
|
||||
parser.add_argument("topic", nargs="?", help="Topic to research")
|
||||
parser.add_argument("--mock", action="store_true", help="Use fixtures")
|
||||
parser.add_argument(
|
||||
"--emit",
|
||||
choices=["compact", "json", "md", "context", "path"],
|
||||
default="compact",
|
||||
help="Output mode",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--sources",
|
||||
choices=["auto", "reddit", "x", "both"],
|
||||
default="auto",
|
||||
help="Source selection",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--quick",
|
||||
action="store_true",
|
||||
help="Faster research with fewer sources (8-12 each)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--deep",
|
||||
action="store_true",
|
||||
help="Comprehensive research with more sources (50-70 Reddit, 40-60 X)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--debug",
|
||||
action="store_true",
|
||||
help="Enable verbose debug logging",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--include-web",
|
||||
action="store_true",
|
||||
help="Include general web search alongside Reddit/X (lower weighted)",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Enable debug logging if requested
|
||||
if args.debug:
|
||||
os.environ["LAST30DAYS_DEBUG"] = "1"
|
||||
# Re-import http to pick up debug flag
|
||||
from lib import http as http_module
|
||||
http_module.DEBUG = True
|
||||
|
||||
# Determine depth
|
||||
if args.quick and args.deep:
|
||||
print("Error: Cannot use both --quick and --deep", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
elif args.quick:
|
||||
depth = "quick"
|
||||
elif args.deep:
|
||||
depth = "deep"
|
||||
else:
|
||||
depth = "default"
|
||||
|
||||
if not args.topic:
|
||||
print("Error: Please provide a topic to research.", file=sys.stderr)
|
||||
print("Usage: python3 last30days.py <topic> [options]", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Load config
|
||||
config = env.get_config()
|
||||
|
||||
# Check available sources
|
||||
available = env.get_available_sources(config)
|
||||
|
||||
# Mock mode can work without keys
|
||||
if args.mock:
|
||||
if args.sources == "auto":
|
||||
sources = "both"
|
||||
else:
|
||||
sources = args.sources
|
||||
else:
|
||||
# Validate requested sources against available
|
||||
sources, error = env.validate_sources(args.sources, available, args.include_web)
|
||||
if error:
|
||||
# If it's a warning about WebSearch fallback, print but continue
|
||||
if "WebSearch fallback" in error:
|
||||
print(f"Note: {error}", file=sys.stderr)
|
||||
else:
|
||||
print(f"Error: {error}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Get date range
|
||||
from_date, to_date = dates.get_date_range(30)
|
||||
|
||||
# Check what keys are missing for promo messaging
|
||||
missing_keys = env.get_missing_keys(config)
|
||||
|
||||
# Initialize progress display
|
||||
progress = ui.ProgressDisplay(args.topic, show_banner=True)
|
||||
|
||||
# Show promo for missing keys BEFORE research
|
||||
if missing_keys != 'none':
|
||||
progress.show_promo(missing_keys)
|
||||
|
||||
# Select models
|
||||
if args.mock:
|
||||
# Use mock models
|
||||
mock_openai_models = load_fixture("models_openai_sample.json").get("data", [])
|
||||
mock_xai_models = load_fixture("models_xai_sample.json").get("data", [])
|
||||
selected_models = models.get_models(
|
||||
{
|
||||
"OPENAI_API_KEY": "mock",
|
||||
"XAI_API_KEY": "mock",
|
||||
**config,
|
||||
},
|
||||
mock_openai_models,
|
||||
mock_xai_models,
|
||||
)
|
||||
else:
|
||||
selected_models = models.get_models(config)
|
||||
|
||||
# Determine mode string
|
||||
if sources == "all":
|
||||
mode = "all" # reddit + x + web
|
||||
elif sources == "both":
|
||||
mode = "both" # reddit + x
|
||||
elif sources == "reddit":
|
||||
mode = "reddit-only"
|
||||
elif sources == "reddit-web":
|
||||
mode = "reddit-web"
|
||||
elif sources == "x":
|
||||
mode = "x-only"
|
||||
elif sources == "x-web":
|
||||
mode = "x-web"
|
||||
elif sources == "web":
|
||||
mode = "web-only"
|
||||
else:
|
||||
mode = sources
|
||||
|
||||
# Run research
|
||||
reddit_items, x_items, web_needed, raw_openai, raw_xai, raw_reddit_enriched, reddit_error, x_error = run_research(
|
||||
args.topic,
|
||||
sources,
|
||||
config,
|
||||
selected_models,
|
||||
from_date,
|
||||
to_date,
|
||||
depth,
|
||||
args.mock,
|
||||
progress,
|
||||
)
|
||||
|
||||
# Processing phase
|
||||
progress.start_processing()
|
||||
|
||||
# Normalize items
|
||||
normalized_reddit = normalize.normalize_reddit_items(reddit_items, from_date, to_date)
|
||||
normalized_x = normalize.normalize_x_items(x_items, from_date, to_date)
|
||||
|
||||
# Hard date filter: exclude items with verified dates outside the range
|
||||
# This is the safety net - even if prompts let old content through, this filters it
|
||||
filtered_reddit = normalize.filter_by_date_range(normalized_reddit, from_date, to_date)
|
||||
filtered_x = normalize.filter_by_date_range(normalized_x, from_date, to_date)
|
||||
|
||||
# Score items
|
||||
scored_reddit = score.score_reddit_items(filtered_reddit)
|
||||
scored_x = score.score_x_items(filtered_x)
|
||||
|
||||
# Sort items
|
||||
sorted_reddit = score.sort_items(scored_reddit)
|
||||
sorted_x = score.sort_items(scored_x)
|
||||
|
||||
# Dedupe items
|
||||
deduped_reddit = dedupe.dedupe_reddit(sorted_reddit)
|
||||
deduped_x = dedupe.dedupe_x(sorted_x)
|
||||
|
||||
progress.end_processing()
|
||||
|
||||
# Create report
|
||||
report = schema.create_report(
|
||||
args.topic,
|
||||
from_date,
|
||||
to_date,
|
||||
mode,
|
||||
selected_models.get("openai"),
|
||||
selected_models.get("xai"),
|
||||
)
|
||||
report.reddit = deduped_reddit
|
||||
report.x = deduped_x
|
||||
report.reddit_error = reddit_error
|
||||
report.x_error = x_error
|
||||
|
||||
# Generate context snippet
|
||||
report.context_snippet_md = render.render_context_snippet(report)
|
||||
|
||||
# Write outputs
|
||||
render.write_outputs(report, raw_openai, raw_xai, raw_reddit_enriched)
|
||||
|
||||
# Show completion
|
||||
if sources == "web":
|
||||
progress.show_web_only_complete()
|
||||
else:
|
||||
progress.show_complete(len(deduped_reddit), len(deduped_x))
|
||||
|
||||
# Output result
|
||||
output_result(report, args.emit, web_needed, args.topic, from_date, to_date, missing_keys)
|
||||
|
||||
|
||||
def output_result(
|
||||
report: schema.Report,
|
||||
emit_mode: str,
|
||||
web_needed: bool = False,
|
||||
topic: str = "",
|
||||
from_date: str = "",
|
||||
to_date: str = "",
|
||||
missing_keys: str = "none",
|
||||
):
|
||||
"""Output the result based on emit mode."""
|
||||
if emit_mode == "compact":
|
||||
print(render.render_compact(report, missing_keys=missing_keys))
|
||||
elif emit_mode == "json":
|
||||
print(json.dumps(report.to_dict(), indent=2))
|
||||
elif emit_mode == "md":
|
||||
print(render.render_full_report(report))
|
||||
elif emit_mode == "context":
|
||||
print(report.context_snippet_md)
|
||||
elif emit_mode == "path":
|
||||
print(render.get_context_path())
|
||||
|
||||
# Output WebSearch instructions if needed
|
||||
if web_needed:
|
||||
print("\n" + "="*60)
|
||||
print("### WEBSEARCH REQUIRED ###")
|
||||
print("="*60)
|
||||
print(f"Topic: {topic}")
|
||||
print(f"Date range: {from_date} to {to_date}")
|
||||
print("")
|
||||
print("Claude: Use your WebSearch tool to find 8-15 relevant web pages.")
|
||||
print("EXCLUDE: reddit.com, x.com, twitter.com (already covered above)")
|
||||
print("INCLUDE: blogs, docs, news, tutorials from the last 30 days")
|
||||
print("")
|
||||
print("After searching, synthesize WebSearch results WITH the Reddit/X")
|
||||
print("results above. WebSearch items should rank LOWER than comparable")
|
||||
print("Reddit/X items (they lack engagement metrics).")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
1
skills/last30days/scripts/lib/__init__.py
Normal file
1
skills/last30days/scripts/lib/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# last30days library modules
|
||||
152
skills/last30days/scripts/lib/cache.py
Normal file
152
skills/last30days/scripts/lib/cache.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""Caching utilities for last30days skill."""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Optional
|
||||
|
||||
CACHE_DIR = Path.home() / ".cache" / "last30days"
|
||||
DEFAULT_TTL_HOURS = 24
|
||||
MODEL_CACHE_TTL_DAYS = 7
|
||||
|
||||
|
||||
def ensure_cache_dir():
|
||||
"""Ensure cache directory exists."""
|
||||
CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
def get_cache_key(topic: str, from_date: str, to_date: str, sources: str) -> str:
|
||||
"""Generate a cache key from query parameters."""
|
||||
key_data = f"{topic}|{from_date}|{to_date}|{sources}"
|
||||
return hashlib.sha256(key_data.encode()).hexdigest()[:16]
|
||||
|
||||
|
||||
def get_cache_path(cache_key: str) -> Path:
|
||||
"""Get path to cache file."""
|
||||
return CACHE_DIR / f"{cache_key}.json"
|
||||
|
||||
|
||||
def is_cache_valid(cache_path: Path, ttl_hours: int = DEFAULT_TTL_HOURS) -> bool:
|
||||
"""Check if cache file exists and is within TTL."""
|
||||
if not cache_path.exists():
|
||||
return False
|
||||
|
||||
try:
|
||||
stat = cache_path.stat()
|
||||
mtime = datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc)
|
||||
now = datetime.now(timezone.utc)
|
||||
age_hours = (now - mtime).total_seconds() / 3600
|
||||
return age_hours < ttl_hours
|
||||
except OSError:
|
||||
return False
|
||||
|
||||
|
||||
def load_cache(cache_key: str, ttl_hours: int = DEFAULT_TTL_HOURS) -> Optional[dict]:
|
||||
"""Load data from cache if valid."""
|
||||
cache_path = get_cache_path(cache_key)
|
||||
|
||||
if not is_cache_valid(cache_path, ttl_hours):
|
||||
return None
|
||||
|
||||
try:
|
||||
with open(cache_path, 'r') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, OSError):
|
||||
return None
|
||||
|
||||
|
||||
def get_cache_age_hours(cache_path: Path) -> Optional[float]:
|
||||
"""Get age of cache file in hours."""
|
||||
if not cache_path.exists():
|
||||
return None
|
||||
try:
|
||||
stat = cache_path.stat()
|
||||
mtime = datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc)
|
||||
now = datetime.now(timezone.utc)
|
||||
return (now - mtime).total_seconds() / 3600
|
||||
except OSError:
|
||||
return None
|
||||
|
||||
|
||||
def load_cache_with_age(cache_key: str, ttl_hours: int = DEFAULT_TTL_HOURS) -> tuple:
|
||||
"""Load data from cache with age info.
|
||||
|
||||
Returns:
|
||||
Tuple of (data, age_hours) or (None, None) if invalid
|
||||
"""
|
||||
cache_path = get_cache_path(cache_key)
|
||||
|
||||
if not is_cache_valid(cache_path, ttl_hours):
|
||||
return None, None
|
||||
|
||||
age = get_cache_age_hours(cache_path)
|
||||
|
||||
try:
|
||||
with open(cache_path, 'r') as f:
|
||||
return json.load(f), age
|
||||
except (json.JSONDecodeError, OSError):
|
||||
return None, None
|
||||
|
||||
|
||||
def save_cache(cache_key: str, data: dict):
|
||||
"""Save data to cache."""
|
||||
ensure_cache_dir()
|
||||
cache_path = get_cache_path(cache_key)
|
||||
|
||||
try:
|
||||
with open(cache_path, 'w') as f:
|
||||
json.dump(data, f)
|
||||
except OSError:
|
||||
pass # Silently fail on cache write errors
|
||||
|
||||
|
||||
def clear_cache():
|
||||
"""Clear all cache files."""
|
||||
if CACHE_DIR.exists():
|
||||
for f in CACHE_DIR.glob("*.json"):
|
||||
try:
|
||||
f.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
# Model selection cache (longer TTL)
|
||||
MODEL_CACHE_FILE = CACHE_DIR / "model_selection.json"
|
||||
|
||||
|
||||
def load_model_cache() -> dict:
|
||||
"""Load model selection cache."""
|
||||
if not is_cache_valid(MODEL_CACHE_FILE, MODEL_CACHE_TTL_DAYS * 24):
|
||||
return {}
|
||||
|
||||
try:
|
||||
with open(MODEL_CACHE_FILE, 'r') as f:
|
||||
return json.load(f)
|
||||
except (json.JSONDecodeError, OSError):
|
||||
return {}
|
||||
|
||||
|
||||
def save_model_cache(data: dict):
|
||||
"""Save model selection cache."""
|
||||
ensure_cache_dir()
|
||||
try:
|
||||
with open(MODEL_CACHE_FILE, 'w') as f:
|
||||
json.dump(data, f)
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
def get_cached_model(provider: str) -> Optional[str]:
|
||||
"""Get cached model selection for a provider."""
|
||||
cache = load_model_cache()
|
||||
return cache.get(provider)
|
||||
|
||||
|
||||
def set_cached_model(provider: str, model: str):
|
||||
"""Cache model selection for a provider."""
|
||||
cache = load_model_cache()
|
||||
cache[provider] = model
|
||||
cache['updated_at'] = datetime.now(timezone.utc).isoformat()
|
||||
save_model_cache(cache)
|
||||
124
skills/last30days/scripts/lib/dates.py
Normal file
124
skills/last30days/scripts/lib/dates.py
Normal file
@@ -0,0 +1,124 @@
|
||||
"""Date utilities for last30days skill."""
|
||||
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Optional, Tuple
|
||||
|
||||
|
||||
def get_date_range(days: int = 30) -> Tuple[str, str]:
|
||||
"""Get the date range for the last N days.
|
||||
|
||||
Returns:
|
||||
Tuple of (from_date, to_date) as YYYY-MM-DD strings
|
||||
"""
|
||||
today = datetime.now(timezone.utc).date()
|
||||
from_date = today - timedelta(days=days)
|
||||
return from_date.isoformat(), today.isoformat()
|
||||
|
||||
|
||||
def parse_date(date_str: Optional[str]) -> Optional[datetime]:
|
||||
"""Parse a date string in various formats.
|
||||
|
||||
Supports: YYYY-MM-DD, ISO 8601, Unix timestamp
|
||||
"""
|
||||
if not date_str:
|
||||
return None
|
||||
|
||||
# Try Unix timestamp (from Reddit)
|
||||
try:
|
||||
ts = float(date_str)
|
||||
return datetime.fromtimestamp(ts, tz=timezone.utc)
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
# Try ISO formats
|
||||
formats = [
|
||||
"%Y-%m-%d",
|
||||
"%Y-%m-%dT%H:%M:%S",
|
||||
"%Y-%m-%dT%H:%M:%SZ",
|
||||
"%Y-%m-%dT%H:%M:%S%z",
|
||||
"%Y-%m-%dT%H:%M:%S.%f%z",
|
||||
]
|
||||
|
||||
for fmt in formats:
|
||||
try:
|
||||
return datetime.strptime(date_str, fmt).replace(tzinfo=timezone.utc)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def timestamp_to_date(ts: Optional[float]) -> Optional[str]:
|
||||
"""Convert Unix timestamp to YYYY-MM-DD string."""
|
||||
if ts is None:
|
||||
return None
|
||||
try:
|
||||
dt = datetime.fromtimestamp(ts, tz=timezone.utc)
|
||||
return dt.date().isoformat()
|
||||
except (ValueError, TypeError, OSError):
|
||||
return None
|
||||
|
||||
|
||||
def get_date_confidence(date_str: Optional[str], from_date: str, to_date: str) -> str:
|
||||
"""Determine confidence level for a date.
|
||||
|
||||
Args:
|
||||
date_str: The date to check (YYYY-MM-DD or None)
|
||||
from_date: Start of valid range (YYYY-MM-DD)
|
||||
to_date: End of valid range (YYYY-MM-DD)
|
||||
|
||||
Returns:
|
||||
'high', 'med', or 'low'
|
||||
"""
|
||||
if not date_str:
|
||||
return 'low'
|
||||
|
||||
try:
|
||||
dt = datetime.strptime(date_str, "%Y-%m-%d").date()
|
||||
start = datetime.strptime(from_date, "%Y-%m-%d").date()
|
||||
end = datetime.strptime(to_date, "%Y-%m-%d").date()
|
||||
|
||||
if start <= dt <= end:
|
||||
return 'high'
|
||||
elif dt < start:
|
||||
# Older than range
|
||||
return 'low'
|
||||
else:
|
||||
# Future date (suspicious)
|
||||
return 'low'
|
||||
except ValueError:
|
||||
return 'low'
|
||||
|
||||
|
||||
def days_ago(date_str: Optional[str]) -> Optional[int]:
|
||||
"""Calculate how many days ago a date is.
|
||||
|
||||
Returns None if date is invalid or missing.
|
||||
"""
|
||||
if not date_str:
|
||||
return None
|
||||
|
||||
try:
|
||||
dt = datetime.strptime(date_str, "%Y-%m-%d").date()
|
||||
today = datetime.now(timezone.utc).date()
|
||||
delta = today - dt
|
||||
return delta.days
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
|
||||
def recency_score(date_str: Optional[str], max_days: int = 30) -> int:
|
||||
"""Calculate recency score (0-100).
|
||||
|
||||
0 days ago = 100, max_days ago = 0, clamped.
|
||||
"""
|
||||
age = days_ago(date_str)
|
||||
if age is None:
|
||||
return 0 # Unknown date gets worst score
|
||||
|
||||
if age < 0:
|
||||
return 100 # Future date (treat as today)
|
||||
if age >= max_days:
|
||||
return 0
|
||||
|
||||
return int(100 * (1 - age / max_days))
|
||||
120
skills/last30days/scripts/lib/dedupe.py
Normal file
120
skills/last30days/scripts/lib/dedupe.py
Normal file
@@ -0,0 +1,120 @@
|
||||
"""Near-duplicate detection for last30days skill."""
|
||||
|
||||
import re
|
||||
from typing import List, Set, Tuple, Union
|
||||
|
||||
from . import schema
|
||||
|
||||
|
||||
def normalize_text(text: str) -> str:
|
||||
"""Normalize text for comparison.
|
||||
|
||||
- Lowercase
|
||||
- Remove punctuation
|
||||
- Collapse whitespace
|
||||
"""
|
||||
text = text.lower()
|
||||
text = re.sub(r'[^\w\s]', ' ', text)
|
||||
text = re.sub(r'\s+', ' ', text)
|
||||
return text.strip()
|
||||
|
||||
|
||||
def get_ngrams(text: str, n: int = 3) -> Set[str]:
|
||||
"""Get character n-grams from text."""
|
||||
text = normalize_text(text)
|
||||
if len(text) < n:
|
||||
return {text}
|
||||
return {text[i:i+n] for i in range(len(text) - n + 1)}
|
||||
|
||||
|
||||
def jaccard_similarity(set1: Set[str], set2: Set[str]) -> float:
|
||||
"""Compute Jaccard similarity between two sets."""
|
||||
if not set1 or not set2:
|
||||
return 0.0
|
||||
intersection = len(set1 & set2)
|
||||
union = len(set1 | set2)
|
||||
return intersection / union if union > 0 else 0.0
|
||||
|
||||
|
||||
def get_item_text(item: Union[schema.RedditItem, schema.XItem]) -> str:
|
||||
"""Get comparable text from an item."""
|
||||
if isinstance(item, schema.RedditItem):
|
||||
return item.title
|
||||
else:
|
||||
return item.text
|
||||
|
||||
|
||||
def find_duplicates(
|
||||
items: List[Union[schema.RedditItem, schema.XItem]],
|
||||
threshold: float = 0.7,
|
||||
) -> List[Tuple[int, int]]:
|
||||
"""Find near-duplicate pairs in items.
|
||||
|
||||
Args:
|
||||
items: List of items to check
|
||||
threshold: Similarity threshold (0-1)
|
||||
|
||||
Returns:
|
||||
List of (i, j) index pairs where i < j and items are similar
|
||||
"""
|
||||
duplicates = []
|
||||
|
||||
# Pre-compute n-grams
|
||||
ngrams = [get_ngrams(get_item_text(item)) for item in items]
|
||||
|
||||
for i in range(len(items)):
|
||||
for j in range(i + 1, len(items)):
|
||||
similarity = jaccard_similarity(ngrams[i], ngrams[j])
|
||||
if similarity >= threshold:
|
||||
duplicates.append((i, j))
|
||||
|
||||
return duplicates
|
||||
|
||||
|
||||
def dedupe_items(
|
||||
items: List[Union[schema.RedditItem, schema.XItem]],
|
||||
threshold: float = 0.7,
|
||||
) -> List[Union[schema.RedditItem, schema.XItem]]:
|
||||
"""Remove near-duplicates, keeping highest-scored item.
|
||||
|
||||
Args:
|
||||
items: List of items (should be pre-sorted by score descending)
|
||||
threshold: Similarity threshold
|
||||
|
||||
Returns:
|
||||
Deduplicated items
|
||||
"""
|
||||
if len(items) <= 1:
|
||||
return items
|
||||
|
||||
# Find duplicate pairs
|
||||
dup_pairs = find_duplicates(items, threshold)
|
||||
|
||||
# Mark indices to remove (always remove the lower-scored one)
|
||||
# Since items are pre-sorted by score, the second index is always lower
|
||||
to_remove = set()
|
||||
for i, j in dup_pairs:
|
||||
# Keep the higher-scored one (lower index in sorted list)
|
||||
if items[i].score >= items[j].score:
|
||||
to_remove.add(j)
|
||||
else:
|
||||
to_remove.add(i)
|
||||
|
||||
# Return items not marked for removal
|
||||
return [item for idx, item in enumerate(items) if idx not in to_remove]
|
||||
|
||||
|
||||
def dedupe_reddit(
|
||||
items: List[schema.RedditItem],
|
||||
threshold: float = 0.7,
|
||||
) -> List[schema.RedditItem]:
|
||||
"""Dedupe Reddit items."""
|
||||
return dedupe_items(items, threshold)
|
||||
|
||||
|
||||
def dedupe_x(
|
||||
items: List[schema.XItem],
|
||||
threshold: float = 0.7,
|
||||
) -> List[schema.XItem]:
|
||||
"""Dedupe X items."""
|
||||
return dedupe_items(items, threshold)
|
||||
149
skills/last30days/scripts/lib/env.py
Normal file
149
skills/last30days/scripts/lib/env.py
Normal file
@@ -0,0 +1,149 @@
|
||||
"""Environment and API key management for last30days skill."""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
CONFIG_DIR = Path.home() / ".config" / "last30days"
|
||||
CONFIG_FILE = CONFIG_DIR / ".env"
|
||||
|
||||
|
||||
def load_env_file(path: Path) -> Dict[str, str]:
|
||||
"""Load environment variables from a file."""
|
||||
env = {}
|
||||
if not path.exists():
|
||||
return env
|
||||
|
||||
with open(path, 'r') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line or line.startswith('#'):
|
||||
continue
|
||||
if '=' in line:
|
||||
key, _, value = line.partition('=')
|
||||
key = key.strip()
|
||||
value = value.strip()
|
||||
# Remove quotes if present
|
||||
if value and value[0] in ('"', "'") and value[-1] == value[0]:
|
||||
value = value[1:-1]
|
||||
if key and value:
|
||||
env[key] = value
|
||||
return env
|
||||
|
||||
|
||||
def get_config() -> Dict[str, Any]:
|
||||
"""Load configuration from ~/.config/last30days/.env and environment."""
|
||||
# Load from config file first
|
||||
file_env = load_env_file(CONFIG_FILE)
|
||||
|
||||
# Environment variables override file
|
||||
config = {
|
||||
'OPENAI_API_KEY': os.environ.get('OPENAI_API_KEY') or file_env.get('OPENAI_API_KEY'),
|
||||
'XAI_API_KEY': os.environ.get('XAI_API_KEY') or file_env.get('XAI_API_KEY'),
|
||||
'OPENAI_MODEL_POLICY': os.environ.get('OPENAI_MODEL_POLICY') or file_env.get('OPENAI_MODEL_POLICY', 'auto'),
|
||||
'OPENAI_MODEL_PIN': os.environ.get('OPENAI_MODEL_PIN') or file_env.get('OPENAI_MODEL_PIN'),
|
||||
'XAI_MODEL_POLICY': os.environ.get('XAI_MODEL_POLICY') or file_env.get('XAI_MODEL_POLICY', 'latest'),
|
||||
'XAI_MODEL_PIN': os.environ.get('XAI_MODEL_PIN') or file_env.get('XAI_MODEL_PIN'),
|
||||
}
|
||||
|
||||
return config
|
||||
|
||||
|
||||
def config_exists() -> bool:
|
||||
"""Check if configuration file exists."""
|
||||
return CONFIG_FILE.exists()
|
||||
|
||||
|
||||
def get_available_sources(config: Dict[str, Any]) -> str:
|
||||
"""Determine which sources are available based on API keys.
|
||||
|
||||
Returns: 'both', 'reddit', 'x', or 'web' (fallback when no keys)
|
||||
"""
|
||||
has_openai = bool(config.get('OPENAI_API_KEY'))
|
||||
has_xai = bool(config.get('XAI_API_KEY'))
|
||||
|
||||
if has_openai and has_xai:
|
||||
return 'both'
|
||||
elif has_openai:
|
||||
return 'reddit'
|
||||
elif has_xai:
|
||||
return 'x'
|
||||
else:
|
||||
return 'web' # Fallback: WebSearch only (no API keys needed)
|
||||
|
||||
|
||||
def get_missing_keys(config: Dict[str, Any]) -> str:
|
||||
"""Determine which API keys are missing.
|
||||
|
||||
Returns: 'both', 'reddit', 'x', or 'none'
|
||||
"""
|
||||
has_openai = bool(config.get('OPENAI_API_KEY'))
|
||||
has_xai = bool(config.get('XAI_API_KEY'))
|
||||
|
||||
if has_openai and has_xai:
|
||||
return 'none'
|
||||
elif has_openai:
|
||||
return 'x' # Missing xAI key
|
||||
elif has_xai:
|
||||
return 'reddit' # Missing OpenAI key
|
||||
else:
|
||||
return 'both' # Missing both keys
|
||||
|
||||
|
||||
def validate_sources(requested: str, available: str, include_web: bool = False) -> tuple[str, Optional[str]]:
|
||||
"""Validate requested sources against available keys.
|
||||
|
||||
Args:
|
||||
requested: 'auto', 'reddit', 'x', 'both', or 'web'
|
||||
available: Result from get_available_sources()
|
||||
include_web: If True, add WebSearch to available sources
|
||||
|
||||
Returns:
|
||||
Tuple of (effective_sources, error_message)
|
||||
"""
|
||||
# WebSearch-only mode (no API keys)
|
||||
if available == 'web':
|
||||
if requested == 'auto':
|
||||
return 'web', None
|
||||
elif requested == 'web':
|
||||
return 'web', None
|
||||
else:
|
||||
return 'web', f"No API keys configured. Using WebSearch fallback. Add keys to ~/.config/last30days/.env for Reddit/X."
|
||||
|
||||
if requested == 'auto':
|
||||
# Add web to sources if include_web is set
|
||||
if include_web:
|
||||
if available == 'both':
|
||||
return 'all', None # reddit + x + web
|
||||
elif available == 'reddit':
|
||||
return 'reddit-web', None
|
||||
elif available == 'x':
|
||||
return 'x-web', None
|
||||
return available, None
|
||||
|
||||
if requested == 'web':
|
||||
return 'web', None
|
||||
|
||||
if requested == 'both':
|
||||
if available not in ('both',):
|
||||
missing = 'xAI' if available == 'reddit' else 'OpenAI'
|
||||
return 'none', f"Requested both sources but {missing} key is missing. Use --sources=auto to use available keys."
|
||||
if include_web:
|
||||
return 'all', None
|
||||
return 'both', None
|
||||
|
||||
if requested == 'reddit':
|
||||
if available == 'x':
|
||||
return 'none', "Requested Reddit but only xAI key is available."
|
||||
if include_web:
|
||||
return 'reddit-web', None
|
||||
return 'reddit', None
|
||||
|
||||
if requested == 'x':
|
||||
if available == 'reddit':
|
||||
return 'none', "Requested X but only OpenAI key is available."
|
||||
if include_web:
|
||||
return 'x-web', None
|
||||
return 'x', None
|
||||
|
||||
return requested, None
|
||||
152
skills/last30days/scripts/lib/http.py
Normal file
152
skills/last30days/scripts/lib/http.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""HTTP utilities for last30days skill (stdlib only)."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from typing import Any, Dict, Optional
|
||||
from urllib.parse import urlencode
|
||||
|
||||
DEFAULT_TIMEOUT = 30
|
||||
DEBUG = os.environ.get("LAST30DAYS_DEBUG", "").lower() in ("1", "true", "yes")
|
||||
|
||||
|
||||
def log(msg: str):
|
||||
"""Log debug message to stderr."""
|
||||
if DEBUG:
|
||||
sys.stderr.write(f"[DEBUG] {msg}\n")
|
||||
sys.stderr.flush()
|
||||
MAX_RETRIES = 3
|
||||
RETRY_DELAY = 1.0
|
||||
USER_AGENT = "last30days-skill/1.0 (Claude Code Skill)"
|
||||
|
||||
|
||||
class HTTPError(Exception):
|
||||
"""HTTP request error with status code."""
|
||||
def __init__(self, message: str, status_code: Optional[int] = None, body: Optional[str] = None):
|
||||
super().__init__(message)
|
||||
self.status_code = status_code
|
||||
self.body = body
|
||||
|
||||
|
||||
def request(
|
||||
method: str,
|
||||
url: str,
|
||||
headers: Optional[Dict[str, str]] = None,
|
||||
json_data: Optional[Dict[str, Any]] = None,
|
||||
timeout: int = DEFAULT_TIMEOUT,
|
||||
retries: int = MAX_RETRIES,
|
||||
) -> Dict[str, Any]:
|
||||
"""Make an HTTP request and return JSON response.
|
||||
|
||||
Args:
|
||||
method: HTTP method (GET, POST, etc.)
|
||||
url: Request URL
|
||||
headers: Optional headers dict
|
||||
json_data: Optional JSON body (for POST)
|
||||
timeout: Request timeout in seconds
|
||||
retries: Number of retries on failure
|
||||
|
||||
Returns:
|
||||
Parsed JSON response
|
||||
|
||||
Raises:
|
||||
HTTPError: On request failure
|
||||
"""
|
||||
headers = headers or {}
|
||||
headers.setdefault("User-Agent", USER_AGENT)
|
||||
|
||||
data = None
|
||||
if json_data is not None:
|
||||
data = json.dumps(json_data).encode('utf-8')
|
||||
headers.setdefault("Content-Type", "application/json")
|
||||
|
||||
req = urllib.request.Request(url, data=data, headers=headers, method=method)
|
||||
|
||||
log(f"{method} {url}")
|
||||
if json_data:
|
||||
log(f"Payload keys: {list(json_data.keys())}")
|
||||
|
||||
last_error = None
|
||||
for attempt in range(retries):
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=timeout) as response:
|
||||
body = response.read().decode('utf-8')
|
||||
log(f"Response: {response.status} ({len(body)} bytes)")
|
||||
return json.loads(body) if body else {}
|
||||
except urllib.error.HTTPError as e:
|
||||
body = None
|
||||
try:
|
||||
body = e.read().decode('utf-8')
|
||||
except:
|
||||
pass
|
||||
log(f"HTTP Error {e.code}: {e.reason}")
|
||||
if body:
|
||||
log(f"Error body: {body[:500]}")
|
||||
last_error = HTTPError(f"HTTP {e.code}: {e.reason}", e.code, body)
|
||||
|
||||
# Don't retry client errors (4xx) except rate limits
|
||||
if 400 <= e.code < 500 and e.code != 429:
|
||||
raise last_error
|
||||
|
||||
if attempt < retries - 1:
|
||||
time.sleep(RETRY_DELAY * (attempt + 1))
|
||||
except urllib.error.URLError as e:
|
||||
log(f"URL Error: {e.reason}")
|
||||
last_error = HTTPError(f"URL Error: {e.reason}")
|
||||
if attempt < retries - 1:
|
||||
time.sleep(RETRY_DELAY * (attempt + 1))
|
||||
except json.JSONDecodeError as e:
|
||||
log(f"JSON decode error: {e}")
|
||||
last_error = HTTPError(f"Invalid JSON response: {e}")
|
||||
raise last_error
|
||||
except (OSError, TimeoutError, ConnectionResetError) as e:
|
||||
# Handle socket-level errors (connection reset, timeout, etc.)
|
||||
log(f"Connection error: {type(e).__name__}: {e}")
|
||||
last_error = HTTPError(f"Connection error: {type(e).__name__}: {e}")
|
||||
if attempt < retries - 1:
|
||||
time.sleep(RETRY_DELAY * (attempt + 1))
|
||||
|
||||
if last_error:
|
||||
raise last_error
|
||||
raise HTTPError("Request failed with no error details")
|
||||
|
||||
|
||||
def get(url: str, headers: Optional[Dict[str, str]] = None, **kwargs) -> Dict[str, Any]:
|
||||
"""Make a GET request."""
|
||||
return request("GET", url, headers=headers, **kwargs)
|
||||
|
||||
|
||||
def post(url: str, json_data: Dict[str, Any], headers: Optional[Dict[str, str]] = None, **kwargs) -> Dict[str, Any]:
|
||||
"""Make a POST request with JSON body."""
|
||||
return request("POST", url, headers=headers, json_data=json_data, **kwargs)
|
||||
|
||||
|
||||
def get_reddit_json(path: str) -> Dict[str, Any]:
|
||||
"""Fetch Reddit thread JSON.
|
||||
|
||||
Args:
|
||||
path: Reddit path (e.g., /r/subreddit/comments/id/title)
|
||||
|
||||
Returns:
|
||||
Parsed JSON response
|
||||
"""
|
||||
# Ensure path starts with /
|
||||
if not path.startswith('/'):
|
||||
path = '/' + path
|
||||
|
||||
# Remove trailing slash and add .json
|
||||
path = path.rstrip('/')
|
||||
if not path.endswith('.json'):
|
||||
path = path + '.json'
|
||||
|
||||
url = f"https://www.reddit.com{path}?raw_json=1"
|
||||
|
||||
headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Accept": "application/json",
|
||||
}
|
||||
|
||||
return get(url, headers=headers)
|
||||
175
skills/last30days/scripts/lib/models.py
Normal file
175
skills/last30days/scripts/lib/models.py
Normal file
@@ -0,0 +1,175 @@
|
||||
"""Model auto-selection for last30days skill."""
|
||||
|
||||
import re
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
|
||||
from . import cache, http
|
||||
|
||||
# OpenAI API
|
||||
OPENAI_MODELS_URL = "https://api.openai.com/v1/models"
|
||||
OPENAI_FALLBACK_MODELS = ["gpt-5.2", "gpt-5.1", "gpt-5", "gpt-4o"]
|
||||
|
||||
# xAI API - Agent Tools API requires grok-4 family
|
||||
XAI_MODELS_URL = "https://api.x.ai/v1/models"
|
||||
XAI_ALIASES = {
|
||||
"latest": "grok-4-1-fast", # Required for x_search tool
|
||||
"stable": "grok-4-1-fast",
|
||||
}
|
||||
|
||||
|
||||
def parse_version(model_id: str) -> Optional[Tuple[int, ...]]:
|
||||
"""Parse semantic version from model ID.
|
||||
|
||||
Examples:
|
||||
gpt-5 -> (5,)
|
||||
gpt-5.2 -> (5, 2)
|
||||
gpt-5.2.1 -> (5, 2, 1)
|
||||
"""
|
||||
match = re.search(r'(\d+(?:\.\d+)*)', model_id)
|
||||
if match:
|
||||
return tuple(int(x) for x in match.group(1).split('.'))
|
||||
return None
|
||||
|
||||
|
||||
def is_mainline_openai_model(model_id: str) -> bool:
|
||||
"""Check if model is a mainline GPT model (not mini/nano/chat/codex/pro)."""
|
||||
model_lower = model_id.lower()
|
||||
|
||||
# Must be gpt-5 series
|
||||
if not re.match(r'^gpt-5(\.\d+)*$', model_lower):
|
||||
return False
|
||||
|
||||
# Exclude variants
|
||||
excludes = ['mini', 'nano', 'chat', 'codex', 'pro', 'preview', 'turbo']
|
||||
for exc in excludes:
|
||||
if exc in model_lower:
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def select_openai_model(
|
||||
api_key: str,
|
||||
policy: str = "auto",
|
||||
pin: Optional[str] = None,
|
||||
mock_models: Optional[List[Dict]] = None,
|
||||
) -> str:
|
||||
"""Select the best OpenAI model based on policy.
|
||||
|
||||
Args:
|
||||
api_key: OpenAI API key
|
||||
policy: 'auto' or 'pinned'
|
||||
pin: Model to use if policy is 'pinned'
|
||||
mock_models: Mock model list for testing
|
||||
|
||||
Returns:
|
||||
Selected model ID
|
||||
"""
|
||||
if policy == "pinned" and pin:
|
||||
return pin
|
||||
|
||||
# Check cache first
|
||||
cached = cache.get_cached_model("openai")
|
||||
if cached:
|
||||
return cached
|
||||
|
||||
# Fetch model list
|
||||
if mock_models is not None:
|
||||
models = mock_models
|
||||
else:
|
||||
try:
|
||||
headers = {"Authorization": f"Bearer {api_key}"}
|
||||
response = http.get(OPENAI_MODELS_URL, headers=headers)
|
||||
models = response.get("data", [])
|
||||
except http.HTTPError:
|
||||
# Fall back to known models
|
||||
return OPENAI_FALLBACK_MODELS[0]
|
||||
|
||||
# Filter to mainline models
|
||||
candidates = [m for m in models if is_mainline_openai_model(m.get("id", ""))]
|
||||
|
||||
if not candidates:
|
||||
# No gpt-5 models found, use fallback
|
||||
return OPENAI_FALLBACK_MODELS[0]
|
||||
|
||||
# Sort by version (descending), then by created timestamp
|
||||
def sort_key(m):
|
||||
version = parse_version(m.get("id", "")) or (0,)
|
||||
created = m.get("created", 0)
|
||||
return (version, created)
|
||||
|
||||
candidates.sort(key=sort_key, reverse=True)
|
||||
selected = candidates[0]["id"]
|
||||
|
||||
# Cache the selection
|
||||
cache.set_cached_model("openai", selected)
|
||||
|
||||
return selected
|
||||
|
||||
|
||||
def select_xai_model(
|
||||
api_key: str,
|
||||
policy: str = "latest",
|
||||
pin: Optional[str] = None,
|
||||
mock_models: Optional[List[Dict]] = None,
|
||||
) -> str:
|
||||
"""Select the best xAI model based on policy.
|
||||
|
||||
Args:
|
||||
api_key: xAI API key
|
||||
policy: 'latest', 'stable', or 'pinned'
|
||||
pin: Model to use if policy is 'pinned'
|
||||
mock_models: Mock model list for testing
|
||||
|
||||
Returns:
|
||||
Selected model ID
|
||||
"""
|
||||
if policy == "pinned" and pin:
|
||||
return pin
|
||||
|
||||
# Use alias system
|
||||
if policy in XAI_ALIASES:
|
||||
alias = XAI_ALIASES[policy]
|
||||
|
||||
# Check cache first
|
||||
cached = cache.get_cached_model("xai")
|
||||
if cached:
|
||||
return cached
|
||||
|
||||
# Cache the alias
|
||||
cache.set_cached_model("xai", alias)
|
||||
return alias
|
||||
|
||||
# Default to latest
|
||||
return XAI_ALIASES["latest"]
|
||||
|
||||
|
||||
def get_models(
|
||||
config: Dict,
|
||||
mock_openai_models: Optional[List[Dict]] = None,
|
||||
mock_xai_models: Optional[List[Dict]] = None,
|
||||
) -> Dict[str, Optional[str]]:
|
||||
"""Get selected models for both providers.
|
||||
|
||||
Returns:
|
||||
Dict with 'openai' and 'xai' keys
|
||||
"""
|
||||
result = {"openai": None, "xai": None}
|
||||
|
||||
if config.get("OPENAI_API_KEY"):
|
||||
result["openai"] = select_openai_model(
|
||||
config["OPENAI_API_KEY"],
|
||||
config.get("OPENAI_MODEL_POLICY", "auto"),
|
||||
config.get("OPENAI_MODEL_PIN"),
|
||||
mock_openai_models,
|
||||
)
|
||||
|
||||
if config.get("XAI_API_KEY"):
|
||||
result["xai"] = select_xai_model(
|
||||
config["XAI_API_KEY"],
|
||||
config.get("XAI_MODEL_POLICY", "latest"),
|
||||
config.get("XAI_MODEL_PIN"),
|
||||
mock_xai_models,
|
||||
)
|
||||
|
||||
return result
|
||||
160
skills/last30days/scripts/lib/normalize.py
Normal file
160
skills/last30days/scripts/lib/normalize.py
Normal file
@@ -0,0 +1,160 @@
|
||||
"""Normalization of raw API data to canonical schema."""
|
||||
|
||||
from typing import Any, Dict, List, TypeVar, Union
|
||||
|
||||
from . import dates, schema
|
||||
|
||||
T = TypeVar("T", schema.RedditItem, schema.XItem, schema.WebSearchItem)
|
||||
|
||||
|
||||
def filter_by_date_range(
|
||||
items: List[T],
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
require_date: bool = False,
|
||||
) -> List[T]:
|
||||
"""Hard filter: Remove items outside the date range.
|
||||
|
||||
This is the safety net - even if the prompt lets old content through,
|
||||
this filter will exclude it.
|
||||
|
||||
Args:
|
||||
items: List of items to filter
|
||||
from_date: Start date (YYYY-MM-DD) - exclude items before this
|
||||
to_date: End date (YYYY-MM-DD) - exclude items after this
|
||||
require_date: If True, also remove items with no date
|
||||
|
||||
Returns:
|
||||
Filtered list with only items in range (or unknown dates if not required)
|
||||
"""
|
||||
result = []
|
||||
for item in items:
|
||||
if item.date is None:
|
||||
if not require_date:
|
||||
result.append(item) # Keep unknown dates (with scoring penalty)
|
||||
continue
|
||||
|
||||
# Hard filter: if date is before from_date, exclude
|
||||
if item.date < from_date:
|
||||
continue # DROP - too old
|
||||
|
||||
# Hard filter: if date is after to_date, exclude (likely parsing error)
|
||||
if item.date > to_date:
|
||||
continue # DROP - future date
|
||||
|
||||
result.append(item)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def normalize_reddit_items(
|
||||
items: List[Dict[str, Any]],
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
) -> List[schema.RedditItem]:
|
||||
"""Normalize raw Reddit items to schema.
|
||||
|
||||
Args:
|
||||
items: Raw Reddit items from API
|
||||
from_date: Start of date range
|
||||
to_date: End of date range
|
||||
|
||||
Returns:
|
||||
List of RedditItem objects
|
||||
"""
|
||||
normalized = []
|
||||
|
||||
for item in items:
|
||||
# Parse engagement
|
||||
engagement = None
|
||||
eng_raw = item.get("engagement")
|
||||
if isinstance(eng_raw, dict):
|
||||
engagement = schema.Engagement(
|
||||
score=eng_raw.get("score"),
|
||||
num_comments=eng_raw.get("num_comments"),
|
||||
upvote_ratio=eng_raw.get("upvote_ratio"),
|
||||
)
|
||||
|
||||
# Parse comments
|
||||
top_comments = []
|
||||
for c in item.get("top_comments", []):
|
||||
top_comments.append(schema.Comment(
|
||||
score=c.get("score", 0),
|
||||
date=c.get("date"),
|
||||
author=c.get("author", ""),
|
||||
excerpt=c.get("excerpt", ""),
|
||||
url=c.get("url", ""),
|
||||
))
|
||||
|
||||
# Determine date confidence
|
||||
date_str = item.get("date")
|
||||
date_confidence = dates.get_date_confidence(date_str, from_date, to_date)
|
||||
|
||||
normalized.append(schema.RedditItem(
|
||||
id=item.get("id", ""),
|
||||
title=item.get("title", ""),
|
||||
url=item.get("url", ""),
|
||||
subreddit=item.get("subreddit", ""),
|
||||
date=date_str,
|
||||
date_confidence=date_confidence,
|
||||
engagement=engagement,
|
||||
top_comments=top_comments,
|
||||
comment_insights=item.get("comment_insights", []),
|
||||
relevance=item.get("relevance", 0.5),
|
||||
why_relevant=item.get("why_relevant", ""),
|
||||
))
|
||||
|
||||
return normalized
|
||||
|
||||
|
||||
def normalize_x_items(
|
||||
items: List[Dict[str, Any]],
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
) -> List[schema.XItem]:
|
||||
"""Normalize raw X items to schema.
|
||||
|
||||
Args:
|
||||
items: Raw X items from API
|
||||
from_date: Start of date range
|
||||
to_date: End of date range
|
||||
|
||||
Returns:
|
||||
List of XItem objects
|
||||
"""
|
||||
normalized = []
|
||||
|
||||
for item in items:
|
||||
# Parse engagement
|
||||
engagement = None
|
||||
eng_raw = item.get("engagement")
|
||||
if isinstance(eng_raw, dict):
|
||||
engagement = schema.Engagement(
|
||||
likes=eng_raw.get("likes"),
|
||||
reposts=eng_raw.get("reposts"),
|
||||
replies=eng_raw.get("replies"),
|
||||
quotes=eng_raw.get("quotes"),
|
||||
)
|
||||
|
||||
# Determine date confidence
|
||||
date_str = item.get("date")
|
||||
date_confidence = dates.get_date_confidence(date_str, from_date, to_date)
|
||||
|
||||
normalized.append(schema.XItem(
|
||||
id=item.get("id", ""),
|
||||
text=item.get("text", ""),
|
||||
url=item.get("url", ""),
|
||||
author_handle=item.get("author_handle", ""),
|
||||
date=date_str,
|
||||
date_confidence=date_confidence,
|
||||
engagement=engagement,
|
||||
relevance=item.get("relevance", 0.5),
|
||||
why_relevant=item.get("why_relevant", ""),
|
||||
))
|
||||
|
||||
return normalized
|
||||
|
||||
|
||||
def items_to_dicts(items: List) -> List[Dict[str, Any]]:
|
||||
"""Convert schema items to dicts for JSON serialization."""
|
||||
return [item.to_dict() for item in items]
|
||||
230
skills/last30days/scripts/lib/openai_reddit.py
Normal file
230
skills/last30days/scripts/lib/openai_reddit.py
Normal file
@@ -0,0 +1,230 @@
|
||||
"""OpenAI Responses API client for Reddit discovery."""
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from . import http
|
||||
|
||||
|
||||
def _log_error(msg: str):
|
||||
"""Log error to stderr."""
|
||||
sys.stderr.write(f"[REDDIT ERROR] {msg}\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
OPENAI_RESPONSES_URL = "https://api.openai.com/v1/responses"
|
||||
|
||||
# Depth configurations: (min, max) threads to request
|
||||
# Request MORE than needed since many get filtered by date
|
||||
DEPTH_CONFIG = {
|
||||
"quick": (15, 25),
|
||||
"default": (30, 50),
|
||||
"deep": (70, 100),
|
||||
}
|
||||
|
||||
REDDIT_SEARCH_PROMPT = """Find Reddit discussion threads about: {topic}
|
||||
|
||||
STEP 1: EXTRACT THE CORE SUBJECT
|
||||
Get the MAIN NOUN/PRODUCT/TOPIC:
|
||||
- "best nano banana prompting practices" → "nano banana"
|
||||
- "killer features of clawdbot" → "clawdbot"
|
||||
- "top Claude Code skills" → "Claude Code"
|
||||
DO NOT include "best", "top", "tips", "practices", "features" in your search.
|
||||
|
||||
STEP 2: SEARCH BROADLY
|
||||
Search for the core subject:
|
||||
1. "[core subject] site:reddit.com"
|
||||
2. "reddit [core subject]"
|
||||
3. "[core subject] reddit"
|
||||
|
||||
Return as many relevant threads as you find. We filter by date server-side.
|
||||
|
||||
STEP 3: INCLUDE ALL MATCHES
|
||||
- Include ALL threads about the core subject
|
||||
- Set date to "YYYY-MM-DD" if you can determine it, otherwise null
|
||||
- We verify dates and filter old content server-side
|
||||
- DO NOT pre-filter aggressively - include anything relevant
|
||||
|
||||
REQUIRED: URLs must contain "/r/" AND "/comments/"
|
||||
REJECT: developers.reddit.com, business.reddit.com
|
||||
|
||||
Find {min_items}-{max_items} threads. Return MORE rather than fewer.
|
||||
|
||||
Return JSON:
|
||||
{{
|
||||
"items": [
|
||||
{{
|
||||
"title": "Thread title",
|
||||
"url": "https://www.reddit.com/r/sub/comments/xyz/title/",
|
||||
"subreddit": "subreddit_name",
|
||||
"date": "YYYY-MM-DD or null",
|
||||
"why_relevant": "Why relevant",
|
||||
"relevance": 0.85
|
||||
}}
|
||||
]
|
||||
}}"""
|
||||
|
||||
|
||||
def _extract_core_subject(topic: str) -> str:
|
||||
"""Extract core subject from verbose query for retry."""
|
||||
noise = ['best', 'top', 'how to', 'tips for', 'practices', 'features',
|
||||
'killer', 'guide', 'tutorial', 'recommendations', 'advice',
|
||||
'prompting', 'using', 'for', 'with', 'the', 'of', 'in', 'on']
|
||||
words = topic.lower().split()
|
||||
result = [w for w in words if w not in noise]
|
||||
return ' '.join(result[:3]) or topic # Keep max 3 words
|
||||
|
||||
|
||||
def search_reddit(
|
||||
api_key: str,
|
||||
model: str,
|
||||
topic: str,
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
depth: str = "default",
|
||||
mock_response: Optional[Dict] = None,
|
||||
_retry: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
"""Search Reddit for relevant threads using OpenAI Responses API.
|
||||
|
||||
Args:
|
||||
api_key: OpenAI API key
|
||||
model: Model to use
|
||||
topic: Search topic
|
||||
from_date: Start date (YYYY-MM-DD) - only include threads after this
|
||||
to_date: End date (YYYY-MM-DD) - only include threads before this
|
||||
depth: Research depth - "quick", "default", or "deep"
|
||||
mock_response: Mock response for testing
|
||||
|
||||
Returns:
|
||||
Raw API response
|
||||
"""
|
||||
if mock_response is not None:
|
||||
return mock_response
|
||||
|
||||
min_items, max_items = DEPTH_CONFIG.get(depth, DEPTH_CONFIG["default"])
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
# Adjust timeout based on depth (generous for OpenAI web_search which can be slow)
|
||||
timeout = 90 if depth == "quick" else 120 if depth == "default" else 180
|
||||
|
||||
# Note: allowed_domains accepts base domain, not subdomains
|
||||
# We rely on prompt to filter out developers.reddit.com, etc.
|
||||
payload = {
|
||||
"model": model,
|
||||
"tools": [
|
||||
{
|
||||
"type": "web_search",
|
||||
"filters": {
|
||||
"allowed_domains": ["reddit.com"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"include": ["web_search_call.action.sources"],
|
||||
"input": REDDIT_SEARCH_PROMPT.format(
|
||||
topic=topic,
|
||||
from_date=from_date,
|
||||
to_date=to_date,
|
||||
min_items=min_items,
|
||||
max_items=max_items,
|
||||
),
|
||||
}
|
||||
|
||||
return http.post(OPENAI_RESPONSES_URL, payload, headers=headers, timeout=timeout)
|
||||
|
||||
|
||||
def parse_reddit_response(response: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""Parse OpenAI response to extract Reddit items.
|
||||
|
||||
Args:
|
||||
response: Raw API response
|
||||
|
||||
Returns:
|
||||
List of item dicts
|
||||
"""
|
||||
items = []
|
||||
|
||||
# Check for API errors first
|
||||
if "error" in response and response["error"]:
|
||||
error = response["error"]
|
||||
err_msg = error.get("message", str(error)) if isinstance(error, dict) else str(error)
|
||||
_log_error(f"OpenAI API error: {err_msg}")
|
||||
if http.DEBUG:
|
||||
_log_error(f"Full error response: {json.dumps(response, indent=2)[:1000]}")
|
||||
return items
|
||||
|
||||
# Try to find the output text
|
||||
output_text = ""
|
||||
if "output" in response:
|
||||
output = response["output"]
|
||||
if isinstance(output, str):
|
||||
output_text = output
|
||||
elif isinstance(output, list):
|
||||
for item in output:
|
||||
if isinstance(item, dict):
|
||||
if item.get("type") == "message":
|
||||
content = item.get("content", [])
|
||||
for c in content:
|
||||
if isinstance(c, dict) and c.get("type") == "output_text":
|
||||
output_text = c.get("text", "")
|
||||
break
|
||||
elif "text" in item:
|
||||
output_text = item["text"]
|
||||
elif isinstance(item, str):
|
||||
output_text = item
|
||||
if output_text:
|
||||
break
|
||||
|
||||
# Also check for choices (older format)
|
||||
if not output_text and "choices" in response:
|
||||
for choice in response["choices"]:
|
||||
if "message" in choice:
|
||||
output_text = choice["message"].get("content", "")
|
||||
break
|
||||
|
||||
if not output_text:
|
||||
print(f"[REDDIT WARNING] No output text found in OpenAI response. Keys present: {list(response.keys())}", flush=True)
|
||||
return items
|
||||
|
||||
# Extract JSON from the response
|
||||
json_match = re.search(r'\{[\s\S]*"items"[\s\S]*\}', output_text)
|
||||
if json_match:
|
||||
try:
|
||||
data = json.loads(json_match.group())
|
||||
items = data.get("items", [])
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Validate and clean items
|
||||
clean_items = []
|
||||
for i, item in enumerate(items):
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
|
||||
url = item.get("url", "")
|
||||
if not url or "reddit.com" not in url:
|
||||
continue
|
||||
|
||||
clean_item = {
|
||||
"id": f"R{i+1}",
|
||||
"title": str(item.get("title", "")).strip(),
|
||||
"url": url,
|
||||
"subreddit": str(item.get("subreddit", "")).strip().lstrip("r/"),
|
||||
"date": item.get("date"),
|
||||
"why_relevant": str(item.get("why_relevant", "")).strip(),
|
||||
"relevance": min(1.0, max(0.0, float(item.get("relevance", 0.5)))),
|
||||
}
|
||||
|
||||
# Validate date format
|
||||
if clean_item["date"]:
|
||||
if not re.match(r'^\d{4}-\d{2}-\d{2}$', str(clean_item["date"])):
|
||||
clean_item["date"] = None
|
||||
|
||||
clean_items.append(clean_item)
|
||||
|
||||
return clean_items
|
||||
232
skills/last30days/scripts/lib/reddit_enrich.py
Normal file
232
skills/last30days/scripts/lib/reddit_enrich.py
Normal file
@@ -0,0 +1,232 @@
|
||||
"""Reddit thread enrichment with real engagement metrics."""
|
||||
|
||||
import re
|
||||
from typing import Any, Dict, List, Optional
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from . import http, dates
|
||||
|
||||
|
||||
def extract_reddit_path(url: str) -> Optional[str]:
|
||||
"""Extract the path from a Reddit URL.
|
||||
|
||||
Args:
|
||||
url: Reddit URL
|
||||
|
||||
Returns:
|
||||
Path component or None
|
||||
"""
|
||||
try:
|
||||
parsed = urlparse(url)
|
||||
if "reddit.com" not in parsed.netloc:
|
||||
return None
|
||||
return parsed.path
|
||||
except:
|
||||
return None
|
||||
|
||||
|
||||
def fetch_thread_data(url: str, mock_data: Optional[Dict] = None) -> Optional[Dict[str, Any]]:
|
||||
"""Fetch Reddit thread JSON data.
|
||||
|
||||
Args:
|
||||
url: Reddit thread URL
|
||||
mock_data: Mock data for testing
|
||||
|
||||
Returns:
|
||||
Thread data dict or None on failure
|
||||
"""
|
||||
if mock_data is not None:
|
||||
return mock_data
|
||||
|
||||
path = extract_reddit_path(url)
|
||||
if not path:
|
||||
return None
|
||||
|
||||
try:
|
||||
data = http.get_reddit_json(path)
|
||||
return data
|
||||
except http.HTTPError:
|
||||
return None
|
||||
|
||||
|
||||
def parse_thread_data(data: Any) -> Dict[str, Any]:
|
||||
"""Parse Reddit thread JSON into structured data.
|
||||
|
||||
Args:
|
||||
data: Raw Reddit JSON response
|
||||
|
||||
Returns:
|
||||
Dict with submission and comments data
|
||||
"""
|
||||
result = {
|
||||
"submission": None,
|
||||
"comments": [],
|
||||
}
|
||||
|
||||
if not isinstance(data, list) or len(data) < 1:
|
||||
return result
|
||||
|
||||
# First element is submission listing
|
||||
submission_listing = data[0]
|
||||
if isinstance(submission_listing, dict):
|
||||
children = submission_listing.get("data", {}).get("children", [])
|
||||
if children:
|
||||
sub_data = children[0].get("data", {})
|
||||
result["submission"] = {
|
||||
"score": sub_data.get("score"),
|
||||
"num_comments": sub_data.get("num_comments"),
|
||||
"upvote_ratio": sub_data.get("upvote_ratio"),
|
||||
"created_utc": sub_data.get("created_utc"),
|
||||
"permalink": sub_data.get("permalink"),
|
||||
"title": sub_data.get("title"),
|
||||
"selftext": sub_data.get("selftext", "")[:500], # Truncate
|
||||
}
|
||||
|
||||
# Second element is comments listing
|
||||
if len(data) >= 2:
|
||||
comments_listing = data[1]
|
||||
if isinstance(comments_listing, dict):
|
||||
children = comments_listing.get("data", {}).get("children", [])
|
||||
for child in children:
|
||||
if child.get("kind") != "t1": # t1 = comment
|
||||
continue
|
||||
c_data = child.get("data", {})
|
||||
if not c_data.get("body"):
|
||||
continue
|
||||
|
||||
comment = {
|
||||
"score": c_data.get("score", 0),
|
||||
"created_utc": c_data.get("created_utc"),
|
||||
"author": c_data.get("author", "[deleted]"),
|
||||
"body": c_data.get("body", "")[:300], # Truncate
|
||||
"permalink": c_data.get("permalink"),
|
||||
}
|
||||
result["comments"].append(comment)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def get_top_comments(comments: List[Dict], limit: int = 10) -> List[Dict[str, Any]]:
|
||||
"""Get top comments sorted by score.
|
||||
|
||||
Args:
|
||||
comments: List of comment dicts
|
||||
limit: Maximum number to return
|
||||
|
||||
Returns:
|
||||
Top comments sorted by score
|
||||
"""
|
||||
# Filter out deleted/removed
|
||||
valid = [c for c in comments if c.get("author") not in ("[deleted]", "[removed]")]
|
||||
|
||||
# Sort by score descending
|
||||
sorted_comments = sorted(valid, key=lambda c: c.get("score", 0), reverse=True)
|
||||
|
||||
return sorted_comments[:limit]
|
||||
|
||||
|
||||
def extract_comment_insights(comments: List[Dict], limit: int = 7) -> List[str]:
|
||||
"""Extract key insights from top comments.
|
||||
|
||||
Uses simple heuristics to identify valuable comments:
|
||||
- Has substantive text
|
||||
- Contains actionable information
|
||||
- Not just agreement/disagreement
|
||||
|
||||
Args:
|
||||
comments: Top comments
|
||||
limit: Max insights to extract
|
||||
|
||||
Returns:
|
||||
List of insight strings
|
||||
"""
|
||||
insights = []
|
||||
|
||||
for comment in comments[:limit * 2]: # Look at more comments than we need
|
||||
body = comment.get("body", "").strip()
|
||||
if not body or len(body) < 30:
|
||||
continue
|
||||
|
||||
# Skip low-value patterns
|
||||
skip_patterns = [
|
||||
r'^(this|same|agreed|exactly|yep|nope|yes|no|thanks|thank you)\.?$',
|
||||
r'^lol|lmao|haha',
|
||||
r'^\[deleted\]',
|
||||
r'^\[removed\]',
|
||||
]
|
||||
if any(re.match(p, body.lower()) for p in skip_patterns):
|
||||
continue
|
||||
|
||||
# Truncate to first meaningful sentence or ~150 chars
|
||||
insight = body[:150]
|
||||
if len(body) > 150:
|
||||
# Try to find a sentence boundary
|
||||
for i, char in enumerate(insight):
|
||||
if char in '.!?' and i > 50:
|
||||
insight = insight[:i+1]
|
||||
break
|
||||
else:
|
||||
insight = insight.rstrip() + "..."
|
||||
|
||||
insights.append(insight)
|
||||
if len(insights) >= limit:
|
||||
break
|
||||
|
||||
return insights
|
||||
|
||||
|
||||
def enrich_reddit_item(
|
||||
item: Dict[str, Any],
|
||||
mock_thread_data: Optional[Dict] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Enrich a Reddit item with real engagement data.
|
||||
|
||||
Args:
|
||||
item: Reddit item dict
|
||||
mock_thread_data: Mock data for testing
|
||||
|
||||
Returns:
|
||||
Enriched item dict
|
||||
"""
|
||||
url = item.get("url", "")
|
||||
|
||||
# Fetch thread data
|
||||
thread_data = fetch_thread_data(url, mock_thread_data)
|
||||
if not thread_data:
|
||||
return item
|
||||
|
||||
parsed = parse_thread_data(thread_data)
|
||||
submission = parsed.get("submission")
|
||||
comments = parsed.get("comments", [])
|
||||
|
||||
# Update engagement metrics
|
||||
if submission:
|
||||
item["engagement"] = {
|
||||
"score": submission.get("score"),
|
||||
"num_comments": submission.get("num_comments"),
|
||||
"upvote_ratio": submission.get("upvote_ratio"),
|
||||
}
|
||||
|
||||
# Update date from actual data
|
||||
created_utc = submission.get("created_utc")
|
||||
if created_utc:
|
||||
item["date"] = dates.timestamp_to_date(created_utc)
|
||||
|
||||
# Get top comments
|
||||
top_comments = get_top_comments(comments)
|
||||
item["top_comments"] = []
|
||||
for c in top_comments:
|
||||
permalink = c.get("permalink", "")
|
||||
comment_url = f"https://reddit.com{permalink}" if permalink else ""
|
||||
item["top_comments"].append({
|
||||
"score": c.get("score", 0),
|
||||
"date": dates.timestamp_to_date(c.get("created_utc")),
|
||||
"author": c.get("author", ""),
|
||||
"excerpt": c.get("body", "")[:200],
|
||||
"url": comment_url,
|
||||
})
|
||||
|
||||
# Extract insights
|
||||
item["comment_insights"] = extract_comment_insights(top_comments)
|
||||
|
||||
return item
|
||||
383
skills/last30days/scripts/lib/render.py
Normal file
383
skills/last30days/scripts/lib/render.py
Normal file
@@ -0,0 +1,383 @@
|
||||
"""Output rendering for last30days skill."""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import List, Optional
|
||||
|
||||
from . import schema
|
||||
|
||||
OUTPUT_DIR = Path.home() / ".local" / "share" / "last30days" / "out"
|
||||
|
||||
|
||||
def ensure_output_dir():
|
||||
"""Ensure output directory exists."""
|
||||
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
def _assess_data_freshness(report: schema.Report) -> dict:
|
||||
"""Assess how much data is actually from the last 30 days."""
|
||||
reddit_recent = sum(1 for r in report.reddit if r.date and r.date >= report.range_from)
|
||||
x_recent = sum(1 for x in report.x if x.date and x.date >= report.range_from)
|
||||
web_recent = sum(1 for w in report.web if w.date and w.date >= report.range_from)
|
||||
|
||||
total_recent = reddit_recent + x_recent + web_recent
|
||||
total_items = len(report.reddit) + len(report.x) + len(report.web)
|
||||
|
||||
return {
|
||||
"reddit_recent": reddit_recent,
|
||||
"x_recent": x_recent,
|
||||
"web_recent": web_recent,
|
||||
"total_recent": total_recent,
|
||||
"total_items": total_items,
|
||||
"is_sparse": total_recent < 5,
|
||||
"mostly_evergreen": total_items > 0 and total_recent < total_items * 0.3,
|
||||
}
|
||||
|
||||
|
||||
def render_compact(report: schema.Report, limit: int = 15, missing_keys: str = "none") -> str:
|
||||
"""Render compact output for Claude to synthesize.
|
||||
|
||||
Args:
|
||||
report: Report data
|
||||
limit: Max items per source
|
||||
missing_keys: 'both', 'reddit', 'x', or 'none'
|
||||
|
||||
Returns:
|
||||
Compact markdown string
|
||||
"""
|
||||
lines = []
|
||||
|
||||
# Header
|
||||
lines.append(f"## Research Results: {report.topic}")
|
||||
lines.append("")
|
||||
|
||||
# Assess data freshness and add honesty warning if needed
|
||||
freshness = _assess_data_freshness(report)
|
||||
if freshness["is_sparse"]:
|
||||
lines.append("**⚠️ LIMITED RECENT DATA** - Few discussions from the last 30 days.")
|
||||
lines.append(f"Only {freshness['total_recent']} item(s) confirmed from {report.range_from} to {report.range_to}.")
|
||||
lines.append("Results below may include older/evergreen content. Be transparent with the user about this.")
|
||||
lines.append("")
|
||||
|
||||
# Web-only mode banner (when no API keys)
|
||||
if report.mode == "web-only":
|
||||
lines.append("**🌐 WEB SEARCH MODE** - Claude will search blogs, docs & news")
|
||||
lines.append("")
|
||||
lines.append("---")
|
||||
lines.append("**⚡ Want better results?** Add API keys to unlock Reddit & X data:")
|
||||
lines.append("- `OPENAI_API_KEY` → Reddit threads with real upvotes & comments")
|
||||
lines.append("- `XAI_API_KEY` → X posts with real likes & reposts")
|
||||
lines.append("- Edit `~/.config/last30days/.env` to add keys")
|
||||
lines.append("---")
|
||||
lines.append("")
|
||||
|
||||
# Cache indicator
|
||||
if report.from_cache:
|
||||
age_str = f"{report.cache_age_hours:.1f}h old" if report.cache_age_hours else "cached"
|
||||
lines.append(f"**⚡ CACHED RESULTS** ({age_str}) - use `--refresh` for fresh data")
|
||||
lines.append("")
|
||||
|
||||
lines.append(f"**Date Range:** {report.range_from} to {report.range_to}")
|
||||
lines.append(f"**Mode:** {report.mode}")
|
||||
if report.openai_model_used:
|
||||
lines.append(f"**OpenAI Model:** {report.openai_model_used}")
|
||||
if report.xai_model_used:
|
||||
lines.append(f"**xAI Model:** {report.xai_model_used}")
|
||||
lines.append("")
|
||||
|
||||
# Coverage note for partial coverage
|
||||
if report.mode == "reddit-only" and missing_keys == "x":
|
||||
lines.append("*💡 Tip: Add XAI_API_KEY for X/Twitter data and better triangulation.*")
|
||||
lines.append("")
|
||||
elif report.mode == "x-only" and missing_keys == "reddit":
|
||||
lines.append("*💡 Tip: Add OPENAI_API_KEY for Reddit data and better triangulation.*")
|
||||
lines.append("")
|
||||
|
||||
# Reddit items
|
||||
if report.reddit_error:
|
||||
lines.append("### Reddit Threads")
|
||||
lines.append("")
|
||||
lines.append(f"**ERROR:** {report.reddit_error}")
|
||||
lines.append("")
|
||||
elif report.mode in ("both", "reddit-only") and not report.reddit:
|
||||
lines.append("### Reddit Threads")
|
||||
lines.append("")
|
||||
lines.append("*No relevant Reddit threads found for this topic.*")
|
||||
lines.append("")
|
||||
elif report.reddit:
|
||||
lines.append("### Reddit Threads")
|
||||
lines.append("")
|
||||
for item in report.reddit[:limit]:
|
||||
eng_str = ""
|
||||
if item.engagement:
|
||||
eng = item.engagement
|
||||
parts = []
|
||||
if eng.score is not None:
|
||||
parts.append(f"{eng.score}pts")
|
||||
if eng.num_comments is not None:
|
||||
parts.append(f"{eng.num_comments}cmt")
|
||||
if parts:
|
||||
eng_str = f" [{', '.join(parts)}]"
|
||||
|
||||
date_str = f" ({item.date})" if item.date else " (date unknown)"
|
||||
conf_str = f" [date:{item.date_confidence}]" if item.date_confidence != "high" else ""
|
||||
|
||||
lines.append(f"**{item.id}** (score:{item.score}) r/{item.subreddit}{date_str}{conf_str}{eng_str}")
|
||||
lines.append(f" {item.title}")
|
||||
lines.append(f" {item.url}")
|
||||
lines.append(f" *{item.why_relevant}*")
|
||||
|
||||
# Top comment insights
|
||||
if item.comment_insights:
|
||||
lines.append(f" Insights:")
|
||||
for insight in item.comment_insights[:3]:
|
||||
lines.append(f" - {insight}")
|
||||
|
||||
lines.append("")
|
||||
|
||||
# X items
|
||||
if report.x_error:
|
||||
lines.append("### X Posts")
|
||||
lines.append("")
|
||||
lines.append(f"**ERROR:** {report.x_error}")
|
||||
lines.append("")
|
||||
elif report.mode in ("both", "x-only", "all", "x-web") and not report.x:
|
||||
lines.append("### X Posts")
|
||||
lines.append("")
|
||||
lines.append("*No relevant X posts found for this topic.*")
|
||||
lines.append("")
|
||||
elif report.x:
|
||||
lines.append("### X Posts")
|
||||
lines.append("")
|
||||
for item in report.x[:limit]:
|
||||
eng_str = ""
|
||||
if item.engagement:
|
||||
eng = item.engagement
|
||||
parts = []
|
||||
if eng.likes is not None:
|
||||
parts.append(f"{eng.likes}likes")
|
||||
if eng.reposts is not None:
|
||||
parts.append(f"{eng.reposts}rt")
|
||||
if parts:
|
||||
eng_str = f" [{', '.join(parts)}]"
|
||||
|
||||
date_str = f" ({item.date})" if item.date else " (date unknown)"
|
||||
conf_str = f" [date:{item.date_confidence}]" if item.date_confidence != "high" else ""
|
||||
|
||||
lines.append(f"**{item.id}** (score:{item.score}) @{item.author_handle}{date_str}{conf_str}{eng_str}")
|
||||
lines.append(f" {item.text[:200]}...")
|
||||
lines.append(f" {item.url}")
|
||||
lines.append(f" *{item.why_relevant}*")
|
||||
lines.append("")
|
||||
|
||||
# Web items (if any - populated by Claude)
|
||||
if report.web_error:
|
||||
lines.append("### Web Results")
|
||||
lines.append("")
|
||||
lines.append(f"**ERROR:** {report.web_error}")
|
||||
lines.append("")
|
||||
elif report.web:
|
||||
lines.append("### Web Results")
|
||||
lines.append("")
|
||||
for item in report.web[:limit]:
|
||||
date_str = f" ({item.date})" if item.date else " (date unknown)"
|
||||
conf_str = f" [date:{item.date_confidence}]" if item.date_confidence != "high" else ""
|
||||
|
||||
lines.append(f"**{item.id}** [WEB] (score:{item.score}) {item.source_domain}{date_str}{conf_str}")
|
||||
lines.append(f" {item.title}")
|
||||
lines.append(f" {item.url}")
|
||||
lines.append(f" {item.snippet[:150]}...")
|
||||
lines.append(f" *{item.why_relevant}*")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def render_context_snippet(report: schema.Report) -> str:
|
||||
"""Render reusable context snippet.
|
||||
|
||||
Args:
|
||||
report: Report data
|
||||
|
||||
Returns:
|
||||
Context markdown string
|
||||
"""
|
||||
lines = []
|
||||
lines.append(f"# Context: {report.topic} (Last 30 Days)")
|
||||
lines.append("")
|
||||
lines.append(f"*Generated: {report.generated_at[:10]} | Sources: {report.mode}*")
|
||||
lines.append("")
|
||||
|
||||
# Key sources summary
|
||||
lines.append("## Key Sources")
|
||||
lines.append("")
|
||||
|
||||
all_items = []
|
||||
for item in report.reddit[:5]:
|
||||
all_items.append((item.score, "Reddit", item.title, item.url))
|
||||
for item in report.x[:5]:
|
||||
all_items.append((item.score, "X", item.text[:50] + "...", item.url))
|
||||
for item in report.web[:5]:
|
||||
all_items.append((item.score, "Web", item.title[:50] + "...", item.url))
|
||||
|
||||
all_items.sort(key=lambda x: -x[0])
|
||||
for score, source, text, url in all_items[:7]:
|
||||
lines.append(f"- [{source}] {text}")
|
||||
|
||||
lines.append("")
|
||||
lines.append("## Summary")
|
||||
lines.append("")
|
||||
lines.append("*See full report for best practices, prompt pack, and detailed sources.*")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def render_full_report(report: schema.Report) -> str:
|
||||
"""Render full markdown report.
|
||||
|
||||
Args:
|
||||
report: Report data
|
||||
|
||||
Returns:
|
||||
Full report markdown
|
||||
"""
|
||||
lines = []
|
||||
|
||||
# Title
|
||||
lines.append(f"# {report.topic} - Last 30 Days Research Report")
|
||||
lines.append("")
|
||||
lines.append(f"**Generated:** {report.generated_at}")
|
||||
lines.append(f"**Date Range:** {report.range_from} to {report.range_to}")
|
||||
lines.append(f"**Mode:** {report.mode}")
|
||||
lines.append("")
|
||||
|
||||
# Models
|
||||
lines.append("## Models Used")
|
||||
lines.append("")
|
||||
if report.openai_model_used:
|
||||
lines.append(f"- **OpenAI:** {report.openai_model_used}")
|
||||
if report.xai_model_used:
|
||||
lines.append(f"- **xAI:** {report.xai_model_used}")
|
||||
lines.append("")
|
||||
|
||||
# Reddit section
|
||||
if report.reddit:
|
||||
lines.append("## Reddit Threads")
|
||||
lines.append("")
|
||||
for item in report.reddit:
|
||||
lines.append(f"### {item.id}: {item.title}")
|
||||
lines.append("")
|
||||
lines.append(f"- **Subreddit:** r/{item.subreddit}")
|
||||
lines.append(f"- **URL:** {item.url}")
|
||||
lines.append(f"- **Date:** {item.date or 'Unknown'} (confidence: {item.date_confidence})")
|
||||
lines.append(f"- **Score:** {item.score}/100")
|
||||
lines.append(f"- **Relevance:** {item.why_relevant}")
|
||||
|
||||
if item.engagement:
|
||||
eng = item.engagement
|
||||
lines.append(f"- **Engagement:** {eng.score or '?'} points, {eng.num_comments or '?'} comments")
|
||||
|
||||
if item.comment_insights:
|
||||
lines.append("")
|
||||
lines.append("**Key Insights from Comments:**")
|
||||
for insight in item.comment_insights:
|
||||
lines.append(f"- {insight}")
|
||||
|
||||
lines.append("")
|
||||
|
||||
# X section
|
||||
if report.x:
|
||||
lines.append("## X Posts")
|
||||
lines.append("")
|
||||
for item in report.x:
|
||||
lines.append(f"### {item.id}: @{item.author_handle}")
|
||||
lines.append("")
|
||||
lines.append(f"- **URL:** {item.url}")
|
||||
lines.append(f"- **Date:** {item.date or 'Unknown'} (confidence: {item.date_confidence})")
|
||||
lines.append(f"- **Score:** {item.score}/100")
|
||||
lines.append(f"- **Relevance:** {item.why_relevant}")
|
||||
|
||||
if item.engagement:
|
||||
eng = item.engagement
|
||||
lines.append(f"- **Engagement:** {eng.likes or '?'} likes, {eng.reposts or '?'} reposts")
|
||||
|
||||
lines.append("")
|
||||
lines.append(f"> {item.text}")
|
||||
lines.append("")
|
||||
|
||||
# Web section
|
||||
if report.web:
|
||||
lines.append("## Web Results")
|
||||
lines.append("")
|
||||
for item in report.web:
|
||||
lines.append(f"### {item.id}: {item.title}")
|
||||
lines.append("")
|
||||
lines.append(f"- **Source:** {item.source_domain}")
|
||||
lines.append(f"- **URL:** {item.url}")
|
||||
lines.append(f"- **Date:** {item.date or 'Unknown'} (confidence: {item.date_confidence})")
|
||||
lines.append(f"- **Score:** {item.score}/100")
|
||||
lines.append(f"- **Relevance:** {item.why_relevant}")
|
||||
lines.append("")
|
||||
lines.append(f"> {item.snippet}")
|
||||
lines.append("")
|
||||
|
||||
# Placeholders for Claude synthesis
|
||||
lines.append("## Best Practices")
|
||||
lines.append("")
|
||||
lines.append("*To be synthesized by Claude*")
|
||||
lines.append("")
|
||||
|
||||
lines.append("## Prompt Pack")
|
||||
lines.append("")
|
||||
lines.append("*To be synthesized by Claude*")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def write_outputs(
|
||||
report: schema.Report,
|
||||
raw_openai: Optional[dict] = None,
|
||||
raw_xai: Optional[dict] = None,
|
||||
raw_reddit_enriched: Optional[list] = None,
|
||||
):
|
||||
"""Write all output files.
|
||||
|
||||
Args:
|
||||
report: Report data
|
||||
raw_openai: Raw OpenAI API response
|
||||
raw_xai: Raw xAI API response
|
||||
raw_reddit_enriched: Raw enriched Reddit thread data
|
||||
"""
|
||||
ensure_output_dir()
|
||||
|
||||
# report.json
|
||||
with open(OUTPUT_DIR / "report.json", 'w') as f:
|
||||
json.dump(report.to_dict(), f, indent=2)
|
||||
|
||||
# report.md
|
||||
with open(OUTPUT_DIR / "report.md", 'w') as f:
|
||||
f.write(render_full_report(report))
|
||||
|
||||
# last30days.context.md
|
||||
with open(OUTPUT_DIR / "last30days.context.md", 'w') as f:
|
||||
f.write(render_context_snippet(report))
|
||||
|
||||
# Raw responses
|
||||
if raw_openai:
|
||||
with open(OUTPUT_DIR / "raw_openai.json", 'w') as f:
|
||||
json.dump(raw_openai, f, indent=2)
|
||||
|
||||
if raw_xai:
|
||||
with open(OUTPUT_DIR / "raw_xai.json", 'w') as f:
|
||||
json.dump(raw_xai, f, indent=2)
|
||||
|
||||
if raw_reddit_enriched:
|
||||
with open(OUTPUT_DIR / "raw_reddit_threads_enriched.json", 'w') as f:
|
||||
json.dump(raw_reddit_enriched, f, indent=2)
|
||||
|
||||
|
||||
def get_context_path() -> str:
|
||||
"""Get path to context file."""
|
||||
return str(OUTPUT_DIR / "last30days.context.md")
|
||||
336
skills/last30days/scripts/lib/schema.py
Normal file
336
skills/last30days/scripts/lib/schema.py
Normal file
@@ -0,0 +1,336 @@
|
||||
"""Data schemas for last30days skill."""
|
||||
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from typing import Any, Dict, List, Optional
|
||||
from datetime import datetime, timezone
|
||||
|
||||
|
||||
@dataclass
|
||||
class Engagement:
|
||||
"""Engagement metrics."""
|
||||
# Reddit fields
|
||||
score: Optional[int] = None
|
||||
num_comments: Optional[int] = None
|
||||
upvote_ratio: Optional[float] = None
|
||||
|
||||
# X fields
|
||||
likes: Optional[int] = None
|
||||
reposts: Optional[int] = None
|
||||
replies: Optional[int] = None
|
||||
quotes: Optional[int] = None
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d = {}
|
||||
if self.score is not None:
|
||||
d['score'] = self.score
|
||||
if self.num_comments is not None:
|
||||
d['num_comments'] = self.num_comments
|
||||
if self.upvote_ratio is not None:
|
||||
d['upvote_ratio'] = self.upvote_ratio
|
||||
if self.likes is not None:
|
||||
d['likes'] = self.likes
|
||||
if self.reposts is not None:
|
||||
d['reposts'] = self.reposts
|
||||
if self.replies is not None:
|
||||
d['replies'] = self.replies
|
||||
if self.quotes is not None:
|
||||
d['quotes'] = self.quotes
|
||||
return d if d else None
|
||||
|
||||
|
||||
@dataclass
|
||||
class Comment:
|
||||
"""Reddit comment."""
|
||||
score: int
|
||||
date: Optional[str]
|
||||
author: str
|
||||
excerpt: str
|
||||
url: str
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
'score': self.score,
|
||||
'date': self.date,
|
||||
'author': self.author,
|
||||
'excerpt': self.excerpt,
|
||||
'url': self.url,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class SubScores:
|
||||
"""Component scores."""
|
||||
relevance: int = 0
|
||||
recency: int = 0
|
||||
engagement: int = 0
|
||||
|
||||
def to_dict(self) -> Dict[str, int]:
|
||||
return {
|
||||
'relevance': self.relevance,
|
||||
'recency': self.recency,
|
||||
'engagement': self.engagement,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class RedditItem:
|
||||
"""Normalized Reddit item."""
|
||||
id: str
|
||||
title: str
|
||||
url: str
|
||||
subreddit: str
|
||||
date: Optional[str] = None
|
||||
date_confidence: str = "low"
|
||||
engagement: Optional[Engagement] = None
|
||||
top_comments: List[Comment] = field(default_factory=list)
|
||||
comment_insights: List[str] = field(default_factory=list)
|
||||
relevance: float = 0.5
|
||||
why_relevant: str = ""
|
||||
subs: SubScores = field(default_factory=SubScores)
|
||||
score: int = 0
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
'id': self.id,
|
||||
'title': self.title,
|
||||
'url': self.url,
|
||||
'subreddit': self.subreddit,
|
||||
'date': self.date,
|
||||
'date_confidence': self.date_confidence,
|
||||
'engagement': self.engagement.to_dict() if self.engagement else None,
|
||||
'top_comments': [c.to_dict() for c in self.top_comments],
|
||||
'comment_insights': self.comment_insights,
|
||||
'relevance': self.relevance,
|
||||
'why_relevant': self.why_relevant,
|
||||
'subs': self.subs.to_dict(),
|
||||
'score': self.score,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class XItem:
|
||||
"""Normalized X item."""
|
||||
id: str
|
||||
text: str
|
||||
url: str
|
||||
author_handle: str
|
||||
date: Optional[str] = None
|
||||
date_confidence: str = "low"
|
||||
engagement: Optional[Engagement] = None
|
||||
relevance: float = 0.5
|
||||
why_relevant: str = ""
|
||||
subs: SubScores = field(default_factory=SubScores)
|
||||
score: int = 0
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
'id': self.id,
|
||||
'text': self.text,
|
||||
'url': self.url,
|
||||
'author_handle': self.author_handle,
|
||||
'date': self.date,
|
||||
'date_confidence': self.date_confidence,
|
||||
'engagement': self.engagement.to_dict() if self.engagement else None,
|
||||
'relevance': self.relevance,
|
||||
'why_relevant': self.why_relevant,
|
||||
'subs': self.subs.to_dict(),
|
||||
'score': self.score,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class WebSearchItem:
|
||||
"""Normalized web search item (no engagement metrics)."""
|
||||
id: str
|
||||
title: str
|
||||
url: str
|
||||
source_domain: str # e.g., "medium.com", "github.com"
|
||||
snippet: str
|
||||
date: Optional[str] = None
|
||||
date_confidence: str = "low"
|
||||
relevance: float = 0.5
|
||||
why_relevant: str = ""
|
||||
subs: SubScores = field(default_factory=SubScores)
|
||||
score: int = 0
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
return {
|
||||
'id': self.id,
|
||||
'title': self.title,
|
||||
'url': self.url,
|
||||
'source_domain': self.source_domain,
|
||||
'snippet': self.snippet,
|
||||
'date': self.date,
|
||||
'date_confidence': self.date_confidence,
|
||||
'relevance': self.relevance,
|
||||
'why_relevant': self.why_relevant,
|
||||
'subs': self.subs.to_dict(),
|
||||
'score': self.score,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class Report:
|
||||
"""Full research report."""
|
||||
topic: str
|
||||
range_from: str
|
||||
range_to: str
|
||||
generated_at: str
|
||||
mode: str # 'reddit-only', 'x-only', 'both', 'web-only', etc.
|
||||
openai_model_used: Optional[str] = None
|
||||
xai_model_used: Optional[str] = None
|
||||
reddit: List[RedditItem] = field(default_factory=list)
|
||||
x: List[XItem] = field(default_factory=list)
|
||||
web: List[WebSearchItem] = field(default_factory=list)
|
||||
best_practices: List[str] = field(default_factory=list)
|
||||
prompt_pack: List[str] = field(default_factory=list)
|
||||
context_snippet_md: str = ""
|
||||
# Status tracking
|
||||
reddit_error: Optional[str] = None
|
||||
x_error: Optional[str] = None
|
||||
web_error: Optional[str] = None
|
||||
# Cache info
|
||||
from_cache: bool = False
|
||||
cache_age_hours: Optional[float] = None
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d = {
|
||||
'topic': self.topic,
|
||||
'range': {
|
||||
'from': self.range_from,
|
||||
'to': self.range_to,
|
||||
},
|
||||
'generated_at': self.generated_at,
|
||||
'mode': self.mode,
|
||||
'openai_model_used': self.openai_model_used,
|
||||
'xai_model_used': self.xai_model_used,
|
||||
'reddit': [r.to_dict() for r in self.reddit],
|
||||
'x': [x.to_dict() for x in self.x],
|
||||
'web': [w.to_dict() for w in self.web],
|
||||
'best_practices': self.best_practices,
|
||||
'prompt_pack': self.prompt_pack,
|
||||
'context_snippet_md': self.context_snippet_md,
|
||||
}
|
||||
if self.reddit_error:
|
||||
d['reddit_error'] = self.reddit_error
|
||||
if self.x_error:
|
||||
d['x_error'] = self.x_error
|
||||
if self.web_error:
|
||||
d['web_error'] = self.web_error
|
||||
if self.from_cache:
|
||||
d['from_cache'] = self.from_cache
|
||||
if self.cache_age_hours is not None:
|
||||
d['cache_age_hours'] = self.cache_age_hours
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "Report":
|
||||
"""Create Report from serialized dict (handles cache format)."""
|
||||
# Handle range field conversion
|
||||
range_data = data.get('range', {})
|
||||
range_from = range_data.get('from', data.get('range_from', ''))
|
||||
range_to = range_data.get('to', data.get('range_to', ''))
|
||||
|
||||
# Reconstruct Reddit items
|
||||
reddit_items = []
|
||||
for r in data.get('reddit', []):
|
||||
eng = None
|
||||
if r.get('engagement'):
|
||||
eng = Engagement(**r['engagement'])
|
||||
comments = [Comment(**c) for c in r.get('top_comments', [])]
|
||||
subs = SubScores(**r.get('subs', {})) if r.get('subs') else SubScores()
|
||||
reddit_items.append(RedditItem(
|
||||
id=r['id'],
|
||||
title=r['title'],
|
||||
url=r['url'],
|
||||
subreddit=r['subreddit'],
|
||||
date=r.get('date'),
|
||||
date_confidence=r.get('date_confidence', 'low'),
|
||||
engagement=eng,
|
||||
top_comments=comments,
|
||||
comment_insights=r.get('comment_insights', []),
|
||||
relevance=r.get('relevance', 0.5),
|
||||
why_relevant=r.get('why_relevant', ''),
|
||||
subs=subs,
|
||||
score=r.get('score', 0),
|
||||
))
|
||||
|
||||
# Reconstruct X items
|
||||
x_items = []
|
||||
for x in data.get('x', []):
|
||||
eng = None
|
||||
if x.get('engagement'):
|
||||
eng = Engagement(**x['engagement'])
|
||||
subs = SubScores(**x.get('subs', {})) if x.get('subs') else SubScores()
|
||||
x_items.append(XItem(
|
||||
id=x['id'],
|
||||
text=x['text'],
|
||||
url=x['url'],
|
||||
author_handle=x['author_handle'],
|
||||
date=x.get('date'),
|
||||
date_confidence=x.get('date_confidence', 'low'),
|
||||
engagement=eng,
|
||||
relevance=x.get('relevance', 0.5),
|
||||
why_relevant=x.get('why_relevant', ''),
|
||||
subs=subs,
|
||||
score=x.get('score', 0),
|
||||
))
|
||||
|
||||
# Reconstruct Web items
|
||||
web_items = []
|
||||
for w in data.get('web', []):
|
||||
subs = SubScores(**w.get('subs', {})) if w.get('subs') else SubScores()
|
||||
web_items.append(WebSearchItem(
|
||||
id=w['id'],
|
||||
title=w['title'],
|
||||
url=w['url'],
|
||||
source_domain=w.get('source_domain', ''),
|
||||
snippet=w.get('snippet', ''),
|
||||
date=w.get('date'),
|
||||
date_confidence=w.get('date_confidence', 'low'),
|
||||
relevance=w.get('relevance', 0.5),
|
||||
why_relevant=w.get('why_relevant', ''),
|
||||
subs=subs,
|
||||
score=w.get('score', 0),
|
||||
))
|
||||
|
||||
return cls(
|
||||
topic=data['topic'],
|
||||
range_from=range_from,
|
||||
range_to=range_to,
|
||||
generated_at=data['generated_at'],
|
||||
mode=data['mode'],
|
||||
openai_model_used=data.get('openai_model_used'),
|
||||
xai_model_used=data.get('xai_model_used'),
|
||||
reddit=reddit_items,
|
||||
x=x_items,
|
||||
web=web_items,
|
||||
best_practices=data.get('best_practices', []),
|
||||
prompt_pack=data.get('prompt_pack', []),
|
||||
context_snippet_md=data.get('context_snippet_md', ''),
|
||||
reddit_error=data.get('reddit_error'),
|
||||
x_error=data.get('x_error'),
|
||||
web_error=data.get('web_error'),
|
||||
from_cache=data.get('from_cache', False),
|
||||
cache_age_hours=data.get('cache_age_hours'),
|
||||
)
|
||||
|
||||
|
||||
def create_report(
|
||||
topic: str,
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
mode: str,
|
||||
openai_model: Optional[str] = None,
|
||||
xai_model: Optional[str] = None,
|
||||
) -> Report:
|
||||
"""Create a new report with metadata."""
|
||||
return Report(
|
||||
topic=topic,
|
||||
range_from=from_date,
|
||||
range_to=to_date,
|
||||
generated_at=datetime.now(timezone.utc).isoformat(),
|
||||
mode=mode,
|
||||
openai_model_used=openai_model,
|
||||
xai_model_used=xai_model,
|
||||
)
|
||||
311
skills/last30days/scripts/lib/score.py
Normal file
311
skills/last30days/scripts/lib/score.py
Normal file
@@ -0,0 +1,311 @@
|
||||
"""Popularity-aware scoring for last30days skill."""
|
||||
|
||||
import math
|
||||
from typing import List, Optional, Union
|
||||
|
||||
from . import dates, schema
|
||||
|
||||
# Score weights for Reddit/X (has engagement)
|
||||
WEIGHT_RELEVANCE = 0.45
|
||||
WEIGHT_RECENCY = 0.25
|
||||
WEIGHT_ENGAGEMENT = 0.30
|
||||
|
||||
# WebSearch weights (no engagement, reweighted to 100%)
|
||||
WEBSEARCH_WEIGHT_RELEVANCE = 0.55
|
||||
WEBSEARCH_WEIGHT_RECENCY = 0.45
|
||||
WEBSEARCH_SOURCE_PENALTY = 15 # Points deducted for lacking engagement
|
||||
|
||||
# WebSearch date confidence adjustments
|
||||
WEBSEARCH_VERIFIED_BONUS = 10 # Bonus for URL-verified recent date (high confidence)
|
||||
WEBSEARCH_NO_DATE_PENALTY = 20 # Heavy penalty for no date signals (low confidence)
|
||||
|
||||
# Default engagement score for unknown
|
||||
DEFAULT_ENGAGEMENT = 35
|
||||
UNKNOWN_ENGAGEMENT_PENALTY = 10
|
||||
|
||||
|
||||
def log1p_safe(x: Optional[int]) -> float:
|
||||
"""Safe log1p that handles None and negative values."""
|
||||
if x is None or x < 0:
|
||||
return 0.0
|
||||
return math.log1p(x)
|
||||
|
||||
|
||||
def compute_reddit_engagement_raw(engagement: Optional[schema.Engagement]) -> Optional[float]:
|
||||
"""Compute raw engagement score for Reddit item.
|
||||
|
||||
Formula: 0.55*log1p(score) + 0.40*log1p(num_comments) + 0.05*(upvote_ratio*10)
|
||||
"""
|
||||
if engagement is None:
|
||||
return None
|
||||
|
||||
if engagement.score is None and engagement.num_comments is None:
|
||||
return None
|
||||
|
||||
score = log1p_safe(engagement.score)
|
||||
comments = log1p_safe(engagement.num_comments)
|
||||
ratio = (engagement.upvote_ratio or 0.5) * 10
|
||||
|
||||
return 0.55 * score + 0.40 * comments + 0.05 * ratio
|
||||
|
||||
|
||||
def compute_x_engagement_raw(engagement: Optional[schema.Engagement]) -> Optional[float]:
|
||||
"""Compute raw engagement score for X item.
|
||||
|
||||
Formula: 0.55*log1p(likes) + 0.25*log1p(reposts) + 0.15*log1p(replies) + 0.05*log1p(quotes)
|
||||
"""
|
||||
if engagement is None:
|
||||
return None
|
||||
|
||||
if engagement.likes is None and engagement.reposts is None:
|
||||
return None
|
||||
|
||||
likes = log1p_safe(engagement.likes)
|
||||
reposts = log1p_safe(engagement.reposts)
|
||||
replies = log1p_safe(engagement.replies)
|
||||
quotes = log1p_safe(engagement.quotes)
|
||||
|
||||
return 0.55 * likes + 0.25 * reposts + 0.15 * replies + 0.05 * quotes
|
||||
|
||||
|
||||
def normalize_to_100(values: List[float], default: float = 50) -> List[float]:
|
||||
"""Normalize a list of values to 0-100 scale.
|
||||
|
||||
Args:
|
||||
values: Raw values (None values are preserved)
|
||||
default: Default value for None entries
|
||||
|
||||
Returns:
|
||||
Normalized values
|
||||
"""
|
||||
# Filter out None
|
||||
valid = [v for v in values if v is not None]
|
||||
if not valid:
|
||||
return [default if v is None else 50 for v in values]
|
||||
|
||||
min_val = min(valid)
|
||||
max_val = max(valid)
|
||||
range_val = max_val - min_val
|
||||
|
||||
if range_val == 0:
|
||||
return [50 if v is None else 50 for v in values]
|
||||
|
||||
result = []
|
||||
for v in values:
|
||||
if v is None:
|
||||
result.append(None)
|
||||
else:
|
||||
normalized = ((v - min_val) / range_val) * 100
|
||||
result.append(normalized)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def score_reddit_items(items: List[schema.RedditItem]) -> List[schema.RedditItem]:
|
||||
"""Compute scores for Reddit items.
|
||||
|
||||
Args:
|
||||
items: List of Reddit items
|
||||
|
||||
Returns:
|
||||
Items with updated scores
|
||||
"""
|
||||
if not items:
|
||||
return items
|
||||
|
||||
# Compute raw engagement scores
|
||||
eng_raw = [compute_reddit_engagement_raw(item.engagement) for item in items]
|
||||
|
||||
# Normalize engagement to 0-100
|
||||
eng_normalized = normalize_to_100(eng_raw)
|
||||
|
||||
for i, item in enumerate(items):
|
||||
# Relevance subscore (model-provided, convert to 0-100)
|
||||
rel_score = int(item.relevance * 100)
|
||||
|
||||
# Recency subscore
|
||||
rec_score = dates.recency_score(item.date)
|
||||
|
||||
# Engagement subscore
|
||||
if eng_normalized[i] is not None:
|
||||
eng_score = int(eng_normalized[i])
|
||||
else:
|
||||
eng_score = DEFAULT_ENGAGEMENT
|
||||
|
||||
# Store subscores
|
||||
item.subs = schema.SubScores(
|
||||
relevance=rel_score,
|
||||
recency=rec_score,
|
||||
engagement=eng_score,
|
||||
)
|
||||
|
||||
# Compute overall score
|
||||
overall = (
|
||||
WEIGHT_RELEVANCE * rel_score +
|
||||
WEIGHT_RECENCY * rec_score +
|
||||
WEIGHT_ENGAGEMENT * eng_score
|
||||
)
|
||||
|
||||
# Apply penalty for unknown engagement
|
||||
if eng_raw[i] is None:
|
||||
overall -= UNKNOWN_ENGAGEMENT_PENALTY
|
||||
|
||||
# Apply penalty for low date confidence
|
||||
if item.date_confidence == "low":
|
||||
overall -= 10
|
||||
elif item.date_confidence == "med":
|
||||
overall -= 5
|
||||
|
||||
item.score = max(0, min(100, int(overall)))
|
||||
|
||||
return items
|
||||
|
||||
|
||||
def score_x_items(items: List[schema.XItem]) -> List[schema.XItem]:
|
||||
"""Compute scores for X items.
|
||||
|
||||
Args:
|
||||
items: List of X items
|
||||
|
||||
Returns:
|
||||
Items with updated scores
|
||||
"""
|
||||
if not items:
|
||||
return items
|
||||
|
||||
# Compute raw engagement scores
|
||||
eng_raw = [compute_x_engagement_raw(item.engagement) for item in items]
|
||||
|
||||
# Normalize engagement to 0-100
|
||||
eng_normalized = normalize_to_100(eng_raw)
|
||||
|
||||
for i, item in enumerate(items):
|
||||
# Relevance subscore (model-provided, convert to 0-100)
|
||||
rel_score = int(item.relevance * 100)
|
||||
|
||||
# Recency subscore
|
||||
rec_score = dates.recency_score(item.date)
|
||||
|
||||
# Engagement subscore
|
||||
if eng_normalized[i] is not None:
|
||||
eng_score = int(eng_normalized[i])
|
||||
else:
|
||||
eng_score = DEFAULT_ENGAGEMENT
|
||||
|
||||
# Store subscores
|
||||
item.subs = schema.SubScores(
|
||||
relevance=rel_score,
|
||||
recency=rec_score,
|
||||
engagement=eng_score,
|
||||
)
|
||||
|
||||
# Compute overall score
|
||||
overall = (
|
||||
WEIGHT_RELEVANCE * rel_score +
|
||||
WEIGHT_RECENCY * rec_score +
|
||||
WEIGHT_ENGAGEMENT * eng_score
|
||||
)
|
||||
|
||||
# Apply penalty for unknown engagement
|
||||
if eng_raw[i] is None:
|
||||
overall -= UNKNOWN_ENGAGEMENT_PENALTY
|
||||
|
||||
# Apply penalty for low date confidence
|
||||
if item.date_confidence == "low":
|
||||
overall -= 10
|
||||
elif item.date_confidence == "med":
|
||||
overall -= 5
|
||||
|
||||
item.score = max(0, min(100, int(overall)))
|
||||
|
||||
return items
|
||||
|
||||
|
||||
def score_websearch_items(items: List[schema.WebSearchItem]) -> List[schema.WebSearchItem]:
|
||||
"""Compute scores for WebSearch items WITHOUT engagement metrics.
|
||||
|
||||
Uses reweighted formula: 55% relevance + 45% recency - 15pt source penalty.
|
||||
This ensures WebSearch items rank below comparable Reddit/X items.
|
||||
|
||||
Date confidence adjustments:
|
||||
- High confidence (URL-verified date): +10 bonus
|
||||
- Med confidence (snippet-extracted date): no change
|
||||
- Low confidence (no date signals): -20 penalty
|
||||
|
||||
Args:
|
||||
items: List of WebSearch items
|
||||
|
||||
Returns:
|
||||
Items with updated scores
|
||||
"""
|
||||
if not items:
|
||||
return items
|
||||
|
||||
for item in items:
|
||||
# Relevance subscore (model-provided, convert to 0-100)
|
||||
rel_score = int(item.relevance * 100)
|
||||
|
||||
# Recency subscore
|
||||
rec_score = dates.recency_score(item.date)
|
||||
|
||||
# Store subscores (engagement is 0 for WebSearch - no data)
|
||||
item.subs = schema.SubScores(
|
||||
relevance=rel_score,
|
||||
recency=rec_score,
|
||||
engagement=0, # Explicitly zero - no engagement data available
|
||||
)
|
||||
|
||||
# Compute overall score using WebSearch weights
|
||||
overall = (
|
||||
WEBSEARCH_WEIGHT_RELEVANCE * rel_score +
|
||||
WEBSEARCH_WEIGHT_RECENCY * rec_score
|
||||
)
|
||||
|
||||
# Apply source penalty (WebSearch < Reddit/X for same relevance/recency)
|
||||
overall -= WEBSEARCH_SOURCE_PENALTY
|
||||
|
||||
# Apply date confidence adjustments
|
||||
# High confidence (URL-verified): reward with bonus
|
||||
# Med confidence (snippet-extracted): neutral
|
||||
# Low confidence (no date signals): heavy penalty
|
||||
if item.date_confidence == "high":
|
||||
overall += WEBSEARCH_VERIFIED_BONUS # Reward verified recent dates
|
||||
elif item.date_confidence == "low":
|
||||
overall -= WEBSEARCH_NO_DATE_PENALTY # Heavy penalty for unknown
|
||||
|
||||
item.score = max(0, min(100, int(overall)))
|
||||
|
||||
return items
|
||||
|
||||
|
||||
def sort_items(items: List[Union[schema.RedditItem, schema.XItem, schema.WebSearchItem]]) -> List:
|
||||
"""Sort items by score (descending), then date, then source priority.
|
||||
|
||||
Args:
|
||||
items: List of items to sort
|
||||
|
||||
Returns:
|
||||
Sorted items
|
||||
"""
|
||||
def sort_key(item):
|
||||
# Primary: score descending (negate for descending)
|
||||
score = -item.score
|
||||
|
||||
# Secondary: date descending (recent first)
|
||||
date = item.date or "0000-00-00"
|
||||
date_key = -int(date.replace("-", ""))
|
||||
|
||||
# Tertiary: source priority (Reddit > X > WebSearch)
|
||||
if isinstance(item, schema.RedditItem):
|
||||
source_priority = 0
|
||||
elif isinstance(item, schema.XItem):
|
||||
source_priority = 1
|
||||
else: # WebSearchItem
|
||||
source_priority = 2
|
||||
|
||||
# Quaternary: title/text for stability
|
||||
text = getattr(item, "title", "") or getattr(item, "text", "")
|
||||
|
||||
return (score, date_key, source_priority, text)
|
||||
|
||||
return sorted(items, key=sort_key)
|
||||
324
skills/last30days/scripts/lib/ui.py
Normal file
324
skills/last30days/scripts/lib/ui.py
Normal file
@@ -0,0 +1,324 @@
|
||||
"""Terminal UI utilities for last30days skill."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import threading
|
||||
import random
|
||||
from typing import Optional
|
||||
|
||||
# Check if we're in a real terminal (not captured by Claude Code)
|
||||
IS_TTY = sys.stderr.isatty()
|
||||
|
||||
# ANSI color codes
|
||||
class Colors:
|
||||
PURPLE = '\033[95m'
|
||||
BLUE = '\033[94m'
|
||||
CYAN = '\033[96m'
|
||||
GREEN = '\033[92m'
|
||||
YELLOW = '\033[93m'
|
||||
RED = '\033[91m'
|
||||
BOLD = '\033[1m'
|
||||
DIM = '\033[2m'
|
||||
RESET = '\033[0m'
|
||||
|
||||
|
||||
BANNER = f"""{Colors.PURPLE}{Colors.BOLD}
|
||||
██╗ █████╗ ███████╗████████╗██████╗ ██████╗ ██████╗ █████╗ ██╗ ██╗███████╗
|
||||
██║ ██╔══██╗██╔════╝╚══██╔══╝╚════██╗██╔═████╗██╔══██╗██╔══██╗╚██╗ ██╔╝██╔════╝
|
||||
██║ ███████║███████╗ ██║ █████╔╝██║██╔██║██║ ██║███████║ ╚████╔╝ ███████╗
|
||||
██║ ██╔══██║╚════██║ ██║ ╚═══██╗████╔╝██║██║ ██║██╔══██║ ╚██╔╝ ╚════██║
|
||||
███████╗██║ ██║███████║ ██║ ██████╔╝╚██████╔╝██████╔╝██║ ██║ ██║ ███████║
|
||||
╚══════╝╚═╝ ╚═╝╚══════╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═╝ ╚══════╝
|
||||
{Colors.RESET}{Colors.DIM} 30 days of research. 30 seconds of work.{Colors.RESET}
|
||||
"""
|
||||
|
||||
MINI_BANNER = f"""{Colors.PURPLE}{Colors.BOLD}/last30days{Colors.RESET} {Colors.DIM}· researching...{Colors.RESET}"""
|
||||
|
||||
# Fun status messages for each phase
|
||||
REDDIT_MESSAGES = [
|
||||
"Diving into Reddit threads...",
|
||||
"Scanning subreddits for gold...",
|
||||
"Reading what Redditors are saying...",
|
||||
"Exploring the front page of the internet...",
|
||||
"Finding the good discussions...",
|
||||
"Upvoting mentally...",
|
||||
"Scrolling through comments...",
|
||||
]
|
||||
|
||||
X_MESSAGES = [
|
||||
"Checking what X is buzzing about...",
|
||||
"Reading the timeline...",
|
||||
"Finding the hot takes...",
|
||||
"Scanning tweets and threads...",
|
||||
"Discovering trending insights...",
|
||||
"Following the conversation...",
|
||||
"Reading between the posts...",
|
||||
]
|
||||
|
||||
ENRICHING_MESSAGES = [
|
||||
"Getting the juicy details...",
|
||||
"Fetching engagement metrics...",
|
||||
"Reading top comments...",
|
||||
"Extracting insights...",
|
||||
"Analyzing discussions...",
|
||||
]
|
||||
|
||||
PROCESSING_MESSAGES = [
|
||||
"Crunching the data...",
|
||||
"Scoring and ranking...",
|
||||
"Finding patterns...",
|
||||
"Removing duplicates...",
|
||||
"Organizing findings...",
|
||||
]
|
||||
|
||||
WEB_ONLY_MESSAGES = [
|
||||
"Searching the web...",
|
||||
"Finding blogs and docs...",
|
||||
"Crawling news sites...",
|
||||
"Discovering tutorials...",
|
||||
]
|
||||
|
||||
# Promo message for users without API keys
|
||||
PROMO_MESSAGE = f"""
|
||||
{Colors.YELLOW}{Colors.BOLD}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━{Colors.RESET}
|
||||
{Colors.YELLOW}⚡ UNLOCK THE FULL POWER OF /last30days{Colors.RESET}
|
||||
|
||||
{Colors.DIM}Right now you're using web search only. Add API keys to unlock:{Colors.RESET}
|
||||
|
||||
{Colors.YELLOW}🟠 Reddit{Colors.RESET} - Real upvotes, comments, and community insights
|
||||
└─ Add OPENAI_API_KEY (uses OpenAI's web_search for Reddit)
|
||||
|
||||
{Colors.CYAN}🔵 X (Twitter){Colors.RESET} - Real-time posts, likes, reposts from creators
|
||||
└─ Add XAI_API_KEY (uses xAI's live X search)
|
||||
|
||||
{Colors.DIM}Setup:{Colors.RESET} Edit {Colors.BOLD}~/.config/last30days/.env{Colors.RESET}
|
||||
{Colors.YELLOW}{Colors.BOLD}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━{Colors.RESET}
|
||||
"""
|
||||
|
||||
PROMO_MESSAGE_PLAIN = """
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
⚡ UNLOCK THE FULL POWER OF /last30days
|
||||
|
||||
Right now you're using web search only. Add API keys to unlock:
|
||||
|
||||
🟠 Reddit - Real upvotes, comments, and community insights
|
||||
└─ Add OPENAI_API_KEY (uses OpenAI's web_search for Reddit)
|
||||
|
||||
🔵 X (Twitter) - Real-time posts, likes, reposts from creators
|
||||
└─ Add XAI_API_KEY (uses xAI's live X search)
|
||||
|
||||
Setup: Edit ~/.config/last30days/.env
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
"""
|
||||
|
||||
# Shorter promo for single missing key
|
||||
PROMO_SINGLE_KEY = {
|
||||
"reddit": f"""
|
||||
{Colors.DIM}💡 Tip: Add {Colors.YELLOW}OPENAI_API_KEY{Colors.RESET}{Colors.DIM} to ~/.config/last30days/.env for Reddit data with real engagement metrics!{Colors.RESET}
|
||||
""",
|
||||
"x": f"""
|
||||
{Colors.DIM}💡 Tip: Add {Colors.CYAN}XAI_API_KEY{Colors.RESET}{Colors.DIM} to ~/.config/last30days/.env for X/Twitter data with real likes & reposts!{Colors.RESET}
|
||||
""",
|
||||
}
|
||||
|
||||
PROMO_SINGLE_KEY_PLAIN = {
|
||||
"reddit": "\n💡 Tip: Add OPENAI_API_KEY to ~/.config/last30days/.env for Reddit data with real engagement metrics!\n",
|
||||
"x": "\n💡 Tip: Add XAI_API_KEY to ~/.config/last30days/.env for X/Twitter data with real likes & reposts!\n",
|
||||
}
|
||||
|
||||
# Spinner frames
|
||||
SPINNER_FRAMES = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']
|
||||
DOTS_FRAMES = [' ', '. ', '.. ', '...']
|
||||
|
||||
|
||||
class Spinner:
|
||||
"""Animated spinner for long-running operations."""
|
||||
|
||||
def __init__(self, message: str = "Working", color: str = Colors.CYAN):
|
||||
self.message = message
|
||||
self.color = color
|
||||
self.running = False
|
||||
self.thread: Optional[threading.Thread] = None
|
||||
self.frame_idx = 0
|
||||
self.shown_static = False
|
||||
|
||||
def _spin(self):
|
||||
while self.running:
|
||||
frame = SPINNER_FRAMES[self.frame_idx % len(SPINNER_FRAMES)]
|
||||
sys.stderr.write(f"\r{self.color}{frame}{Colors.RESET} {self.message} ")
|
||||
sys.stderr.flush()
|
||||
self.frame_idx += 1
|
||||
time.sleep(0.08)
|
||||
|
||||
def start(self):
|
||||
self.running = True
|
||||
if IS_TTY:
|
||||
# Real terminal - animate
|
||||
self.thread = threading.Thread(target=self._spin, daemon=True)
|
||||
self.thread.start()
|
||||
else:
|
||||
# Not a TTY (Claude Code) - just print once
|
||||
if not self.shown_static:
|
||||
sys.stderr.write(f"⏳ {self.message}\n")
|
||||
sys.stderr.flush()
|
||||
self.shown_static = True
|
||||
|
||||
def update(self, message: str):
|
||||
self.message = message
|
||||
if not IS_TTY and not self.shown_static:
|
||||
# Print update in non-TTY mode
|
||||
sys.stderr.write(f"⏳ {message}\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
def stop(self, final_message: str = ""):
|
||||
self.running = False
|
||||
if self.thread:
|
||||
self.thread.join(timeout=0.2)
|
||||
if IS_TTY:
|
||||
# Clear the line in real terminal
|
||||
sys.stderr.write("\r" + " " * 80 + "\r")
|
||||
if final_message:
|
||||
sys.stderr.write(f"✓ {final_message}\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
|
||||
class ProgressDisplay:
|
||||
"""Progress display for research phases."""
|
||||
|
||||
def __init__(self, topic: str, show_banner: bool = True):
|
||||
self.topic = topic
|
||||
self.spinner: Optional[Spinner] = None
|
||||
self.start_time = time.time()
|
||||
|
||||
if show_banner:
|
||||
self._show_banner()
|
||||
|
||||
def _show_banner(self):
|
||||
if IS_TTY:
|
||||
sys.stderr.write(MINI_BANNER + "\n")
|
||||
sys.stderr.write(f"{Colors.DIM}Topic: {Colors.RESET}{Colors.BOLD}{self.topic}{Colors.RESET}\n\n")
|
||||
else:
|
||||
# Simple text for non-TTY
|
||||
sys.stderr.write(f"/last30days · researching: {self.topic}\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
def start_reddit(self):
|
||||
msg = random.choice(REDDIT_MESSAGES)
|
||||
self.spinner = Spinner(f"{Colors.YELLOW}Reddit{Colors.RESET} {msg}", Colors.YELLOW)
|
||||
self.spinner.start()
|
||||
|
||||
def end_reddit(self, count: int):
|
||||
if self.spinner:
|
||||
self.spinner.stop(f"{Colors.YELLOW}Reddit{Colors.RESET} Found {count} threads")
|
||||
|
||||
def start_reddit_enrich(self, current: int, total: int):
|
||||
if self.spinner:
|
||||
self.spinner.stop()
|
||||
msg = random.choice(ENRICHING_MESSAGES)
|
||||
self.spinner = Spinner(f"{Colors.YELLOW}Reddit{Colors.RESET} [{current}/{total}] {msg}", Colors.YELLOW)
|
||||
self.spinner.start()
|
||||
|
||||
def update_reddit_enrich(self, current: int, total: int):
|
||||
if self.spinner:
|
||||
msg = random.choice(ENRICHING_MESSAGES)
|
||||
self.spinner.update(f"{Colors.YELLOW}Reddit{Colors.RESET} [{current}/{total}] {msg}")
|
||||
|
||||
def end_reddit_enrich(self):
|
||||
if self.spinner:
|
||||
self.spinner.stop(f"{Colors.YELLOW}Reddit{Colors.RESET} Enriched with engagement data")
|
||||
|
||||
def start_x(self):
|
||||
msg = random.choice(X_MESSAGES)
|
||||
self.spinner = Spinner(f"{Colors.CYAN}X{Colors.RESET} {msg}", Colors.CYAN)
|
||||
self.spinner.start()
|
||||
|
||||
def end_x(self, count: int):
|
||||
if self.spinner:
|
||||
self.spinner.stop(f"{Colors.CYAN}X{Colors.RESET} Found {count} posts")
|
||||
|
||||
def start_processing(self):
|
||||
msg = random.choice(PROCESSING_MESSAGES)
|
||||
self.spinner = Spinner(f"{Colors.PURPLE}Processing{Colors.RESET} {msg}", Colors.PURPLE)
|
||||
self.spinner.start()
|
||||
|
||||
def end_processing(self):
|
||||
if self.spinner:
|
||||
self.spinner.stop()
|
||||
|
||||
def show_complete(self, reddit_count: int, x_count: int):
|
||||
elapsed = time.time() - self.start_time
|
||||
if IS_TTY:
|
||||
sys.stderr.write(f"\n{Colors.GREEN}{Colors.BOLD}✓ Research complete{Colors.RESET} ")
|
||||
sys.stderr.write(f"{Colors.DIM}({elapsed:.1f}s){Colors.RESET}\n")
|
||||
sys.stderr.write(f" {Colors.YELLOW}Reddit:{Colors.RESET} {reddit_count} threads ")
|
||||
sys.stderr.write(f"{Colors.CYAN}X:{Colors.RESET} {x_count} posts\n\n")
|
||||
else:
|
||||
sys.stderr.write(f"✓ Research complete ({elapsed:.1f}s) - Reddit: {reddit_count} threads, X: {x_count} posts\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
def show_cached(self, age_hours: float = None):
|
||||
if age_hours is not None:
|
||||
age_str = f" ({age_hours:.1f}h old)"
|
||||
else:
|
||||
age_str = ""
|
||||
sys.stderr.write(f"{Colors.GREEN}⚡{Colors.RESET} {Colors.DIM}Using cached results{age_str} - use --refresh for fresh data{Colors.RESET}\n\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
def show_error(self, message: str):
|
||||
sys.stderr.write(f"{Colors.RED}✗ Error:{Colors.RESET} {message}\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
def start_web_only(self):
|
||||
"""Show web-only mode indicator."""
|
||||
msg = random.choice(WEB_ONLY_MESSAGES)
|
||||
self.spinner = Spinner(f"{Colors.GREEN}Web{Colors.RESET} {msg}", Colors.GREEN)
|
||||
self.spinner.start()
|
||||
|
||||
def end_web_only(self):
|
||||
"""End web-only spinner."""
|
||||
if self.spinner:
|
||||
self.spinner.stop(f"{Colors.GREEN}Web{Colors.RESET} Claude will search the web")
|
||||
|
||||
def show_web_only_complete(self):
|
||||
"""Show completion for web-only mode."""
|
||||
elapsed = time.time() - self.start_time
|
||||
if IS_TTY:
|
||||
sys.stderr.write(f"\n{Colors.GREEN}{Colors.BOLD}✓ Ready for web search{Colors.RESET} ")
|
||||
sys.stderr.write(f"{Colors.DIM}({elapsed:.1f}s){Colors.RESET}\n")
|
||||
sys.stderr.write(f" {Colors.GREEN}Web:{Colors.RESET} Claude will search blogs, docs & news\n\n")
|
||||
else:
|
||||
sys.stderr.write(f"✓ Ready for web search ({elapsed:.1f}s)\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
def show_promo(self, missing: str = "both"):
|
||||
"""Show promotional message for missing API keys.
|
||||
|
||||
Args:
|
||||
missing: 'both', 'reddit', or 'x' - which keys are missing
|
||||
"""
|
||||
if missing == "both":
|
||||
if IS_TTY:
|
||||
sys.stderr.write(PROMO_MESSAGE)
|
||||
else:
|
||||
sys.stderr.write(PROMO_MESSAGE_PLAIN)
|
||||
elif missing in PROMO_SINGLE_KEY:
|
||||
if IS_TTY:
|
||||
sys.stderr.write(PROMO_SINGLE_KEY[missing])
|
||||
else:
|
||||
sys.stderr.write(PROMO_SINGLE_KEY_PLAIN[missing])
|
||||
sys.stderr.flush()
|
||||
|
||||
|
||||
def print_phase(phase: str, message: str):
|
||||
"""Print a phase message."""
|
||||
colors = {
|
||||
"reddit": Colors.YELLOW,
|
||||
"x": Colors.CYAN,
|
||||
"process": Colors.PURPLE,
|
||||
"done": Colors.GREEN,
|
||||
"error": Colors.RED,
|
||||
}
|
||||
color = colors.get(phase, Colors.RESET)
|
||||
sys.stderr.write(f"{color}▸{Colors.RESET} {message}\n")
|
||||
sys.stderr.flush()
|
||||
401
skills/last30days/scripts/lib/websearch.py
Normal file
401
skills/last30days/scripts/lib/websearch.py
Normal file
@@ -0,0 +1,401 @@
|
||||
"""WebSearch module for last30days skill.
|
||||
|
||||
NOTE: WebSearch uses Claude's built-in WebSearch tool, which runs INSIDE Claude Code.
|
||||
Unlike Reddit/X which use external APIs, WebSearch results are obtained by Claude
|
||||
directly and passed to this module for normalization and scoring.
|
||||
|
||||
The typical flow is:
|
||||
1. Claude invokes WebSearch tool with the topic
|
||||
2. Claude passes results to parse_websearch_results()
|
||||
3. Results are normalized into WebSearchItem objects
|
||||
"""
|
||||
|
||||
import re
|
||||
from datetime import datetime, timedelta
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from . import schema
|
||||
|
||||
|
||||
# Month name mappings for date parsing
|
||||
MONTH_MAP = {
|
||||
"jan": 1, "january": 1,
|
||||
"feb": 2, "february": 2,
|
||||
"mar": 3, "march": 3,
|
||||
"apr": 4, "april": 4,
|
||||
"may": 5,
|
||||
"jun": 6, "june": 6,
|
||||
"jul": 7, "july": 7,
|
||||
"aug": 8, "august": 8,
|
||||
"sep": 9, "sept": 9, "september": 9,
|
||||
"oct": 10, "october": 10,
|
||||
"nov": 11, "november": 11,
|
||||
"dec": 12, "december": 12,
|
||||
}
|
||||
|
||||
|
||||
def extract_date_from_url(url: str) -> Optional[str]:
|
||||
"""Try to extract a date from URL path.
|
||||
|
||||
Many sites embed dates in URLs like:
|
||||
- /2026/01/24/article-title
|
||||
- /2026-01-24/article
|
||||
- /blog/20260124/title
|
||||
|
||||
Args:
|
||||
url: URL to parse
|
||||
|
||||
Returns:
|
||||
Date string in YYYY-MM-DD format, or None
|
||||
"""
|
||||
# Pattern 1: /YYYY/MM/DD/ (most common)
|
||||
match = re.search(r'/(\d{4})/(\d{2})/(\d{2})/', url)
|
||||
if match:
|
||||
year, month, day = match.groups()
|
||||
if 2020 <= int(year) <= 2030 and 1 <= int(month) <= 12 and 1 <= int(day) <= 31:
|
||||
return f"{year}-{month}-{day}"
|
||||
|
||||
# Pattern 2: /YYYY-MM-DD/ or /YYYY-MM-DD-
|
||||
match = re.search(r'/(\d{4})-(\d{2})-(\d{2})[-/]', url)
|
||||
if match:
|
||||
year, month, day = match.groups()
|
||||
if 2020 <= int(year) <= 2030 and 1 <= int(month) <= 12 and 1 <= int(day) <= 31:
|
||||
return f"{year}-{month}-{day}"
|
||||
|
||||
# Pattern 3: /YYYYMMDD/ (compact)
|
||||
match = re.search(r'/(\d{4})(\d{2})(\d{2})/', url)
|
||||
if match:
|
||||
year, month, day = match.groups()
|
||||
if 2020 <= int(year) <= 2030 and 1 <= int(month) <= 12 and 1 <= int(day) <= 31:
|
||||
return f"{year}-{month}-{day}"
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def extract_date_from_snippet(text: str) -> Optional[str]:
|
||||
"""Try to extract a date from text snippet or title.
|
||||
|
||||
Looks for patterns like:
|
||||
- January 24, 2026 or Jan 24, 2026
|
||||
- 24 January 2026
|
||||
- 2026-01-24
|
||||
- "3 days ago", "yesterday", "last week"
|
||||
|
||||
Args:
|
||||
text: Text to parse
|
||||
|
||||
Returns:
|
||||
Date string in YYYY-MM-DD format, or None
|
||||
"""
|
||||
if not text:
|
||||
return None
|
||||
|
||||
text_lower = text.lower()
|
||||
|
||||
# Pattern 1: Month DD, YYYY (e.g., "January 24, 2026")
|
||||
match = re.search(
|
||||
r'\b(jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|'
|
||||
r'jul(?:y)?|aug(?:ust)?|sep(?:t(?:ember)?)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)'
|
||||
r'\s+(\d{1,2})(?:st|nd|rd|th)?,?\s*(\d{4})\b',
|
||||
text_lower
|
||||
)
|
||||
if match:
|
||||
month_str, day, year = match.groups()
|
||||
month = MONTH_MAP.get(month_str[:3])
|
||||
if month and 2020 <= int(year) <= 2030 and 1 <= int(day) <= 31:
|
||||
return f"{year}-{month:02d}-{int(day):02d}"
|
||||
|
||||
# Pattern 2: DD Month YYYY (e.g., "24 January 2026")
|
||||
match = re.search(
|
||||
r'\b(\d{1,2})(?:st|nd|rd|th)?\s+'
|
||||
r'(jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|'
|
||||
r'jul(?:y)?|aug(?:ust)?|sep(?:t(?:ember)?)?|oct(?:ober)?|nov(?:ember)?|dec(?:ember)?)'
|
||||
r'\s+(\d{4})\b',
|
||||
text_lower
|
||||
)
|
||||
if match:
|
||||
day, month_str, year = match.groups()
|
||||
month = MONTH_MAP.get(month_str[:3])
|
||||
if month and 2020 <= int(year) <= 2030 and 1 <= int(day) <= 31:
|
||||
return f"{year}-{month:02d}-{int(day):02d}"
|
||||
|
||||
# Pattern 3: YYYY-MM-DD (ISO format)
|
||||
match = re.search(r'\b(\d{4})-(\d{2})-(\d{2})\b', text)
|
||||
if match:
|
||||
year, month, day = match.groups()
|
||||
if 2020 <= int(year) <= 2030 and 1 <= int(month) <= 12 and 1 <= int(day) <= 31:
|
||||
return f"{year}-{month}-{day}"
|
||||
|
||||
# Pattern 4: Relative dates ("3 days ago", "yesterday", etc.)
|
||||
today = datetime.now()
|
||||
|
||||
if "yesterday" in text_lower:
|
||||
date = today - timedelta(days=1)
|
||||
return date.strftime("%Y-%m-%d")
|
||||
|
||||
if "today" in text_lower:
|
||||
return today.strftime("%Y-%m-%d")
|
||||
|
||||
# "N days ago"
|
||||
match = re.search(r'\b(\d+)\s*days?\s*ago\b', text_lower)
|
||||
if match:
|
||||
days = int(match.group(1))
|
||||
if days <= 60: # Reasonable range
|
||||
date = today - timedelta(days=days)
|
||||
return date.strftime("%Y-%m-%d")
|
||||
|
||||
# "N hours ago" -> today
|
||||
match = re.search(r'\b(\d+)\s*hours?\s*ago\b', text_lower)
|
||||
if match:
|
||||
return today.strftime("%Y-%m-%d")
|
||||
|
||||
# "last week" -> ~7 days ago
|
||||
if "last week" in text_lower:
|
||||
date = today - timedelta(days=7)
|
||||
return date.strftime("%Y-%m-%d")
|
||||
|
||||
# "this week" -> ~3 days ago (middle of week)
|
||||
if "this week" in text_lower:
|
||||
date = today - timedelta(days=3)
|
||||
return date.strftime("%Y-%m-%d")
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def extract_date_signals(
|
||||
url: str,
|
||||
snippet: str,
|
||||
title: str,
|
||||
) -> Tuple[Optional[str], str]:
|
||||
"""Extract date from any available signal.
|
||||
|
||||
Tries URL first (most reliable), then snippet, then title.
|
||||
|
||||
Args:
|
||||
url: Page URL
|
||||
snippet: Page snippet/description
|
||||
title: Page title
|
||||
|
||||
Returns:
|
||||
Tuple of (date_string, confidence)
|
||||
- date from URL: 'high' confidence
|
||||
- date from snippet/title: 'med' confidence
|
||||
- no date found: None, 'low' confidence
|
||||
"""
|
||||
# Try URL first (most reliable)
|
||||
url_date = extract_date_from_url(url)
|
||||
if url_date:
|
||||
return url_date, "high"
|
||||
|
||||
# Try snippet
|
||||
snippet_date = extract_date_from_snippet(snippet)
|
||||
if snippet_date:
|
||||
return snippet_date, "med"
|
||||
|
||||
# Try title
|
||||
title_date = extract_date_from_snippet(title)
|
||||
if title_date:
|
||||
return title_date, "med"
|
||||
|
||||
return None, "low"
|
||||
|
||||
|
||||
# Domains to exclude (Reddit and X are handled separately)
|
||||
EXCLUDED_DOMAINS = {
|
||||
"reddit.com",
|
||||
"www.reddit.com",
|
||||
"old.reddit.com",
|
||||
"twitter.com",
|
||||
"www.twitter.com",
|
||||
"x.com",
|
||||
"www.x.com",
|
||||
"mobile.twitter.com",
|
||||
}
|
||||
|
||||
|
||||
def extract_domain(url: str) -> str:
|
||||
"""Extract the domain from a URL.
|
||||
|
||||
Args:
|
||||
url: Full URL
|
||||
|
||||
Returns:
|
||||
Domain string (e.g., "medium.com")
|
||||
"""
|
||||
try:
|
||||
parsed = urlparse(url)
|
||||
domain = parsed.netloc.lower()
|
||||
# Remove www. prefix for cleaner display
|
||||
if domain.startswith("www."):
|
||||
domain = domain[4:]
|
||||
return domain
|
||||
except Exception:
|
||||
return ""
|
||||
|
||||
|
||||
def is_excluded_domain(url: str) -> bool:
|
||||
"""Check if URL is from an excluded domain (Reddit/X).
|
||||
|
||||
Args:
|
||||
url: URL to check
|
||||
|
||||
Returns:
|
||||
True if URL should be excluded
|
||||
"""
|
||||
try:
|
||||
parsed = urlparse(url)
|
||||
domain = parsed.netloc.lower()
|
||||
return domain in EXCLUDED_DOMAINS
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def parse_websearch_results(
|
||||
results: List[Dict[str, Any]],
|
||||
topic: str,
|
||||
from_date: str = "",
|
||||
to_date: str = "",
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""Parse WebSearch results into normalized format.
|
||||
|
||||
This function expects results from Claude's WebSearch tool.
|
||||
Each result should have: title, url, snippet, and optionally date/relevance.
|
||||
|
||||
Uses "Date Detective" approach:
|
||||
1. Extract dates from URLs (high confidence)
|
||||
2. Extract dates from snippets/titles (med confidence)
|
||||
3. Hard filter: exclude items with verified old dates
|
||||
4. Keep items with no date signals (with low confidence penalty)
|
||||
|
||||
Args:
|
||||
results: List of WebSearch result dicts
|
||||
topic: Original search topic (for context)
|
||||
from_date: Start date for filtering (YYYY-MM-DD)
|
||||
to_date: End date for filtering (YYYY-MM-DD)
|
||||
|
||||
Returns:
|
||||
List of normalized item dicts ready for WebSearchItem creation
|
||||
"""
|
||||
items = []
|
||||
|
||||
for i, result in enumerate(results):
|
||||
if not isinstance(result, dict):
|
||||
continue
|
||||
|
||||
url = result.get("url", "")
|
||||
if not url:
|
||||
continue
|
||||
|
||||
# Skip Reddit/X URLs (handled separately)
|
||||
if is_excluded_domain(url):
|
||||
continue
|
||||
|
||||
title = str(result.get("title", "")).strip()
|
||||
snippet = str(result.get("snippet", result.get("description", ""))).strip()
|
||||
|
||||
if not title and not snippet:
|
||||
continue
|
||||
|
||||
# Use Date Detective to extract date signals
|
||||
date = result.get("date") # Use provided date if available
|
||||
date_confidence = "low"
|
||||
|
||||
if date and re.match(r'^\d{4}-\d{2}-\d{2}$', str(date)):
|
||||
# Provided date is valid
|
||||
date_confidence = "med"
|
||||
else:
|
||||
# Try to extract date from URL/snippet/title
|
||||
extracted_date, confidence = extract_date_signals(url, snippet, title)
|
||||
if extracted_date:
|
||||
date = extracted_date
|
||||
date_confidence = confidence
|
||||
|
||||
# Hard filter: if we found a date and it's too old, skip
|
||||
if date and from_date and date < from_date:
|
||||
continue # DROP - verified old content
|
||||
|
||||
# Hard filter: if date is in the future, skip (parsing error)
|
||||
if date and to_date and date > to_date:
|
||||
continue # DROP - future date
|
||||
|
||||
# Get relevance if provided, default to 0.5
|
||||
relevance = result.get("relevance", 0.5)
|
||||
try:
|
||||
relevance = min(1.0, max(0.0, float(relevance)))
|
||||
except (TypeError, ValueError):
|
||||
relevance = 0.5
|
||||
|
||||
item = {
|
||||
"id": f"W{i+1}",
|
||||
"title": title[:200], # Truncate long titles
|
||||
"url": url,
|
||||
"source_domain": extract_domain(url),
|
||||
"snippet": snippet[:500], # Truncate long snippets
|
||||
"date": date,
|
||||
"date_confidence": date_confidence,
|
||||
"relevance": relevance,
|
||||
"why_relevant": str(result.get("why_relevant", "")).strip(),
|
||||
}
|
||||
|
||||
items.append(item)
|
||||
|
||||
return items
|
||||
|
||||
|
||||
def normalize_websearch_items(
|
||||
items: List[Dict[str, Any]],
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
) -> List[schema.WebSearchItem]:
|
||||
"""Convert parsed dicts to WebSearchItem objects.
|
||||
|
||||
Args:
|
||||
items: List of parsed item dicts
|
||||
from_date: Start of date range (YYYY-MM-DD)
|
||||
to_date: End of date range (YYYY-MM-DD)
|
||||
|
||||
Returns:
|
||||
List of WebSearchItem objects
|
||||
"""
|
||||
result = []
|
||||
|
||||
for item in items:
|
||||
web_item = schema.WebSearchItem(
|
||||
id=item["id"],
|
||||
title=item["title"],
|
||||
url=item["url"],
|
||||
source_domain=item["source_domain"],
|
||||
snippet=item["snippet"],
|
||||
date=item.get("date"),
|
||||
date_confidence=item.get("date_confidence", "low"),
|
||||
relevance=item.get("relevance", 0.5),
|
||||
why_relevant=item.get("why_relevant", ""),
|
||||
)
|
||||
result.append(web_item)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def dedupe_websearch(items: List[schema.WebSearchItem]) -> List[schema.WebSearchItem]:
|
||||
"""Remove duplicate WebSearch items.
|
||||
|
||||
Deduplication is based on URL.
|
||||
|
||||
Args:
|
||||
items: List of WebSearchItem objects
|
||||
|
||||
Returns:
|
||||
Deduplicated list
|
||||
"""
|
||||
seen_urls = set()
|
||||
result = []
|
||||
|
||||
for item in items:
|
||||
# Normalize URL for comparison
|
||||
url_key = item.url.lower().rstrip("/")
|
||||
if url_key not in seen_urls:
|
||||
seen_urls.add(url_key)
|
||||
result.append(item)
|
||||
|
||||
return result
|
||||
217
skills/last30days/scripts/lib/xai_x.py
Normal file
217
skills/last30days/scripts/lib/xai_x.py
Normal file
@@ -0,0 +1,217 @@
|
||||
"""xAI API client for X (Twitter) discovery."""
|
||||
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from . import http
|
||||
|
||||
|
||||
def _log_error(msg: str):
|
||||
"""Log error to stderr."""
|
||||
sys.stderr.write(f"[X ERROR] {msg}\n")
|
||||
sys.stderr.flush()
|
||||
|
||||
# xAI uses responses endpoint with Agent Tools API
|
||||
XAI_RESPONSES_URL = "https://api.x.ai/v1/responses"
|
||||
|
||||
# Depth configurations: (min, max) posts to request
|
||||
DEPTH_CONFIG = {
|
||||
"quick": (8, 12),
|
||||
"default": (20, 30),
|
||||
"deep": (40, 60),
|
||||
}
|
||||
|
||||
X_SEARCH_PROMPT = """You have access to real-time X (Twitter) data. Search for posts about: {topic}
|
||||
|
||||
Focus on posts from {from_date} to {to_date}. Find {min_items}-{max_items} high-quality, relevant posts.
|
||||
|
||||
IMPORTANT: Return ONLY valid JSON in this exact format, no other text:
|
||||
{{
|
||||
"items": [
|
||||
{{
|
||||
"text": "Post text content (truncated if long)",
|
||||
"url": "https://x.com/user/status/...",
|
||||
"author_handle": "username",
|
||||
"date": "YYYY-MM-DD or null if unknown",
|
||||
"engagement": {{
|
||||
"likes": 100,
|
||||
"reposts": 25,
|
||||
"replies": 15,
|
||||
"quotes": 5
|
||||
}},
|
||||
"why_relevant": "Brief explanation of relevance",
|
||||
"relevance": 0.85
|
||||
}}
|
||||
]
|
||||
}}
|
||||
|
||||
Rules:
|
||||
- relevance is 0.0 to 1.0 (1.0 = highly relevant)
|
||||
- date must be YYYY-MM-DD format or null
|
||||
- engagement can be null if unknown
|
||||
- Include diverse voices/accounts if applicable
|
||||
- Prefer posts with substantive content, not just links"""
|
||||
|
||||
|
||||
def search_x(
|
||||
api_key: str,
|
||||
model: str,
|
||||
topic: str,
|
||||
from_date: str,
|
||||
to_date: str,
|
||||
depth: str = "default",
|
||||
mock_response: Optional[Dict] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Search X for relevant posts using xAI API with live search.
|
||||
|
||||
Args:
|
||||
api_key: xAI API key
|
||||
model: Model to use
|
||||
topic: Search topic
|
||||
from_date: Start date (YYYY-MM-DD)
|
||||
to_date: End date (YYYY-MM-DD)
|
||||
depth: Research depth - "quick", "default", or "deep"
|
||||
mock_response: Mock response for testing
|
||||
|
||||
Returns:
|
||||
Raw API response
|
||||
"""
|
||||
if mock_response is not None:
|
||||
return mock_response
|
||||
|
||||
min_items, max_items = DEPTH_CONFIG.get(depth, DEPTH_CONFIG["default"])
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
# Adjust timeout based on depth (generous for API response time)
|
||||
timeout = 90 if depth == "quick" else 120 if depth == "default" else 180
|
||||
|
||||
# Use Agent Tools API with x_search tool
|
||||
payload = {
|
||||
"model": model,
|
||||
"tools": [
|
||||
{"type": "x_search"}
|
||||
],
|
||||
"input": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": X_SEARCH_PROMPT.format(
|
||||
topic=topic,
|
||||
from_date=from_date,
|
||||
to_date=to_date,
|
||||
min_items=min_items,
|
||||
max_items=max_items,
|
||||
),
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
return http.post(XAI_RESPONSES_URL, payload, headers=headers, timeout=timeout)
|
||||
|
||||
|
||||
def parse_x_response(response: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""Parse xAI response to extract X items.
|
||||
|
||||
Args:
|
||||
response: Raw API response
|
||||
|
||||
Returns:
|
||||
List of item dicts
|
||||
"""
|
||||
items = []
|
||||
|
||||
# Check for API errors first
|
||||
if "error" in response and response["error"]:
|
||||
error = response["error"]
|
||||
err_msg = error.get("message", str(error)) if isinstance(error, dict) else str(error)
|
||||
_log_error(f"xAI API error: {err_msg}")
|
||||
if http.DEBUG:
|
||||
_log_error(f"Full error response: {json.dumps(response, indent=2)[:1000]}")
|
||||
return items
|
||||
|
||||
# Try to find the output text
|
||||
output_text = ""
|
||||
if "output" in response:
|
||||
output = response["output"]
|
||||
if isinstance(output, str):
|
||||
output_text = output
|
||||
elif isinstance(output, list):
|
||||
for item in output:
|
||||
if isinstance(item, dict):
|
||||
if item.get("type") == "message":
|
||||
content = item.get("content", [])
|
||||
for c in content:
|
||||
if isinstance(c, dict) and c.get("type") == "output_text":
|
||||
output_text = c.get("text", "")
|
||||
break
|
||||
elif "text" in item:
|
||||
output_text = item["text"]
|
||||
elif isinstance(item, str):
|
||||
output_text = item
|
||||
if output_text:
|
||||
break
|
||||
|
||||
# Also check for choices (older format)
|
||||
if not output_text and "choices" in response:
|
||||
for choice in response["choices"]:
|
||||
if "message" in choice:
|
||||
output_text = choice["message"].get("content", "")
|
||||
break
|
||||
|
||||
if not output_text:
|
||||
return items
|
||||
|
||||
# Extract JSON from the response
|
||||
json_match = re.search(r'\{[\s\S]*"items"[\s\S]*\}', output_text)
|
||||
if json_match:
|
||||
try:
|
||||
data = json.loads(json_match.group())
|
||||
items = data.get("items", [])
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Validate and clean items
|
||||
clean_items = []
|
||||
for i, item in enumerate(items):
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
|
||||
url = item.get("url", "")
|
||||
if not url:
|
||||
continue
|
||||
|
||||
# Parse engagement
|
||||
engagement = None
|
||||
eng_raw = item.get("engagement")
|
||||
if isinstance(eng_raw, dict):
|
||||
engagement = {
|
||||
"likes": int(eng_raw.get("likes", 0)) if eng_raw.get("likes") else None,
|
||||
"reposts": int(eng_raw.get("reposts", 0)) if eng_raw.get("reposts") else None,
|
||||
"replies": int(eng_raw.get("replies", 0)) if eng_raw.get("replies") else None,
|
||||
"quotes": int(eng_raw.get("quotes", 0)) if eng_raw.get("quotes") else None,
|
||||
}
|
||||
|
||||
clean_item = {
|
||||
"id": f"X{i+1}",
|
||||
"text": str(item.get("text", "")).strip()[:500], # Truncate long text
|
||||
"url": url,
|
||||
"author_handle": str(item.get("author_handle", "")).strip().lstrip("@"),
|
||||
"date": item.get("date"),
|
||||
"engagement": engagement,
|
||||
"why_relevant": str(item.get("why_relevant", "")).strip(),
|
||||
"relevance": min(1.0, max(0.0, float(item.get("relevance", 0.5)))),
|
||||
}
|
||||
|
||||
# Validate date format
|
||||
if clean_item["date"]:
|
||||
if not re.match(r'^\d{4}-\d{2}-\d{2}$', str(clean_item["date"])):
|
||||
clean_item["date"] = None
|
||||
|
||||
clean_items.append(clean_item)
|
||||
|
||||
return clean_items
|
||||
1
skills/last30days/tests/__init__.py
Normal file
1
skills/last30days/tests/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# last30days tests
|
||||
59
skills/last30days/tests/test_cache.py
Normal file
59
skills/last30days/tests/test_cache.py
Normal file
@@ -0,0 +1,59 @@
|
||||
"""Tests for cache module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import cache
|
||||
|
||||
|
||||
class TestGetCacheKey(unittest.TestCase):
|
||||
def test_returns_string(self):
|
||||
result = cache.get_cache_key("test topic", "2026-01-01", "2026-01-31", "both")
|
||||
self.assertIsInstance(result, str)
|
||||
|
||||
def test_consistent_for_same_inputs(self):
|
||||
key1 = cache.get_cache_key("test topic", "2026-01-01", "2026-01-31", "both")
|
||||
key2 = cache.get_cache_key("test topic", "2026-01-01", "2026-01-31", "both")
|
||||
self.assertEqual(key1, key2)
|
||||
|
||||
def test_different_for_different_inputs(self):
|
||||
key1 = cache.get_cache_key("topic a", "2026-01-01", "2026-01-31", "both")
|
||||
key2 = cache.get_cache_key("topic b", "2026-01-01", "2026-01-31", "both")
|
||||
self.assertNotEqual(key1, key2)
|
||||
|
||||
def test_key_length(self):
|
||||
key = cache.get_cache_key("test", "2026-01-01", "2026-01-31", "both")
|
||||
self.assertEqual(len(key), 16)
|
||||
|
||||
|
||||
class TestCachePath(unittest.TestCase):
|
||||
def test_returns_path(self):
|
||||
result = cache.get_cache_path("abc123")
|
||||
self.assertIsInstance(result, Path)
|
||||
|
||||
def test_has_json_extension(self):
|
||||
result = cache.get_cache_path("abc123")
|
||||
self.assertEqual(result.suffix, ".json")
|
||||
|
||||
|
||||
class TestCacheValidity(unittest.TestCase):
|
||||
def test_nonexistent_file_is_invalid(self):
|
||||
fake_path = Path("/nonexistent/path/file.json")
|
||||
result = cache.is_cache_valid(fake_path)
|
||||
self.assertFalse(result)
|
||||
|
||||
|
||||
class TestModelCache(unittest.TestCase):
|
||||
def test_get_cached_model_returns_none_for_missing(self):
|
||||
# Clear any existing cache first
|
||||
result = cache.get_cached_model("nonexistent_provider")
|
||||
# May be None or a cached value, but should not error
|
||||
self.assertTrue(result is None or isinstance(result, str))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
114
skills/last30days/tests/test_dates.py
Normal file
114
skills/last30days/tests/test_dates.py
Normal file
@@ -0,0 +1,114 @@
|
||||
"""Tests for dates module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import dates
|
||||
|
||||
|
||||
class TestGetDateRange(unittest.TestCase):
|
||||
def test_returns_tuple_of_two_strings(self):
|
||||
from_date, to_date = dates.get_date_range(30)
|
||||
self.assertIsInstance(from_date, str)
|
||||
self.assertIsInstance(to_date, str)
|
||||
|
||||
def test_date_format(self):
|
||||
from_date, to_date = dates.get_date_range(30)
|
||||
# Should be YYYY-MM-DD format
|
||||
self.assertRegex(from_date, r'^\d{4}-\d{2}-\d{2}$')
|
||||
self.assertRegex(to_date, r'^\d{4}-\d{2}-\d{2}$')
|
||||
|
||||
def test_range_is_correct_days(self):
|
||||
from_date, to_date = dates.get_date_range(30)
|
||||
start = datetime.strptime(from_date, "%Y-%m-%d")
|
||||
end = datetime.strptime(to_date, "%Y-%m-%d")
|
||||
delta = end - start
|
||||
self.assertEqual(delta.days, 30)
|
||||
|
||||
|
||||
class TestParseDate(unittest.TestCase):
|
||||
def test_parse_iso_date(self):
|
||||
result = dates.parse_date("2026-01-15")
|
||||
self.assertIsNotNone(result)
|
||||
self.assertEqual(result.year, 2026)
|
||||
self.assertEqual(result.month, 1)
|
||||
self.assertEqual(result.day, 15)
|
||||
|
||||
def test_parse_timestamp(self):
|
||||
# Unix timestamp for 2026-01-15 00:00:00 UTC
|
||||
result = dates.parse_date("1768435200")
|
||||
self.assertIsNotNone(result)
|
||||
|
||||
def test_parse_none(self):
|
||||
result = dates.parse_date(None)
|
||||
self.assertIsNone(result)
|
||||
|
||||
def test_parse_empty_string(self):
|
||||
result = dates.parse_date("")
|
||||
self.assertIsNone(result)
|
||||
|
||||
|
||||
class TestTimestampToDate(unittest.TestCase):
|
||||
def test_valid_timestamp(self):
|
||||
# 2026-01-15 00:00:00 UTC
|
||||
result = dates.timestamp_to_date(1768435200)
|
||||
self.assertEqual(result, "2026-01-15")
|
||||
|
||||
def test_none_timestamp(self):
|
||||
result = dates.timestamp_to_date(None)
|
||||
self.assertIsNone(result)
|
||||
|
||||
|
||||
class TestGetDateConfidence(unittest.TestCase):
|
||||
def test_high_confidence_in_range(self):
|
||||
result = dates.get_date_confidence("2026-01-15", "2026-01-01", "2026-01-31")
|
||||
self.assertEqual(result, "high")
|
||||
|
||||
def test_low_confidence_before_range(self):
|
||||
result = dates.get_date_confidence("2025-12-15", "2026-01-01", "2026-01-31")
|
||||
self.assertEqual(result, "low")
|
||||
|
||||
def test_low_confidence_no_date(self):
|
||||
result = dates.get_date_confidence(None, "2026-01-01", "2026-01-31")
|
||||
self.assertEqual(result, "low")
|
||||
|
||||
|
||||
class TestDaysAgo(unittest.TestCase):
|
||||
def test_today(self):
|
||||
today = datetime.now(timezone.utc).date().isoformat()
|
||||
result = dates.days_ago(today)
|
||||
self.assertEqual(result, 0)
|
||||
|
||||
def test_none_date(self):
|
||||
result = dates.days_ago(None)
|
||||
self.assertIsNone(result)
|
||||
|
||||
|
||||
class TestRecencyScore(unittest.TestCase):
|
||||
def test_today_is_100(self):
|
||||
today = datetime.now(timezone.utc).date().isoformat()
|
||||
result = dates.recency_score(today)
|
||||
self.assertEqual(result, 100)
|
||||
|
||||
def test_30_days_ago_is_0(self):
|
||||
old_date = (datetime.now(timezone.utc).date() - timedelta(days=30)).isoformat()
|
||||
result = dates.recency_score(old_date)
|
||||
self.assertEqual(result, 0)
|
||||
|
||||
def test_15_days_ago_is_50(self):
|
||||
mid_date = (datetime.now(timezone.utc).date() - timedelta(days=15)).isoformat()
|
||||
result = dates.recency_score(mid_date)
|
||||
self.assertEqual(result, 50)
|
||||
|
||||
def test_none_date_is_0(self):
|
||||
result = dates.recency_score(None)
|
||||
self.assertEqual(result, 0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
111
skills/last30days/tests/test_dedupe.py
Normal file
111
skills/last30days/tests/test_dedupe.py
Normal file
@@ -0,0 +1,111 @@
|
||||
"""Tests for dedupe module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import dedupe, schema
|
||||
|
||||
|
||||
class TestNormalizeText(unittest.TestCase):
|
||||
def test_lowercase(self):
|
||||
result = dedupe.normalize_text("HELLO World")
|
||||
self.assertEqual(result, "hello world")
|
||||
|
||||
def test_removes_punctuation(self):
|
||||
result = dedupe.normalize_text("Hello, World!")
|
||||
# Punctuation replaced with space, then whitespace collapsed
|
||||
self.assertEqual(result, "hello world")
|
||||
|
||||
def test_collapses_whitespace(self):
|
||||
result = dedupe.normalize_text("hello world")
|
||||
self.assertEqual(result, "hello world")
|
||||
|
||||
|
||||
class TestGetNgrams(unittest.TestCase):
|
||||
def test_short_text(self):
|
||||
result = dedupe.get_ngrams("ab", n=3)
|
||||
self.assertEqual(result, {"ab"})
|
||||
|
||||
def test_normal_text(self):
|
||||
result = dedupe.get_ngrams("hello", n=3)
|
||||
self.assertIn("hel", result)
|
||||
self.assertIn("ell", result)
|
||||
self.assertIn("llo", result)
|
||||
|
||||
|
||||
class TestJaccardSimilarity(unittest.TestCase):
|
||||
def test_identical_sets(self):
|
||||
set1 = {"a", "b", "c"}
|
||||
result = dedupe.jaccard_similarity(set1, set1)
|
||||
self.assertEqual(result, 1.0)
|
||||
|
||||
def test_disjoint_sets(self):
|
||||
set1 = {"a", "b", "c"}
|
||||
set2 = {"d", "e", "f"}
|
||||
result = dedupe.jaccard_similarity(set1, set2)
|
||||
self.assertEqual(result, 0.0)
|
||||
|
||||
def test_partial_overlap(self):
|
||||
set1 = {"a", "b", "c"}
|
||||
set2 = {"b", "c", "d"}
|
||||
result = dedupe.jaccard_similarity(set1, set2)
|
||||
self.assertEqual(result, 0.5) # 2 overlap / 4 union
|
||||
|
||||
def test_empty_sets(self):
|
||||
result = dedupe.jaccard_similarity(set(), set())
|
||||
self.assertEqual(result, 0.0)
|
||||
|
||||
|
||||
class TestFindDuplicates(unittest.TestCase):
|
||||
def test_no_duplicates(self):
|
||||
items = [
|
||||
schema.RedditItem(id="R1", title="Completely different topic A", url="", subreddit=""),
|
||||
schema.RedditItem(id="R2", title="Another unrelated subject B", url="", subreddit=""),
|
||||
]
|
||||
result = dedupe.find_duplicates(items)
|
||||
self.assertEqual(result, [])
|
||||
|
||||
def test_finds_duplicates(self):
|
||||
items = [
|
||||
schema.RedditItem(id="R1", title="Best practices for Claude Code skills", url="", subreddit=""),
|
||||
schema.RedditItem(id="R2", title="Best practices for Claude Code skills guide", url="", subreddit=""),
|
||||
]
|
||||
result = dedupe.find_duplicates(items, threshold=0.7)
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertEqual(result[0], (0, 1))
|
||||
|
||||
|
||||
class TestDedupeItems(unittest.TestCase):
|
||||
def test_keeps_higher_scored(self):
|
||||
items = [
|
||||
schema.RedditItem(id="R1", title="Best practices for skills", url="", subreddit="", score=90),
|
||||
schema.RedditItem(id="R2", title="Best practices for skills guide", url="", subreddit="", score=50),
|
||||
]
|
||||
result = dedupe.dedupe_items(items, threshold=0.6)
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertEqual(result[0].id, "R1")
|
||||
|
||||
def test_keeps_all_unique(self):
|
||||
items = [
|
||||
schema.RedditItem(id="R1", title="Topic about apples", url="", subreddit="", score=90),
|
||||
schema.RedditItem(id="R2", title="Discussion of oranges", url="", subreddit="", score=50),
|
||||
]
|
||||
result = dedupe.dedupe_items(items)
|
||||
self.assertEqual(len(result), 2)
|
||||
|
||||
def test_empty_list(self):
|
||||
result = dedupe.dedupe_items([])
|
||||
self.assertEqual(result, [])
|
||||
|
||||
def test_single_item(self):
|
||||
items = [schema.RedditItem(id="R1", title="Test", url="", subreddit="")]
|
||||
result = dedupe.dedupe_items(items)
|
||||
self.assertEqual(len(result), 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
135
skills/last30days/tests/test_models.py
Normal file
135
skills/last30days/tests/test_models.py
Normal file
@@ -0,0 +1,135 @@
|
||||
"""Tests for models module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import models
|
||||
|
||||
|
||||
class TestParseVersion(unittest.TestCase):
|
||||
def test_simple_version(self):
|
||||
result = models.parse_version("gpt-5")
|
||||
self.assertEqual(result, (5,))
|
||||
|
||||
def test_minor_version(self):
|
||||
result = models.parse_version("gpt-5.2")
|
||||
self.assertEqual(result, (5, 2))
|
||||
|
||||
def test_patch_version(self):
|
||||
result = models.parse_version("gpt-5.2.1")
|
||||
self.assertEqual(result, (5, 2, 1))
|
||||
|
||||
def test_no_version(self):
|
||||
result = models.parse_version("custom-model")
|
||||
self.assertIsNone(result)
|
||||
|
||||
|
||||
class TestIsMainlineOpenAIModel(unittest.TestCase):
|
||||
def test_gpt5_is_mainline(self):
|
||||
self.assertTrue(models.is_mainline_openai_model("gpt-5"))
|
||||
|
||||
def test_gpt52_is_mainline(self):
|
||||
self.assertTrue(models.is_mainline_openai_model("gpt-5.2"))
|
||||
|
||||
def test_gpt5_mini_is_not_mainline(self):
|
||||
self.assertFalse(models.is_mainline_openai_model("gpt-5-mini"))
|
||||
|
||||
def test_gpt4_is_not_mainline(self):
|
||||
self.assertFalse(models.is_mainline_openai_model("gpt-4"))
|
||||
|
||||
|
||||
class TestSelectOpenAIModel(unittest.TestCase):
|
||||
def test_pinned_policy(self):
|
||||
result = models.select_openai_model(
|
||||
"fake-key",
|
||||
policy="pinned",
|
||||
pin="gpt-5.1"
|
||||
)
|
||||
self.assertEqual(result, "gpt-5.1")
|
||||
|
||||
def test_auto_with_mock_models(self):
|
||||
mock_models = [
|
||||
{"id": "gpt-5.2", "created": 1704067200},
|
||||
{"id": "gpt-5.1", "created": 1701388800},
|
||||
{"id": "gpt-5", "created": 1698710400},
|
||||
]
|
||||
result = models.select_openai_model(
|
||||
"fake-key",
|
||||
policy="auto",
|
||||
mock_models=mock_models
|
||||
)
|
||||
self.assertEqual(result, "gpt-5.2")
|
||||
|
||||
def test_auto_filters_variants(self):
|
||||
mock_models = [
|
||||
{"id": "gpt-5.2", "created": 1704067200},
|
||||
{"id": "gpt-5-mini", "created": 1704067200},
|
||||
{"id": "gpt-5.1", "created": 1701388800},
|
||||
]
|
||||
result = models.select_openai_model(
|
||||
"fake-key",
|
||||
policy="auto",
|
||||
mock_models=mock_models
|
||||
)
|
||||
self.assertEqual(result, "gpt-5.2")
|
||||
|
||||
|
||||
class TestSelectXAIModel(unittest.TestCase):
|
||||
def test_latest_policy(self):
|
||||
result = models.select_xai_model(
|
||||
"fake-key",
|
||||
policy="latest"
|
||||
)
|
||||
self.assertEqual(result, "grok-4-latest")
|
||||
|
||||
def test_stable_policy(self):
|
||||
# Clear cache first to avoid interference
|
||||
from lib import cache
|
||||
cache.MODEL_CACHE_FILE.unlink(missing_ok=True)
|
||||
result = models.select_xai_model(
|
||||
"fake-key",
|
||||
policy="stable"
|
||||
)
|
||||
self.assertEqual(result, "grok-4")
|
||||
|
||||
def test_pinned_policy(self):
|
||||
result = models.select_xai_model(
|
||||
"fake-key",
|
||||
policy="pinned",
|
||||
pin="grok-3"
|
||||
)
|
||||
self.assertEqual(result, "grok-3")
|
||||
|
||||
|
||||
class TestGetModels(unittest.TestCase):
|
||||
def test_no_keys_returns_none(self):
|
||||
config = {}
|
||||
result = models.get_models(config)
|
||||
self.assertIsNone(result["openai"])
|
||||
self.assertIsNone(result["xai"])
|
||||
|
||||
def test_openai_key_only(self):
|
||||
config = {"OPENAI_API_KEY": "sk-test"}
|
||||
mock_models = [{"id": "gpt-5.2", "created": 1704067200}]
|
||||
result = models.get_models(config, mock_openai_models=mock_models)
|
||||
self.assertEqual(result["openai"], "gpt-5.2")
|
||||
self.assertIsNone(result["xai"])
|
||||
|
||||
def test_both_keys(self):
|
||||
config = {
|
||||
"OPENAI_API_KEY": "sk-test",
|
||||
"XAI_API_KEY": "xai-test",
|
||||
}
|
||||
mock_openai = [{"id": "gpt-5.2", "created": 1704067200}]
|
||||
mock_xai = [{"id": "grok-4-latest", "created": 1704067200}]
|
||||
result = models.get_models(config, mock_openai, mock_xai)
|
||||
self.assertEqual(result["openai"], "gpt-5.2")
|
||||
self.assertEqual(result["xai"], "grok-4-latest")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
138
skills/last30days/tests/test_normalize.py
Normal file
138
skills/last30days/tests/test_normalize.py
Normal file
@@ -0,0 +1,138 @@
|
||||
"""Tests for normalize module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import normalize, schema
|
||||
|
||||
|
||||
class TestNormalizeRedditItems(unittest.TestCase):
|
||||
def test_normalizes_basic_item(self):
|
||||
items = [
|
||||
{
|
||||
"id": "R1",
|
||||
"title": "Test Thread",
|
||||
"url": "https://reddit.com/r/test/1",
|
||||
"subreddit": "test",
|
||||
"date": "2026-01-15",
|
||||
"why_relevant": "Relevant because...",
|
||||
"relevance": 0.85,
|
||||
}
|
||||
]
|
||||
|
||||
result = normalize.normalize_reddit_items(items, "2026-01-01", "2026-01-31")
|
||||
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIsInstance(result[0], schema.RedditItem)
|
||||
self.assertEqual(result[0].id, "R1")
|
||||
self.assertEqual(result[0].title, "Test Thread")
|
||||
self.assertEqual(result[0].date_confidence, "high")
|
||||
|
||||
def test_sets_low_confidence_for_old_date(self):
|
||||
items = [
|
||||
{
|
||||
"id": "R1",
|
||||
"title": "Old Thread",
|
||||
"url": "https://reddit.com/r/test/1",
|
||||
"subreddit": "test",
|
||||
"date": "2025-12-01", # Before range
|
||||
"relevance": 0.5,
|
||||
}
|
||||
]
|
||||
|
||||
result = normalize.normalize_reddit_items(items, "2026-01-01", "2026-01-31")
|
||||
|
||||
self.assertEqual(result[0].date_confidence, "low")
|
||||
|
||||
def test_handles_engagement(self):
|
||||
items = [
|
||||
{
|
||||
"id": "R1",
|
||||
"title": "Thread with engagement",
|
||||
"url": "https://reddit.com/r/test/1",
|
||||
"subreddit": "test",
|
||||
"engagement": {
|
||||
"score": 100,
|
||||
"num_comments": 50,
|
||||
"upvote_ratio": 0.9,
|
||||
},
|
||||
"relevance": 0.5,
|
||||
}
|
||||
]
|
||||
|
||||
result = normalize.normalize_reddit_items(items, "2026-01-01", "2026-01-31")
|
||||
|
||||
self.assertIsNotNone(result[0].engagement)
|
||||
self.assertEqual(result[0].engagement.score, 100)
|
||||
self.assertEqual(result[0].engagement.num_comments, 50)
|
||||
|
||||
|
||||
class TestNormalizeXItems(unittest.TestCase):
|
||||
def test_normalizes_basic_item(self):
|
||||
items = [
|
||||
{
|
||||
"id": "X1",
|
||||
"text": "Test post content",
|
||||
"url": "https://x.com/user/status/123",
|
||||
"author_handle": "testuser",
|
||||
"date": "2026-01-15",
|
||||
"why_relevant": "Relevant because...",
|
||||
"relevance": 0.9,
|
||||
}
|
||||
]
|
||||
|
||||
result = normalize.normalize_x_items(items, "2026-01-01", "2026-01-31")
|
||||
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIsInstance(result[0], schema.XItem)
|
||||
self.assertEqual(result[0].id, "X1")
|
||||
self.assertEqual(result[0].author_handle, "testuser")
|
||||
|
||||
def test_handles_x_engagement(self):
|
||||
items = [
|
||||
{
|
||||
"id": "X1",
|
||||
"text": "Post with engagement",
|
||||
"url": "https://x.com/user/status/123",
|
||||
"author_handle": "user",
|
||||
"engagement": {
|
||||
"likes": 100,
|
||||
"reposts": 25,
|
||||
"replies": 15,
|
||||
"quotes": 5,
|
||||
},
|
||||
"relevance": 0.5,
|
||||
}
|
||||
]
|
||||
|
||||
result = normalize.normalize_x_items(items, "2026-01-01", "2026-01-31")
|
||||
|
||||
self.assertIsNotNone(result[0].engagement)
|
||||
self.assertEqual(result[0].engagement.likes, 100)
|
||||
self.assertEqual(result[0].engagement.reposts, 25)
|
||||
|
||||
|
||||
class TestItemsToDicts(unittest.TestCase):
|
||||
def test_converts_items(self):
|
||||
items = [
|
||||
schema.RedditItem(
|
||||
id="R1",
|
||||
title="Test",
|
||||
url="https://reddit.com/r/test/1",
|
||||
subreddit="test",
|
||||
)
|
||||
]
|
||||
|
||||
result = normalize.items_to_dicts(items)
|
||||
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertIsInstance(result[0], dict)
|
||||
self.assertEqual(result[0]["id"], "R1")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
116
skills/last30days/tests/test_render.py
Normal file
116
skills/last30days/tests/test_render.py
Normal file
@@ -0,0 +1,116 @@
|
||||
"""Tests for render module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import render, schema
|
||||
|
||||
|
||||
class TestRenderCompact(unittest.TestCase):
|
||||
def test_renders_basic_report(self):
|
||||
report = schema.Report(
|
||||
topic="test topic",
|
||||
range_from="2026-01-01",
|
||||
range_to="2026-01-31",
|
||||
generated_at="2026-01-31T12:00:00Z",
|
||||
mode="both",
|
||||
openai_model_used="gpt-5.2",
|
||||
xai_model_used="grok-4-latest",
|
||||
)
|
||||
|
||||
result = render.render_compact(report)
|
||||
|
||||
self.assertIn("test topic", result)
|
||||
self.assertIn("2026-01-01", result)
|
||||
self.assertIn("both", result)
|
||||
self.assertIn("gpt-5.2", result)
|
||||
|
||||
def test_renders_reddit_items(self):
|
||||
report = schema.Report(
|
||||
topic="test",
|
||||
range_from="2026-01-01",
|
||||
range_to="2026-01-31",
|
||||
generated_at="2026-01-31T12:00:00Z",
|
||||
mode="reddit-only",
|
||||
reddit=[
|
||||
schema.RedditItem(
|
||||
id="R1",
|
||||
title="Test Thread",
|
||||
url="https://reddit.com/r/test/1",
|
||||
subreddit="test",
|
||||
date="2026-01-15",
|
||||
date_confidence="high",
|
||||
score=85,
|
||||
why_relevant="Very relevant",
|
||||
)
|
||||
],
|
||||
)
|
||||
|
||||
result = render.render_compact(report)
|
||||
|
||||
self.assertIn("R1", result)
|
||||
self.assertIn("Test Thread", result)
|
||||
self.assertIn("r/test", result)
|
||||
|
||||
def test_shows_coverage_tip_for_reddit_only(self):
|
||||
report = schema.Report(
|
||||
topic="test",
|
||||
range_from="2026-01-01",
|
||||
range_to="2026-01-31",
|
||||
generated_at="2026-01-31T12:00:00Z",
|
||||
mode="reddit-only",
|
||||
)
|
||||
|
||||
result = render.render_compact(report)
|
||||
|
||||
self.assertIn("xAI key", result)
|
||||
|
||||
|
||||
class TestRenderContextSnippet(unittest.TestCase):
|
||||
def test_renders_snippet(self):
|
||||
report = schema.Report(
|
||||
topic="Claude Code Skills",
|
||||
range_from="2026-01-01",
|
||||
range_to="2026-01-31",
|
||||
generated_at="2026-01-31T12:00:00Z",
|
||||
mode="both",
|
||||
)
|
||||
|
||||
result = render.render_context_snippet(report)
|
||||
|
||||
self.assertIn("Claude Code Skills", result)
|
||||
self.assertIn("Last 30 Days", result)
|
||||
|
||||
|
||||
class TestRenderFullReport(unittest.TestCase):
|
||||
def test_renders_full_report(self):
|
||||
report = schema.Report(
|
||||
topic="test topic",
|
||||
range_from="2026-01-01",
|
||||
range_to="2026-01-31",
|
||||
generated_at="2026-01-31T12:00:00Z",
|
||||
mode="both",
|
||||
openai_model_used="gpt-5.2",
|
||||
xai_model_used="grok-4-latest",
|
||||
)
|
||||
|
||||
result = render.render_full_report(report)
|
||||
|
||||
self.assertIn("# test topic", result)
|
||||
self.assertIn("## Models Used", result)
|
||||
self.assertIn("gpt-5.2", result)
|
||||
|
||||
|
||||
class TestGetContextPath(unittest.TestCase):
|
||||
def test_returns_path_string(self):
|
||||
result = render.get_context_path()
|
||||
self.assertIsInstance(result, str)
|
||||
self.assertIn("last30days.context.md", result)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
168
skills/last30days/tests/test_score.py
Normal file
168
skills/last30days/tests/test_score.py
Normal file
@@ -0,0 +1,168 @@
|
||||
"""Tests for score module."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# Add lib to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
|
||||
from lib import schema, score
|
||||
|
||||
|
||||
class TestLog1pSafe(unittest.TestCase):
|
||||
def test_positive_value(self):
|
||||
result = score.log1p_safe(100)
|
||||
self.assertGreater(result, 0)
|
||||
|
||||
def test_zero(self):
|
||||
result = score.log1p_safe(0)
|
||||
self.assertEqual(result, 0)
|
||||
|
||||
def test_none(self):
|
||||
result = score.log1p_safe(None)
|
||||
self.assertEqual(result, 0)
|
||||
|
||||
def test_negative(self):
|
||||
result = score.log1p_safe(-5)
|
||||
self.assertEqual(result, 0)
|
||||
|
||||
|
||||
class TestComputeRedditEngagementRaw(unittest.TestCase):
|
||||
def test_with_engagement(self):
|
||||
eng = schema.Engagement(score=100, num_comments=50, upvote_ratio=0.9)
|
||||
result = score.compute_reddit_engagement_raw(eng)
|
||||
self.assertIsNotNone(result)
|
||||
self.assertGreater(result, 0)
|
||||
|
||||
def test_without_engagement(self):
|
||||
result = score.compute_reddit_engagement_raw(None)
|
||||
self.assertIsNone(result)
|
||||
|
||||
def test_empty_engagement(self):
|
||||
eng = schema.Engagement()
|
||||
result = score.compute_reddit_engagement_raw(eng)
|
||||
self.assertIsNone(result)
|
||||
|
||||
|
||||
class TestComputeXEngagementRaw(unittest.TestCase):
|
||||
def test_with_engagement(self):
|
||||
eng = schema.Engagement(likes=100, reposts=25, replies=15, quotes=5)
|
||||
result = score.compute_x_engagement_raw(eng)
|
||||
self.assertIsNotNone(result)
|
||||
self.assertGreater(result, 0)
|
||||
|
||||
def test_without_engagement(self):
|
||||
result = score.compute_x_engagement_raw(None)
|
||||
self.assertIsNone(result)
|
||||
|
||||
|
||||
class TestNormalizeTo100(unittest.TestCase):
|
||||
def test_normalizes_values(self):
|
||||
values = [0, 50, 100]
|
||||
result = score.normalize_to_100(values)
|
||||
self.assertEqual(result[0], 0)
|
||||
self.assertEqual(result[1], 50)
|
||||
self.assertEqual(result[2], 100)
|
||||
|
||||
def test_handles_none(self):
|
||||
values = [0, None, 100]
|
||||
result = score.normalize_to_100(values)
|
||||
self.assertIsNone(result[1])
|
||||
|
||||
def test_single_value(self):
|
||||
values = [50]
|
||||
result = score.normalize_to_100(values)
|
||||
self.assertEqual(result[0], 50)
|
||||
|
||||
|
||||
class TestScoreRedditItems(unittest.TestCase):
|
||||
def test_scores_items(self):
|
||||
today = datetime.now(timezone.utc).date().isoformat()
|
||||
items = [
|
||||
schema.RedditItem(
|
||||
id="R1",
|
||||
title="Test",
|
||||
url="https://reddit.com/r/test/1",
|
||||
subreddit="test",
|
||||
date=today,
|
||||
date_confidence="high",
|
||||
engagement=schema.Engagement(score=100, num_comments=50, upvote_ratio=0.9),
|
||||
relevance=0.9,
|
||||
),
|
||||
schema.RedditItem(
|
||||
id="R2",
|
||||
title="Test 2",
|
||||
url="https://reddit.com/r/test/2",
|
||||
subreddit="test",
|
||||
date=today,
|
||||
date_confidence="high",
|
||||
engagement=schema.Engagement(score=10, num_comments=5, upvote_ratio=0.8),
|
||||
relevance=0.5,
|
||||
),
|
||||
]
|
||||
|
||||
result = score.score_reddit_items(items)
|
||||
|
||||
self.assertEqual(len(result), 2)
|
||||
self.assertGreater(result[0].score, 0)
|
||||
self.assertGreater(result[1].score, 0)
|
||||
# Higher relevance and engagement should score higher
|
||||
self.assertGreater(result[0].score, result[1].score)
|
||||
|
||||
def test_empty_list(self):
|
||||
result = score.score_reddit_items([])
|
||||
self.assertEqual(result, [])
|
||||
|
||||
|
||||
class TestScoreXItems(unittest.TestCase):
|
||||
def test_scores_items(self):
|
||||
today = datetime.now(timezone.utc).date().isoformat()
|
||||
items = [
|
||||
schema.XItem(
|
||||
id="X1",
|
||||
text="Test post",
|
||||
url="https://x.com/user/1",
|
||||
author_handle="user1",
|
||||
date=today,
|
||||
date_confidence="high",
|
||||
engagement=schema.Engagement(likes=100, reposts=25, replies=15, quotes=5),
|
||||
relevance=0.9,
|
||||
),
|
||||
]
|
||||
|
||||
result = score.score_x_items(items)
|
||||
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertGreater(result[0].score, 0)
|
||||
|
||||
|
||||
class TestSortItems(unittest.TestCase):
|
||||
def test_sorts_by_score_descending(self):
|
||||
items = [
|
||||
schema.RedditItem(id="R1", title="Low", url="", subreddit="", score=30),
|
||||
schema.RedditItem(id="R2", title="High", url="", subreddit="", score=90),
|
||||
schema.RedditItem(id="R3", title="Mid", url="", subreddit="", score=60),
|
||||
]
|
||||
|
||||
result = score.sort_items(items)
|
||||
|
||||
self.assertEqual(result[0].id, "R2")
|
||||
self.assertEqual(result[1].id, "R3")
|
||||
self.assertEqual(result[2].id, "R1")
|
||||
|
||||
def test_stable_sort(self):
|
||||
items = [
|
||||
schema.RedditItem(id="R1", title="A", url="", subreddit="", score=50),
|
||||
schema.RedditItem(id="R2", title="B", url="", subreddit="", score=50),
|
||||
]
|
||||
|
||||
result = score.sort_items(items)
|
||||
|
||||
# Both have same score, should maintain order by title
|
||||
self.assertEqual(len(result), 2)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
@@ -647,6 +647,15 @@
|
||||
"risk": "unknown",
|
||||
"source": "unknown"
|
||||
},
|
||||
{
|
||||
"id": "daily-news-report",
|
||||
"path": "skills/daily-news-report",
|
||||
"category": "uncategorized",
|
||||
"name": "daily-news-report",
|
||||
"description": "\u57fa\u4e8e\u9884\u8bbe URL \u5217\u8868\u6293\u53d6\u5185\u5bb9\uff0c\u7b5b\u9009\u9ad8\u8d28\u91cf\u6280\u672f\u4fe1\u606f\u5e76\u751f\u6210\u6bcf\u65e5 Markdown \u62a5\u544a\u3002",
|
||||
"risk": "unknown",
|
||||
"source": "unknown"
|
||||
},
|
||||
{
|
||||
"id": "database-design",
|
||||
"path": "skills/database-design",
|
||||
@@ -1070,6 +1079,15 @@
|
||||
"risk": "unknown",
|
||||
"source": "vibeship-spawner-skills (Apache 2.0)"
|
||||
},
|
||||
{
|
||||
"id": "last30days",
|
||||
"path": "skills/last30days",
|
||||
"category": "uncategorized",
|
||||
"name": "last30days",
|
||||
"description": "Research a topic from the last 30 days on Reddit + X + Web, become an expert, and write copy-paste-ready prompts for the user's target tool.",
|
||||
"risk": "unknown",
|
||||
"source": "unknown"
|
||||
},
|
||||
{
|
||||
"id": "launch-strategy",
|
||||
"path": "skills/launch-strategy",
|
||||
|
||||
Reference in New Issue
Block a user