feat(twitter-reader): add fetch_article.py for X Articles with images

- Use twitter-cli for structured metadata (likes, retweets, bookmarks) - Use Jina API for content with images - Auto-download all images to attachments/ - Generate Markdown with YAML frontmatter and local image references - Security scan passed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:31:33 +08:00
parent 673980639b
commit 22ec9f0d59
3 changed files with 377 additions and 46 deletions
--- a/twitter-reader/SKILL.md
+++ b/twitter-reader/SKILL.md
@@ -1,72 +1,156 @@
 ---
 name: twitter-reader
-description: Fetch Twitter/X post content by URL using jina.ai API to bypass JavaScript restrictions. Use when Claude needs to retrieve tweet content including author, timestamp, post text, images, and thread replies. Supports individual posts or batch fetching from x.com or twitter.com URLs.
+description: Fetch Twitter/X post content including long-form Articles with full images and metadata. Use when Claude needs to retrieve tweet/article content, author info, engagement metrics, and embedded media. Supports individual posts and X Articles (long-form content). Automatically downloads all images to local attachments folder and generates complete Markdown with proper image references. Preferred over Jina for X Articles with images.
 ---

 # Twitter Reader

-Fetch Twitter/X post content without needing JavaScript or authentication.
+Fetch Twitter/X post and article content with full media support.
+
+## Quick Start (Recommended)
+
+For X Articles with images, use the new fetch_article.py script:
+
+```bash
+uv run --with pyyaml python scripts/fetch_article.py <article_url> [output_dir]
+```
+
+Example:
+```bash
+uv run --with pyyaml python scripts/fetch_article.py \
+  https://x.com/HiTw93/status/2040047268221608281 \
+  ./Clippings
+```
+
+This will:
+- Fetch structured data via `twitter-cli` (likes, retweets, bookmarks)
+- Fetch content with images via `jina.ai` API
+- Download all images to `attachments/YYYY-MM-DD-AUTHOR-TITLE/`
+- Generate complete Markdown with embedded image references
+- Include YAML frontmatter with metadata
+
+### Example Output
+
+```
+Fetching: https://x.com/HiTw93/status/2040047268221608281
+--------------------------------------------------
+Getting metadata...
+Title: 你不知道的大模型训练：原理、路径与新实践
+Author: Tw93
+Likes: 1648
+
+Getting content and images...
+Images: 15
+
+Downloading 15 images...
+  ✓ 01-image.jpg
+  ✓ 02-image.jpg
+  ...
+
+✓ Saved: ./Clippings/2026-04-03-文章标题.md
+✓ Images: ./Clippings/attachments/2026-04-03-HiTw93-.../ (15 downloaded)
+```
+
+## Alternative: Jina API (Text-only)
+
+For simple text-only fetching without authentication:
+
+```bash
+# Single tweet
+curl "https://r.jina.ai/https://x.com/USER/status/TWEET_ID" \
+  -H "Authorization: Bearer ${JINA_API_KEY}"
+
+# Batch fetching
+scripts/fetch_tweets.sh url1 url2 url3
+```
+
+## Features
+
+### Full Article Mode (fetch_article.py)
+- ✅ Structured metadata (author, date, engagement metrics)
+- ✅ Automatic image download (all embedded media)
+- ✅ Complete Markdown with local image references
+- ✅ YAML frontmatter for PKM systems
+- ✅ Handles X Articles (long-form content)
+
+### Simple Mode (Jina API)
+- Text-only content
+- No authentication required beyond Jina API key
+- Good for quick text extraction

 ## Prerequisites

-You need a Jina API key to use this skill:
-
-1. Visit https://jina.ai/ to sign up (free tier available)
-2. Get your API key from the dashboard
-3. Set the environment variable:
+### For Full Article Mode
+- `uv` (Python package manager)
+- No additional setup (twitter-cli auto-installed)

+### For Simple Mode (Jina)
 ```bash
 export JINA_API_KEY="your_api_key_here"
+# Get from https://jina.ai/
 ```

-## Quick Start
+## Output Structure

-For a single tweet, use curl directly:
-
-```bash
-curl "https://r.jina.ai/https://x.com/USER/status/TWEET_ID" \
-  -H "Authorization: Bearer ${JINA_API_KEY}"
 ```
-
-For multiple tweets, use the bundled script:
-
-```bash
-scripts/fetch_tweets.sh url1 url2 url3
+output_dir/
+├── YYYY-MM-DD-article-title.md       # Main Markdown file
+└── attachments/
+    └── YYYY-MM-DD-author-title/
+        ├── 01-image.jpg
+        ├── 02-image.jpg
+        └── ...
 ```

 ## What Gets Returned

+### Full Article Mode
+- **YAML Frontmatter**: source, author, date, likes, retweets, bookmarks
+- **Markdown Content**: Full article text with local image references
+- **Attachments**: All downloaded images in dedicated folder
+
+### Simple Mode
 - **Title**: Post author and content preview
 - **URL Source**: Original tweet link
 - **Published Time**: GMT timestamp
- **Markdown Content**: Full post text with media descriptions
-
-## Bundled Scripts
-
-### fetch_tweet.py
-
-Python script for fetching individual tweets.
-
-```bash
-python scripts/fetch_tweet.py https://x.com/user/status/123 output.md
-```
-
-### fetch_tweets.sh
-
-Bash script for batch fetching multiple tweets.
-
-```bash
-scripts/fetch_tweets.sh \
-  "https://x.com/user/status/123" \
-  "https://x.com/user/status/456"
-```
+- **Markdown Content**: Text with remote media URLs

 ## URL Formats Supported

- `https://x.com/USER/status/ID`
- `https://twitter.com/USER/status/ID`
- `https://x.com/...` (redirects work automatically)
+- `https://x.com/USER/status/ID` (posts)
+- `https://x.com/USER/article/ID` (long-form articles)
+- `https://twitter.com/USER/status/ID` (legacy)

-## Environment Variables
+## Scripts

- `JINA_API_KEY`: Required. Your Jina.ai API key for accessing the reader API
+### fetch_article.py
+Full-featured article fetcher with image download:
+```bash
+uv run --with pyyaml python scripts/fetch_article.py <url> [output_dir]
+```
+
+### fetch_tweet.py
+Simple text-only fetcher using Jina API:
+```bash
+python scripts/fetch_tweet.py <tweet_url> [output_file]
+```
+
+### fetch_tweets.sh
+Batch fetch multiple tweets (Jina API):
+```bash
+scripts/fetch_tweets.sh <url1> <url2> ...
+```
+
+## Migration from Jina API
+
+Old workflow:
+```bash
+curl "https://r.jina.ai/https://x.com/..."
+# Manual image extraction and download
+```
+
+New workflow:
+```bash
+uv run --with pyyaml python scripts/fetch_article.py <url>
+# Automatic image download, complete Markdown
+```