feat(twitter-reader): add fetch_article.py for X Articles with images

- Use twitter-cli for structured metadata (likes, retweets, bookmarks)
- Use Jina API for content with images
- Auto-download all images to attachments/
- Generate Markdown with YAML frontmatter and local image references
- Security scan passed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
daymade
2026-04-06 16:31:33 +08:00
parent 673980639b
commit 22ec9f0d59
3 changed files with 377 additions and 46 deletions

View File

@@ -1,72 +1,156 @@
---
name: twitter-reader
description: Fetch Twitter/X post content by URL using jina.ai API to bypass JavaScript restrictions. Use when Claude needs to retrieve tweet content including author, timestamp, post text, images, and thread replies. Supports individual posts or batch fetching from x.com or twitter.com URLs.
description: Fetch Twitter/X post content including long-form Articles with full images and metadata. Use when Claude needs to retrieve tweet/article content, author info, engagement metrics, and embedded media. Supports individual posts and X Articles (long-form content). Automatically downloads all images to local attachments folder and generates complete Markdown with proper image references. Preferred over Jina for X Articles with images.
---
# Twitter Reader
Fetch Twitter/X post content without needing JavaScript or authentication.
Fetch Twitter/X post and article content with full media support.
## Quick Start (Recommended)
For X Articles with images, use the new fetch_article.py script:
```bash
uv run --with pyyaml python scripts/fetch_article.py <article_url> [output_dir]
```
Example:
```bash
uv run --with pyyaml python scripts/fetch_article.py \
https://x.com/HiTw93/status/2040047268221608281 \
./Clippings
```
This will:
- Fetch structured data via `twitter-cli` (likes, retweets, bookmarks)
- Fetch content with images via `jina.ai` API
- Download all images to `attachments/YYYY-MM-DD-AUTHOR-TITLE/`
- Generate complete Markdown with embedded image references
- Include YAML frontmatter with metadata
### Example Output
```
Fetching: https://x.com/HiTw93/status/2040047268221608281
--------------------------------------------------
Getting metadata...
Title: 你不知道的大模型训练:原理、路径与新实践
Author: Tw93
Likes: 1648
Getting content and images...
Images: 15
Downloading 15 images...
✓ 01-image.jpg
✓ 02-image.jpg
...
✓ Saved: ./Clippings/2026-04-03-文章标题.md
✓ Images: ./Clippings/attachments/2026-04-03-HiTw93-.../ (15 downloaded)
```
## Alternative: Jina API (Text-only)
For simple text-only fetching without authentication:
```bash
# Single tweet
curl "https://r.jina.ai/https://x.com/USER/status/TWEET_ID" \
-H "Authorization: Bearer ${JINA_API_KEY}"
# Batch fetching
scripts/fetch_tweets.sh url1 url2 url3
```
## Features
### Full Article Mode (fetch_article.py)
- ✅ Structured metadata (author, date, engagement metrics)
- ✅ Automatic image download (all embedded media)
- ✅ Complete Markdown with local image references
- ✅ YAML frontmatter for PKM systems
- ✅ Handles X Articles (long-form content)
### Simple Mode (Jina API)
- Text-only content
- No authentication required beyond Jina API key
- Good for quick text extraction
## Prerequisites
You need a Jina API key to use this skill:
1. Visit https://jina.ai/ to sign up (free tier available)
2. Get your API key from the dashboard
3. Set the environment variable:
### For Full Article Mode
- `uv` (Python package manager)
- No additional setup (twitter-cli auto-installed)
### For Simple Mode (Jina)
```bash
export JINA_API_KEY="your_api_key_here"
# Get from https://jina.ai/
```
## Quick Start
## Output Structure
For a single tweet, use curl directly:
```bash
curl "https://r.jina.ai/https://x.com/USER/status/TWEET_ID" \
-H "Authorization: Bearer ${JINA_API_KEY}"
```
For multiple tweets, use the bundled script:
```bash
scripts/fetch_tweets.sh url1 url2 url3
output_dir/
├── YYYY-MM-DD-article-title.md # Main Markdown file
└── attachments/
└── YYYY-MM-DD-author-title/
├── 01-image.jpg
├── 02-image.jpg
└── ...
```
## What Gets Returned
### Full Article Mode
- **YAML Frontmatter**: source, author, date, likes, retweets, bookmarks
- **Markdown Content**: Full article text with local image references
- **Attachments**: All downloaded images in dedicated folder
### Simple Mode
- **Title**: Post author and content preview
- **URL Source**: Original tweet link
- **Published Time**: GMT timestamp
- **Markdown Content**: Full post text with media descriptions
## Bundled Scripts
### fetch_tweet.py
Python script for fetching individual tweets.
```bash
python scripts/fetch_tweet.py https://x.com/user/status/123 output.md
```
### fetch_tweets.sh
Bash script for batch fetching multiple tweets.
```bash
scripts/fetch_tweets.sh \
"https://x.com/user/status/123" \
"https://x.com/user/status/456"
```
- **Markdown Content**: Text with remote media URLs
## URL Formats Supported
- `https://x.com/USER/status/ID`
- `https://twitter.com/USER/status/ID`
- `https://x.com/...` (redirects work automatically)
- `https://x.com/USER/status/ID` (posts)
- `https://x.com/USER/article/ID` (long-form articles)
- `https://twitter.com/USER/status/ID` (legacy)
## Environment Variables
## Scripts
- `JINA_API_KEY`: Required. Your Jina.ai API key for accessing the reader API
### fetch_article.py
Full-featured article fetcher with image download:
```bash
uv run --with pyyaml python scripts/fetch_article.py <url> [output_dir]
```
### fetch_tweet.py
Simple text-only fetcher using Jina API:
```bash
python scripts/fetch_tweet.py <tweet_url> [output_file]
```
### fetch_tweets.sh
Batch fetch multiple tweets (Jina API):
```bash
scripts/fetch_tweets.sh <url1> <url2> ...
```
## Migration from Jina API
Old workflow:
```bash
curl "https://r.jina.ai/https://x.com/..."
# Manual image extraction and download
```
New workflow:
```bash
uv run --with pyyaml python scripts/fetch_article.py <url>
# Automatic image download, complete Markdown
```