feat: add douban-skill + enhance skill-creator with development methodology

New skill: douban-skill
- Full export of Douban (豆瓣) book/movie/music/game collections via Frodo API
- RSS incremental sync for daily updates
- Python stdlib only, zero dependencies, cross-platform (macOS/Windows/Linux)
- Documented 7 failed approaches (PoW anti-scraping) and why Frodo API is the only working solution
- Pre-flight user validation, KeyboardInterrupt handling, pagination bug fix

skill-creator enhancements:
- Add development methodology reference (8-phase process with prior art research,
  counter review, and real failure case studies)
- Sync upstream changes: improve_description.py now uses `claude -p` instead of
  Anthropic SDK (no ANTHROPIC_API_KEY needed), remove stale "extended thinking" ref
- Add "Updating an existing skill" guidance to Claude.ai and Cowork sections
- Restore test case heuristic guidance for objective vs subjective skills

README updates:
- Document fork advantages vs upstream with quality comparison table (65 vs 42)
- Bilingual (EN + ZH-CN) with consistent content

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
daymade
2026-04-04 12:36:51 +08:00
parent cafabd753b
commit 28cd6bd813
11 changed files with 1186 additions and 73 deletions

131
douban-skill/SKILL.md Normal file
View File

@@ -0,0 +1,131 @@
---
name: douban-skill
description: >
Export and sync Douban (豆瓣) book/movie/music/game collections to local CSV files via Frodo API.
Supports full export (all history) and RSS incremental sync (recent items).
Use when the user wants to export Douban reading/watching/listening/gaming history,
back up their Douban data, set up incremental sync, or mentions 豆瓣/douban collections.
Triggers on: 豆瓣, douban, 读书记录, 观影记录, 书影音, 导出豆瓣, export, backup, sync, collection.
---
# Douban Collection Export
Export Douban user collections (books, movies, music, games) to CSV files.
Douban has no official data export; the official API shut down in 2018.
## What This Skill Can Do
- Full export of all book/movie/music/game collections via Frodo API
- RSS incremental sync for daily updates (last ~10 items)
- CSV output with UTF-8 BOM (Excel-compatible), cross-platform (macOS/Windows/Linux)
- No login, no cookies, no browser required
- Pre-flight user ID validation (fail fast on wrong ID)
## What This Skill Cannot Do
- Cannot export reviews (长评), notes (读书笔记), or broadcasts (广播)
- Cannot filter by single category in one run (exports all 4 types together)
- Cannot access private profiles (returns 0 items silently)
## Why Frodo API (Do NOT Use Web Scraping)
Douban uses PoW (Proof of Work) challenges on web pages, blocking all HTTP scraping.
We tested 7 approaches — only the Frodo API works. **Do NOT attempt** web scraping,
`browser_cookie3`+`requests`, `curl` with cookies, or Jina Reader.
See [references/troubleshooting.md](references/troubleshooting.md) for the complete
failure log of all 7 tested approaches and why each failed.
## Security & Privacy
The API key and HMAC secret in the script are Douban's **public mobile app credentials**,
extracted from the APK. They are shared by all Douban app users and do not identify you.
No personal credentials are used or stored. Data is fetched only from `frodo.douban.com`.
## Full Export (Primary Method)
```bash
DOUBAN_USER=<user_id> python3 scripts/douban-frodo-export.py
```
**Finding the user ID:** Profile URL `douban.com/people/<ID>/` — the ID is after `/people/`.
If the user provides a full URL, the script auto-extracts the ID.
**Environment variables:**
- `DOUBAN_USER` (required): Douban user ID (alphanumeric or numeric, or full profile URL)
- `DOUBAN_OUTPUT_DIR` (optional): Override output directory
**Default output** (auto-detected per platform):
- macOS: `~/Downloads/douban-sync/<user_id>/`
- Windows: `%USERPROFILE%\Downloads\douban-sync\<user_id>\`
- Linux: `~/Downloads/douban-sync/<user_id>/`
**Dependencies:** Python 3.6+ standard library only (works with `python3` or `uv run`).
**Example console output:**
```
Douban Export for user: your_douban_id
Output directory: /Users/you/Downloads/douban-sync/your_douban_id
=== 读过 (book) ===
Total: 639
Fetched 0-50 (50/639)
Fetched 50-100 (100/639)
...
Fetched 597-639 (639/639)
Collected: 639
=== 在读 (book) ===
Total: 75
...
--- Writing CSV files ---
书.csv: 996 rows
影视.csv: 238 rows
音乐.csv: 0 rows
游戏.csv: 0 rows
Done! 1234 total items exported to /Users/you/Downloads/douban-sync/your_douban_id
```
## RSS Incremental Sync (Complementary)
```bash
DOUBAN_USER=<user_id> node scripts/douban-rss-sync.mjs
```
RSS returns only the latest ~10 items (no pagination). Use Full Export first, then RSS for daily updates.
## Output Format
Four CSV files per user:
```
Downloads/douban-sync/<user_id>/
├── 书.csv (读过 + 在读 + 想读)
├── 影视.csv (看过 + 在看 + 想看)
├── 音乐.csv (听过 + 在听 + 想听)
└── 游戏.csv (玩过 + 在玩 + 想玩)
```
Columns: `title, url, date, rating, status, comment`
- `rating`: ★ to ★★★★★ (empty if unrated)
- `date`: YYYY-MM-DD (when the user marked it)
- Safe to run multiple times (overwrites with fresh data)
- Row counts may be slightly below Douban's displayed count due to delisted items
## Workflow
1. Ask for Douban user ID (from profile URL, or accept full URL)
2. Run: `DOUBAN_USER=<id> python3 scripts/douban-frodo-export.py`
3. Verify: row counts in console output should match, check with `wc -l <output_dir>/*.csv`
4. (Optional) Set up RSS sync for daily incremental updates
## Troubleshooting
See [references/troubleshooting.md](references/troubleshooting.md) for:
- Frodo API auth details (HMAC-SHA1 signature computation)
- Common errors (code 996 signature error, rate limits, pagination quirks)
- Complete failure log of all 7 tested approaches with root causes
- Alternative approaches (豆伴 extension, Tampermonkey script, browser console)
- API endpoint reference with response format

View File

@@ -0,0 +1,265 @@
# Troubleshooting & Technical Reference
## How Frodo API Auth Works
The Frodo API is Douban's mobile app backend at `frodo.douban.com`. It uses HMAC-SHA1 signature
authentication instead of the PoW challenges used on web pages.
**Signature computation:**
1. Build raw string: `GET` + `&` + URL-encoded(**path only**) + `&` + timestamp
2. HMAC-SHA1 with secret key `bf7dddc7c9cfe6f7`
3. Base64-encode the result → this is `_sig`
**Critical:** Sign only the URL **path** (e.g., `/api/v2/user/xxx/interests`), never the
full URL with query parameters. This was our first signature error — code 996.
**Required query parameters:**
- `apiKey`: `0dad551ec0f84ed02907ff5c42e8ec70` (Douban mobile app's public API key)
- `_ts`: Unix timestamp in **seconds** (string)
- `_sig`: The computed HMAC-SHA1 signature
- `os_rom`: `android`
**Required headers:**
- `User-Agent`: Must look like a Douban Android app client string
**Python implementation:**
```python
import hmac, hashlib, base64, urllib.parse
def compute_signature(url_path, timestamp):
raw = '&'.join(['GET', urllib.parse.quote(url_path, safe=''), timestamp])
sig = hmac.new(b'bf7dddc7c9cfe6f7', raw.encode(), hashlib.sha1)
return base64.b64encode(sig.digest()).decode()
```
## Common Errors
### Signature Error (code 996)
```json
{"msg": "invalid_request_996", "code": 996}
```
**Cause:** The `_sig` parameter doesn't match the expected value.
**Debug checklist:**
1. Are you signing only the **path**, not the full URL with query params?
2. Does `_ts` in the signature match `_ts` in the query params exactly?
3. Is `_ts` a string of Unix seconds (not milliseconds)?
4. Are you using `urllib.parse.quote(path, safe='')` (encoding `/` as `%2F`)?
### Pagination Returns Fewer Items Than Expected
Some pages return fewer than the requested `count` (e.g., 48 instead of 50). This happens
when items have been delisted from Douban's catalog but still count toward the total.
**This was our biggest silent bug.** The first version of the export script used
`len(page_items) < count_per_page` as the stop condition. Result: only 499 out of 639
books were exported, with no error message. The fix:
```python
# WRONG: stops early when a page has fewer items due to delisted content
if len(interests) < count_per_page:
break
# CORRECT: check against the total count reported by the API
if len(all_items) >= total:
break
start += len(interests) # advance by actual count, not page_size
```
### Rating Scale Confusion
The Frodo API returns **two different ratings** per item:
| Field | Scale | Meaning |
|-------|-------|---------|
| `interest.rating` | `{value: 1-5, max: 5}` | **User's personal rating** |
| `subject.rating` | `{value: 0-10, max: 10}` | Douban community average |
Our first version divided all values by 2, which halved the user's rating (2 stars → 1 star).
The fix: check `max` field to determine scale.
```python
# Correct conversion
if max_val <= 5:
stars = int(val) # value is already 1-5
else:
stars = int(val / 2) # value is 2-10, convert to 1-5
```
### HTTP 403 / Rate Limiting
The Frodo API is generally tolerant, but excessive requests may trigger rate limiting.
**Tested intervals:**
- 1.5s between pages + 2s between categories: 1234 items exported without issues
- 0s (no delay): Not tested, not recommended
If you hit 403, increase delays to 3s/5s and retry after a few minutes.
## Detailed Failure Log: All 7 Tested Approaches
### Approach 1: `requests` + `browser_cookie3` (Python)
**What we tried:** Extract Chrome cookies via `browser_cookie3`, use `requests` with those cookies.
**What happened:**
1. First request succeeded — we saw "639 books" in the page title
2. Subsequent requests returned "禁止访问" (Forbidden) page
3. The HTML contained no items despite HTTP 200 status
**Root cause:** Douban's PoW challenge. The first request sometimes passes (cached/grace period),
but subsequent requests trigger the PoW redirect to `sec.douban.com`. Python `requests` cannot
execute the SHA-512 proof-of-work JavaScript.
### Approach 2: `curl` with browser cookies
**What we tried:** Export cookies from Chrome, use `curl` with full browser headers (User-Agent,
Accept, Referer, Accept-Language).
**What happened:** HTTP 302 redirect to `https://www.douban.com/misc/sorry?original-url=...`
**Root cause:** Same PoW issue. Even with `NO_PROXY` set to bypass local proxy, the IP was
already rate-limited from approach 1's requests.
### Approach 3: Jina Reader (`r.jina.ai`)
**What we tried:** `curl -s "https://r.jina.ai/https://book.douban.com/people/<user_id>/collect"`
**What happened:** HTTP 200 but content was "403 Forbidden" — Jina's server got blocked.
**Root cause:** Jina's scraping infrastructure also cannot solve Douban's PoW challenges.
### Approach 4: Chrome DevTools MCP (Playwright browser)
**What we tried:** Navigate to Douban pages in the Playwright browser via Chrome DevTools MCP.
Injected cookies via `document.cookie` in evaluate_script.
**What happened:**
1. `mcp__chrome-devtools__navigate_page` → page title was "403 Forbidden"
2. After cookie injection → still redirected to `/misc/sorry`
**Root cause:** The Chrome DevTools MCP connects to a Playwright browser instance, not the
user's actual Chrome. Even after injecting cookies, the IP was already banned from earlier
requests. Also, HttpOnly cookies (like `dbcl2`) can't be set via `document.cookie`.
### Approach 5: `opencli douban marks`
**What we tried:** `opencli douban marks --uid <user_id> --status all --limit 0 -f csv`
**What happened:** **Partial success** — exported 24 movie records successfully.
**Limitation:** `opencli douban` only implements `marks` (movies). No book/music/game support.
The `opencli generate` and `opencli cascade` commands failed to discover APIs for
`book.douban.com` because Douban books use server-rendered HTML with no discoverable API.
### Approach 6: Agent Reach
**What we tried:** Installed `agent-reach` (17-platform CLI tool). Checked for Douban support.
**What happened:** Agent Reach has no Douban channel. Its web reader (Jina) also gets 403.
### Approach 7: Node.js HTTP scraper (from douban-sync-skill)
**What we tried:** The `douban-scraper.mjs` from the cosformula/douban-sync-skill.
**Status:** User rejected the command before it ran — based on prior failures, it would hit
the same PoW blocking. The script uses `fetch()` with a fake User-Agent, which is exactly
what approaches 1-3 proved doesn't work.
## Alternative Approaches (Not Blocked)
These approaches work but have different tradeoffs compared to the Frodo API:
### 豆伴 (Tofu) Chrome Extension (605 stars)
- GitHub: `doufen-org/tofu`
- Uses Douban's **Rexxar API** (`m.douban.com/rexxar/api/v2/user/{uid}/interests`)
- Most comprehensive: backs up books, movies, music, games, reviews, notes, photos, etc.
- **Current status (April 2026):** Mainline v0.12.x is broken due to MV3 migration + anti-scraping.
PR #121 (v0.13.0) fixes both issues but is not yet merged.
- **Risk:** Makes many API calls as logged-in user → may trigger account lockout
### Tampermonkey Userscript (bambooom/douban-backup, 162 stars)
- Greasemonkey/Tampermonkey: `https://greasyfork.org/en/scripts/420999`
- Runs inside real browser → inherits PoW-solved session
- Adds "export" button on collection pages → auto-paginates → downloads CSV
- Suitable for one-time manual export
### Browser Console Script (built into old skill)
- Paste `fetch()`-based extraction script into browser DevTools console
- Zero blocking risk (same-origin request from authenticated session)
- Most manual approach — user must paste script and copy clipboard
## API Endpoint Reference
### User Interests (Collections)
```
GET https://frodo.douban.com/api/v2/user/{user_id}/interests
?type={book|movie|music|game}
&status={done|doing|mark}
&start={offset}
&count={page_size, max 50}
&apiKey=0dad551ec0f84ed02907ff5c42e8ec70
&_ts={unix_timestamp_seconds}
&_sig={hmac_sha1_signature}
&os_rom=android
```
**Response:**
```json
{
"count": 50,
"start": 0,
"total": 639,
"interests": [
{
"comment": "短评文本",
"rating": {"value": 4, "max": 5, "star_count": 4.0},
"create_time": "2026-03-21 18:23:10",
"status": "done",
"id": 4799352304,
"subject": {
"id": "36116375",
"title": "书名",
"url": "https://book.douban.com/subject/36116375/",
"rating": {"value": 7.8, "max": 10, "count": 14}
}
}
]
}
```
**Important distinctions:**
- `interest.rating` = user's personal rating (max 5)
- `subject.rating` = Douban community average (max 10)
- `interest.create_time` = when the user marked it (not the item's publish date)
- `status`: `done` = 读过/看过/听过/玩过, `doing` = 在读/在看/在听/在玩, `mark` = 想读/想看/想听/想玩
### Other Known Frodo Endpoints (Not Used by This Skill)
| Endpoint | Returns |
|----------|---------|
| `/api/v2/book/{id}` | Book detail |
| `/api/v2/movie/{id}` | Movie detail |
| `/api/v2/group/{id}/topics` | Group discussion topics |
| `/api/v2/group/topic/{id}` | Single topic with comments |
| `/api/v2/subject_collection/{type}/items` | Douban curated lists |
### Mouban Proxy Service (Third-Party)
`mouban.mythsman.com` is a Go service that pre-crawls Douban data. If a user has been indexed,
it returns data instantly without hitting Douban directly. Endpoints:
| Endpoint | Returns |
|----------|---------|
| `GET /guest/check_user?id={douban_id}` | User profile + counts |
| `GET /guest/user_book?id={id}&action={wish\|do\|collect}` | Book entries |
| `GET /guest/user_movie?id={id}&action=...` | Movie entries |
**Caveat:** Data freshness depends on when the service last crawled the user. First request
for a new user triggers a background crawl (takes minutes to hours). Third-party dependency.

View File

@@ -0,0 +1,329 @@
#!/usr/bin/env python3
"""
Douban Collection Full Export via Frodo API (Mobile App Backend)
Exports all book/movie/music/game collections to CSV files.
No login or cookies required — uses HMAC-SHA1 signature auth.
The API key and HMAC secret are Douban's mobile app credentials, extracted from
the public APK. They are the same for all users and do not identify you. No
personal credentials are used or stored. Data is fetched only from frodo.douban.com.
Usage:
DOUBAN_USER=<user_id> python3 douban-frodo-export.py
DOUBAN_USER=<user_id> DOUBAN_OUTPUT_DIR=/custom/path python3 douban-frodo-export.py
Environment:
DOUBAN_USER (required): Douban user ID from profile URL
DOUBAN_OUTPUT_DIR (optional): Override output directory
"""
import hmac
import hashlib
import base64
import csv
import json
import os
import platform
import re
import socket
import sys
import time
import urllib.parse
import urllib.request
import urllib.error
# --- Frodo API Auth ---
# Public credentials from the Douban Android APK, shared by all app users.
API_KEY = '0dad551ec0f84ed02907ff5c42e8ec70'
HMAC_SECRET = b'bf7dddc7c9cfe6f7'
BASE_URL = 'https://frodo.douban.com'
USER_AGENT = (
'api-client/1 com.douban.frodo/7.22.0.beta9(231) Android/23 '
'product/Mate40 vendor/HUAWEI model/Mate40 brand/HUAWEI '
'rom/android network/wifi platform/AndroidPad'
)
# --- Rate Limiting ---
# 1.5s between pages, 2s between categories. Tested with 1200+ items.
PAGE_DELAY = 1.5
CATEGORY_DELAY = 2.0
ITEMS_PER_PAGE = 50
MAX_PAGES_SAFETY = 500 # Guard against infinite pagination loops
# --- Category Definitions ---
CATEGORIES = [
('book', 'done', '读过', '书.csv'),
('book', 'doing', '在读', '书.csv'),
('book', 'mark', '想读', '书.csv'),
('movie', 'done', '看过', '影视.csv'),
('movie', 'doing', '在看', '影视.csv'),
('movie', 'mark', '想看', '影视.csv'),
('music', 'done', '听过', '音乐.csv'),
('music', 'doing', '在听', '音乐.csv'),
('music', 'mark', '想听', '音乐.csv'),
('game', 'done', '玩过', '游戏.csv'),
('game', 'doing', '在玩', '游戏.csv'),
('game', 'mark', '想玩', '游戏.csv'),
]
URL_PREFIX = {
'book': 'https://book.douban.com/subject/',
'movie': 'https://movie.douban.com/subject/',
'music': 'https://music.douban.com/subject/',
'game': 'https://www.douban.com/game/',
}
CSV_FIELDS = ['title', 'url', 'date', 'rating', 'status', 'comment']
def get_download_dir():
"""Get the platform-appropriate Downloads directory."""
system = platform.system()
if system == 'Darwin':
return os.path.expanduser('~/Downloads')
elif system == 'Windows':
return os.path.join(os.environ.get('USERPROFILE', os.path.expanduser('~')), 'Downloads')
else:
return os.path.expanduser('~/Downloads')
def get_output_dir(user_id):
"""Determine output directory from env or platform default."""
base = os.environ.get('DOUBAN_OUTPUT_DIR')
if not base:
base = os.path.join(get_download_dir(), 'douban-sync')
return os.path.join(base, user_id)
def compute_signature(url_path, timestamp):
"""Compute Frodo API HMAC-SHA1 signature.
Signs: METHOD & url_encoded_path & timestamp (path only, no query params).
"""
raw = '&'.join(['GET', urllib.parse.quote(url_path, safe=''), timestamp])
sig = hmac.new(HMAC_SECRET, raw.encode(), hashlib.sha1)
return base64.b64encode(sig.digest()).decode()
def fetch_json(url, params):
"""Make an authenticated GET request to the Frodo API.
Returns (data_dict, status_code). Catches HTTP errors, network errors,
and timeouts — all return a synthetic error dict so the caller can retry.
"""
query = urllib.parse.urlencode(params)
full_url = f'{url}?{query}'
req = urllib.request.Request(full_url, headers={'User-Agent': USER_AGENT})
try:
with urllib.request.urlopen(req, timeout=15) as resp:
return json.loads(resp.read().decode('utf-8')), resp.status
except urllib.error.HTTPError as e:
body = e.read().decode('utf-8', errors='replace')[:200]
return {'error': body, 'code': e.code}, e.code
except urllib.error.URLError as e:
return {'error': f'Network error: {e.reason}'}, 0
except socket.timeout:
return {'error': 'Request timed out'}, 0
except json.JSONDecodeError as e:
return {'error': f'Invalid JSON response: {e}'}, 0
def preflight_check(user_id):
"""Verify user exists by fetching one page of book interests.
Returns True if the user has any data, False if the user ID appears invalid.
Prints a warning and continues if the check itself fails (network issue).
"""
api_path = f'/api/v2/user/{user_id}/interests'
ts = str(int(time.time()))
sig = compute_signature(api_path, ts)
params = {
'type': 'book', 'status': 'done', 'start': 0, 'count': 1,
'apiKey': API_KEY, '_ts': ts, '_sig': sig, 'os_rom': 'android',
}
data, code = fetch_json(f'{BASE_URL}{api_path}', params)
if code == 0:
print(f'Warning: Could not verify user ID (network issue). Proceeding anyway.')
return True
if code != 200:
print(f'Error: API returned HTTP {code} for user "{user_id}".')
print(f' Check that the user ID is correct (from douban.com/people/<ID>/).')
return False
total = data.get('total', -1)
if total == -1:
print(f'Warning: Unexpected API response. Proceeding anyway.')
return True
return True
def fetch_all_interests(user_id, type_name, status):
"""Fetch all items for a given type+status combination.
Paginates through the API, checking against the reported total
(not page size) to handle pages with fewer items due to delisted content.
"""
api_path = f'/api/v2/user/{user_id}/interests'
all_items = []
start = 0
total = None
retries = 0
max_retries = 3
page_count = 0
while page_count < MAX_PAGES_SAFETY:
page_count += 1
ts = str(int(time.time()))
sig = compute_signature(api_path, ts)
params = {
'type': type_name, 'status': status,
'start': start, 'count': ITEMS_PER_PAGE,
'apiKey': API_KEY, '_ts': ts, '_sig': sig, 'os_rom': 'android',
}
data, status_code = fetch_json(f'{BASE_URL}{api_path}', params)
if status_code != 200:
retries += 1
if retries > max_retries:
print(f' Error: HTTP {status_code} after {max_retries} retries, stopping.')
print(f' See references/troubleshooting.md for common errors.')
break
delay = 5 * (2 ** (retries - 1))
print(f' HTTP {status_code}, retry {retries}/{max_retries}, waiting {delay}s...')
time.sleep(delay)
continue
retries = 0
if total is None:
total = data.get('total', 0)
if total == 0:
return []
print(f' Total: {total}')
interests = data.get('interests', [])
if not interests:
break
all_items.extend(interests)
print(f' Fetched {start}-{start + len(interests)} ({len(all_items)}/{total})')
if len(all_items) >= total:
break
start += len(interests)
time.sleep(PAGE_DELAY)
return all_items
def extract_rating(interest):
"""Convert Frodo API rating to star string.
Frodo returns {value: N, max: 5} where N is 1-5.
Some older entries may use max=10 scale (value 2-10).
API values are typically integers; round() handles any edge cases.
"""
r = interest.get('rating')
if not r or not isinstance(r, dict):
return ''
val = r.get('value', 0)
max_val = r.get('max', 5)
if not val:
return ''
stars = round(val) if max_val <= 5 else round(val / 2)
return '' * max(0, min(5, stars))
def interest_to_row(interest, type_name, status_cn):
"""Convert a single Frodo API interest object to a CSV row dict."""
subject = interest.get('subject', {})
sid = subject.get('id', '')
prefix = URL_PREFIX.get(type_name, 'https://www.douban.com/subject/')
url = f'{prefix}{sid}/' if sid else subject.get('url', '')
date_raw = interest.get('create_time', '') or ''
date = date_raw[:10] if re.match(r'\d{4}-\d{2}-\d{2}', date_raw) else ''
return {
'title': subject.get('title', ''),
'url': url,
'date': date,
'rating': extract_rating(interest),
'status': status_cn,
'comment': interest.get('comment', ''),
}
def write_csv(filepath, rows):
"""Write rows to a CSV file with UTF-8 BOM for Excel compatibility."""
with open(filepath, 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.DictWriter(f, fieldnames=CSV_FIELDS)
writer.writeheader()
writer.writerows(rows)
def main():
user_id = os.environ.get('DOUBAN_USER', '').strip()
if not user_id:
print('Error: DOUBAN_USER environment variable is required')
print('Usage: DOUBAN_USER=<your_douban_id> python3 douban-frodo-export.py')
sys.exit(1)
# Extract ID from URL if user pasted a full profile URL
url_match = re.search(r'douban\.com/people/([A-Za-z0-9._-]+)', user_id)
if url_match:
user_id = url_match.group(1)
if not re.match(r'^[A-Za-z0-9._-]+$', user_id):
print(f'Error: DOUBAN_USER contains invalid characters: {user_id}')
sys.exit(1)
output_dir = get_output_dir(user_id)
os.makedirs(output_dir, exist_ok=True)
print(f'Douban Export for user: {user_id}')
print(f'Output directory: {output_dir}\n')
# Pre-flight: verify user ID is valid before spending time on full export
if not preflight_check(user_id):
sys.exit(1)
# Collect data grouped by output file, then write all at the end.
file_data = {}
grand_total = 0
for type_name, status, status_cn, outfile in CATEGORIES:
print(f'=== {status_cn} ({type_name}) ===')
items = fetch_all_interests(user_id, type_name, status)
if outfile not in file_data:
file_data[outfile] = []
for item in items:
file_data[outfile].append(interest_to_row(item, type_name, status_cn))
count = len(items)
grand_total += count
if count > 0:
print(f' Collected: {count}\n')
else:
print(f' (empty)\n')
time.sleep(CATEGORY_DELAY)
# Write CSV files
print('--- Writing CSV files ---')
for filename, rows in file_data.items():
filepath = os.path.join(output_dir, filename)
write_csv(filepath, rows)
print(f' {filename}: {len(rows)} rows')
print(f'\nDone! {grand_total} total items exported to {output_dir}')
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
print('\n\nExport interrupted by user.')
sys.exit(130)

View File

@@ -0,0 +1,190 @@
#!/usr/bin/env node
/**
* Douban RSS → CSV incremental sync
*
* Pulls the public RSS feed, parses new entries, appends to CSV files.
* No login required. Returns only the ~10 most recent items.
* Best used for daily sync after a full Frodo API export.
*
* Usage:
* DOUBAN_USER=<user_id> node douban-rss-sync.mjs
*
* Environment:
* DOUBAN_USER (required): Douban user ID
* DOUBAN_OUTPUT_DIR (optional): Override output directory
*/
import https from 'node:https';
import http from 'node:http';
import fs from 'node:fs';
import path from 'node:path';
import os from 'node:os';
let DOUBAN_USER = process.env.DOUBAN_USER;
if (!DOUBAN_USER) { console.error('Error: DOUBAN_USER env var is required'); process.exit(1); }
// Extract ID from full URL if provided (e.g., https://www.douban.com/people/foo/)
const urlMatch = DOUBAN_USER.match(/douban\.com\/people\/([A-Za-z0-9._-]+)/);
if (urlMatch) DOUBAN_USER = urlMatch[1];
if (!/^[A-Za-z0-9._-]+$/.test(DOUBAN_USER)) { console.error('Error: DOUBAN_USER contains invalid characters'); process.exit(1); }
function getDownloadDir() {
if (process.platform === 'win32') {
return path.join(process.env.USERPROFILE || os.homedir(), 'Downloads');
}
return path.join(os.homedir(), 'Downloads');
}
const BASE_DIR = process.env.DOUBAN_OUTPUT_DIR || path.join(getDownloadDir(), 'douban-sync');
const DOUBAN_OUTPUT_DIR = path.join(BASE_DIR, DOUBAN_USER);
const STATE_FILE = path.join(DOUBAN_OUTPUT_DIR, '.douban-rss-state.json');
const RSS_URL = `https://www.douban.com/feed/people/${DOUBAN_USER}/interests`;
const CATEGORY_MAP = [
{ pattern: /^读过/, file: '书.csv', status: '读过' },
{ pattern: /^(?:在读|最近在读)/, file: '书.csv', status: '在读' },
{ pattern: /^想读/, file: '书.csv', status: '想读' },
{ pattern: /^看过/, file: '影视.csv', status: '看过' },
{ pattern: /^(?:在看|最近在看)/, file: '影视.csv', status: '在看' },
{ pattern: /^想看/, file: '影视.csv', status: '想看' },
{ pattern: /^听过/, file: '音乐.csv', status: '听过' },
{ pattern: /^(?:在听|最近在听)/, file: '音乐.csv', status: '在听' },
{ pattern: /^想听/, file: '音乐.csv', status: '想听' },
{ pattern: /^玩过/, file: '游戏.csv', status: '玩过' },
{ pattern: /^(?:在玩|最近在玩)/, file: '游戏.csv', status: '在玩' },
{ pattern: /^想玩/, file: '游戏.csv', status: '想玩' },
];
const CSV_HEADER = '\ufefftitle,url,date,rating,status,comment\n';
const RATING_MAP = { '力荐': '★★★★★', '推荐': '★★★★', '还行': '★★★', '较差': '★★', '很差': '★' };
function httpGet(url, redirects = 0) {
if (redirects > 5) return Promise.reject(new Error('Too many redirects'));
return new Promise((resolve, reject) => {
const mod = url.startsWith('https') ? https : http;
const req = mod.get(url, { headers: { 'User-Agent': 'Mozilla/5.0' }, timeout: 15000 }, res => {
if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
return httpGet(new URL(res.headers.location, url).href, redirects + 1).then(resolve, reject);
}
if (res.statusCode >= 400) return reject(new Error(`HTTP ${res.statusCode} for ${url}`));
let data = '';
res.on('data', c => data += c);
res.on('end', () => resolve(data));
});
req.on('error', reject);
req.on('timeout', () => { req.destroy(); reject(new Error('Request timeout')); });
});
}
function csvEscape(str) {
if (!str) return '';
if (str.includes(',') || str.includes('"') || str.includes('\n') || str.includes('\r')) {
return '"' + str.replace(/"/g, '""') + '"';
}
return str;
}
function parseItems(xml) {
const items = [];
const itemRegex = /<item>([\s\S]*?)<\/item>/g;
let match;
while ((match = itemRegex.exec(xml)) !== null) {
const block = match[1];
const get = tag => {
const m = block.match(new RegExp(`<${tag}[^>]*>(?:<!\\[CDATA\\[)?([\\s\\S]*?)(?:\\]\\]>)?<\\/${tag}>`));
return m ? m[1].trim() : '';
};
const title = get('title');
const link = get('link');
const guid = get('guid');
const pubDate = get('pubDate');
const desc = get('description');
const ratingMatch = desc.match(/推荐:\s*(力荐|推荐|还行|较差|很差)/);
const rating = ratingMatch ? RATING_MAP[ratingMatch[1]] || '' : '';
const commentMatch = desc.match(/短评:\s*([^<]+)/);
const comment = commentMatch ? commentMatch[1].trim() : '';
items.push({ title, link, guid, pubDate, rating, comment });
}
return items;
}
function loadState() {
try { return JSON.parse(fs.readFileSync(STATE_FILE, 'utf8')); }
catch { return { lastSyncGuids: [] }; }
}
function saveState(state) { fs.writeFileSync(STATE_FILE, JSON.stringify(state, null, 2)); }
function extractName(title) {
for (const { pattern } of CATEGORY_MAP) {
if (pattern.test(title)) return title.replace(pattern, '');
}
return title;
}
function isAlreadyInFile(filePath, link) {
try {
const content = fs.readFileSync(filePath, 'utf8');
// Exact URL match as CSV field — avoid false positives from substring matches
// (e.g., /subject/1234/ matching /subject/12345/)
return content.includes(',' + link + ',') ||
content.includes(',' + link + '\n') ||
content.includes(',' + link + '\r');
} catch { return false; }
}
function formatDate(pubDateStr) {
try {
const direct = pubDateStr.match(/(\d{4}-\d{2}-\d{2})/);
if (direct) return direct[1];
const d = new Date(pubDateStr);
const cst = new Date(d.getTime() + 8 * 3600000);
return cst.toISOString().split('T')[0];
} catch { return ''; }
}
function ensureCsvFile(filePath) {
if (!fs.existsSync(filePath)) {
fs.mkdirSync(path.dirname(filePath), { recursive: true });
fs.writeFileSync(filePath, CSV_HEADER);
}
}
function appendToCsv(filePath, entry, status) {
ensureCsvFile(filePath);
const name = extractName(entry.title);
const date = formatDate(entry.pubDate);
const line = [csvEscape(name), csvEscape(entry.link), csvEscape(date),
csvEscape(entry.rating), csvEscape(status), csvEscape(entry.comment)].join(',') + '\n';
fs.appendFileSync(filePath, line);
}
async function main() {
console.log(`Douban RSS Sync for user: ${DOUBAN_USER}`);
console.log(`Output: ${DOUBAN_OUTPUT_DIR}\n`);
console.log('Fetching RSS feed...');
const xml = await httpGet(RSS_URL);
const items = parseItems(xml);
console.log(`Found ${items.length} items in feed`);
const state = loadState();
const knownGuids = new Set(state.lastSyncGuids || []);
let newCount = 0;
for (const item of items) {
if (knownGuids.has(item.guid)) continue;
const cat = CATEGORY_MAP.find(c => c.pattern.test(item.title));
if (!cat) { console.log(` Skip (unknown category): ${item.title}`); continue; }
const filePath = path.join(DOUBAN_OUTPUT_DIR, cat.file);
if (isAlreadyInFile(filePath, item.link)) { console.log(` Skip (exists): ${item.title}`); continue; }
console.log(` + ${item.title}${cat.file}`);
appendToCsv(filePath, item, cat.status);
newCount++;
}
state.lastSyncGuids = items.map(i => i.guid);
state.lastSync = new Date().toISOString();
saveState(state);
console.log(`\nDone. ${newCount} new entries added.`);
}
main().catch(err => { console.error('Error:', err.message); process.exit(1); });