feat: add douban-skill + enhance skill-creator with development methodology
New skill: douban-skill - Full export of Douban (豆瓣) book/movie/music/game collections via Frodo API - RSS incremental sync for daily updates - Python stdlib only, zero dependencies, cross-platform (macOS/Windows/Linux) - Documented 7 failed approaches (PoW anti-scraping) and why Frodo API is the only working solution - Pre-flight user validation, KeyboardInterrupt handling, pagination bug fix skill-creator enhancements: - Add development methodology reference (8-phase process with prior art research, counter review, and real failure case studies) - Sync upstream changes: improve_description.py now uses `claude -p` instead of Anthropic SDK (no ANTHROPIC_API_KEY needed), remove stale "extended thinking" ref - Add "Updating an existing skill" guidance to Claude.ai and Cowork sections - Restore test case heuristic guidance for objective vs subjective skills README updates: - Document fork advantages vs upstream with quality comparison table (65 vs 42) - Bilingual (EN + ZH-CN) with consistent content Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
131
douban-skill/SKILL.md
Normal file
131
douban-skill/SKILL.md
Normal file
@@ -0,0 +1,131 @@
|
||||
---
|
||||
name: douban-skill
|
||||
description: >
|
||||
Export and sync Douban (豆瓣) book/movie/music/game collections to local CSV files via Frodo API.
|
||||
Supports full export (all history) and RSS incremental sync (recent items).
|
||||
Use when the user wants to export Douban reading/watching/listening/gaming history,
|
||||
back up their Douban data, set up incremental sync, or mentions 豆瓣/douban collections.
|
||||
Triggers on: 豆瓣, douban, 读书记录, 观影记录, 书影音, 导出豆瓣, export, backup, sync, collection.
|
||||
---
|
||||
|
||||
# Douban Collection Export
|
||||
|
||||
Export Douban user collections (books, movies, music, games) to CSV files.
|
||||
Douban has no official data export; the official API shut down in 2018.
|
||||
|
||||
## What This Skill Can Do
|
||||
|
||||
- Full export of all book/movie/music/game collections via Frodo API
|
||||
- RSS incremental sync for daily updates (last ~10 items)
|
||||
- CSV output with UTF-8 BOM (Excel-compatible), cross-platform (macOS/Windows/Linux)
|
||||
- No login, no cookies, no browser required
|
||||
- Pre-flight user ID validation (fail fast on wrong ID)
|
||||
|
||||
## What This Skill Cannot Do
|
||||
|
||||
- Cannot export reviews (长评), notes (读书笔记), or broadcasts (广播)
|
||||
- Cannot filter by single category in one run (exports all 4 types together)
|
||||
- Cannot access private profiles (returns 0 items silently)
|
||||
|
||||
## Why Frodo API (Do NOT Use Web Scraping)
|
||||
|
||||
Douban uses PoW (Proof of Work) challenges on web pages, blocking all HTTP scraping.
|
||||
We tested 7 approaches — only the Frodo API works. **Do NOT attempt** web scraping,
|
||||
`browser_cookie3`+`requests`, `curl` with cookies, or Jina Reader.
|
||||
|
||||
See [references/troubleshooting.md](references/troubleshooting.md) for the complete
|
||||
failure log of all 7 tested approaches and why each failed.
|
||||
|
||||
## Security & Privacy
|
||||
|
||||
The API key and HMAC secret in the script are Douban's **public mobile app credentials**,
|
||||
extracted from the APK. They are shared by all Douban app users and do not identify you.
|
||||
No personal credentials are used or stored. Data is fetched only from `frodo.douban.com`.
|
||||
|
||||
## Full Export (Primary Method)
|
||||
|
||||
```bash
|
||||
DOUBAN_USER=<user_id> python3 scripts/douban-frodo-export.py
|
||||
```
|
||||
|
||||
**Finding the user ID:** Profile URL `douban.com/people/<ID>/` — the ID is after `/people/`.
|
||||
If the user provides a full URL, the script auto-extracts the ID.
|
||||
|
||||
**Environment variables:**
|
||||
- `DOUBAN_USER` (required): Douban user ID (alphanumeric or numeric, or full profile URL)
|
||||
- `DOUBAN_OUTPUT_DIR` (optional): Override output directory
|
||||
|
||||
**Default output** (auto-detected per platform):
|
||||
- macOS: `~/Downloads/douban-sync/<user_id>/`
|
||||
- Windows: `%USERPROFILE%\Downloads\douban-sync\<user_id>\`
|
||||
- Linux: `~/Downloads/douban-sync/<user_id>/`
|
||||
|
||||
**Dependencies:** Python 3.6+ standard library only (works with `python3` or `uv run`).
|
||||
|
||||
**Example console output:**
|
||||
```
|
||||
Douban Export for user: your_douban_id
|
||||
Output directory: /Users/you/Downloads/douban-sync/your_douban_id
|
||||
|
||||
=== 读过 (book) ===
|
||||
Total: 639
|
||||
Fetched 0-50 (50/639)
|
||||
Fetched 50-100 (100/639)
|
||||
...
|
||||
Fetched 597-639 (639/639)
|
||||
Collected: 639
|
||||
|
||||
=== 在读 (book) ===
|
||||
Total: 75
|
||||
...
|
||||
|
||||
--- Writing CSV files ---
|
||||
书.csv: 996 rows
|
||||
影视.csv: 238 rows
|
||||
音乐.csv: 0 rows
|
||||
游戏.csv: 0 rows
|
||||
|
||||
Done! 1234 total items exported to /Users/you/Downloads/douban-sync/your_douban_id
|
||||
```
|
||||
|
||||
## RSS Incremental Sync (Complementary)
|
||||
|
||||
```bash
|
||||
DOUBAN_USER=<user_id> node scripts/douban-rss-sync.mjs
|
||||
```
|
||||
|
||||
RSS returns only the latest ~10 items (no pagination). Use Full Export first, then RSS for daily updates.
|
||||
|
||||
## Output Format
|
||||
|
||||
Four CSV files per user:
|
||||
|
||||
```
|
||||
Downloads/douban-sync/<user_id>/
|
||||
├── 书.csv (读过 + 在读 + 想读)
|
||||
├── 影视.csv (看过 + 在看 + 想看)
|
||||
├── 音乐.csv (听过 + 在听 + 想听)
|
||||
└── 游戏.csv (玩过 + 在玩 + 想玩)
|
||||
```
|
||||
|
||||
Columns: `title, url, date, rating, status, comment`
|
||||
- `rating`: ★ to ★★★★★ (empty if unrated)
|
||||
- `date`: YYYY-MM-DD (when the user marked it)
|
||||
- Safe to run multiple times (overwrites with fresh data)
|
||||
- Row counts may be slightly below Douban's displayed count due to delisted items
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Ask for Douban user ID (from profile URL, or accept full URL)
|
||||
2. Run: `DOUBAN_USER=<id> python3 scripts/douban-frodo-export.py`
|
||||
3. Verify: row counts in console output should match, check with `wc -l <output_dir>/*.csv`
|
||||
4. (Optional) Set up RSS sync for daily incremental updates
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
See [references/troubleshooting.md](references/troubleshooting.md) for:
|
||||
- Frodo API auth details (HMAC-SHA1 signature computation)
|
||||
- Common errors (code 996 signature error, rate limits, pagination quirks)
|
||||
- Complete failure log of all 7 tested approaches with root causes
|
||||
- Alternative approaches (豆伴 extension, Tampermonkey script, browser console)
|
||||
- API endpoint reference with response format
|
||||
265
douban-skill/references/troubleshooting.md
Normal file
265
douban-skill/references/troubleshooting.md
Normal file
@@ -0,0 +1,265 @@
|
||||
# Troubleshooting & Technical Reference
|
||||
|
||||
## How Frodo API Auth Works
|
||||
|
||||
The Frodo API is Douban's mobile app backend at `frodo.douban.com`. It uses HMAC-SHA1 signature
|
||||
authentication instead of the PoW challenges used on web pages.
|
||||
|
||||
**Signature computation:**
|
||||
1. Build raw string: `GET` + `&` + URL-encoded(**path only**) + `&` + timestamp
|
||||
2. HMAC-SHA1 with secret key `bf7dddc7c9cfe6f7`
|
||||
3. Base64-encode the result → this is `_sig`
|
||||
|
||||
**Critical:** Sign only the URL **path** (e.g., `/api/v2/user/xxx/interests`), never the
|
||||
full URL with query parameters. This was our first signature error — code 996.
|
||||
|
||||
**Required query parameters:**
|
||||
- `apiKey`: `0dad551ec0f84ed02907ff5c42e8ec70` (Douban mobile app's public API key)
|
||||
- `_ts`: Unix timestamp in **seconds** (string)
|
||||
- `_sig`: The computed HMAC-SHA1 signature
|
||||
- `os_rom`: `android`
|
||||
|
||||
**Required headers:**
|
||||
- `User-Agent`: Must look like a Douban Android app client string
|
||||
|
||||
**Python implementation:**
|
||||
```python
|
||||
import hmac, hashlib, base64, urllib.parse
|
||||
|
||||
def compute_signature(url_path, timestamp):
|
||||
raw = '&'.join(['GET', urllib.parse.quote(url_path, safe=''), timestamp])
|
||||
sig = hmac.new(b'bf7dddc7c9cfe6f7', raw.encode(), hashlib.sha1)
|
||||
return base64.b64encode(sig.digest()).decode()
|
||||
```
|
||||
|
||||
## Common Errors
|
||||
|
||||
### Signature Error (code 996)
|
||||
|
||||
```json
|
||||
{"msg": "invalid_request_996", "code": 996}
|
||||
```
|
||||
|
||||
**Cause:** The `_sig` parameter doesn't match the expected value.
|
||||
|
||||
**Debug checklist:**
|
||||
1. Are you signing only the **path**, not the full URL with query params?
|
||||
2. Does `_ts` in the signature match `_ts` in the query params exactly?
|
||||
3. Is `_ts` a string of Unix seconds (not milliseconds)?
|
||||
4. Are you using `urllib.parse.quote(path, safe='')` (encoding `/` as `%2F`)?
|
||||
|
||||
### Pagination Returns Fewer Items Than Expected
|
||||
|
||||
Some pages return fewer than the requested `count` (e.g., 48 instead of 50). This happens
|
||||
when items have been delisted from Douban's catalog but still count toward the total.
|
||||
|
||||
**This was our biggest silent bug.** The first version of the export script used
|
||||
`len(page_items) < count_per_page` as the stop condition. Result: only 499 out of 639
|
||||
books were exported, with no error message. The fix:
|
||||
|
||||
```python
|
||||
# WRONG: stops early when a page has fewer items due to delisted content
|
||||
if len(interests) < count_per_page:
|
||||
break
|
||||
|
||||
# CORRECT: check against the total count reported by the API
|
||||
if len(all_items) >= total:
|
||||
break
|
||||
start += len(interests) # advance by actual count, not page_size
|
||||
```
|
||||
|
||||
### Rating Scale Confusion
|
||||
|
||||
The Frodo API returns **two different ratings** per item:
|
||||
|
||||
| Field | Scale | Meaning |
|
||||
|-------|-------|---------|
|
||||
| `interest.rating` | `{value: 1-5, max: 5}` | **User's personal rating** |
|
||||
| `subject.rating` | `{value: 0-10, max: 10}` | Douban community average |
|
||||
|
||||
Our first version divided all values by 2, which halved the user's rating (2 stars → 1 star).
|
||||
The fix: check `max` field to determine scale.
|
||||
|
||||
```python
|
||||
# Correct conversion
|
||||
if max_val <= 5:
|
||||
stars = int(val) # value is already 1-5
|
||||
else:
|
||||
stars = int(val / 2) # value is 2-10, convert to 1-5
|
||||
```
|
||||
|
||||
### HTTP 403 / Rate Limiting
|
||||
|
||||
The Frodo API is generally tolerant, but excessive requests may trigger rate limiting.
|
||||
|
||||
**Tested intervals:**
|
||||
- 1.5s between pages + 2s between categories: 1234 items exported without issues
|
||||
- 0s (no delay): Not tested, not recommended
|
||||
|
||||
If you hit 403, increase delays to 3s/5s and retry after a few minutes.
|
||||
|
||||
## Detailed Failure Log: All 7 Tested Approaches
|
||||
|
||||
### Approach 1: `requests` + `browser_cookie3` (Python)
|
||||
|
||||
**What we tried:** Extract Chrome cookies via `browser_cookie3`, use `requests` with those cookies.
|
||||
|
||||
**What happened:**
|
||||
1. First request succeeded — we saw "639 books" in the page title
|
||||
2. Subsequent requests returned "禁止访问" (Forbidden) page
|
||||
3. The HTML contained no items despite HTTP 200 status
|
||||
|
||||
**Root cause:** Douban's PoW challenge. The first request sometimes passes (cached/grace period),
|
||||
but subsequent requests trigger the PoW redirect to `sec.douban.com`. Python `requests` cannot
|
||||
execute the SHA-512 proof-of-work JavaScript.
|
||||
|
||||
### Approach 2: `curl` with browser cookies
|
||||
|
||||
**What we tried:** Export cookies from Chrome, use `curl` with full browser headers (User-Agent,
|
||||
Accept, Referer, Accept-Language).
|
||||
|
||||
**What happened:** HTTP 302 redirect to `https://www.douban.com/misc/sorry?original-url=...`
|
||||
|
||||
**Root cause:** Same PoW issue. Even with `NO_PROXY` set to bypass local proxy, the IP was
|
||||
already rate-limited from approach 1's requests.
|
||||
|
||||
### Approach 3: Jina Reader (`r.jina.ai`)
|
||||
|
||||
**What we tried:** `curl -s "https://r.jina.ai/https://book.douban.com/people/<user_id>/collect"`
|
||||
|
||||
**What happened:** HTTP 200 but content was "403 Forbidden" — Jina's server got blocked.
|
||||
|
||||
**Root cause:** Jina's scraping infrastructure also cannot solve Douban's PoW challenges.
|
||||
|
||||
### Approach 4: Chrome DevTools MCP (Playwright browser)
|
||||
|
||||
**What we tried:** Navigate to Douban pages in the Playwright browser via Chrome DevTools MCP.
|
||||
Injected cookies via `document.cookie` in evaluate_script.
|
||||
|
||||
**What happened:**
|
||||
1. `mcp__chrome-devtools__navigate_page` → page title was "403 Forbidden"
|
||||
2. After cookie injection → still redirected to `/misc/sorry`
|
||||
|
||||
**Root cause:** The Chrome DevTools MCP connects to a Playwright browser instance, not the
|
||||
user's actual Chrome. Even after injecting cookies, the IP was already banned from earlier
|
||||
requests. Also, HttpOnly cookies (like `dbcl2`) can't be set via `document.cookie`.
|
||||
|
||||
### Approach 5: `opencli douban marks`
|
||||
|
||||
**What we tried:** `opencli douban marks --uid <user_id> --status all --limit 0 -f csv`
|
||||
|
||||
**What happened:** **Partial success** — exported 24 movie records successfully.
|
||||
|
||||
**Limitation:** `opencli douban` only implements `marks` (movies). No book/music/game support.
|
||||
The `opencli generate` and `opencli cascade` commands failed to discover APIs for
|
||||
`book.douban.com` because Douban books use server-rendered HTML with no discoverable API.
|
||||
|
||||
### Approach 6: Agent Reach
|
||||
|
||||
**What we tried:** Installed `agent-reach` (17-platform CLI tool). Checked for Douban support.
|
||||
|
||||
**What happened:** Agent Reach has no Douban channel. Its web reader (Jina) also gets 403.
|
||||
|
||||
### Approach 7: Node.js HTTP scraper (from douban-sync-skill)
|
||||
|
||||
**What we tried:** The `douban-scraper.mjs` from the cosformula/douban-sync-skill.
|
||||
|
||||
**Status:** User rejected the command before it ran — based on prior failures, it would hit
|
||||
the same PoW blocking. The script uses `fetch()` with a fake User-Agent, which is exactly
|
||||
what approaches 1-3 proved doesn't work.
|
||||
|
||||
## Alternative Approaches (Not Blocked)
|
||||
|
||||
These approaches work but have different tradeoffs compared to the Frodo API:
|
||||
|
||||
### 豆伴 (Tofu) Chrome Extension (605 stars)
|
||||
|
||||
- GitHub: `doufen-org/tofu`
|
||||
- Uses Douban's **Rexxar API** (`m.douban.com/rexxar/api/v2/user/{uid}/interests`)
|
||||
- Most comprehensive: backs up books, movies, music, games, reviews, notes, photos, etc.
|
||||
- **Current status (April 2026):** Mainline v0.12.x is broken due to MV3 migration + anti-scraping.
|
||||
PR #121 (v0.13.0) fixes both issues but is not yet merged.
|
||||
- **Risk:** Makes many API calls as logged-in user → may trigger account lockout
|
||||
|
||||
### Tampermonkey Userscript (bambooom/douban-backup, 162 stars)
|
||||
|
||||
- Greasemonkey/Tampermonkey: `https://greasyfork.org/en/scripts/420999`
|
||||
- Runs inside real browser → inherits PoW-solved session
|
||||
- Adds "export" button on collection pages → auto-paginates → downloads CSV
|
||||
- Suitable for one-time manual export
|
||||
|
||||
### Browser Console Script (built into old skill)
|
||||
|
||||
- Paste `fetch()`-based extraction script into browser DevTools console
|
||||
- Zero blocking risk (same-origin request from authenticated session)
|
||||
- Most manual approach — user must paste script and copy clipboard
|
||||
|
||||
## API Endpoint Reference
|
||||
|
||||
### User Interests (Collections)
|
||||
|
||||
```
|
||||
GET https://frodo.douban.com/api/v2/user/{user_id}/interests
|
||||
?type={book|movie|music|game}
|
||||
&status={done|doing|mark}
|
||||
&start={offset}
|
||||
&count={page_size, max 50}
|
||||
&apiKey=0dad551ec0f84ed02907ff5c42e8ec70
|
||||
&_ts={unix_timestamp_seconds}
|
||||
&_sig={hmac_sha1_signature}
|
||||
&os_rom=android
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"count": 50,
|
||||
"start": 0,
|
||||
"total": 639,
|
||||
"interests": [
|
||||
{
|
||||
"comment": "短评文本",
|
||||
"rating": {"value": 4, "max": 5, "star_count": 4.0},
|
||||
"create_time": "2026-03-21 18:23:10",
|
||||
"status": "done",
|
||||
"id": 4799352304,
|
||||
"subject": {
|
||||
"id": "36116375",
|
||||
"title": "书名",
|
||||
"url": "https://book.douban.com/subject/36116375/",
|
||||
"rating": {"value": 7.8, "max": 10, "count": 14}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Important distinctions:**
|
||||
- `interest.rating` = user's personal rating (max 5)
|
||||
- `subject.rating` = Douban community average (max 10)
|
||||
- `interest.create_time` = when the user marked it (not the item's publish date)
|
||||
- `status`: `done` = 读过/看过/听过/玩过, `doing` = 在读/在看/在听/在玩, `mark` = 想读/想看/想听/想玩
|
||||
|
||||
### Other Known Frodo Endpoints (Not Used by This Skill)
|
||||
|
||||
| Endpoint | Returns |
|
||||
|----------|---------|
|
||||
| `/api/v2/book/{id}` | Book detail |
|
||||
| `/api/v2/movie/{id}` | Movie detail |
|
||||
| `/api/v2/group/{id}/topics` | Group discussion topics |
|
||||
| `/api/v2/group/topic/{id}` | Single topic with comments |
|
||||
| `/api/v2/subject_collection/{type}/items` | Douban curated lists |
|
||||
|
||||
### Mouban Proxy Service (Third-Party)
|
||||
|
||||
`mouban.mythsman.com` is a Go service that pre-crawls Douban data. If a user has been indexed,
|
||||
it returns data instantly without hitting Douban directly. Endpoints:
|
||||
|
||||
| Endpoint | Returns |
|
||||
|----------|---------|
|
||||
| `GET /guest/check_user?id={douban_id}` | User profile + counts |
|
||||
| `GET /guest/user_book?id={id}&action={wish\|do\|collect}` | Book entries |
|
||||
| `GET /guest/user_movie?id={id}&action=...` | Movie entries |
|
||||
|
||||
**Caveat:** Data freshness depends on when the service last crawled the user. First request
|
||||
for a new user triggers a background crawl (takes minutes to hours). Third-party dependency.
|
||||
329
douban-skill/scripts/douban-frodo-export.py
Normal file
329
douban-skill/scripts/douban-frodo-export.py
Normal file
@@ -0,0 +1,329 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Douban Collection Full Export via Frodo API (Mobile App Backend)
|
||||
|
||||
Exports all book/movie/music/game collections to CSV files.
|
||||
No login or cookies required — uses HMAC-SHA1 signature auth.
|
||||
|
||||
The API key and HMAC secret are Douban's mobile app credentials, extracted from
|
||||
the public APK. They are the same for all users and do not identify you. No
|
||||
personal credentials are used or stored. Data is fetched only from frodo.douban.com.
|
||||
|
||||
Usage:
|
||||
DOUBAN_USER=<user_id> python3 douban-frodo-export.py
|
||||
DOUBAN_USER=<user_id> DOUBAN_OUTPUT_DIR=/custom/path python3 douban-frodo-export.py
|
||||
|
||||
Environment:
|
||||
DOUBAN_USER (required): Douban user ID from profile URL
|
||||
DOUBAN_OUTPUT_DIR (optional): Override output directory
|
||||
"""
|
||||
|
||||
import hmac
|
||||
import hashlib
|
||||
import base64
|
||||
import csv
|
||||
import json
|
||||
import os
|
||||
import platform
|
||||
import re
|
||||
import socket
|
||||
import sys
|
||||
import time
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
# --- Frodo API Auth ---
|
||||
# Public credentials from the Douban Android APK, shared by all app users.
|
||||
API_KEY = '0dad551ec0f84ed02907ff5c42e8ec70'
|
||||
HMAC_SECRET = b'bf7dddc7c9cfe6f7'
|
||||
BASE_URL = 'https://frodo.douban.com'
|
||||
USER_AGENT = (
|
||||
'api-client/1 com.douban.frodo/7.22.0.beta9(231) Android/23 '
|
||||
'product/Mate40 vendor/HUAWEI model/Mate40 brand/HUAWEI '
|
||||
'rom/android network/wifi platform/AndroidPad'
|
||||
)
|
||||
|
||||
# --- Rate Limiting ---
|
||||
# 1.5s between pages, 2s between categories. Tested with 1200+ items.
|
||||
PAGE_DELAY = 1.5
|
||||
CATEGORY_DELAY = 2.0
|
||||
ITEMS_PER_PAGE = 50
|
||||
MAX_PAGES_SAFETY = 500 # Guard against infinite pagination loops
|
||||
|
||||
# --- Category Definitions ---
|
||||
CATEGORIES = [
|
||||
('book', 'done', '读过', '书.csv'),
|
||||
('book', 'doing', '在读', '书.csv'),
|
||||
('book', 'mark', '想读', '书.csv'),
|
||||
('movie', 'done', '看过', '影视.csv'),
|
||||
('movie', 'doing', '在看', '影视.csv'),
|
||||
('movie', 'mark', '想看', '影视.csv'),
|
||||
('music', 'done', '听过', '音乐.csv'),
|
||||
('music', 'doing', '在听', '音乐.csv'),
|
||||
('music', 'mark', '想听', '音乐.csv'),
|
||||
('game', 'done', '玩过', '游戏.csv'),
|
||||
('game', 'doing', '在玩', '游戏.csv'),
|
||||
('game', 'mark', '想玩', '游戏.csv'),
|
||||
]
|
||||
|
||||
URL_PREFIX = {
|
||||
'book': 'https://book.douban.com/subject/',
|
||||
'movie': 'https://movie.douban.com/subject/',
|
||||
'music': 'https://music.douban.com/subject/',
|
||||
'game': 'https://www.douban.com/game/',
|
||||
}
|
||||
|
||||
CSV_FIELDS = ['title', 'url', 'date', 'rating', 'status', 'comment']
|
||||
|
||||
|
||||
def get_download_dir():
|
||||
"""Get the platform-appropriate Downloads directory."""
|
||||
system = platform.system()
|
||||
if system == 'Darwin':
|
||||
return os.path.expanduser('~/Downloads')
|
||||
elif system == 'Windows':
|
||||
return os.path.join(os.environ.get('USERPROFILE', os.path.expanduser('~')), 'Downloads')
|
||||
else:
|
||||
return os.path.expanduser('~/Downloads')
|
||||
|
||||
|
||||
def get_output_dir(user_id):
|
||||
"""Determine output directory from env or platform default."""
|
||||
base = os.environ.get('DOUBAN_OUTPUT_DIR')
|
||||
if not base:
|
||||
base = os.path.join(get_download_dir(), 'douban-sync')
|
||||
return os.path.join(base, user_id)
|
||||
|
||||
|
||||
def compute_signature(url_path, timestamp):
|
||||
"""Compute Frodo API HMAC-SHA1 signature.
|
||||
|
||||
Signs: METHOD & url_encoded_path & timestamp (path only, no query params).
|
||||
"""
|
||||
raw = '&'.join(['GET', urllib.parse.quote(url_path, safe=''), timestamp])
|
||||
sig = hmac.new(HMAC_SECRET, raw.encode(), hashlib.sha1)
|
||||
return base64.b64encode(sig.digest()).decode()
|
||||
|
||||
|
||||
def fetch_json(url, params):
|
||||
"""Make an authenticated GET request to the Frodo API.
|
||||
|
||||
Returns (data_dict, status_code). Catches HTTP errors, network errors,
|
||||
and timeouts — all return a synthetic error dict so the caller can retry.
|
||||
"""
|
||||
query = urllib.parse.urlencode(params)
|
||||
full_url = f'{url}?{query}'
|
||||
req = urllib.request.Request(full_url, headers={'User-Agent': USER_AGENT})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=15) as resp:
|
||||
return json.loads(resp.read().decode('utf-8')), resp.status
|
||||
except urllib.error.HTTPError as e:
|
||||
body = e.read().decode('utf-8', errors='replace')[:200]
|
||||
return {'error': body, 'code': e.code}, e.code
|
||||
except urllib.error.URLError as e:
|
||||
return {'error': f'Network error: {e.reason}'}, 0
|
||||
except socket.timeout:
|
||||
return {'error': 'Request timed out'}, 0
|
||||
except json.JSONDecodeError as e:
|
||||
return {'error': f'Invalid JSON response: {e}'}, 0
|
||||
|
||||
|
||||
def preflight_check(user_id):
|
||||
"""Verify user exists by fetching one page of book interests.
|
||||
|
||||
Returns True if the user has any data, False if the user ID appears invalid.
|
||||
Prints a warning and continues if the check itself fails (network issue).
|
||||
"""
|
||||
api_path = f'/api/v2/user/{user_id}/interests'
|
||||
ts = str(int(time.time()))
|
||||
sig = compute_signature(api_path, ts)
|
||||
params = {
|
||||
'type': 'book', 'status': 'done', 'start': 0, 'count': 1,
|
||||
'apiKey': API_KEY, '_ts': ts, '_sig': sig, 'os_rom': 'android',
|
||||
}
|
||||
data, code = fetch_json(f'{BASE_URL}{api_path}', params)
|
||||
if code == 0:
|
||||
print(f'Warning: Could not verify user ID (network issue). Proceeding anyway.')
|
||||
return True
|
||||
if code != 200:
|
||||
print(f'Error: API returned HTTP {code} for user "{user_id}".')
|
||||
print(f' Check that the user ID is correct (from douban.com/people/<ID>/).')
|
||||
return False
|
||||
total = data.get('total', -1)
|
||||
if total == -1:
|
||||
print(f'Warning: Unexpected API response. Proceeding anyway.')
|
||||
return True
|
||||
return True
|
||||
|
||||
|
||||
def fetch_all_interests(user_id, type_name, status):
|
||||
"""Fetch all items for a given type+status combination.
|
||||
|
||||
Paginates through the API, checking against the reported total
|
||||
(not page size) to handle pages with fewer items due to delisted content.
|
||||
"""
|
||||
api_path = f'/api/v2/user/{user_id}/interests'
|
||||
all_items = []
|
||||
start = 0
|
||||
total = None
|
||||
retries = 0
|
||||
max_retries = 3
|
||||
page_count = 0
|
||||
|
||||
while page_count < MAX_PAGES_SAFETY:
|
||||
page_count += 1
|
||||
ts = str(int(time.time()))
|
||||
sig = compute_signature(api_path, ts)
|
||||
params = {
|
||||
'type': type_name, 'status': status,
|
||||
'start': start, 'count': ITEMS_PER_PAGE,
|
||||
'apiKey': API_KEY, '_ts': ts, '_sig': sig, 'os_rom': 'android',
|
||||
}
|
||||
|
||||
data, status_code = fetch_json(f'{BASE_URL}{api_path}', params)
|
||||
|
||||
if status_code != 200:
|
||||
retries += 1
|
||||
if retries > max_retries:
|
||||
print(f' Error: HTTP {status_code} after {max_retries} retries, stopping.')
|
||||
print(f' See references/troubleshooting.md for common errors.')
|
||||
break
|
||||
delay = 5 * (2 ** (retries - 1))
|
||||
print(f' HTTP {status_code}, retry {retries}/{max_retries}, waiting {delay}s...')
|
||||
time.sleep(delay)
|
||||
continue
|
||||
|
||||
retries = 0
|
||||
|
||||
if total is None:
|
||||
total = data.get('total', 0)
|
||||
if total == 0:
|
||||
return []
|
||||
print(f' Total: {total}')
|
||||
|
||||
interests = data.get('interests', [])
|
||||
if not interests:
|
||||
break
|
||||
|
||||
all_items.extend(interests)
|
||||
print(f' Fetched {start}-{start + len(interests)} ({len(all_items)}/{total})')
|
||||
|
||||
if len(all_items) >= total:
|
||||
break
|
||||
start += len(interests)
|
||||
time.sleep(PAGE_DELAY)
|
||||
|
||||
return all_items
|
||||
|
||||
|
||||
def extract_rating(interest):
|
||||
"""Convert Frodo API rating to star string.
|
||||
|
||||
Frodo returns {value: N, max: 5} where N is 1-5.
|
||||
Some older entries may use max=10 scale (value 2-10).
|
||||
API values are typically integers; round() handles any edge cases.
|
||||
"""
|
||||
r = interest.get('rating')
|
||||
if not r or not isinstance(r, dict):
|
||||
return ''
|
||||
val = r.get('value', 0)
|
||||
max_val = r.get('max', 5)
|
||||
if not val:
|
||||
return ''
|
||||
stars = round(val) if max_val <= 5 else round(val / 2)
|
||||
return '★' * max(0, min(5, stars))
|
||||
|
||||
|
||||
def interest_to_row(interest, type_name, status_cn):
|
||||
"""Convert a single Frodo API interest object to a CSV row dict."""
|
||||
subject = interest.get('subject', {})
|
||||
sid = subject.get('id', '')
|
||||
prefix = URL_PREFIX.get(type_name, 'https://www.douban.com/subject/')
|
||||
url = f'{prefix}{sid}/' if sid else subject.get('url', '')
|
||||
|
||||
date_raw = interest.get('create_time', '') or ''
|
||||
date = date_raw[:10] if re.match(r'\d{4}-\d{2}-\d{2}', date_raw) else ''
|
||||
|
||||
return {
|
||||
'title': subject.get('title', ''),
|
||||
'url': url,
|
||||
'date': date,
|
||||
'rating': extract_rating(interest),
|
||||
'status': status_cn,
|
||||
'comment': interest.get('comment', ''),
|
||||
}
|
||||
|
||||
|
||||
def write_csv(filepath, rows):
|
||||
"""Write rows to a CSV file with UTF-8 BOM for Excel compatibility."""
|
||||
with open(filepath, 'w', newline='', encoding='utf-8-sig') as f:
|
||||
writer = csv.DictWriter(f, fieldnames=CSV_FIELDS)
|
||||
writer.writeheader()
|
||||
writer.writerows(rows)
|
||||
|
||||
|
||||
def main():
|
||||
user_id = os.environ.get('DOUBAN_USER', '').strip()
|
||||
if not user_id:
|
||||
print('Error: DOUBAN_USER environment variable is required')
|
||||
print('Usage: DOUBAN_USER=<your_douban_id> python3 douban-frodo-export.py')
|
||||
sys.exit(1)
|
||||
|
||||
# Extract ID from URL if user pasted a full profile URL
|
||||
url_match = re.search(r'douban\.com/people/([A-Za-z0-9._-]+)', user_id)
|
||||
if url_match:
|
||||
user_id = url_match.group(1)
|
||||
|
||||
if not re.match(r'^[A-Za-z0-9._-]+$', user_id):
|
||||
print(f'Error: DOUBAN_USER contains invalid characters: {user_id}')
|
||||
sys.exit(1)
|
||||
|
||||
output_dir = get_output_dir(user_id)
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
print(f'Douban Export for user: {user_id}')
|
||||
print(f'Output directory: {output_dir}\n')
|
||||
|
||||
# Pre-flight: verify user ID is valid before spending time on full export
|
||||
if not preflight_check(user_id):
|
||||
sys.exit(1)
|
||||
|
||||
# Collect data grouped by output file, then write all at the end.
|
||||
file_data = {}
|
||||
grand_total = 0
|
||||
|
||||
for type_name, status, status_cn, outfile in CATEGORIES:
|
||||
print(f'=== {status_cn} ({type_name}) ===')
|
||||
items = fetch_all_interests(user_id, type_name, status)
|
||||
|
||||
if outfile not in file_data:
|
||||
file_data[outfile] = []
|
||||
|
||||
for item in items:
|
||||
file_data[outfile].append(interest_to_row(item, type_name, status_cn))
|
||||
|
||||
count = len(items)
|
||||
grand_total += count
|
||||
if count > 0:
|
||||
print(f' Collected: {count}\n')
|
||||
else:
|
||||
print(f' (empty)\n')
|
||||
|
||||
time.sleep(CATEGORY_DELAY)
|
||||
|
||||
# Write CSV files
|
||||
print('--- Writing CSV files ---')
|
||||
for filename, rows in file_data.items():
|
||||
filepath = os.path.join(output_dir, filename)
|
||||
write_csv(filepath, rows)
|
||||
print(f' {filename}: {len(rows)} rows')
|
||||
|
||||
print(f'\nDone! {grand_total} total items exported to {output_dir}')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
try:
|
||||
main()
|
||||
except KeyboardInterrupt:
|
||||
print('\n\nExport interrupted by user.')
|
||||
sys.exit(130)
|
||||
190
douban-skill/scripts/douban-rss-sync.mjs
Normal file
190
douban-skill/scripts/douban-rss-sync.mjs
Normal file
@@ -0,0 +1,190 @@
|
||||
#!/usr/bin/env node
|
||||
/**
|
||||
* Douban RSS → CSV incremental sync
|
||||
*
|
||||
* Pulls the public RSS feed, parses new entries, appends to CSV files.
|
||||
* No login required. Returns only the ~10 most recent items.
|
||||
* Best used for daily sync after a full Frodo API export.
|
||||
*
|
||||
* Usage:
|
||||
* DOUBAN_USER=<user_id> node douban-rss-sync.mjs
|
||||
*
|
||||
* Environment:
|
||||
* DOUBAN_USER (required): Douban user ID
|
||||
* DOUBAN_OUTPUT_DIR (optional): Override output directory
|
||||
*/
|
||||
|
||||
import https from 'node:https';
|
||||
import http from 'node:http';
|
||||
import fs from 'node:fs';
|
||||
import path from 'node:path';
|
||||
import os from 'node:os';
|
||||
|
||||
let DOUBAN_USER = process.env.DOUBAN_USER;
|
||||
if (!DOUBAN_USER) { console.error('Error: DOUBAN_USER env var is required'); process.exit(1); }
|
||||
// Extract ID from full URL if provided (e.g., https://www.douban.com/people/foo/)
|
||||
const urlMatch = DOUBAN_USER.match(/douban\.com\/people\/([A-Za-z0-9._-]+)/);
|
||||
if (urlMatch) DOUBAN_USER = urlMatch[1];
|
||||
if (!/^[A-Za-z0-9._-]+$/.test(DOUBAN_USER)) { console.error('Error: DOUBAN_USER contains invalid characters'); process.exit(1); }
|
||||
|
||||
function getDownloadDir() {
|
||||
if (process.platform === 'win32') {
|
||||
return path.join(process.env.USERPROFILE || os.homedir(), 'Downloads');
|
||||
}
|
||||
return path.join(os.homedir(), 'Downloads');
|
||||
}
|
||||
|
||||
const BASE_DIR = process.env.DOUBAN_OUTPUT_DIR || path.join(getDownloadDir(), 'douban-sync');
|
||||
const DOUBAN_OUTPUT_DIR = path.join(BASE_DIR, DOUBAN_USER);
|
||||
const STATE_FILE = path.join(DOUBAN_OUTPUT_DIR, '.douban-rss-state.json');
|
||||
const RSS_URL = `https://www.douban.com/feed/people/${DOUBAN_USER}/interests`;
|
||||
|
||||
const CATEGORY_MAP = [
|
||||
{ pattern: /^读过/, file: '书.csv', status: '读过' },
|
||||
{ pattern: /^(?:在读|最近在读)/, file: '书.csv', status: '在读' },
|
||||
{ pattern: /^想读/, file: '书.csv', status: '想读' },
|
||||
{ pattern: /^看过/, file: '影视.csv', status: '看过' },
|
||||
{ pattern: /^(?:在看|最近在看)/, file: '影视.csv', status: '在看' },
|
||||
{ pattern: /^想看/, file: '影视.csv', status: '想看' },
|
||||
{ pattern: /^听过/, file: '音乐.csv', status: '听过' },
|
||||
{ pattern: /^(?:在听|最近在听)/, file: '音乐.csv', status: '在听' },
|
||||
{ pattern: /^想听/, file: '音乐.csv', status: '想听' },
|
||||
{ pattern: /^玩过/, file: '游戏.csv', status: '玩过' },
|
||||
{ pattern: /^(?:在玩|最近在玩)/, file: '游戏.csv', status: '在玩' },
|
||||
{ pattern: /^想玩/, file: '游戏.csv', status: '想玩' },
|
||||
];
|
||||
|
||||
const CSV_HEADER = '\ufefftitle,url,date,rating,status,comment\n';
|
||||
const RATING_MAP = { '力荐': '★★★★★', '推荐': '★★★★', '还行': '★★★', '较差': '★★', '很差': '★' };
|
||||
|
||||
function httpGet(url, redirects = 0) {
|
||||
if (redirects > 5) return Promise.reject(new Error('Too many redirects'));
|
||||
return new Promise((resolve, reject) => {
|
||||
const mod = url.startsWith('https') ? https : http;
|
||||
const req = mod.get(url, { headers: { 'User-Agent': 'Mozilla/5.0' }, timeout: 15000 }, res => {
|
||||
if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
|
||||
return httpGet(new URL(res.headers.location, url).href, redirects + 1).then(resolve, reject);
|
||||
}
|
||||
if (res.statusCode >= 400) return reject(new Error(`HTTP ${res.statusCode} for ${url}`));
|
||||
let data = '';
|
||||
res.on('data', c => data += c);
|
||||
res.on('end', () => resolve(data));
|
||||
});
|
||||
req.on('error', reject);
|
||||
req.on('timeout', () => { req.destroy(); reject(new Error('Request timeout')); });
|
||||
});
|
||||
}
|
||||
|
||||
function csvEscape(str) {
|
||||
if (!str) return '';
|
||||
if (str.includes(',') || str.includes('"') || str.includes('\n') || str.includes('\r')) {
|
||||
return '"' + str.replace(/"/g, '""') + '"';
|
||||
}
|
||||
return str;
|
||||
}
|
||||
|
||||
function parseItems(xml) {
|
||||
const items = [];
|
||||
const itemRegex = /<item>([\s\S]*?)<\/item>/g;
|
||||
let match;
|
||||
while ((match = itemRegex.exec(xml)) !== null) {
|
||||
const block = match[1];
|
||||
const get = tag => {
|
||||
const m = block.match(new RegExp(`<${tag}[^>]*>(?:<!\\[CDATA\\[)?([\\s\\S]*?)(?:\\]\\]>)?<\\/${tag}>`));
|
||||
return m ? m[1].trim() : '';
|
||||
};
|
||||
const title = get('title');
|
||||
const link = get('link');
|
||||
const guid = get('guid');
|
||||
const pubDate = get('pubDate');
|
||||
const desc = get('description');
|
||||
const ratingMatch = desc.match(/推荐:\s*(力荐|推荐|还行|较差|很差)/);
|
||||
const rating = ratingMatch ? RATING_MAP[ratingMatch[1]] || '' : '';
|
||||
const commentMatch = desc.match(/短评:\s*([^<]+)/);
|
||||
const comment = commentMatch ? commentMatch[1].trim() : '';
|
||||
items.push({ title, link, guid, pubDate, rating, comment });
|
||||
}
|
||||
return items;
|
||||
}
|
||||
|
||||
function loadState() {
|
||||
try { return JSON.parse(fs.readFileSync(STATE_FILE, 'utf8')); }
|
||||
catch { return { lastSyncGuids: [] }; }
|
||||
}
|
||||
|
||||
function saveState(state) { fs.writeFileSync(STATE_FILE, JSON.stringify(state, null, 2)); }
|
||||
|
||||
function extractName(title) {
|
||||
for (const { pattern } of CATEGORY_MAP) {
|
||||
if (pattern.test(title)) return title.replace(pattern, '');
|
||||
}
|
||||
return title;
|
||||
}
|
||||
|
||||
function isAlreadyInFile(filePath, link) {
|
||||
try {
|
||||
const content = fs.readFileSync(filePath, 'utf8');
|
||||
// Exact URL match as CSV field — avoid false positives from substring matches
|
||||
// (e.g., /subject/1234/ matching /subject/12345/)
|
||||
return content.includes(',' + link + ',') ||
|
||||
content.includes(',' + link + '\n') ||
|
||||
content.includes(',' + link + '\r');
|
||||
} catch { return false; }
|
||||
}
|
||||
|
||||
function formatDate(pubDateStr) {
|
||||
try {
|
||||
const direct = pubDateStr.match(/(\d{4}-\d{2}-\d{2})/);
|
||||
if (direct) return direct[1];
|
||||
const d = new Date(pubDateStr);
|
||||
const cst = new Date(d.getTime() + 8 * 3600000);
|
||||
return cst.toISOString().split('T')[0];
|
||||
} catch { return ''; }
|
||||
}
|
||||
|
||||
function ensureCsvFile(filePath) {
|
||||
if (!fs.existsSync(filePath)) {
|
||||
fs.mkdirSync(path.dirname(filePath), { recursive: true });
|
||||
fs.writeFileSync(filePath, CSV_HEADER);
|
||||
}
|
||||
}
|
||||
|
||||
function appendToCsv(filePath, entry, status) {
|
||||
ensureCsvFile(filePath);
|
||||
const name = extractName(entry.title);
|
||||
const date = formatDate(entry.pubDate);
|
||||
const line = [csvEscape(name), csvEscape(entry.link), csvEscape(date),
|
||||
csvEscape(entry.rating), csvEscape(status), csvEscape(entry.comment)].join(',') + '\n';
|
||||
fs.appendFileSync(filePath, line);
|
||||
}
|
||||
|
||||
async function main() {
|
||||
console.log(`Douban RSS Sync for user: ${DOUBAN_USER}`);
|
||||
console.log(`Output: ${DOUBAN_OUTPUT_DIR}\n`);
|
||||
console.log('Fetching RSS feed...');
|
||||
const xml = await httpGet(RSS_URL);
|
||||
const items = parseItems(xml);
|
||||
console.log(`Found ${items.length} items in feed`);
|
||||
|
||||
const state = loadState();
|
||||
const knownGuids = new Set(state.lastSyncGuids || []);
|
||||
let newCount = 0;
|
||||
|
||||
for (const item of items) {
|
||||
if (knownGuids.has(item.guid)) continue;
|
||||
const cat = CATEGORY_MAP.find(c => c.pattern.test(item.title));
|
||||
if (!cat) { console.log(` Skip (unknown category): ${item.title}`); continue; }
|
||||
const filePath = path.join(DOUBAN_OUTPUT_DIR, cat.file);
|
||||
if (isAlreadyInFile(filePath, item.link)) { console.log(` Skip (exists): ${item.title}`); continue; }
|
||||
console.log(` + ${item.title} → ${cat.file}`);
|
||||
appendToCsv(filePath, item, cat.status);
|
||||
newCount++;
|
||||
}
|
||||
|
||||
state.lastSyncGuids = items.map(i => i.guid);
|
||||
state.lastSync = new Date().toISOString();
|
||||
saveState(state);
|
||||
console.log(`\nDone. ${newCount} new entries added.`);
|
||||
}
|
||||
|
||||
main().catch(err => { console.error('Error:', err.message); process.exit(1); });
|
||||
Reference in New Issue
Block a user