Update capture-screen skill docs and versioning

This commit is contained in:
daymade
2026-04-04 14:23:37 +08:00
parent 0715ffb4bd
commit 5538258771
4 changed files with 294 additions and 8 deletions

View File

@@ -844,7 +844,7 @@
"description": "Programmatic screenshot capture on macOS. Get window IDs via Swift CGWindowListCopyWindowInfo, capture specific windows with screencapture -l, and control application windows via AppleScript. Supports multi-shot workflows for capturing different sections of the same window. Use when taking automated screenshots, capturing application windows, or creating visual documentation", "description": "Programmatic screenshot capture on macOS. Get window IDs via Swift CGWindowListCopyWindowInfo, capture specific windows with screencapture -l, and control application windows via AppleScript. Supports multi-shot workflows for capturing different sections of the same window. Use when taking automated screenshots, capturing application windows, or creating visual documentation",
"source": "./", "source": "./",
"strict": false, "strict": false,
"version": "1.0.0", "version": "1.0.1",
"category": "utilities", "category": "utilities",
"keywords": [ "keywords": [
"screenshot", "screenshot",
@@ -947,4 +947,4 @@
] ]
} }
] ]
} }

View File

@@ -227,6 +227,65 @@ These methods were tested and confirmed to fail on macOS:
| Python `import Quartz` (PyObjC) | `ModuleNotFoundError` | PyObjC not installed in system Python; don't attempt to install it — use Swift instead | | Python `import Quartz` (PyObjC) | `ModuleNotFoundError` | PyObjC not installed in system Python; don't attempt to install it — use Swift instead |
| `osascript` window id | Wrong format | Returns AppleScript window index, not CGWindowID needed by `screencapture -l` | | `osascript` window id | Wrong format | Returns AppleScript window index, not CGWindowID needed by `screencapture -l` |
## Permission Troubleshooting
`swift scripts/get_window_id.swift` reads on-screen windows via CoreGraphics, so it needs Screen Recording permission on macOS.
Use this order:
1. Confirm trigger
2. Confirm target identity
3. Add/enable exact app in Settings
If the command fails with `ERROR: Failed to enumerate windows`, do this:
```bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"
```
Or print the same checklist directly from the script:
```bash
swift scripts/get_window_id.swift --permission-hint screen
swift scripts/get_window_id.swift --permission-hint microphone
```
Then:
1. In Privacy & Security → Screen Recording, enable the target app.
2. If your app is missing from the list:
- Ensure you granted permission to the real app bundle (not `swift` / terminal helpers).
- For CLI tools, build/run as a packaged `.app` during permission verification.
- Click `+` and add the `.app` manually from `/Applications`.
3. Re-run the command after restarting the app.
4. If this is a CLI workflow, also check whether the launcher is a helper binary:
- In most cases the entry shown in TCC is the helper process (`swift`, `Terminal`, `iTerm`, etc.), not the business app.
- Permission still works after helper-level grant, but it is not ideal for final UX.
For mic-access-related prompts, use the same pattern with the microphone pane:
```bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Microphone"
```
The same rule still applies: the system can only show permissions for a concrete `.app` bundle. If the request is made by a helper binary, the settings list can be misleading or empty for your product app.
### Quick Check Template
```text
1) Error: permission denied
2) Open target pane
3) Verify identity shown by OS = identity you granted
4) If not matched, use the script-reported candidate identities and grant the launcher process
5) Reopen/restart and verify
```
For production apps, avoid requesting permissions via `swift`/`python` entry points; always route permission checks in the packaged app process so users only see one target.
If you maintain another macOS permission-related flow, reuse this standardized triage template:
- [permission-triage-template.md](references/permission-triage-template.md)
## Supported Applications ## Supported Applications
| Application | Window ID | AppleScript Control | Notes | | Application | Window ID | AppleScript Control | Notes |

View File

@@ -0,0 +1,42 @@
# macOS 权限排障模板Screen Recording / 麦克风)
## 排障目标
- 在系统设置里找不到目标应用
- 权限拒绝但设置项看起来已打开
- 通过终端/脚本入口触发时,用户不知道该给谁授权
## 标准排查顺序(必须按序执行)
1. 确认触发点
- 明确是哪个权限被拒绝Screen Recording / 麦克风)。
2. 确认 TCC 实体
- 不是脚本文件名。
- 先确认“当前触发进程”与“最终应用体”是否一致。
- 关注脚本输出里的候选身份列表invoker/runtime并逐项核验。
3. 确认设置面板
- 直接跳转到对应隐私面板
- 允许该进程/应用
- 重启进程后复验
## 通用动作模板
```bash
# Screen Recording
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"
# Microphone
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Microphone"
```
## 不在列表时处理
- 优先确认请求来自真实 .app Bundle签名、打包
- 如果当前为 CLI/脚本入口先给宿主进程授权Terminal/iTerm/swift/python
- 在设置面板点击 `+` 手工添加目标 `.app`
- 变更后退出并重启应用,重新测试
## 验收标准(用户侧)
- 用户能看到一条明确的“应授权对象”
- 错误提示中有“找不到对象时下一步该做什么”
- 无需反复猜测在设置里要点击什么

View File

@@ -4,9 +4,11 @@
// Enumerate on-screen windows and print their Window IDs. // Enumerate on-screen windows and print their Window IDs.
// //
// Usage: // Usage:
// swift get_window_id.swift # List all windows // swift scripts/get_window_id.swift # List all windows
// swift get_window_id.swift Excel # Filter by keyword // swift scripts/get_window_id.swift Excel # Filter by keyword
// swift get_window_id.swift "Chrome" # Filter by app name // swift scripts/get_window_id.swift "Chrome" # Filter by app name
// swift scripts/get_window_id.swift --permission-hint screen # Print Screen Recording triage
// swift scripts/get_window_id.swift --permission-hint microphone # Print Microphone triage
// //
// Output format: // Output format:
// WID=12345 | App=Microsoft Excel | Title=workbook.xlsx // WID=12345 | App=Microsoft Excel | Title=workbook.xlsx
@@ -15,10 +17,192 @@
// //
import CoreGraphics import CoreGraphics
import Foundation
let keyword = CommandLine.arguments.count > 1 enum PermissionKind: String {
? CommandLine.arguments[1] case screen
: "" case microphone
}
let invocationPath = (CommandLine.arguments.first ?? "")
let invokerName = URL(fileURLWithPath: invocationPath).lastPathComponent
let runtimeProcessName = ProcessInfo.processInfo.processName
let invokerIsBundle = invocationPath.hasSuffix(".app") || invocationPath.contains(".app/")
let scriptPath: String? = invocationPath.hasSuffix(".swift") ? invocationPath : nil
let helperBinaries: Set<String> = [
"swift",
"swift-frontend",
"python",
"python3",
"node",
"uv",
"npm",
"bun",
"pnpm",
"yarn",
"bash",
"zsh",
"sh",
"osascript",
"Terminal",
"iTerm2",
"iTerm"
]
let invokerCandidates: [String] = {
var candidates = [String]()
var seen = Set<String>()
func append(_ value: String) {
guard !value.isEmpty, !seen.contains(value) else { return }
seen.insert(value)
candidates.append(value)
}
if let scriptPath = scriptPath {
append(scriptPath)
}
if !invokerName.isEmpty {
append(invokerName)
}
if !runtimeProcessName.isEmpty && runtimeProcessName != invokerName {
append(runtimeProcessName)
}
return candidates
}()
let args = Array(CommandLine.arguments.dropFirst())
var permissionHintTarget: PermissionKind?
var keyword = ""
var expectPermissionTarget = false
func printUsage() {
fputs("Usage:\n", stderr)
fputs(" swift scripts/get_window_id.swift [keyword]\n", stderr)
fputs(" swift scripts/get_window_id.swift --permission-hint [screen|microphone]\n", stderr)
fputs("\n", stderr)
fputs("Options:\n", stderr)
fputs(" --permission-hint [screen|microphone] Print permission triage instructions\n", stderr)
fputs(" -h, --help Show this help\n", stderr)
fputs("\n", stderr)
fputs("Examples:\n", stderr)
fputs(" swift scripts/get_window_id.swift Excel\n", stderr)
fputs(" swift scripts/get_window_id.swift --permission-hint screen\n", stderr)
fputs(" swift scripts/get_window_id.swift --permission-hint microphone\n", stderr)
}
for arg in args {
if expectPermissionTarget {
if let kind = PermissionKind(rawValue: arg.lowercased()) {
permissionHintTarget = kind
expectPermissionTarget = false
continue
}
fputs("Unknown permission target: \(arg)\n", stderr)
printUsage()
exit(2)
}
if arg == "-h" || arg == "--help" {
printUsage()
exit(0)
} else if arg == "--permission-hint" {
permissionHintTarget = .screen
expectPermissionTarget = true
} else if arg.hasPrefix("--permission-hint=") {
let target = String(arg.dropFirst("--permission-hint=".count)).lowercased()
guard let kind = PermissionKind(rawValue: target) else {
fputs("Unknown permission target: \(target)\n", stderr)
printUsage()
exit(2)
}
permissionHintTarget = kind
} else if arg == "--permission-hint-screen" {
permissionHintTarget = .screen
} else if arg == "--permission-hint-microphone" || arg == "--permission-hint-mic" {
permissionHintTarget = .microphone
} else if arg.hasPrefix("-") {
fputs("Unknown option: \(arg)\n", stderr)
printUsage()
exit(2)
} else if keyword.isEmpty {
keyword = arg
}
}
if expectPermissionTarget {
fputs("Missing permission hint target after --permission-hint.\n", stderr)
printUsage()
exit(2)
}
if let kind = permissionHintTarget {
switch kind {
case .screen:
fputs("Screen Recording permission required.\n", stderr)
printCommonPermissionHint(
pane: "Privacy_ScreenCapture",
label: "Screen Recording"
)
printPermissionContextHint()
case .microphone:
fputs("Microphone permission required.\n", stderr)
printCommonPermissionHint(
pane: "Privacy_Microphone",
label: "Microphone"
)
printPermissionContextHint()
}
exit(0)
}
func printCommonPermissionHint(pane: String, label: String, missing: Bool = true) {
let openCommand = "x-apple.systempreferences:com.apple.preference.security?\(pane)"
fputs("Troubleshooting:\n", stderr)
fputs(" - Open Settings: `open \"\(openCommand)\"`\n", stderr)
fputs(" - In Privacy & Security → \(label), enable the target application.\n", stderr)
if missing {
fputs(" - If the target app is not in the list:\n", stderr)
fputs(" - Granting happens by real .app bundle, not by helper/terminal scripts.\n", stderr)
fputs(" - For CLI workflows, grant to the host app you launch from (Terminal, iTerm, iTerm2, Swift, etc.) if no dedicated .app exists yet.\n", stderr)
fputs(" - Click `+` and add the actual `.app` from `/Applications`.\n", stderr)
}
fputs(" - If permission status does not refresh, quit/reopen terminal/app and retry.\n", stderr)
if helperBinaries.contains(invokerName) || helperBinaries.contains(runtimeProcessName) {
fputs(" - Current launcher is a helper/runtime process (`\(runtimeProcessName)`) -> OS may show this entry instead of the tool name.\n", stderr)
} else if invokerIsBundle {
fputs(" - The launcher path looks like a bundled app, which is the preferred state for permissions.\n", stderr)
}
if let scriptPath = scriptPath {
fputs(" - Script entry: `\(scriptPath)`\n", stderr)
}
}
func printPermissionContextHint() {
fputs(" - Invoker path: `\(invocationPath)`\n", stderr)
fputs(" - Runtime process: `\(runtimeProcessName)`\n", stderr)
if !invokerCandidates.isEmpty {
fputs(" - Candidate identities in System Settings:\n", stderr)
for identity in invokerCandidates {
fputs(" - \(identity)\n", stderr)
}
}
if invokerIsBundle {
fputs(" This looks like a bundled app path, so the setting should match the app identity.\n", stderr)
} else {
fputs(" If this is not your final app binary, permissions can be inconsistent.\n", stderr)
}
fputs(" - Recommended for production: keep permission requests inside your signed `.app` process.\n", stderr)
if let scriptPath = scriptPath {
fputs(" - Script entry currently used: `\(scriptPath)`.\n", stderr)
}
}
func printScreenRecordingPermissionHint() {
fputs("Screen Recording permission required.\n", stderr)
fputs("Troubleshooting:\n", stderr)
printCommonPermissionHint(pane: "Privacy_ScreenCapture", label: "Screen Recording")
printPermissionContextHint()
}
guard let windowList = CGWindowListCopyWindowInfo( guard let windowList = CGWindowListCopyWindowInfo(
.optionOnScreenOnly, kCGNullWindowID .optionOnScreenOnly, kCGNullWindowID
@@ -27,6 +211,7 @@ guard let windowList = CGWindowListCopyWindowInfo(
fputs("Possible causes:\n", stderr) fputs("Possible causes:\n", stderr)
fputs(" - No applications with visible windows are running\n", stderr) fputs(" - No applications with visible windows are running\n", stderr)
fputs(" - Screen Recording permission not granted (System Settings → Privacy & Security → Screen Recording)\n", stderr) fputs(" - Screen Recording permission not granted (System Settings → Privacy & Security → Screen Recording)\n", stderr)
printScreenRecordingPermissionHint()
exit(1) exit(1)
} }