⬆️ feat: upgrade tunnel-doctor to v1.2.0 with Layer 4 SSH ProxyCommand diagnostics

Add fourth conflict layer: SSH ProxyCommand double tunneling causing
intermittent git push/pull failures when Shadowrocket TUN is active.

Structural improvements per skill best practices:
- Eliminate content duplication between SKILL.md and reference
- Rename proxy_fixes.md → proxy_conflict_reference.md for clarity
- Trim SKILL.md from 534 to 487 lines (under 500 limit)
- Shorten YAML description from 910 to 661 characters
- Fix "apply all four" listing 5 items (separate anti-pattern)
- Clarify Layer 4's relationship to Tailscale theme

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
daymade
2026-02-17 19:57:08 +08:00
parent 1a5c8d7931
commit 830fc8f90f
7 changed files with 185 additions and 28 deletions

View File

@@ -1,6 +1,6 @@
---
name: tunnel-doctor
description: Diagnoses and fixes conflicts between Tailscale and proxy/VPN tools (Shadowrocket, Clash, Surge) on macOS. Covers three conflict layers - (1) route hijacking (proxy TUN overrides Tailscale routes), (2) HTTP proxy env var interception (http_proxy/NO_PROXY misconfiguration), and (3) system proxy bypass (browser goes through VPN proxy, DIRECT rule can't reach Tailscale utun). Includes SOP for remote development via SSH tunnels with proxy-safe Makefile patterns. Use when Tailscale ping works but SSH/HTTP times out, when browser returns 503 but curl works, when setting up Tailscale SSH to WSL instances, or when bootstrapping remote dev environments over Tailscale.
description: Diagnoses and fixes conflicts between Tailscale and proxy/VPN tools (Shadowrocket, Clash, Surge) on macOS. Covers four conflict layers - (1) route hijacking, (2) HTTP proxy env var interception, (3) system proxy bypass, and (4) SSH ProxyCommand double tunneling causing git push/pull failures. Includes SOP for remote development via SSH tunnels with proxy-safe Makefile patterns. Use when Tailscale ping works but SSH/HTTP times out, when browser returns 503 but curl works, when git push fails with "failed to begin relaying via HTTP", when setting up Tailscale SSH to WSL instances, or when bootstrapping remote dev environments over Tailscale.
allowed-tools: Read, Grep, Edit, Bash
---
@@ -8,15 +8,16 @@ allowed-tools: Read, Grep, Edit, Bash
Diagnose and fix conflicts when Tailscale coexists with proxy/VPN tools on macOS, with specific guidance for SSH access to WSL instances.
## Three Conflict Layers
## Four Conflict Layers
Tailscale + proxy tools can conflict at three independent layers. Each has different symptoms:
Proxy/VPN tools on macOS create conflicts at four independent layers. Layers 1-3 affect Tailscale connectivity; Layer 4 affects SSH git operations (same proxy environment, different target):
| Layer | What breaks | What still works | Root cause |
|-------|-------------|------------------|------------|
| 1. Route table | Everything (SSH, curl, browser) | `tailscale ping` | `tun-excluded-routes` adds `en0` route overriding Tailscale utun |
| 2. HTTP env vars | `curl`, Python requests, Node.js fetch | SSH, browser | `http_proxy` set without `NO_PROXY` for Tailscale |
| 3. System proxy (browser) | Browser only (HTTP 503) | SSH, `curl` (both with/without proxy) | Browser uses VPN system proxy; DIRECT rule routes via Wi-Fi, not Tailscale utun |
| 4. SSH ProxyCommand double tunnel | `git push/pull` (intermittent) | `ssh -T` (small data) | `connect -H` creates HTTP CONNECT tunnel redundant with Shadowrocket TUN; landing proxy drops large/long-lived transfers |
## Diagnostic Workflow
@@ -29,6 +30,7 @@ Determine which scenario applies:
- **Tailscale ping works, SSH/TCP times out** → Route conflict (Step 2B)
- **Remote dev server auth redirects to `localhost` → browser can't follow** → SSH tunnel needed (Step 2D)
- **`make status` / scripts curl to localhost fail with proxy** → localhost proxy interception (Step 2E)
- **`git push/pull` fails with `FATAL: failed to begin relaying via HTTP`** → SSH double tunnel (Step 2F)
- **SSH connects but `operation not permitted`** → Tailscale SSH config issue (Step 4)
- **SSH connects but `be-child ssh` exits code 1** → WSL snap sandbox issue (Step 5)
@@ -36,6 +38,7 @@ Determine which scenario applies:
- SSH does NOT use `http_proxy`/`NO_PROXY` env vars. If SSH works but HTTP doesn't → Layer 2.
- `curl` uses `http_proxy` env var, NOT the system proxy. Browser uses system proxy (set by VPN). If `curl` works but browser doesn't → Layer 3.
- If `tailscale ping` works but regular `ping` doesn't → Layer 1 (route table corrupted).
- If `ssh -T git@github.com` works but `git push` fails intermittently → Layer 4 (double tunnel).
### Step 2A: Fix HTTP Proxy Environment Variables
@@ -66,7 +69,7 @@ export NO_PROXY=localhost,127.0.0.1,.ts.net,100.64.0.0/10,192.168.*,10.*,172.16.
**Two layers complement each other**: `.ts.net` handles domain-based access, `100.64.0.0/10` handles direct IP access.
**NO_PROXY syntax pitfalls** — see [references/proxy_fixes.md](references/proxy_fixes.md) for the compatibility matrix.
**NO_PROXY syntax pitfalls** — see [references/proxy_conflict_reference.md](references/proxy_conflict_reference.md) for the compatibility matrix.
Verify the fix:
@@ -196,9 +199,37 @@ Alternatively, set `no_proxy` globally in `~/.zshrc`:
export no_proxy=localhost,127.0.0.1
```
### Step 2F: Fix SSH ProxyCommand Double Tunnel (git push/pull failures)
**Symptom**: `ssh -T git@github.com` succeeds consistently, but `git push` or `git pull` fails intermittently with:
```
FATAL: failed to begin relaying via HTTP.
Connection closed by UNKNOWN port 65535
```
Small operations (auth, fetch metadata) work; large data transfers fail.
**Root cause**: When Shadowrocket TUN is active, it already routes all TCP traffic through its VPN tunnel. If SSH config also uses `ProxyCommand connect -H`, data flows through two proxy layers — the landing proxy drops large/long-lived HTTP CONNECT connections.
**Diagnosis**:
```bash
# 1. Confirm Shadowrocket TUN is active
ifconfig | grep '^utun'
# 2. Check SSH config for ProxyCommand
grep -A5 'Host github.com' ~/.ssh/config
# 3. Confirm: removing ProxyCommand fixes push
GIT_SSH_COMMAND="ssh -o ProxyCommand=none" git push origin main
```
**Fix** — remove ProxyCommand and switch to `ssh.github.com:443`. See [references/proxy_conflict_reference.md § SSH ProxyCommand and Git Operations](references/proxy_conflict_reference.md) for the full SSH config, why port 443 helps, and fallback options when VPN is off.
### Step 3: Fix Proxy Tool Configuration
Identify the proxy tool and apply the appropriate fix. See [references/proxy_fixes.md](references/proxy_fixes.md) for detailed instructions per tool.
Identify the proxy tool and apply the appropriate fix. See [references/proxy_conflict_reference.md](references/proxy_conflict_reference.md) for detailed instructions per tool.
**Key principle**: Do NOT use `tun-excluded-routes` to exclude `100.64.0.0/10`. This causes the proxy to add a `→ en0` route that overrides Tailscale. Instead, let the traffic enter the proxy TUN and use a DIRECT rule to pass it through.
@@ -374,7 +405,7 @@ Each `-L` flag is independent. If one port is already bound locally, `ExitOnForw
### 4. SSH Non-Login Shell Setup
SSH non-login shells don't load `~/.zshrc`, so nvm/Homebrew tools and proxy env vars are unavailable. Prefix all remote commands with `source ~/.zshrc 2>/dev/null;`. See [references/proxy_fixes.md § SSH Non-Login Shell Pitfall](references/proxy_fixes.md) for details and examples.
SSH non-login shells don't load `~/.zshrc`, so nvm/Homebrew tools and proxy env vars are unavailable. Prefix all remote commands with `source ~/.zshrc 2>/dev/null;`. See [references/proxy_conflict_reference.md § SSH Non-Login Shell Pitfall](references/proxy_conflict_reference.md) for details and examples.
For Makefile targets that run remote commands:
@@ -453,4 +484,4 @@ Before starting remote development, verify:
## References
- [references/proxy_fixes.md](references/proxy_fixes.md) — Detailed fix instructions for Shadowrocket, Clash, and Surge
- [references/proxy_conflict_reference.md](references/proxy_conflict_reference.md) — Per-tool configuration (Shadowrocket, Clash, Surge), NO_PROXY syntax, SSH ProxyCommand, and conflict architecture

View File

@@ -178,17 +178,120 @@ Check MagicDNS status:
tailscale dns status
```
## SSH ProxyCommand and Git Operations
### The Problem
Many developers in China configure SSH with `ProxyCommand connect -H 127.0.0.1:<port>` to tunnel SSH through their HTTP proxy. This works fine for interactive SSH and small operations. But when Shadowrocket (or Clash/Surge) runs in TUN mode, this creates a **double tunnel**:
1. `connect -H` creates an HTTP CONNECT tunnel to the local proxy port
2. Shadowrocket TUN captures the same traffic at the system level
The landing proxy sees a long-lived HTTP CONNECT connection and may drop it during large data transfers (`git push`, `git clone` of large repos).
### Data Flow Comparison
```
Double tunnel (broken):
SSH → connect -H (HTTP CONNECT tunnel) → Shadowrocket local port 1082
→ Shadowrocket TUN → landing proxy → GitHub
Single tunnel (correct):
SSH → system network stack → Shadowrocket TUN → landing proxy → GitHub
```
The HTTP CONNECT tunnel adds protocol framing overhead. The landing proxy (落地代理) sees a long-lived HTTP CONNECT connection and may apply aggressive timeouts or buffer limits, dropping the connection during large transfers.
### Detecting TUN Mode
```bash
# If utun interfaces exist (other than Tailscale's), a VPN TUN is active
ifconfig | grep '^utun'
```
If Shadowrocket/Clash/Surge TUN is active, `ProxyCommand connect -H` is redundant.
### The Fix — SSH over Port 443 without ProxyCommand
```bash
# 1. Add ssh.github.com host key
ssh-keyscan -p 443 ssh.github.com >> ~/.ssh/known_hosts
# 2. Update ~/.ssh/config
```
```
Host github.com
HostName ssh.github.com
Port 443
User git
# No ProxyCommand — Shadowrocket TUN handles routing at the system level.
# Port 443 gets longer timeouts from landing proxies than port 22.
ServerAliveInterval 60
ServerAliveCountMax 3
IdentityFile ~/.ssh/id_ed25519
```
### Why Port 443
HTTP proxies (and landing proxies) are optimized for port 443 traffic:
- **Longer connection timeouts**: HTTPS connections are expected to be long-lived (WebSocket, streaming, large file downloads)
- **Larger buffer limits**: Proxies allocate more resources for 443 traffic
- **No protocol inspection**: Port 22 may trigger deep packet inspection on some proxies; 443 is treated as opaque TLS
GitHub officially supports SSH on port 443 via `ssh.github.com` — it's the same service, same authentication, different port.
### Fallback When VPN Is Off
Without Shadowrocket TUN, SSH can't reach GitHub directly from China. Options:
1. **Keep old config as comment** — manually uncomment ProxyCommand when needed
2. **Use Match directive** — conditionally apply ProxyCommand (advanced):
```
Host github.com
HostName ssh.github.com
Port 443
User git
ServerAliveInterval 60
ServerAliveCountMax 3
IdentityFile ~/.ssh/id_ed25519
# Uncomment when Shadowrocket is off:
# ProxyCommand /opt/homebrew/bin/connect -H 127.0.0.1:1082 %h %p
```
### Verification
```bash
# Auth test
ssh -T git@github.com
# → Hi username! You've successfully authenticated...
# Verbose — confirm ssh.github.com:443
ssh -v -T git@github.com 2>&1 | grep 'Connecting to'
# → Connecting to ssh.github.com [20.205.243.160] port 443.
# Large transfer test
cd /path/to/repo && git push origin main
```
### Performance Trade-off
Connection setup is slightly slower (~6s vs ~2s) because TUN routing has more network hops than a direct HTTP CONNECT tunnel. Actual data transfer speed is the same (bottlenecked by bandwidth, not connection setup).
## General Principles
### Three Conflict Layers
### Four Conflict Layers
Proxy tools and Tailscale can conflict at three independent layers on macOS:
Proxy tools create conflicts at four independent layers on macOS. Layers 1-3 affect Tailscale connectivity; Layer 4 affects SSH git operations through the same proxy infrastructure:
| Layer | Setting | What it controls | Symptom when wrong |
|-------|---------|------------------|--------------------|
| 1. Route table | `tun-excluded-routes` | OS-level IP routing | Everything broken (SSH, curl, browser). `tailscale ping` works but `ping` doesn't |
| 2. HTTP env vars | `http_proxy` / `NO_PROXY` | CLI tools (curl, wget, Python, Node.js) | `curl` times out, SSH works, browser works |
| 3. System proxy | `skip-proxy` | Browser and system HTTP clients | Browser 503, `curl` works (both with/without proxy), SSH works |
| 4. SSH ProxyCommand | `ProxyCommand connect -H` | SSH git operations (push/pull/clone) | `ssh -T` works, `git push` fails intermittently with `failed to begin relaying via HTTP` |
**Each layer is independent.** A fix at one layer doesn't help the others. You may need fixes at multiple layers simultaneously.
@@ -218,12 +321,14 @@ Adding `100.64.0.0/10` to `skip-proxy` makes the system bypass the proxy entirel
### The Correct Approach
For full Tailscale compatibility with proxy tools, apply all three:
For full Tailscale compatibility with proxy tools, apply all four fixes:
1. **`[Rule]`**: `IP-CIDR,100.64.0.0/10,DIRECT` — handles TUN-level traffic
2. **`skip-proxy`**: Add `100.64.0.0/10` — fixes browser access
3. **`NO_PROXY` env var**: Add `100.64.0.0/10,.ts.net` — fixes CLI HTTP tools
4. **`tun-excluded-routes`**: Do NOT add `100.64.0.0/10` — this breaks everything
4. **SSH `~/.ssh/config`**: Remove `ProxyCommand`, use `ssh.github.com:443` — fixes git push/pull
**Critical anti-pattern**: Do NOT add `100.64.0.0/10` to `tun-excluded-routes` — this breaks everything (see "Why tun-excluded-routes Breaks Tailscale" above).
### Quick Verification