155 lines
5.6 KiB
Markdown
Executable File
155 lines
5.6 KiB
Markdown
Executable File
# Authentication Architecture
|
|
|
|
## Overview
|
|
|
|
This skill uses a **hybrid authentication approach** that combines the best of both worlds:
|
|
|
|
1. **Persistent Browser Profile** (`user_data_dir`) for consistent browser fingerprinting
|
|
2. **Manual Cookie Injection** from `state.json` for reliable session cookie persistence
|
|
|
|
## Why This Approach?
|
|
|
|
### The Problem
|
|
|
|
Playwright/Patchright has a known bug ([#36139](https://github.com/microsoft/playwright/issues/36139)) where **session cookies** (cookies without an `Expires` attribute) do not persist correctly when using `launch_persistent_context()` with `user_data_dir`.
|
|
|
|
**What happens:**
|
|
- ✅ Persistent cookies (with `Expires` date) → Saved correctly to browser profile
|
|
- ❌ Session cookies (without `Expires`) → **Lost after browser restarts**
|
|
|
|
**Impact:**
|
|
- Some Google auth cookies are session cookies
|
|
- Users experience random authentication failures
|
|
- "Works on my machine" syndrome (depends on which cookies Google uses)
|
|
|
|
### TypeScript vs Python
|
|
|
|
The **MCP Server** (TypeScript) can work around this by passing `storage_state` as a parameter:
|
|
|
|
```typescript
|
|
// TypeScript - works!
|
|
const context = await chromium.launchPersistentContext(userDataDir, {
|
|
storageState: "state.json", // ← Loads cookies including session cookies
|
|
channel: "chrome"
|
|
});
|
|
```
|
|
|
|
But **Python's Playwright API doesn't support this** ([#14949](https://github.com/microsoft/playwright/issues/14949)):
|
|
|
|
```python
|
|
# Python - NOT SUPPORTED!
|
|
context = playwright.chromium.launch_persistent_context(
|
|
user_data_dir=profile_dir,
|
|
storage_state="state.json", # ← Parameter not available in Python!
|
|
channel="chrome"
|
|
)
|
|
```
|
|
|
|
## Our Solution: Hybrid Approach
|
|
|
|
We use a **two-phase authentication system**:
|
|
|
|
### Phase 1: Setup (`auth_manager.py setup`)
|
|
|
|
1. Launch persistent context with `user_data_dir`
|
|
2. User logs in manually
|
|
3. **Save state to TWO places:**
|
|
- Browser profile directory (automatic, for fingerprint + persistent cookies)
|
|
- `state.json` file (explicit save, for session cookies)
|
|
|
|
```python
|
|
context = playwright.chromium.launch_persistent_context(
|
|
user_data_dir="browser_profile/",
|
|
channel="chrome"
|
|
)
|
|
# User logs in...
|
|
context.storage_state(path="state.json") # Save all cookies
|
|
```
|
|
|
|
### Phase 2: Runtime (`ask_question.py`)
|
|
|
|
1. Launch persistent context with `user_data_dir` (loads fingerprint + persistent cookies)
|
|
2. **Manually inject cookies** from `state.json` (adds session cookies)
|
|
|
|
```python
|
|
# Step 1: Launch with browser profile
|
|
context = playwright.chromium.launch_persistent_context(
|
|
user_data_dir="browser_profile/",
|
|
channel="chrome"
|
|
)
|
|
|
|
# Step 2: Manually inject cookies from state.json
|
|
with open("state.json", 'r') as f:
|
|
state = json.load(f)
|
|
context.add_cookies(state['cookies']) # ← Workaround for session cookies!
|
|
```
|
|
|
|
## Benefits
|
|
|
|
| Feature | Our Approach | Pure `user_data_dir` | Pure `storage_state` |
|
|
|---------|--------------|----------------------|----------------------|
|
|
| **Browser Fingerprint Consistency** | ✅ Same across restarts | ✅ Same | ❌ Changes each time |
|
|
| **Session Cookie Persistence** | ✅ Manual injection | ❌ Lost (bug) | ✅ Native support |
|
|
| **Persistent Cookie Persistence** | ✅ Automatic | ✅ Automatic | ✅ Native support |
|
|
| **Google Trust** | ✅ High (same browser) | ✅ High | ❌ Low (new browser) |
|
|
| **Cross-platform Reliability** | ✅ Chrome required | ⚠️ Chromium issues | ✅ Portable |
|
|
| **Cache Performance** | ✅ Keeps cache | ✅ Keeps cache | ❌ No cache |
|
|
|
|
## File Structure
|
|
|
|
```
|
|
~/.claude/skills/notebooklm/data/
|
|
├── auth_info.json # Metadata about authentication
|
|
├── browser_state/
|
|
│ ├── state.json # Cookies + localStorage (for manual injection)
|
|
│ └── browser_profile/ # Chrome user profile (for fingerprint + cache)
|
|
│ ├── Default/
|
|
│ │ ├── Cookies # Persistent cookies only (session cookies missing!)
|
|
│ │ ├── Local Storage/
|
|
│ │ └── Cache/
|
|
│ └── ...
|
|
```
|
|
|
|
## Why `state.json` is Critical
|
|
|
|
Even though we use `user_data_dir`, we **still need `state.json`** because:
|
|
|
|
1. **Session cookies** are not saved to the browser profile (Playwright bug)
|
|
2. **Manual injection** is the only reliable way to load session cookies
|
|
3. **Validation** - we can check if cookies are expired before launching
|
|
|
|
## Code References
|
|
|
|
**Setup:** `scripts/auth_manager.py:94-120`
|
|
- Lines 100-113: Launch persistent context with `channel="chrome"`
|
|
- Line 167: Save to `state.json` via `context.storage_state()`
|
|
|
|
**Runtime:** `scripts/ask_question.py:77-118`
|
|
- Lines 86-99: Launch persistent context
|
|
- Lines 101-118: Manual cookie injection workaround
|
|
|
|
**Validation:** `scripts/auth_manager.py:236-298`
|
|
- Lines 262-275: Launch persistent context
|
|
- Lines 277-287: Manual cookie injection for validation
|
|
|
|
## Related Issues
|
|
|
|
- [microsoft/playwright#36139](https://github.com/microsoft/playwright/issues/36139) - Session cookies not persisting
|
|
- [microsoft/playwright#14949](https://github.com/microsoft/playwright/issues/14949) - Storage state with persistent context
|
|
- [StackOverflow Question](https://stackoverflow.com/questions/79641481/) - Session cookie persistence issue
|
|
|
|
## Future Improvements
|
|
|
|
If Playwright adds support for `storage_state` parameter in Python's `launch_persistent_context()`, we can simplify to:
|
|
|
|
```python
|
|
# Future (when Python API supports it):
|
|
context = playwright.chromium.launch_persistent_context(
|
|
user_data_dir="browser_profile/",
|
|
storage_state="state.json", # ← Would handle everything automatically!
|
|
channel="chrome"
|
|
)
|
|
```
|
|
|
|
Until then, our hybrid approach is the most reliable solution.
|