Disaster #2 from Feb 23-24 session: - n8n core nodes broken (registry corruption) - PHP workaround operational (sync_codex.php) - Factory reset procedure documented - Added Task #34 for scheduled recovery Decision: Defer reset until next maintenance window Workaround: PHP script handles Codex sync successfully Co-documented with Gemini's post-mortem analysis.
244 lines
6.3 KiB
Markdown
244 lines
6.3 KiB
Markdown
# n8n Node Registry Corruption (v2.x)
|
|
|
|
**Problem:** n8n UI accessible but core nodes (HTTP Request, Execute Command) fail with "Node not found" or "Registry Error"
|
|
|
|
**Incident Date:** February 23-24, 2026
|
|
**Affected System:** TX1 Dallas n8n instance (firefrost-codex-n8n-1)
|
|
**Status:** BYPASSED via PHP workaround, factory reset pending
|
|
|
|
---
|
|
|
|
## Symptoms
|
|
|
|
- ✅ n8n web interface loads normally at https://n8n.firefrostgaming.com
|
|
- ✅ Existing workflows visible in UI
|
|
- ❌ Cannot execute workflows using core nodes
|
|
- ❌ "Node not found" errors for `n8n-nodes-base` package nodes
|
|
- ❌ HTTP Request node: Registry error
|
|
- ❌ Execute Command node: Registry error
|
|
|
|
**These are INTERNAL nodes that should always be available.**
|
|
|
|
---
|
|
|
|
## Root Cause
|
|
|
|
**Corrupted Node Registry during v2.x migration**
|
|
|
|
The internal node registry (`n8n-nodes-base` package) became desynchronized from the workflow engine. This typically happens when:
|
|
|
|
1. Partial update of n8n version with incompatible volume data
|
|
2. Docker volume corruption in `/home/node/.n8n` directory
|
|
3. Version mismatch between container image and persisted configuration
|
|
|
|
**Key indicator:** Core nodes from `n8n-nodes-base` package are "invisible" to the execution engine despite being bundled with n8n.
|
|
|
|
---
|
|
|
|
## Failed Resolution Attempts
|
|
|
|
### Attempt 1: Container Recreation
|
|
```bash
|
|
docker-compose down
|
|
docker-compose pull n8n
|
|
docker-compose up -d
|
|
```
|
|
**Result:** ❌ Failed - corruption persists in volume
|
|
|
|
### Attempt 2: Image Force Pull
|
|
```bash
|
|
docker-compose down
|
|
docker rmi n8nio/n8n:1.121.0
|
|
docker-compose up -d
|
|
```
|
|
**Result:** ❌ Failed - volume data still corrupted
|
|
|
|
**Why these failed:** The corruption is in the VOLUME (`./volumes/n8n`), not the container image.
|
|
|
|
---
|
|
|
|
## Temporary Workaround: PHP Direct Sync
|
|
|
|
**Created:** `sync_codex.php` on TX1 host OS (PHP 8.3 CLI)
|
|
|
|
**Purpose:** Bypass n8n entirely for Codex Git → Dify sync
|
|
|
|
**How it works:**
|
|
```
|
|
TX1 Host (PHP) → Git Pull → Process Files → Dify API (127.0.0.1:5001)
|
|
```
|
|
|
|
**Advantages:**
|
|
- No dependency on n8n registry
|
|
- Direct Docker bridge access to Dify API
|
|
- Simpler debugging (single script vs workflow nodes)
|
|
- Can run via cron for scheduled execution
|
|
|
|
**Disadvantages:**
|
|
- No Discord notifications (yet)
|
|
- No visual workflow editor
|
|
- Harder for non-technical users to modify
|
|
|
|
**Status:** ✅ OPERATIONAL - Successfully synced 361 documents to Dify
|
|
|
|
---
|
|
|
|
## Permanent Fix: n8n Factory Reset
|
|
|
|
**⚠️ THIS IS DESTRUCTIVE - BACKUP WORKFLOWS FIRST ⚠️**
|
|
|
|
### Prerequisites
|
|
|
|
1. **Export ALL workflows to JSON:**
|
|
```bash
|
|
# Via n8n UI:
|
|
# Settings → Workflows → Export All
|
|
# Save to: /opt/firefrost-codex/backups/n8n-workflows-YYYY-MM-DD.json
|
|
|
|
# Or via API:
|
|
curl -X GET https://n8n.firefrostgaming.com/api/v1/workflows \
|
|
-H "X-N8N-API-KEY: your_api_key" > n8n-workflows-backup.json
|
|
```
|
|
|
|
2. **Backup credentials (if any):**
|
|
```bash
|
|
# Settings → Credentials → Export
|
|
# Save separately - these are sensitive
|
|
```
|
|
|
|
3. **Document current configuration:**
|
|
- Webhook URLs
|
|
- Environment variables
|
|
- Executions settings
|
|
- Timezone settings
|
|
|
|
### Reset Procedure
|
|
|
|
**Step 1: Stop n8n**
|
|
```bash
|
|
cd /opt/firefrost-codex
|
|
docker-compose stop n8n
|
|
```
|
|
|
|
**Step 2: Backup existing volume (safety net)**
|
|
```bash
|
|
sudo cp -r ./volumes/n8n ./volumes/n8n.backup.$(date +%Y%m%d)
|
|
```
|
|
|
|
**Step 3: Wipe corrupted volume**
|
|
```bash
|
|
sudo rm -rf ./volumes/n8n/*
|
|
```
|
|
|
|
**Step 4: Recreate container**
|
|
```bash
|
|
docker-compose up -d n8n
|
|
```
|
|
|
|
**Step 5: Wait for initialization (~2 minutes)**
|
|
```bash
|
|
# Watch logs
|
|
docker-compose logs -f n8n
|
|
|
|
# Look for: "Editor is now accessible via: https://n8n.firefrostgaming.com"
|
|
```
|
|
|
|
**Step 6: Initial setup**
|
|
- Visit https://n8n.firefrostgaming.com
|
|
- Create owner account (use same credentials as before)
|
|
- Configure timezone and settings
|
|
|
|
**Step 7: Import workflows**
|
|
- Settings → Workflows → Import from File
|
|
- Select backup JSON
|
|
- Verify all nodes load correctly
|
|
|
|
**Step 8: Test core nodes**
|
|
- Create new workflow
|
|
- Add HTTP Request node → Should work
|
|
- Add Execute Command node → Should work
|
|
- Test execution → Should succeed
|
|
|
|
**Step 9: Restore credentials**
|
|
- Settings → Credentials → Import
|
|
- Re-enter any API keys/secrets
|
|
|
|
**Step 10: Verify automation**
|
|
- Test Git sync workflow manually
|
|
- Verify Discord notifications
|
|
- Check scheduled executions
|
|
|
|
---
|
|
|
|
## Prevention
|
|
|
|
**To avoid this in the future:**
|
|
|
|
1. **Pin n8n version in docker-compose.yml:**
|
|
```yaml
|
|
n8n:
|
|
image: n8nio/n8n:1.121.0 # Specific version, not :latest
|
|
```
|
|
|
|
2. **Backup workflows regularly:**
|
|
```bash
|
|
# Add to cron: Weekly workflow export
|
|
0 2 * * 0 curl https://n8n.firefrostgaming.com/api/v1/workflows > /backups/n8n-workflows-$(date +%Y%m%d).json
|
|
```
|
|
|
|
3. **Test updates on staging first:**
|
|
- Don't upgrade n8n in production without testing
|
|
- Check release notes for breaking changes
|
|
|
|
4. **Monitor n8n health:**
|
|
- Add n8n health check to Uptime Kuma
|
|
- Alert if workflow executions fail
|
|
|
|
---
|
|
|
|
## Current Status (February 24, 2026)
|
|
|
|
**n8n Service:**
|
|
- ⚠️ DEGRADED - UI accessible, core nodes broken
|
|
- 📋 FACTORY RESET PENDING - Scheduled for next maintenance window
|
|
|
|
**Codex Git Sync:**
|
|
- ✅ OPERATIONAL - Using PHP workaround (`sync_codex.php`)
|
|
- ✅ 361 documents syncing successfully
|
|
- ⏱️ Manual execution (cron scheduling pending)
|
|
|
|
**Next Steps:**
|
|
1. Add n8n factory reset to tasks.md
|
|
2. Schedule maintenance window for reset
|
|
3. Consider migrating to PHP permanently if simpler
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- **Phase 5 Deployment:** `docs/tasks/firefrost-codex/`
|
|
- **PHP Workaround:** (To be documented if kept long-term)
|
|
- **n8n Workflows:** Backup stored at `/opt/firefrost-codex/backups/` (when created)
|
|
|
|
---
|
|
|
|
**Incident Timeline:**
|
|
|
|
- **Feb 23, 9:00 PM:** n8n workflow failure discovered during Phase 5 deployment
|
|
- **Feb 23, 9:30 PM:** Diagnosis: Node registry corruption
|
|
- **Feb 23, 10:00 PM:** Pivot to PHP workaround (Gemini + Michael collaboration)
|
|
- **Feb 24, 12:00 AM:** PHP script operational, 361 documents synced
|
|
- **Feb 24, 9:00 AM:** Dify-Qdrant issue resolved (separate incident)
|
|
- **Feb 24, 9:30 AM:** Decision to defer n8n reset until next session
|
|
|
|
---
|
|
|
|
**Created:** February 24, 2026
|
|
**Created By:** Chronicler #26 (from Gemini's post-mortem)
|
|
**Resolution Status:** DEFERRED - Workaround operational
|
|
**Factory Reset:** Scheduled TBD
|
|
|
|
💙🔥❄️
|
|
|
|
**"Sometimes the best fix is the one that waits until you have the energy to do it right."**
|