diff --git a/docs/core/tasks.md b/docs/core/tasks.md index 6d71757..ff1de4e 100644 --- a/docs/core/tasks.md +++ b/docs/core/tasks.md @@ -846,3 +846,43 @@ Small improvements to Whitelist Manager: **Impact:** Cosmetic only - does not affect functionality --- + +--- + +### 34. n8n Factory Reset - Node Registry Recovery +**Time:** 2-3 hours +**Status:** DEFERRED +**Priority:** Tier 2 - Major Infrastructure +**Documentation:** `docs/troubleshooting/n8n-node-registry-corruption.md` + +Reset n8n instance on TX1 to resolve corrupted node registry preventing workflow execution. + +**Problem:** Core nodes (HTTP Request, Execute Command) fail with "Node not found" errors. + +**Current Workaround:** PHP script (`sync_codex.php`) handling Codex Git sync directly. + +**Reset Procedure:** +1. Export all workflows to JSON backup +2. Backup credentials and settings +3. Stop n8n container +4. Backup existing volume to `.backup` folder +5. Wipe `./volumes/n8n/*` directory +6. Recreate container (fresh initialization) +7. Re-import workflows and credentials +8. Test core nodes functionality +9. Restore scheduled executions + +**Prerequisites:** +- Workflow JSON exports backed up +- Credentials documented +- Maintenance window scheduled (low-traffic time) + +**Post-Reset:** +- Verify Git sync workflow works +- Test Discord notifications +- Re-enable hourly scheduling +- Monitor for 24 hours + +**Alternative:** Keep PHP workaround permanently if simpler/more reliable. + +--- diff --git a/docs/troubleshooting/n8n-node-registry-corruption.md b/docs/troubleshooting/n8n-node-registry-corruption.md new file mode 100644 index 0000000..1611681 --- /dev/null +++ b/docs/troubleshooting/n8n-node-registry-corruption.md @@ -0,0 +1,243 @@ +# n8n Node Registry Corruption (v2.x) + +**Problem:** n8n UI accessible but core nodes (HTTP Request, Execute Command) fail with "Node not found" or "Registry Error" + +**Incident Date:** February 23-24, 2026 +**Affected System:** TX1 Dallas n8n instance (firefrost-codex-n8n-1) +**Status:** BYPASSED via PHP workaround, factory reset pending + +--- + +## Symptoms + +- ✅ n8n web interface loads normally at https://n8n.firefrostgaming.com +- ✅ Existing workflows visible in UI +- ❌ Cannot execute workflows using core nodes +- ❌ "Node not found" errors for `n8n-nodes-base` package nodes +- ❌ HTTP Request node: Registry error +- ❌ Execute Command node: Registry error + +**These are INTERNAL nodes that should always be available.** + +--- + +## Root Cause + +**Corrupted Node Registry during v2.x migration** + +The internal node registry (`n8n-nodes-base` package) became desynchronized from the workflow engine. This typically happens when: + +1. Partial update of n8n version with incompatible volume data +2. Docker volume corruption in `/home/node/.n8n` directory +3. Version mismatch between container image and persisted configuration + +**Key indicator:** Core nodes from `n8n-nodes-base` package are "invisible" to the execution engine despite being bundled with n8n. + +--- + +## Failed Resolution Attempts + +### Attempt 1: Container Recreation +```bash +docker-compose down +docker-compose pull n8n +docker-compose up -d +``` +**Result:** ❌ Failed - corruption persists in volume + +### Attempt 2: Image Force Pull +```bash +docker-compose down +docker rmi n8nio/n8n:1.121.0 +docker-compose up -d +``` +**Result:** ❌ Failed - volume data still corrupted + +**Why these failed:** The corruption is in the VOLUME (`./volumes/n8n`), not the container image. + +--- + +## Temporary Workaround: PHP Direct Sync + +**Created:** `sync_codex.php` on TX1 host OS (PHP 8.3 CLI) + +**Purpose:** Bypass n8n entirely for Codex Git → Dify sync + +**How it works:** +``` +TX1 Host (PHP) → Git Pull → Process Files → Dify API (127.0.0.1:5001) +``` + +**Advantages:** +- No dependency on n8n registry +- Direct Docker bridge access to Dify API +- Simpler debugging (single script vs workflow nodes) +- Can run via cron for scheduled execution + +**Disadvantages:** +- No Discord notifications (yet) +- No visual workflow editor +- Harder for non-technical users to modify + +**Status:** ✅ OPERATIONAL - Successfully synced 361 documents to Dify + +--- + +## Permanent Fix: n8n Factory Reset + +**⚠️ THIS IS DESTRUCTIVE - BACKUP WORKFLOWS FIRST ⚠️** + +### Prerequisites + +1. **Export ALL workflows to JSON:** +```bash +# Via n8n UI: +# Settings → Workflows → Export All +# Save to: /opt/firefrost-codex/backups/n8n-workflows-YYYY-MM-DD.json + +# Or via API: +curl -X GET https://n8n.firefrostgaming.com/api/v1/workflows \ + -H "X-N8N-API-KEY: your_api_key" > n8n-workflows-backup.json +``` + +2. **Backup credentials (if any):** +```bash +# Settings → Credentials → Export +# Save separately - these are sensitive +``` + +3. **Document current configuration:** +- Webhook URLs +- Environment variables +- Executions settings +- Timezone settings + +### Reset Procedure + +**Step 1: Stop n8n** +```bash +cd /opt/firefrost-codex +docker-compose stop n8n +``` + +**Step 2: Backup existing volume (safety net)** +```bash +sudo cp -r ./volumes/n8n ./volumes/n8n.backup.$(date +%Y%m%d) +``` + +**Step 3: Wipe corrupted volume** +```bash +sudo rm -rf ./volumes/n8n/* +``` + +**Step 4: Recreate container** +```bash +docker-compose up -d n8n +``` + +**Step 5: Wait for initialization (~2 minutes)** +```bash +# Watch logs +docker-compose logs -f n8n + +# Look for: "Editor is now accessible via: https://n8n.firefrostgaming.com" +``` + +**Step 6: Initial setup** +- Visit https://n8n.firefrostgaming.com +- Create owner account (use same credentials as before) +- Configure timezone and settings + +**Step 7: Import workflows** +- Settings → Workflows → Import from File +- Select backup JSON +- Verify all nodes load correctly + +**Step 8: Test core nodes** +- Create new workflow +- Add HTTP Request node → Should work +- Add Execute Command node → Should work +- Test execution → Should succeed + +**Step 9: Restore credentials** +- Settings → Credentials → Import +- Re-enter any API keys/secrets + +**Step 10: Verify automation** +- Test Git sync workflow manually +- Verify Discord notifications +- Check scheduled executions + +--- + +## Prevention + +**To avoid this in the future:** + +1. **Pin n8n version in docker-compose.yml:** +```yaml +n8n: + image: n8nio/n8n:1.121.0 # Specific version, not :latest +``` + +2. **Backup workflows regularly:** +```bash +# Add to cron: Weekly workflow export +0 2 * * 0 curl https://n8n.firefrostgaming.com/api/v1/workflows > /backups/n8n-workflows-$(date +%Y%m%d).json +``` + +3. **Test updates on staging first:** +- Don't upgrade n8n in production without testing +- Check release notes for breaking changes + +4. **Monitor n8n health:** +- Add n8n health check to Uptime Kuma +- Alert if workflow executions fail + +--- + +## Current Status (February 24, 2026) + +**n8n Service:** +- ⚠️ DEGRADED - UI accessible, core nodes broken +- 📋 FACTORY RESET PENDING - Scheduled for next maintenance window + +**Codex Git Sync:** +- ✅ OPERATIONAL - Using PHP workaround (`sync_codex.php`) +- ✅ 361 documents syncing successfully +- ⏱️ Manual execution (cron scheduling pending) + +**Next Steps:** +1. Add n8n factory reset to tasks.md +2. Schedule maintenance window for reset +3. Consider migrating to PHP permanently if simpler + +--- + +## Related Documentation + +- **Phase 5 Deployment:** `docs/tasks/firefrost-codex/` +- **PHP Workaround:** (To be documented if kept long-term) +- **n8n Workflows:** Backup stored at `/opt/firefrost-codex/backups/` (when created) + +--- + +**Incident Timeline:** + +- **Feb 23, 9:00 PM:** n8n workflow failure discovered during Phase 5 deployment +- **Feb 23, 9:30 PM:** Diagnosis: Node registry corruption +- **Feb 23, 10:00 PM:** Pivot to PHP workaround (Gemini + Michael collaboration) +- **Feb 24, 12:00 AM:** PHP script operational, 361 documents synced +- **Feb 24, 9:00 AM:** Dify-Qdrant issue resolved (separate incident) +- **Feb 24, 9:30 AM:** Decision to defer n8n reset until next session + +--- + +**Created:** February 24, 2026 +**Created By:** Chronicler #26 (from Gemini's post-mortem) +**Resolution Status:** DEFERRED - Workaround operational +**Factory Reset:** Scheduled TBD + +💙🔥❄️ + +**"Sometimes the best fix is the one that waits until you have the energy to do it right."**