Files
firefrost-operations-manual/docs/tasks/firefrost-codex-migration-to-open-webui
The Chronicler a792d384fb docs: Add Phase 4 deployment status - Dify fully operational
- Comprehensive status document covering Phases 0-4 completion
- All 10+ sequential configuration issues documented with solutions
- Critical configuration reference for future troubleshooting
- Lessons learned from 6-hour deployment session
- Ready for Phase 5-11 execution

Phase 4 achievements:
- Plugin system deployed (daemon, sandbox, ssrf_proxy)
- Ollama integration complete (5 models configured)
- Gemini provider added for heavy lifting
- Dify Issue #603 timeout bug solved
- All CORS/CSRF authentication working
- System defaults configured

Deployed by: The Diagnostician (Chronicler #23)
2026-02-23 04:03:07 +00:00
..

Firefrost Knowledge Engine - Complete Deployment

Task ID: FFG-TASK-009-MIGRATION
Priority: CRITICAL
Status: READY FOR EXECUTION
Estimated Time: 10-15 hours (spread across multiple sessions)
Created: February 22, 2026
Created By: The Chronicler #21
Last Updated: February 22, 2026


🎯 EXECUTIVE SUMMARY

What: Replace AnythingLLM with complete "Firefrost Knowledge Engine" (Dify + n8n + Qdrant + Ollama)
Why: AnythingLLM returns incorrect information (searches old archived docs instead of current)
Who Needs It: Meg (all repos) and Holly (Pokerole only) are waiting to start their work
When: Deploy ASAP - partners are blocked
Where: TX1 Dallas (38.68.14.26)
Cost: $0/month (self-hosted)


🚨 CRITICAL CONTEXT

This is NOT a simple migration. This is building a complete autonomous AI assistant system that enables Meg and Holly to work 24/7 without waking Michael.

Key Requirements:

  • Meg needs access to ALL repositories
  • Holly needs access to POKEROLE repositories ONLY
  • Both need ability to UPDATE documents via AI
  • Michael needs approval control via Discord (one-click merge)
  • System must self-heal common failures (80% target)
  • Must work at 3 AM when Michael is asleep

Current State:

  • AnythingLLM deployed on TX1 (Phase 1 complete)
  • 319 documents synced
  • Retrieval quality POOR (returns archived docs instead of current)
  • No RBAC (everyone sees everything)
  • No write-back capability

Target State:

  • Dify + n8n + Qdrant + Ollama on TX1
  • Proper RBAC (Meg sees all, Holly sees Pokerole only)
  • Git write-back via ai-proposals branch
  • Discord approval workflow with buttons
  • Self-healing for 80% of failures
  • Comprehensive monitoring and alerts

📋 ARCHITECTURE OVERVIEW

┌─────────────────────────────────────────────────────────────┐
│                    FIREFROST KNOWLEDGE ENGINE                │
└─────────────────────────────────────────────────────────────┘

External Access:
  ├─ https://codex.firefrostgaming.com (Meg/Holly/Michael)
  └─ https://n8n.firefrostgaming.com (Michael only, Discord webhooks)

Nginx (TX1 Host - Ports 80/443):
  ├─ SSL/TLS with Let's Encrypt
  ├─ Rate limiting (10 req/s standard, 30 req/s webhooks)
  ├─ Reverse proxy to Docker services
  └─ Security headers (HSTS, X-Frame-Options, etc.)

Docker Stack (127.0.0.1 localhost only):
  ├─ Dify Web (port 3000) - User interface
  ├─ Dify API (internal) - RAG engine
  ├─ Dify Worker (internal) - Background processing
  ├─ n8n (port 5678) - Automation & Git workflows
  ├─ Qdrant (port 6333) - Vector database
  ├─ PostgreSQL (internal) - Dify data storage
  └─ Redis (internal) - Cache & queues

External Services:
  ├─ Ollama (TX1 host:11434) - LLM inference
  ├─ Gitea (git.firefrostgaming.com) - Git repository
  ├─ Discord Webhooks - Notifications & approvals
  └─ Uptime Kuma - Health monitoring

Data Flow - Query:
  User → Nginx → Dify Web → Dify API → Qdrant (vector search)
       → Ollama (LLM inference) → Response → User

Data Flow - Update:
  User → "Update doc X" → Dify calls n8n webhook
       → n8n validates (protected files? valid markdown?)
       → Git commit to ai-proposals branch
       → Discord notification with Approve/Reject buttons
       → Michael clicks Approve
       → n8n merges to main, pushes, re-indexes Dify
       → User notified "Your change is live"

Data Flow - Git Sync:
  Cron (hourly) → n8n pulls from Gitea
                → Filters out /archive/* directories
                → Adds metadata (status: current/archived)
                → Sends to Dify for indexing
                → Qdrant stores vectors

📚 DOCUMENT INDEX

Read these documents IN ORDER before deployment:

  1. PREREQUISITES.md - Pre-flight checklist (DNS, SSH keys, backups)
  2. DEPLOYMENT-PLAN.md - Step-by-step execution (every command)
  3. CONFIGURATION-FILES.md - All config files with exact content
  4. RECOVERY.md - Backup automation and disaster recovery
  5. VERIFICATION.md - Testing procedures (how to know it worked)
  6. TROUBLESHOOTING.md - Common issues and solutions

Supporting files:

  • docker-compose.yml - Complete Docker stack definition
  • .env.example - All environment variables with explanations
  • nginx-config.conf - Complete Nginx reverse proxy configuration
  • n8n-workflows/ - All workflow JSON exports
  • discord-webhooks/ - All Discord notification templates
  • backup-script.sh - Automated daily backup script

⏱️ TIME ESTIMATES

Phase 1: Preparation (1-2 hours)

  • DNS configuration and propagation
  • SSL certificate generation
  • SSH key setup for Git access
  • Backup current AnythingLLM state
  • Stop and remove AnythingLLM

Phase 2: Infrastructure Deployment (2-3 hours)

  • Install Nginx on TX1 host
  • Deploy Docker Compose stack
  • Configure Dify (admin account, workspaces, Ollama)
  • Verify services are healthy

Phase 3: Automation Setup (3-4 hours)

  • Import n8n workflows
  • Configure Discord webhooks
  • Test Git sync workflow
  • Test write-back validation
  • Configure Uptime Kuma monitoring

Phase 4: User Onboarding (1-2 hours)

  • Create Meg and Holly accounts
  • Configure workspace permissions
  • Test RBAC (Meg sees all, Holly sees Pokerole only)
  • Train on update workflow
  • Test one-click approval from Discord

Phase 5: Testing & Verification (2-3 hours)

  • Query accuracy testing (current vs archived docs)
  • Update workflow testing (protected files, validation)
  • Discord approval testing (buttons work, Michael-only)
  • Failure simulation (Dify crash, Git unreachable)
  • Self-healing verification

Total: 10-15 hours

Recommended approach: Execute in 2-3 sessions with breaks


🛡️ SAFETY MECHANISMS

The ai-proposals Branch Strategy:

  • All AI updates commit to ai-proposals branch (NOT main)
  • Michael reviews via Discord notification with Approve/Reject buttons
  • Only approved changes merge to main
  • Failed merges fall back to manual intervention
  • Git tags created before each merge (rollback points)

Protected Files:

  • /security/* - Infrastructure configs (READ-ONLY for AI)
  • /infra/* - Server configurations (READ-ONLY for AI)
  • /backups/* - Backup scripts (READ-ONLY for AI)
  • .env - Secrets (READ-ONLY for AI)
  • docker-compose.yml - Stack definition (READ-ONLY for AI)

Validation Checks:

  • File path exists
  • Content is valid Markdown (not empty, has structure)
  • File is not in protected directories
  • User has permission for that repository

Rollback Capability:

  • Git tags: backup-before-ai-<commit_hash>
  • Vector DB: Delete + re-sync from Git (minutes)
  • Full system: 15-minute restore from backup

🚨 CRITICAL SUCCESS FACTORS

MUST BE TRUE before marking this complete:

  1. Meg can ask questions about ANY Firefrost repository
  2. Holly can ask questions about POKEROLE repository ONLY
  3. Holly CANNOT see Firefrost infrastructure docs
  4. Meg can update docs via AI, commits to ai-proposals
  5. Michael receives Discord notification with Approve/Reject buttons
  6. Clicking Approve merges to main and re-indexes
  7. Clicking Reject keeps change in branch for review
  8. Protected files cannot be modified by AI
  9. Current docs are returned (NOT archived docs)
  10. System self-heals from Dify crash (Docker restart)
  11. Failed Git commits queue and retry automatically
  12. Daily backups run and transfer to Command Center
  13. Michael can restore entire system in 15 minutes

If ANY of these are false, deployment is NOT complete.


📊 SUCCESS METRICS

Query Accuracy:

  • "What are current Tier 0 tasks?" → Returns "Whitelist Manager, NC1 Cleanup, Staff Recruitment" (NOT "Initial Server Setup")
  • "What servers does Firefrost operate?" → Returns current 6 servers with correct IPs
  • "What was accomplished in last Codex session?" → Returns Deployer's work

Update Workflow:

  • Meg updates recruitment doc → Commits to ai-proposals → Discord notification → Michael approves → Live in <2 minutes
  • Holly tries to update infrastructure doc → BLOCKED with clear error message

Self-Healing:

  • Dify crashes → Docker restarts within 60 seconds → Users see <1 minute downtime
  • Git unreachable → Updates queue → Retry every 5 minutes → Auto-process when Git returns
  • Qdrant corrupts → Re-index from Git completes in <10 minutes

Resource Usage:

  • RAM: <10GB under load (fits comfortably in 222GB available)
  • Disk: <15GB for complete system
  • CPU: <20% average (leaves headroom for game servers)

⚠️ RISKS AND MITIGATIONS

Risk 1: Port conflicts with game servers

  • Mitigation: Pre-deployment port check verified 80/443 free
  • Status: CLEAR (verified February 22, 2026)

Risk 2: DNS propagation delay

  • Mitigation: Configure DNS FIRST, wait for propagation before SSL
  • Fallback: Use IP address temporarily if needed

Risk 3: SSL certificate failure

  • Mitigation: Detailed Certbot instructions with error handling
  • Fallback: Self-signed cert for testing, proper cert later

Risk 4: Meg/Holly confused by new interface

  • Mitigation: Clear user guide, training session before launch
  • Fallback: Michael processes updates manually until they're comfortable

Risk 5: Git merge conflicts from AI

  • Mitigation: ai-proposals branch, manual review required
  • Fallback: Discord alert, Michael resolves manually

Risk 6: Overwhelming Discord notifications

  • Mitigation: Two channels (#codex-alerts for info, #system-critical for urgent)
  • Fallback: Adjust rate limits in n8n if too noisy

🔄 ROLLBACK PLAN

If deployment fails catastrophically:

  1. Stop new Docker stack: docker-compose down
  2. Restore AnythingLLM from backup (if still needed)
  3. Restore DNS to previous state
  4. Notify Meg/Holly of rollback
  5. Total rollback time: <10 minutes

Rollback triggers:

  • Unable to get SSL certificates after 3 attempts
  • Docker stack won't start after 30 minutes debugging
  • Dify UI inaccessible after deployment
  • Data corruption detected
  • Michael determines risk too high

📞 SUPPORT AND ESCALATION

If you get stuck:

  1. Check TROUBLESHOOTING.md for common issues
  2. Review relevant Gemini responses in session transcript
  3. Check Docker logs: docker-compose logs -f <service>
  4. Check Nginx logs: sudo tail -f /var/log/nginx/error.log
  5. If all else fails: Rollback and regroup

No external support needed - we built this ourselves.


📝 COMPLETION CHECKLIST

Before marking this task COMPLETE:

  • All 13 critical success factors verified
  • Query accuracy tests pass
  • Update workflow tests pass
  • RBAC tests pass (Meg sees all, Holly sees Pokerole only)
  • Discord approval workflow tested
  • Self-healing verified (simulated Dify crash)
  • Backup automation running
  • Test backup restore completed successfully
  • Meg and Holly trained and comfortable
  • Documentation updated in operations manual
  • AnythingLLM fully removed from TX1
  • Michael can sleep peacefully at night 💤

🎓 LESSONS FOR FUTURE CHRONICLERS

What we learned building this:

  1. Tool choice matters more than configuration - AnythingLLM couldn't handle 319 files with archives, Dify can
  2. RBAC is non-negotiable - Meg and Holly need different access levels
  3. Self-healing is essential - Solo operator can't wake up for every issue
  4. Git is the source of truth - Vector DB can always be rebuilt from Git
  5. Discord buttons are powerful - One-click approval from phone = accessibility win
  6. Architecture from Gemini + Partnership from Claude - External research + internal execution

For the next major infrastructure project:

  • Research thoroughly BEFORE building (ask Gemini the hard questions)
  • Get COMPLETE specifications before starting (don't build incrementally)
  • Test on separate system first if possible
  • Build rollback before building forward
  • Document for "future you when you're exhausted at 3 AM"

Fire + Frost + Foundation = Where Love Builds Legacy 💙🔥❄️

Built by: The Chronicler #21
For: Meg, Holly, and children not yet born
With guidance from: Gemini (architecture) + The Deployer (foundation)


Ready to execute? Read PREREQUISITES.md next.