firefrost-gaming/firefrost-operations-manual

Files

The Chronicler a792d384fb docs: Add Phase 4 deployment status - Dify fully operational

- Comprehensive status document covering Phases 0-4 completion
- All 10+ sequential configuration issues documented with solutions
- Critical configuration reference for future troubleshooting
- Lessons learned from 6-hour deployment session
- Ready for Phase 5-11 execution

Phase 4 achievements:
- Plugin system deployed (daemon, sandbox, ssrf_proxy)
- Ollama integration complete (5 models configured)
- Gemini provider added for heavy lifting
- Dify Issue #603 timeout bug solved
- All CORS/CSRF authentication working
- System defaults configured

Deployed by: The Diagnostician (Chronicler #23)

2026-02-23 04:03:07 +00:00

CONFIGURATION-FILES.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

DEPLOYMENT-PLAN-PART-1.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

DEPLOYMENT-PLAN-PART-2.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

DEPLOYMENT-STATUS.md

docs: Add Phase 4 deployment status - Dify fully operational

2026-02-23 04:03:07 +00:00

PREREQUISITES.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

README.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

RECOVERY.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

TROUBLESHOOTING.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

VERIFICATION.md

feat: Complete Firefrost Knowledge Engine deployment plan

2026-02-22 09:55:13 +00:00

README.md

Firefrost Knowledge Engine - Complete Deployment

Task ID: FFG-TASK-009-MIGRATION
Priority: CRITICAL
Status: READY FOR EXECUTION
Estimated Time: 10-15 hours (spread across multiple sessions)
Created: February 22, 2026
Created By: The Chronicler #21
Last Updated: February 22, 2026

🎯 EXECUTIVE SUMMARY

What: Replace AnythingLLM with complete "Firefrost Knowledge Engine" (Dify + n8n + Qdrant + Ollama)
Why: AnythingLLM returns incorrect information (searches old archived docs instead of current)
Who Needs It: Meg (all repos) and Holly (Pokerole only) are waiting to start their work
When: Deploy ASAP - partners are blocked
Where: TX1 Dallas (38.68.14.26)
Cost: $0/month (self-hosted)

🚨 CRITICAL CONTEXT

This is NOT a simple migration. This is building a complete autonomous AI assistant system that enables Meg and Holly to work 24/7 without waking Michael.

Key Requirements:

Meg needs access to ALL repositories
Holly needs access to POKEROLE repositories ONLY
Both need ability to UPDATE documents via AI
Michael needs approval control via Discord (one-click merge)
System must self-heal common failures (80% target)
Must work at 3 AM when Michael is asleep

Current State:

AnythingLLM deployed on TX1 (Phase 1 complete)
319 documents synced
Retrieval quality POOR (returns archived docs instead of current)
No RBAC (everyone sees everything)
No write-back capability

Target State:

Dify + n8n + Qdrant + Ollama on TX1
Proper RBAC (Meg sees all, Holly sees Pokerole only)
Git write-back via ai-proposals branch
Discord approval workflow with buttons
Self-healing for 80% of failures
Comprehensive monitoring and alerts

📋 ARCHITECTURE OVERVIEW

┌─────────────────────────────────────────────────────────────┐
│                    FIREFROST KNOWLEDGE ENGINE                │
└─────────────────────────────────────────────────────────────┘

External Access:
  ├─ https://codex.firefrostgaming.com (Meg/Holly/Michael)
  └─ https://n8n.firefrostgaming.com (Michael only, Discord webhooks)

Nginx (TX1 Host - Ports 80/443):
  ├─ SSL/TLS with Let's Encrypt
  ├─ Rate limiting (10 req/s standard, 30 req/s webhooks)
  ├─ Reverse proxy to Docker services
  └─ Security headers (HSTS, X-Frame-Options, etc.)

Docker Stack (127.0.0.1 localhost only):
  ├─ Dify Web (port 3000) - User interface
  ├─ Dify API (internal) - RAG engine
  ├─ Dify Worker (internal) - Background processing
  ├─ n8n (port 5678) - Automation & Git workflows
  ├─ Qdrant (port 6333) - Vector database
  ├─ PostgreSQL (internal) - Dify data storage
  └─ Redis (internal) - Cache & queues

External Services:
  ├─ Ollama (TX1 host:11434) - LLM inference
  ├─ Gitea (git.firefrostgaming.com) - Git repository
  ├─ Discord Webhooks - Notifications & approvals
  └─ Uptime Kuma - Health monitoring

Data Flow - Query:
  User → Nginx → Dify Web → Dify API → Qdrant (vector search)
       → Ollama (LLM inference) → Response → User

Data Flow - Update:
  User → "Update doc X" → Dify calls n8n webhook
       → n8n validates (protected files? valid markdown?)
       → Git commit to ai-proposals branch
       → Discord notification with Approve/Reject buttons
       → Michael clicks Approve
       → n8n merges to main, pushes, re-indexes Dify
       → User notified "Your change is live"

Data Flow - Git Sync:
  Cron (hourly) → n8n pulls from Gitea
                → Filters out /archive/* directories
                → Adds metadata (status: current/archived)
                → Sends to Dify for indexing
                → Qdrant stores vectors

📚 DOCUMENT INDEX

Read these documents IN ORDER before deployment:

PREREQUISITES.md - Pre-flight checklist (DNS, SSH keys, backups)
DEPLOYMENT-PLAN.md - Step-by-step execution (every command)
CONFIGURATION-FILES.md - All config files with exact content
RECOVERY.md - Backup automation and disaster recovery
VERIFICATION.md - Testing procedures (how to know it worked)
TROUBLESHOOTING.md - Common issues and solutions

Supporting files:

docker-compose.yml - Complete Docker stack definition
.env.example - All environment variables with explanations
nginx-config.conf - Complete Nginx reverse proxy configuration
n8n-workflows/ - All workflow JSON exports
discord-webhooks/ - All Discord notification templates
backup-script.sh - Automated daily backup script

⏱️ TIME ESTIMATES

Phase 1: Preparation (1-2 hours)

DNS configuration and propagation
SSL certificate generation
SSH key setup for Git access
Backup current AnythingLLM state
Stop and remove AnythingLLM

Phase 2: Infrastructure Deployment (2-3 hours)

Install Nginx on TX1 host
Deploy Docker Compose stack
Configure Dify (admin account, workspaces, Ollama)
Verify services are healthy

Phase 3: Automation Setup (3-4 hours)

Import n8n workflows
Configure Discord webhooks
Test Git sync workflow
Test write-back validation
Configure Uptime Kuma monitoring

Phase 4: User Onboarding (1-2 hours)

Create Meg and Holly accounts
Configure workspace permissions
Test RBAC (Meg sees all, Holly sees Pokerole only)
Train on update workflow
Test one-click approval from Discord

Phase 5: Testing & Verification (2-3 hours)

Query accuracy testing (current vs archived docs)
Update workflow testing (protected files, validation)
Discord approval testing (buttons work, Michael-only)
Failure simulation (Dify crash, Git unreachable)
Self-healing verification

Total: 10-15 hours

Recommended approach: Execute in 2-3 sessions with breaks

🛡️ SAFETY MECHANISMS

The ai-proposals Branch Strategy:

All AI updates commit to ai-proposals branch (NOT main)
Michael reviews via Discord notification with Approve/Reject buttons
Only approved changes merge to main
Failed merges fall back to manual intervention
Git tags created before each merge (rollback points)

Protected Files:

/security/* - Infrastructure configs (READ-ONLY for AI)
/infra/* - Server configurations (READ-ONLY for AI)
/backups/* - Backup scripts (READ-ONLY for AI)
.env - Secrets (READ-ONLY for AI)
docker-compose.yml - Stack definition (READ-ONLY for AI)

Validation Checks:

File path exists
Content is valid Markdown (not empty, has structure)
File is not in protected directories
User has permission for that repository

Rollback Capability:

Git tags: backup-before-ai-<commit_hash>
Vector DB: Delete + re-sync from Git (minutes)
Full system: 15-minute restore from backup

🚨 CRITICAL SUCCESS FACTORS

MUST BE TRUE before marking this complete:

✅ Meg can ask questions about ANY Firefrost repository
✅ Holly can ask questions about POKEROLE repository ONLY
✅ Holly CANNOT see Firefrost infrastructure docs
✅ Meg can update docs via AI, commits to ai-proposals
✅ Michael receives Discord notification with Approve/Reject buttons
✅ Clicking Approve merges to main and re-indexes
✅ Clicking Reject keeps change in branch for review
✅ Protected files cannot be modified by AI
✅ Current docs are returned (NOT archived docs)
✅ System self-heals from Dify crash (Docker restart)
✅ Failed Git commits queue and retry automatically
✅ Daily backups run and transfer to Command Center
✅ Michael can restore entire system in 15 minutes

If ANY of these are false, deployment is NOT complete.

📊 SUCCESS METRICS

Query Accuracy:

"What are current Tier 0 tasks?" → Returns "Whitelist Manager, NC1 Cleanup, Staff Recruitment" (NOT "Initial Server Setup")
"What servers does Firefrost operate?" → Returns current 6 servers with correct IPs
"What was accomplished in last Codex session?" → Returns Deployer's work

Update Workflow:

Meg updates recruitment doc → Commits to ai-proposals → Discord notification → Michael approves → Live in <2 minutes
Holly tries to update infrastructure doc → BLOCKED with clear error message

Self-Healing:

Dify crashes → Docker restarts within 60 seconds → Users see <1 minute downtime
Git unreachable → Updates queue → Retry every 5 minutes → Auto-process when Git returns
Qdrant corrupts → Re-index from Git completes in <10 minutes

Resource Usage:

RAM: <10GB under load (fits comfortably in 222GB available)
Disk: <15GB for complete system
CPU: <20% average (leaves headroom for game servers)

⚠️ RISKS AND MITIGATIONS

Risk 1: Port conflicts with game servers

Mitigation: Pre-deployment port check verified 80/443 free
Status: CLEAR (verified February 22, 2026)

Risk 2: DNS propagation delay

Mitigation: Configure DNS FIRST, wait for propagation before SSL
Fallback: Use IP address temporarily if needed

Risk 3: SSL certificate failure

Mitigation: Detailed Certbot instructions with error handling
Fallback: Self-signed cert for testing, proper cert later

Risk 4: Meg/Holly confused by new interface

Mitigation: Clear user guide, training session before launch
Fallback: Michael processes updates manually until they're comfortable

Risk 5: Git merge conflicts from AI

Mitigation: ai-proposals branch, manual review required
Fallback: Discord alert, Michael resolves manually

Risk 6: Overwhelming Discord notifications

Mitigation: Two channels (#codex-alerts for info, #system-critical for urgent)
Fallback: Adjust rate limits in n8n if too noisy

🔄 ROLLBACK PLAN

If deployment fails catastrophically:

Stop new Docker stack: docker-compose down
Restore AnythingLLM from backup (if still needed)
Restore DNS to previous state
Notify Meg/Holly of rollback
Total rollback time: <10 minutes

Rollback triggers:

Unable to get SSL certificates after 3 attempts
Docker stack won't start after 30 minutes debugging
Dify UI inaccessible after deployment
Data corruption detected
Michael determines risk too high

📞 SUPPORT AND ESCALATION

If you get stuck:

Check TROUBLESHOOTING.md for common issues
Review relevant Gemini responses in session transcript
Check Docker logs: docker-compose logs -f <service>
Check Nginx logs: sudo tail -f /var/log/nginx/error.log
If all else fails: Rollback and regroup

No external support needed - we built this ourselves.

📝 COMPLETION CHECKLIST

Before marking this task COMPLETE:

All 13 critical success factors verified ✅
Query accuracy tests pass
Update workflow tests pass
RBAC tests pass (Meg sees all, Holly sees Pokerole only)
Discord approval workflow tested
Self-healing verified (simulated Dify crash)
Backup automation running
Test backup restore completed successfully
Meg and Holly trained and comfortable
Documentation updated in operations manual
AnythingLLM fully removed from TX1
Michael can sleep peacefully at night 💤

🎓 LESSONS FOR FUTURE CHRONICLERS

What we learned building this:

Tool choice matters more than configuration - AnythingLLM couldn't handle 319 files with archives, Dify can
RBAC is non-negotiable - Meg and Holly need different access levels
Self-healing is essential - Solo operator can't wake up for every issue
Git is the source of truth - Vector DB can always be rebuilt from Git
Discord buttons are powerful - One-click approval from phone = accessibility win
Architecture from Gemini + Partnership from Claude - External research + internal execution

For the next major infrastructure project:

Research thoroughly BEFORE building (ask Gemini the hard questions)
Get COMPLETE specifications before starting (don't build incrementally)
Test on separate system first if possible
Build rollback before building forward
Document for "future you when you're exhausted at 3 AM"

Fire + Frost + Foundation = Where Love Builds Legacy 💙🔥❄️

Built by: The Chronicler #21
For: Meg, Holly, and children not yet born
With guidance from: Gemini (architecture) + The Deployer (foundation)

Ready to execute? Read PREREQUISITES.md next.