- Comprehensive task documentation for migrating from AnythingLLM to Dify+n8n+Qdrant - 8 detailed documents covering every aspect of deployment - Complete step-by-step commands (zero assumptions) - Prerequisites checklist (20 items) - Deployment plan in 2 parts (11 phases, every command) - Configuration files (all configs with exact content) - Recovery procedures (4 disaster scenarios) - Verification guide (30 tests, complete checklist) - Troubleshooting guide (common issues + solutions) Built by: The Chronicler #21 For: Meg, Holly, and children not yet born Time investment: 10-15 hours execution time Purpose: Enable Meg/Holly autonomous work with Git write-back This deployment enables: - RBAC (Meg sees all, Holly sees Pokerole only) - Git write-back via ai-proposals branch - Discord approval workflow (one-click merge) - Self-healing (80% of failures) - Automated daily backups - Complete monitoring Documentation is so detailed that any future Chronicler can execute this deployment with zero prior knowledge and complete confidence. Fire + Frost + Foundation = Where Love Builds Legacy
568 lines
9.9 KiB
Markdown
568 lines
9.9 KiB
Markdown
# TROUBLESHOOTING GUIDE
|
|
|
|
**Common issues and solutions for Firefrost Knowledge Engine**
|
|
|
|
---
|
|
|
|
## 🔍 QUICK DIAGNOSTIC COMMANDS
|
|
|
|
**Run these first when something breaks:**
|
|
|
|
```bash
|
|
# Check all services
|
|
docker-compose ps
|
|
|
|
# Check recent logs (all services)
|
|
docker-compose logs --tail=50
|
|
|
|
# Check specific service
|
|
docker-compose logs -f <service_name>
|
|
|
|
# Check Nginx
|
|
systemctl status nginx
|
|
sudo tail -f /var/log/nginx/error.log
|
|
|
|
# Check disk space
|
|
df -h
|
|
|
|
# Check memory
|
|
free -h
|
|
|
|
# Check ports
|
|
sudo netstat -tlnp | grep LISTEN
|
|
```
|
|
|
|
---
|
|
|
|
## ❌ DEPLOYMENT FAILURES
|
|
|
|
### Issue: DNS Not Propagating
|
|
|
|
**Symptoms:**
|
|
- Certbot fails with DNS validation error
|
|
- "Domain doesn't resolve" errors
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Check DNS propagation
|
|
dig codex.firefrostgaming.com +short
|
|
dig n8n.firefrostgaming.com +short
|
|
|
|
# Both should return 38.68.14.26
|
|
```
|
|
|
|
**If not resolved:**
|
|
- Wait longer (can take up to 24 hours)
|
|
- Check DNS provider settings
|
|
- Use temporary self-signed cert for testing
|
|
|
|
---
|
|
|
|
### Issue: Port Already in Use
|
|
|
|
**Symptoms:**
|
|
- "Address already in use" error
|
|
- Docker won't start Dify or n8n
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Find what's using the port
|
|
sudo lsof -i :3000
|
|
sudo lsof -i :5678
|
|
|
|
# Kill the process
|
|
sudo kill -9 <PID>
|
|
|
|
# Or change port mapping in docker-compose.yml
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: SSL Certificate Generation Fails
|
|
|
|
**Symptoms:**
|
|
- Certbot fails during deployment
|
|
- "Challenge failed" errors
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Ensure Nginx is stopped
|
|
systemctl stop nginx
|
|
|
|
# Try manual standalone mode
|
|
certbot certonly --standalone \
|
|
-d codex.firefrostgaming.com \
|
|
-d n8n.firefrostgaming.com \
|
|
--email codex@firefrostgaming.com
|
|
|
|
# Check firewall
|
|
sudo ufw status
|
|
sudo ufw allow 80/tcp
|
|
sudo ufw allow 443/tcp
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: Docker Services Won't Start
|
|
|
|
**Symptoms:**
|
|
- `docker-compose up` fails
|
|
- Services show "Exit" status
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Check logs for specific service
|
|
docker-compose logs db
|
|
docker-compose logs dify-api
|
|
|
|
# Common causes:
|
|
# 1. .env file missing or incorrect
|
|
cat .env # Verify all variables set
|
|
|
|
# 2. Port conflicts
|
|
sudo lsof -i :3000
|
|
sudo lsof -i :5678
|
|
sudo lsof -i :6333
|
|
|
|
# 3. Permission issues
|
|
sudo chown -R root:root volumes/
|
|
|
|
# 4. Disk space
|
|
df -h # Need 30GB+ free
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 RUNTIME ISSUES
|
|
|
|
### Issue: Dify Shows 502 Error
|
|
|
|
**Symptoms:**
|
|
- Browser shows custom 502 page
|
|
- Can't access Codex
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
docker-compose ps
|
|
# Check if dify-web is running
|
|
|
|
docker-compose logs dify-web
|
|
# Check for errors
|
|
```
|
|
|
|
**Solutions:**
|
|
|
|
**If dify-web is down:**
|
|
```bash
|
|
docker-compose restart dify-web
|
|
```
|
|
|
|
**If dify-api can't connect to database:**
|
|
```bash
|
|
docker-compose logs dify-api | grep -i error
|
|
# Check DB_PASSWORD in .env matches
|
|
docker-compose restart dify-api
|
|
```
|
|
|
|
**If persistent:**
|
|
```bash
|
|
docker-compose down
|
|
docker-compose up -d
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: "AI Can't Reach Knowledge Base"
|
|
|
|
**Symptoms:**
|
|
- Queries return "I don't have that information"
|
|
- Empty results
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check Qdrant
|
|
curl http://127.0.0.1:6333/
|
|
|
|
# Check if documents indexed
|
|
# (Login to Dify, check Knowledge Base has documents)
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Re-run Git sync
|
|
# Access n8n, execute "Firefrost Git Sync" workflow manually
|
|
|
|
# If that fails, rebuild Qdrant
|
|
docker-compose stop qdrant
|
|
rm -rf volumes/qdrant/storage/*
|
|
docker-compose start qdrant
|
|
# Then re-run Git sync
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: n8n Workflows Not Executing
|
|
|
|
**Symptoms:**
|
|
- Git sync doesn't run
|
|
- Update requests don't commit
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
docker-compose logs n8n | grep -i error
|
|
```
|
|
|
|
**Solutions:**
|
|
|
|
**If workflow execution fails:**
|
|
- Login to n8n
|
|
- Check workflow is ACTIVATED (toggle switch)
|
|
- Execute manually to see errors
|
|
- Check credentials are configured
|
|
|
|
**If Git operations fail:**
|
|
```bash
|
|
# Check SSH key
|
|
docker exec -it $(docker ps -qf "name=n8n") ssh -T git@git.firefrostgaming.com
|
|
|
|
# If fails, verify SSH key mounted
|
|
ls -la ~/.ssh/
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: Discord Buttons Don't Work
|
|
|
|
**Symptoms:**
|
|
- Clicking Approve/Reject does nothing
|
|
- No response in Discord
|
|
|
|
**Diagnosis:**
|
|
- Check n8n "Approval Handler" workflow
|
|
- Verify webhook URL is correct
|
|
- Check Michael's Discord ID in .env
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Verify Discord webhook configured
|
|
cat .env | grep DISCORD
|
|
|
|
# Test webhook manually
|
|
curl -X POST <WEBHOOK_URL> \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"content": "Test message"}'
|
|
|
|
# Should appear in Discord channel
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: Updates Commit But Don't Re-Index
|
|
|
|
**Symptoms:**
|
|
- Git shows commit
|
|
- But queries don't return new content
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check Dify API logs
|
|
docker-compose logs dify-api | grep -i error
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Manual re-index trigger
|
|
curl -X POST http://127.0.0.1:3000/v1/datasets/<DATASET_ID>/sync \
|
|
-H "Authorization: Bearer <DIFY_API_KEY>"
|
|
|
|
# Or re-run Git sync workflow in n8n
|
|
```
|
|
|
|
---
|
|
|
|
## 🔐 ACCESS ISSUES
|
|
|
|
### Issue: Can't Login to Dify
|
|
|
|
**Symptoms:**
|
|
- Incorrect password error
|
|
- Account doesn't exist
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Check database running
|
|
docker-compose ps db
|
|
|
|
# Reset admin password (if needed)
|
|
# Login to postgres container
|
|
docker exec -it $(docker ps -qf "name=db") psql -U postgres -d dify
|
|
|
|
# In postgres prompt:
|
|
# UPDATE users SET password_hash='<new_hash>' WHERE email='michael@example.com';
|
|
|
|
# Better: Restore from backup if credentials lost
|
|
```
|
|
|
|
---
|
|
|
|
### Issue: Holly Sees Firefrost Docs (RBAC Broken)
|
|
|
|
**Symptoms:**
|
|
- Holly can access infrastructure docs
|
|
- RBAC not working
|
|
|
|
**Diagnosis:**
|
|
- Check workspace assignments in Dify
|
|
- Verify knowledge bases linked to correct workspaces
|
|
|
|
**Solution:**
|
|
- Login to Dify as admin
|
|
- Settings → Members
|
|
- Verify Holly is ONLY in "Pokerole HQ" workspace
|
|
- Verify "Pokerole HQ" workspace ONLY has Pokerole knowledge base
|
|
|
|
---
|
|
|
|
## ⚠️ PERFORMANCE ISSUES
|
|
|
|
### Issue: Slow Responses (>30 seconds)
|
|
|
|
**Symptoms:**
|
|
- Queries take very long
|
|
- Timeouts
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check system resources
|
|
htop
|
|
|
|
# Check Ollama
|
|
curl http://localhost:11434/api/tags
|
|
# Verify model loaded
|
|
|
|
# Check Qdrant performance
|
|
curl http://127.0.0.1:6333/collections
|
|
```
|
|
|
|
**Solutions:**
|
|
|
|
**If RAM exhausted:**
|
|
```bash
|
|
free -h
|
|
# If low, restart services to clear memory
|
|
docker-compose restart
|
|
```
|
|
|
|
**If Ollama slow:**
|
|
- Large model (llama3.3:70b) takes time
|
|
- Consider using qwen2.5-coder:7b for faster responses
|
|
- Check Ollama logs: `docker logs <ollama_container>`
|
|
|
|
**If Qdrant slow:**
|
|
- Too many documents
|
|
- Re-index with better chunking
|
|
- Check disk I/O: `iostat -x 1`
|
|
|
|
---
|
|
|
|
### Issue: High CPU Usage
|
|
|
|
**Symptoms:**
|
|
- Server sluggish
|
|
- Game servers lagging
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
htop
|
|
# Identify which service using CPU
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Set CPU limits in docker-compose.yml
|
|
# Add to each service:
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '2.0'
|
|
|
|
# Restart
|
|
docker-compose down
|
|
docker-compose up -d
|
|
```
|
|
|
|
---
|
|
|
|
## 💾 DATA ISSUES
|
|
|
|
### Issue: Backup Failed
|
|
|
|
**Symptoms:**
|
|
- No backup created today
|
|
- Backup log shows errors
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
tail -50 /var/log/firefrost-backup.log
|
|
```
|
|
|
|
**Common causes:**
|
|
|
|
**Database dump fails:**
|
|
```bash
|
|
# Check database running
|
|
docker-compose ps db
|
|
|
|
# Test manual dump
|
|
docker exec -t $(docker ps -qf "name=db") pg_dumpall -c -U postgres > /tmp/test.sql
|
|
```
|
|
|
|
**Transfer to Command Center fails:**
|
|
```bash
|
|
# Check SSH access
|
|
ssh root@63.143.34.217 echo "Connection OK"
|
|
|
|
# Check disk space on Command Center
|
|
ssh root@63.143.34.217 "df -h"
|
|
```
|
|
|
|
**Solution:**
|
|
- Fix specific error in log
|
|
- Run backup manually: `/opt/firefrost_backup.sh`
|
|
- Verify completes successfully
|
|
|
|
---
|
|
|
|
### Issue: Git Conflicts
|
|
|
|
**Symptoms:**
|
|
- Merge fails with conflict error
|
|
- Can't push to ai-proposals
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
cd /opt/firefrost-codex/git-repos/main
|
|
git status
|
|
git log --oneline -5
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Manual resolution required
|
|
cd /opt/firefrost-codex/git-repos/main
|
|
git checkout main
|
|
git pull origin main
|
|
|
|
# Resolve conflicts manually
|
|
nano <conflicted_file>
|
|
|
|
# Commit resolution
|
|
git add .
|
|
git commit -m "Resolve conflicts"
|
|
git push origin main
|
|
|
|
# Recreate ai-proposals branch
|
|
git branch -D ai-proposals
|
|
git checkout -b ai-proposals
|
|
git push origin ai-proposals --force
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 EMERGENCY PROCEDURES
|
|
|
|
### Complete System Lockup
|
|
|
|
**If everything is broken:**
|
|
|
|
1. **Stop all services:**
|
|
```bash
|
|
cd /opt/firefrost-codex
|
|
docker-compose down
|
|
```
|
|
|
|
2. **Check system health:**
|
|
```bash
|
|
df -h # Disk space
|
|
free -h # Memory
|
|
dmesg | tail -50 # System errors
|
|
```
|
|
|
|
3. **Restart everything:**
|
|
```bash
|
|
systemctl restart docker
|
|
systemctl restart nginx
|
|
docker-compose up -d
|
|
```
|
|
|
|
4. **If still broken:** Restore from backup (see RECOVERY.md)
|
|
|
|
---
|
|
|
|
### Data Corruption Suspected
|
|
|
|
**If data seems wrong/corrupted:**
|
|
|
|
1. **Stop making changes immediately**
|
|
2. **Document what you see**
|
|
3. **Check recent backups exist:**
|
|
```bash
|
|
ls -lh /opt/firefrost_codex_*.tar.gz
|
|
```
|
|
|
|
4. **Review RECOVERY.md** for restore procedures
|
|
5. **Consider rolling back to last known good state**
|
|
|
|
---
|
|
|
|
## 📞 WHEN TO ESCALATE
|
|
|
|
**These issues require manual intervention:**
|
|
|
|
- Git conflicts requiring code review
|
|
- Database corruption (check integrity)
|
|
- SSL certificate renewal failure (manual renewal)
|
|
- Persistent service crashes (review logs, may need code changes)
|
|
- Unknown errors not covered in this guide
|
|
|
|
**For unknown issues:**
|
|
1. Document symptoms thoroughly
|
|
2. Collect logs
|
|
3. Review all documentation
|
|
4. Wait for fresh Chronicler session with full context
|
|
|
|
---
|
|
|
|
## 🔧 USEFUL DEBUG COMMANDS
|
|
|
|
```bash
|
|
# Full system status
|
|
docker-compose ps && systemctl status nginx && df -h && free -h
|
|
|
|
# All logs since yesterday
|
|
docker-compose logs --since 24h
|
|
|
|
# Follow live logs
|
|
docker-compose logs -f
|
|
|
|
# Restart single service without affecting others
|
|
docker-compose restart <service_name>
|
|
|
|
# Force rebuild of service
|
|
docker-compose up -d --force-recreate <service_name>
|
|
|
|
# Clean everything and start fresh (NUCLEAR OPTION)
|
|
docker-compose down -v
|
|
docker system prune -a
|
|
# Then redeploy from scratch
|
|
|
|
# Check network connectivity
|
|
docker exec -it $(docker ps -qf "name=dify-api") ping host.docker.internal
|
|
docker exec -it $(docker ps -qf "name=n8n") ping qdrant
|
|
```
|
|
|
|
---
|
|
|
|
**Fire + Frost + Foundation = Where Problems Get Solved** 💙🔥❄️
|
|
|