Files
firefrost-operations-manual/docs/tasks/firefrost-codex-migration-to-open-webui/TROUBLESHOOTING.md
The Chronicler #21 2e953ce312 feat: Complete Firefrost Knowledge Engine deployment plan
- Comprehensive task documentation for migrating from AnythingLLM to Dify+n8n+Qdrant
- 8 detailed documents covering every aspect of deployment
- Complete step-by-step commands (zero assumptions)
- Prerequisites checklist (20 items)
- Deployment plan in 2 parts (11 phases, every command)
- Configuration files (all configs with exact content)
- Recovery procedures (4 disaster scenarios)
- Verification guide (30 tests, complete checklist)
- Troubleshooting guide (common issues + solutions)

Built by: The Chronicler #21
For: Meg, Holly, and children not yet born
Time investment: 10-15 hours execution time
Purpose: Enable Meg/Holly autonomous work with Git write-back

This deployment enables:
- RBAC (Meg sees all, Holly sees Pokerole only)
- Git write-back via ai-proposals branch
- Discord approval workflow (one-click merge)
- Self-healing (80% of failures)
- Automated daily backups
- Complete monitoring

Documentation is so detailed that any future Chronicler can execute
this deployment with zero prior knowledge and complete confidence.

Fire + Frost + Foundation = Where Love Builds Legacy
2026-02-22 09:55:13 +00:00

568 lines
9.9 KiB
Markdown

# TROUBLESHOOTING GUIDE
**Common issues and solutions for Firefrost Knowledge Engine**
---
## 🔍 QUICK DIAGNOSTIC COMMANDS
**Run these first when something breaks:**
```bash
# Check all services
docker-compose ps
# Check recent logs (all services)
docker-compose logs --tail=50
# Check specific service
docker-compose logs -f <service_name>
# Check Nginx
systemctl status nginx
sudo tail -f /var/log/nginx/error.log
# Check disk space
df -h
# Check memory
free -h
# Check ports
sudo netstat -tlnp | grep LISTEN
```
---
## ❌ DEPLOYMENT FAILURES
### Issue: DNS Not Propagating
**Symptoms:**
- Certbot fails with DNS validation error
- "Domain doesn't resolve" errors
**Solution:**
```bash
# Check DNS propagation
dig codex.firefrostgaming.com +short
dig n8n.firefrostgaming.com +short
# Both should return 38.68.14.26
```
**If not resolved:**
- Wait longer (can take up to 24 hours)
- Check DNS provider settings
- Use temporary self-signed cert for testing
---
### Issue: Port Already in Use
**Symptoms:**
- "Address already in use" error
- Docker won't start Dify or n8n
**Solution:**
```bash
# Find what's using the port
sudo lsof -i :3000
sudo lsof -i :5678
# Kill the process
sudo kill -9 <PID>
# Or change port mapping in docker-compose.yml
```
---
### Issue: SSL Certificate Generation Fails
**Symptoms:**
- Certbot fails during deployment
- "Challenge failed" errors
**Solution:**
```bash
# Ensure Nginx is stopped
systemctl stop nginx
# Try manual standalone mode
certbot certonly --standalone \
-d codex.firefrostgaming.com \
-d n8n.firefrostgaming.com \
--email codex@firefrostgaming.com
# Check firewall
sudo ufw status
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
```
---
### Issue: Docker Services Won't Start
**Symptoms:**
- `docker-compose up` fails
- Services show "Exit" status
**Solution:**
```bash
# Check logs for specific service
docker-compose logs db
docker-compose logs dify-api
# Common causes:
# 1. .env file missing or incorrect
cat .env # Verify all variables set
# 2. Port conflicts
sudo lsof -i :3000
sudo lsof -i :5678
sudo lsof -i :6333
# 3. Permission issues
sudo chown -R root:root volumes/
# 4. Disk space
df -h # Need 30GB+ free
```
---
## 🔄 RUNTIME ISSUES
### Issue: Dify Shows 502 Error
**Symptoms:**
- Browser shows custom 502 page
- Can't access Codex
**Diagnosis:**
```bash
docker-compose ps
# Check if dify-web is running
docker-compose logs dify-web
# Check for errors
```
**Solutions:**
**If dify-web is down:**
```bash
docker-compose restart dify-web
```
**If dify-api can't connect to database:**
```bash
docker-compose logs dify-api | grep -i error
# Check DB_PASSWORD in .env matches
docker-compose restart dify-api
```
**If persistent:**
```bash
docker-compose down
docker-compose up -d
```
---
### Issue: "AI Can't Reach Knowledge Base"
**Symptoms:**
- Queries return "I don't have that information"
- Empty results
**Diagnosis:**
```bash
# Check Qdrant
curl http://127.0.0.1:6333/
# Check if documents indexed
# (Login to Dify, check Knowledge Base has documents)
```
**Solution:**
```bash
# Re-run Git sync
# Access n8n, execute "Firefrost Git Sync" workflow manually
# If that fails, rebuild Qdrant
docker-compose stop qdrant
rm -rf volumes/qdrant/storage/*
docker-compose start qdrant
# Then re-run Git sync
```
---
### Issue: n8n Workflows Not Executing
**Symptoms:**
- Git sync doesn't run
- Update requests don't commit
**Diagnosis:**
```bash
docker-compose logs n8n | grep -i error
```
**Solutions:**
**If workflow execution fails:**
- Login to n8n
- Check workflow is ACTIVATED (toggle switch)
- Execute manually to see errors
- Check credentials are configured
**If Git operations fail:**
```bash
# Check SSH key
docker exec -it $(docker ps -qf "name=n8n") ssh -T git@git.firefrostgaming.com
# If fails, verify SSH key mounted
ls -la ~/.ssh/
```
---
### Issue: Discord Buttons Don't Work
**Symptoms:**
- Clicking Approve/Reject does nothing
- No response in Discord
**Diagnosis:**
- Check n8n "Approval Handler" workflow
- Verify webhook URL is correct
- Check Michael's Discord ID in .env
**Solution:**
```bash
# Verify Discord webhook configured
cat .env | grep DISCORD
# Test webhook manually
curl -X POST <WEBHOOK_URL> \
-H "Content-Type: application/json" \
-d '{"content": "Test message"}'
# Should appear in Discord channel
```
---
### Issue: Updates Commit But Don't Re-Index
**Symptoms:**
- Git shows commit
- But queries don't return new content
**Diagnosis:**
```bash
# Check Dify API logs
docker-compose logs dify-api | grep -i error
```
**Solution:**
```bash
# Manual re-index trigger
curl -X POST http://127.0.0.1:3000/v1/datasets/<DATASET_ID>/sync \
-H "Authorization: Bearer <DIFY_API_KEY>"
# Or re-run Git sync workflow in n8n
```
---
## 🔐 ACCESS ISSUES
### Issue: Can't Login to Dify
**Symptoms:**
- Incorrect password error
- Account doesn't exist
**Solution:**
```bash
# Check database running
docker-compose ps db
# Reset admin password (if needed)
# Login to postgres container
docker exec -it $(docker ps -qf "name=db") psql -U postgres -d dify
# In postgres prompt:
# UPDATE users SET password_hash='<new_hash>' WHERE email='michael@example.com';
# Better: Restore from backup if credentials lost
```
---
### Issue: Holly Sees Firefrost Docs (RBAC Broken)
**Symptoms:**
- Holly can access infrastructure docs
- RBAC not working
**Diagnosis:**
- Check workspace assignments in Dify
- Verify knowledge bases linked to correct workspaces
**Solution:**
- Login to Dify as admin
- Settings → Members
- Verify Holly is ONLY in "Pokerole HQ" workspace
- Verify "Pokerole HQ" workspace ONLY has Pokerole knowledge base
---
## ⚠️ PERFORMANCE ISSUES
### Issue: Slow Responses (>30 seconds)
**Symptoms:**
- Queries take very long
- Timeouts
**Diagnosis:**
```bash
# Check system resources
htop
# Check Ollama
curl http://localhost:11434/api/tags
# Verify model loaded
# Check Qdrant performance
curl http://127.0.0.1:6333/collections
```
**Solutions:**
**If RAM exhausted:**
```bash
free -h
# If low, restart services to clear memory
docker-compose restart
```
**If Ollama slow:**
- Large model (llama3.3:70b) takes time
- Consider using qwen2.5-coder:7b for faster responses
- Check Ollama logs: `docker logs <ollama_container>`
**If Qdrant slow:**
- Too many documents
- Re-index with better chunking
- Check disk I/O: `iostat -x 1`
---
### Issue: High CPU Usage
**Symptoms:**
- Server sluggish
- Game servers lagging
**Diagnosis:**
```bash
htop
# Identify which service using CPU
```
**Solution:**
```bash
# Set CPU limits in docker-compose.yml
# Add to each service:
deploy:
resources:
limits:
cpus: '2.0'
# Restart
docker-compose down
docker-compose up -d
```
---
## 💾 DATA ISSUES
### Issue: Backup Failed
**Symptoms:**
- No backup created today
- Backup log shows errors
**Diagnosis:**
```bash
tail -50 /var/log/firefrost-backup.log
```
**Common causes:**
**Database dump fails:**
```bash
# Check database running
docker-compose ps db
# Test manual dump
docker exec -t $(docker ps -qf "name=db") pg_dumpall -c -U postgres > /tmp/test.sql
```
**Transfer to Command Center fails:**
```bash
# Check SSH access
ssh root@63.143.34.217 echo "Connection OK"
# Check disk space on Command Center
ssh root@63.143.34.217 "df -h"
```
**Solution:**
- Fix specific error in log
- Run backup manually: `/opt/firefrost_backup.sh`
- Verify completes successfully
---
### Issue: Git Conflicts
**Symptoms:**
- Merge fails with conflict error
- Can't push to ai-proposals
**Diagnosis:**
```bash
cd /opt/firefrost-codex/git-repos/main
git status
git log --oneline -5
```
**Solution:**
```bash
# Manual resolution required
cd /opt/firefrost-codex/git-repos/main
git checkout main
git pull origin main
# Resolve conflicts manually
nano <conflicted_file>
# Commit resolution
git add .
git commit -m "Resolve conflicts"
git push origin main
# Recreate ai-proposals branch
git branch -D ai-proposals
git checkout -b ai-proposals
git push origin ai-proposals --force
```
---
## 🚨 EMERGENCY PROCEDURES
### Complete System Lockup
**If everything is broken:**
1. **Stop all services:**
```bash
cd /opt/firefrost-codex
docker-compose down
```
2. **Check system health:**
```bash
df -h # Disk space
free -h # Memory
dmesg | tail -50 # System errors
```
3. **Restart everything:**
```bash
systemctl restart docker
systemctl restart nginx
docker-compose up -d
```
4. **If still broken:** Restore from backup (see RECOVERY.md)
---
### Data Corruption Suspected
**If data seems wrong/corrupted:**
1. **Stop making changes immediately**
2. **Document what you see**
3. **Check recent backups exist:**
```bash
ls -lh /opt/firefrost_codex_*.tar.gz
```
4. **Review RECOVERY.md** for restore procedures
5. **Consider rolling back to last known good state**
---
## 📞 WHEN TO ESCALATE
**These issues require manual intervention:**
- Git conflicts requiring code review
- Database corruption (check integrity)
- SSL certificate renewal failure (manual renewal)
- Persistent service crashes (review logs, may need code changes)
- Unknown errors not covered in this guide
**For unknown issues:**
1. Document symptoms thoroughly
2. Collect logs
3. Review all documentation
4. Wait for fresh Chronicler session with full context
---
## 🔧 USEFUL DEBUG COMMANDS
```bash
# Full system status
docker-compose ps && systemctl status nginx && df -h && free -h
# All logs since yesterday
docker-compose logs --since 24h
# Follow live logs
docker-compose logs -f
# Restart single service without affecting others
docker-compose restart <service_name>
# Force rebuild of service
docker-compose up -d --force-recreate <service_name>
# Clean everything and start fresh (NUCLEAR OPTION)
docker-compose down -v
docker system prune -a
# Then redeploy from scratch
# Check network connectivity
docker exec -it $(docker ps -qf "name=dify-api") ping host.docker.internal
docker exec -it $(docker ps -qf "name=n8n") ping qdrant
```
---
**Fire + Frost + Foundation = Where Problems Get Solved** 💙🔥❄️