Files
firefrost-services/services/arbiter/TROUBLESHOOTING.md
Claude (The Golden Chronicler #50) 04e9b407d5 feat: Migrate Arbiter and Modpack Version Checker to monorepo
WHAT WAS DONE:
- Migrated Arbiter (discord-oauth-arbiter) code to services/arbiter/
- Migrated Modpack Version Checker code to services/modpack-version-checker/
- Created .env.example for Arbiter with all required environment variables
- Moved systemd service file to services/arbiter/deploy/
- Organized directory structure per Gemini monorepo recommendations

WHY:
- Consolidate all service code in one repository
- Prepare for Gemini code review (Panel v1.12 compatibility check)
- Enable service-prefixed Git tagging (arbiter-v2.1.0, modpack-v1.0.0)
- Support npm workspaces for shared dependencies

SERVICES MIGRATED:
1. Arbiter (Discord OAuth bot) - Originally written by Gemini + Claude
   - Full source code from ops-manual docs/implementation/
   - Created comprehensive .env.example
   - Ready for Panel v1.12 compatibility verification

2. Modpack Version Checker (Python CLI tool)
   - Full source code from ops-manual docs/tasks/
   - Written for Panel v1.11, needs Gemini review for v1.12
   - Never had code review before

STILL TODO:
- Whitelist Manager - Pull from Billing VPS (38.68.14.188)
  - Currently deployed and running
  - Needs Panel v1.12 API compatibility fix (Task #86)
  - Requires SSH access to pull code

NEXT STEPS:
- Gemini code review for Panel v1.12 API compatibility
- Create package.json for each service
- Test npm workspaces integration
- Deploy after verification

FILES:
- services/arbiter/ (25 new files, full application)
- services/modpack-version-checker/ (21 new files, full application)

Signed-off-by: The Golden Chronicler <claude@firefrostgaming.com>
2026-03-31 21:52:42 +00:00

667 lines
14 KiB
Markdown

# Firefrost Arbiter - Troubleshooting Guide
**Last Updated:** March 30, 2026
**Prepared by:** Claude (Chronicler #49) + Gemini AI
---
## 🔍 Quick Diagnostics
### Check Service Status
```bash
sudo systemctl status arbiter
```
### View Recent Logs
```bash
sudo journalctl -u arbiter -n 50
```
### Follow Live Logs
```bash
sudo journalctl -u arbiter -f
```
### Check Health Endpoint
```bash
curl https://discord-bot.firefrostgaming.com/health
```
---
## 🚨 Common Issues & Solutions
### 1. "Invalid redirect URI" in Discord OAuth
**Symptom:** When clicking linking URL or admin login, Discord shows "Invalid Redirect URI" error.
**Cause:** The redirect URI in your `.env` file doesn't exactly match what's registered in the Discord Developer Portal.
**Solution:**
1. Check `.env` file:
```bash
cat .env | grep APP_URL
```
Should show: `APP_URL=https://discord-bot.firefrostgaming.com` (no trailing slash)
2. Go to Discord Developer Portal → OAuth2 → General
3. Verify exact URIs are registered:
- `https://discord-bot.firefrostgaming.com/auth/callback`
- `https://discord-bot.firefrostgaming.com/admin/callback`
4. **Important:** Check for:
- Trailing slashes (don't include them)
- `http` vs `https` mismatch
- `www` vs non-www
- Typos in domain
5. If you changed the URI, wait 5-10 minutes for Discord to propagate
6. Restart the application:
```bash
sudo systemctl restart arbiter
```
---
### 2. "Bot missing permissions" when assigning roles
**Symptom:** Logs show "Failed to assign role" or "Missing Permissions" error when trying to assign Discord roles.
**Cause:** Either the bot wasn't invited with the correct permissions, or the bot's role is positioned below the roles it's trying to assign.
**Solution:**
**Check 1: Bot Has "Manage Roles" Permission**
1. Go to Discord Server → Settings → Roles
2. Find the bot's role (usually named after the bot)
3. Verify "Manage Roles" permission is enabled
4. If not, enable it
**Check 2: Role Hierarchy (Most Common Issue)**
1. Go to Discord Server → Settings → Roles
2. Find the bot's role in the list
3. **Drag it ABOVE all subscription tier roles**
4. The bot can only assign roles that are below its own role
Example correct hierarchy:
```
1. Owner (you)
2. Admin
3. [Bot Role] ← MUST BE HERE
4. Sovereign
5. Fire Legend
6. Frost Legend
... (all other subscriber roles)
```
**Check 3: Re-invite Bot with Correct Permissions**
If role hierarchy is correct but still failing:
1. Go to Discord Developer Portal → OAuth2 → URL Generator
2. Select scopes: `bot`
3. Select permissions: `Manage Roles` (minimum)
4. Copy generated URL
5. Visit URL and re-authorize bot (this updates permissions)
**Test:**
```bash
# Check if bot can see roles
sudo journalctl -u arbiter -n 100 | grep "Role ID"
```
---
### 3. "Session not persisting" across requests
**Symptom:** Admin panel logs you out immediately after login, or every page reload requires re-authentication.
**Cause:** Session cookies not being saved properly, usually due to reverse proxy configuration.
**Solution:**
**Check 1: Express Trust Proxy Setting**
Verify in `src/index.js`:
```javascript
app.set('trust proxy', 1);
```
This line MUST be present before session middleware.
**Check 2: Nginx Proxy Headers**
Edit Nginx config:
```bash
sudo nano /etc/nginx/sites-available/arbiter
```
Verify these headers exist in the `location /` block:
```nginx
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
```
**Check 3: Cookie Settings for Development**
If testing on `http://localhost`, update `src/index.js`:
```javascript
cookie: {
secure: process.env.NODE_ENV === 'production', // false for localhost
httpOnly: true,
maxAge: 1000 * 60 * 60 * 24 * 7
}
```
**Check 4: SESSION_SECRET is Set**
```bash
grep SESSION_SECRET .env
```
Should show a 64-character hex string.
**Restart after changes:**
```bash
sudo systemctl restart arbiter
sudo systemctl reload nginx
```
---
### 4. "Ghost API 401 error"
**Symptom:** Logs show "Ghost API 401 Unauthorized" when trying to search users or update members.
**Cause:** Invalid or incorrectly formatted Admin API key.
**Solution:**
**Check 1: API Key Format**
```bash
cat .env | grep CMS_ADMIN_KEY
```
Should be in format: `key_id:secret` (with the colon)
Example:
```
CMS_ADMIN_KEY=65f8a1b2c3d4e5f6:a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6
```
**Check 2: Integration Still Exists**
1. Go to Ghost Admin → Settings → Integrations
2. Find "Firefrost Arbiter" integration
3. Verify it's not deleted or disabled
4. If missing, create new integration and update `.env`
**Check 3: Ghost URL is Correct**
```bash
cat .env | grep CMS_URL
```
Should match your Ghost installation URL exactly (no trailing slash).
**Check 4: Test API Key Manually**
```bash
curl -H "Authorization: Ghost <your_admin_key>" \
"https://firefrostgaming.com/ghost/api/admin/members/"
```
Should return JSON with member list. If 401, key is invalid.
**After fixing:**
```bash
sudo systemctl restart arbiter
```
---
### 5. "Database locked" errors
**Symptom:** Logs show "SQLITE_BUSY: database is locked" when multiple webhooks arrive simultaneously.
**Cause:** SQLite locks the database during writes. If multiple webhooks arrive at exactly the same time, one may fail.
**Solution:**
**Option 1: Increase Timeout (Recommended)**
Edit `src/database.js`:
```javascript
const Database = require('better-sqlite3');
const db = new Database('linking.db', { timeout: 5000 });
```
This gives SQLite 5 seconds to wait for locks to clear.
**Option 2: Add WAL Mode (Write-Ahead Logging)**
Edit `src/database.js`, add after database creation:
```javascript
db.pragma('journal_mode = WAL');
```
WAL mode allows concurrent reads and writes.
**Option 3: Retry Logic (For Critical Operations)**
In `src/routes/webhook.js`, wrap database operations:
```javascript
let retries = 3;
while (retries > 0) {
try {
stmt.run(token, customer_email, tier, subscription_id);
break;
} catch (error) {
if (error.code === 'SQLITE_BUSY' && retries > 1) {
retries--;
await new Promise(resolve => setTimeout(resolve, 100));
} else {
throw error;
}
}
}
```
**After changes:**
```bash
sudo systemctl restart arbiter
```
---
### 6. "Email not sending"
**Symptom:** Webhook processes successfully but subscriber never receives linking email.
**Cause:** SMTP connection issue, firewall blocking port 587, or incorrect credentials.
**Solution:**
**Check 1: SMTP Credentials**
```bash
cat .env | grep SMTP
```
Verify:
- `SMTP_HOST=38.68.14.188`
- `SMTP_USER=noreply@firefrostgaming.com`
- `SMTP_PASS=<correct password>`
**Check 2: Port 587 is Open**
From Command Center:
```bash
telnet 38.68.14.188 587
```
Should connect. If "Connection refused":
```bash
sudo ufw allow 587
```
**Check 3: Test SMTP Manually**
```bash
node -e "
const nodemailer = require('nodemailer');
const t = nodemailer.createTransport({
host: '38.68.14.188',
port: 587,
secure: false,
auth: { user: 'noreply@firefrostgaming.com', pass: 'YOUR_PASSWORD' }
});
t.sendMail({
from: 'noreply@firefrostgaming.com',
to: 'your_email@example.com',
subject: 'Test',
text: 'Testing SMTP'
}).then(() => console.log('Sent!')).catch(console.error);
"
```
**Check 4: Mailcow Logs**
SSH to Billing VPS:
```bash
ssh root@38.68.14.188
docker logs -f mailcowdockerized_postfix-mailcow_1 | grep noreply
```
Look for errors or rejections.
**Check 5: Spam Folder**
Check if email landed in spam/junk folder.
**Check 6: DKIM/SPF Records**
Verify DNS records are set up correctly (should be done already, but worth checking if delivery is failing).
---
### 7. "Webhook signature verification failed"
**Symptom:** Paymenter sends webhook but application logs "Invalid webhook signature" and returns 401.
**Cause:** `WEBHOOK_SECRET` in `.env` doesn't match the secret configured in Paymenter.
**Solution:**
**Check 1: Secrets Match**
```bash
cat .env | grep WEBHOOK_SECRET
```
Compare to Paymenter webhook configuration:
1. Paymenter Admin → System → Webhooks
2. Find Arbiter webhook
3. Check secret field
They must match exactly.
**Check 2: Header Name**
Verify Paymenter sends signature in `x-signature` header.
Edit `src/middleware/verifyWebhook.js` if needed:
```javascript
const signature = req.headers['x-signature']; // or 'x-paymenter-signature' or whatever Paymenter uses
```
**Check 3: Signature Algorithm**
Verify Paymenter uses HMAC SHA256. If different, update `src/middleware/verifyWebhook.js`:
```javascript
const expectedSignature = crypto
.createHmac('sha256', secret) // or 'sha1', 'md5', etc.
.update(payload)
.digest('hex');
```
**Check 4: Payload Format**
Paymenter might stringify the JSON differently. Add debug logging:
```javascript
console.log('Received signature:', signature);
console.log('Payload:', payload);
console.log('Expected signature:', expectedSignature);
```
**Temporary Bypass (Testing Only):**
To test without signature verification (NOT for production):
```javascript
// In src/routes/webhook.js, temporarily comment out:
// router.post('/billing', verifyBillingWebhook, validateBillingPayload, async (req, res) => {
router.post('/billing', validateBillingPayload, async (req, res) => {
```
**After fixing:**
```bash
sudo systemctl restart arbiter
```
---
## 🔥 Emergency Procedures
### Application Won't Start
**Symptom:** `systemctl status arbiter` shows "failed" status.
**Diagnosis:**
```bash
sudo journalctl -u arbiter -n 100
```
Look for:
- Missing `.env` file
- Syntax errors in code
- Missing dependencies
- Port 3500 already in use
**Solutions:**
**Port in use:**
```bash
sudo lsof -i :3500
sudo kill -9 <PID>
sudo systemctl start arbiter
```
**Missing dependencies:**
```bash
cd /home/architect/arbiter
npm install
sudo systemctl restart arbiter
```
**Syntax errors:**
Fix the reported file and line number, then:
```bash
sudo systemctl restart arbiter
```
---
### Database Corruption
**Symptom:** Application crashes with "database disk image is malformed" error.
**Solution:**
```bash
# Stop application
sudo systemctl stop arbiter
# Check database integrity
sqlite3 linking.db "PRAGMA integrity_check;"
```
**If corrupted:**
```bash
# Restore from backup (see DEPLOYMENT.md Phase 5)
mv linking.db linking.db.corrupt
cp /home/architect/backups/arbiter/linking_YYYYMMDD_HHMMSS.db linking.db
# Restart application
sudo systemctl start arbiter
```
---
### All Webhooks Suddenly Failing
**Symptom:** Every webhook returns 500 error, but application is running.
**Check 1: Disk Space**
```bash
df -h
```
If `/` is at 100%, clear space:
```bash
# Clean old logs
sudo journalctl --vacuum-time=7d
# Clean old backups
find /home/architect/backups/arbiter -type f -mtime +7 -delete
```
**Check 2: Memory Usage**
```bash
free -h
```
If out of memory:
```bash
sudo systemctl restart arbiter
```
**Check 3: Discord Bot Disconnected**
```bash
curl http://localhost:3500/health
```
If `discord: "down"`:
```bash
sudo systemctl restart arbiter
```
---
## 📊 Performance Issues
### Slow Response Times
**Check 1: Database Size**
```bash
ls -lh linking.db sessions.db
```
If >100MB, consider cleanup:
```bash
sqlite3 linking.db "DELETE FROM link_tokens WHERE used = 1 AND created_at < datetime('now', '-30 days');"
sqlite3 linking.db "VACUUM;"
```
**Check 2: High CPU Usage**
```bash
top
```
If `node` process is using >80% CPU consistently, check for:
- Infinite loops in code
- Too many concurrent webhooks
- Discord API rate limiting (bot trying to reconnect repeatedly)
**Check 3: Rate Limiting Too Strict**
If users report frequent "Too many requests" errors:
Edit `src/index.js`:
```javascript
const apiLimiter = rateLimit({
windowMs: 15 * 60 * 1000,
max: 200, // Increase from 100
// ...
});
```
---
## 🔐 Security Concerns
### Suspicious Database Entries
**Check for unusual tokens:**
```bash
sqlite3 linking.db "SELECT email, tier, created_at FROM link_tokens WHERE used = 0 ORDER BY created_at DESC LIMIT 20;"
```
**Check audit log for unauthorized actions:**
```bash
sqlite3 linking.db "SELECT * FROM audit_logs ORDER BY timestamp DESC LIMIT 20;"
```
**If compromised:**
1. Change all secrets in `.env`
2. Rotate Discord bot token
3. Regenerate Ghost Admin API key
4. Clear all unused tokens:
```bash
sqlite3 linking.db "DELETE FROM link_tokens WHERE used = 0;"
```
5. Force all admin re-authentication:
```bash
rm sessions.db
```
6. Restart application
---
## 📞 Getting Help
**Before asking for help, collect:**
1. Service status:
```bash
sudo systemctl status arbiter > /tmp/arbiter-status.txt
```
2. Recent logs:
```bash
sudo journalctl -u arbiter -n 200 > /tmp/arbiter-logs.txt
```
3. Configuration (sanitized):
```bash
cat .env | sed 's/=.*/=REDACTED/' > /tmp/arbiter-config.txt
```
4. Health check output:
```bash
curl https://discord-bot.firefrostgaming.com/health > /tmp/arbiter-health.txt
```
5. Database stats:
```bash
sqlite3 linking.db "SELECT COUNT(*) FROM link_tokens;" > /tmp/arbiter-db-stats.txt
sqlite3 linking.db "SELECT COUNT(*) FROM audit_logs;" >> /tmp/arbiter-db-stats.txt
```
**Share these files (remove any actual secrets first) when requesting support.**
---
## 🛠️ Tools & Commands Reference
### Restart Everything
```bash
sudo systemctl restart arbiter
sudo systemctl reload nginx
```
### View All Environment Variables
```bash
cat .env
```
### Check Which Process is Using Port 3500
```bash
sudo lsof -i :3500
```
### Test Database Connection
```bash
sqlite3 linking.db "SELECT 1;"
```
### Force Regenerate Sessions Database
```bash
sudo systemctl stop arbiter
rm sessions.db
sudo systemctl start arbiter
```
### Manually Cleanup Old Tokens
```bash
sqlite3 linking.db "DELETE FROM link_tokens WHERE created_at < datetime('now', '-1 day');"
```
### Export Audit Logs to CSV
```bash
sqlite3 -header -csv linking.db "SELECT * FROM audit_logs ORDER BY timestamp DESC;" > audit_export.csv
```
---
**🔥❄️ When in doubt, check the logs first. Most issues reveal themselves there. 💙**