Troubleshooting
Troubleshooting
Section titled “Troubleshooting”When something breaks in your homelab, systematic diagnosis beats random reboots. This guide walks through common issues and how to read the logs to find the real problem.
Systemd Services
Section titled “Systemd Services”Check Service Status
Section titled “Check Service Status”sudo systemctl status dockerOutput shows if the service is running, when it started, and recent log lines.
View Full Logs
Section titled “View Full Logs”# Last 50 linessudo journalctl -u docker -n 50
# Follow logs in real-timesudo journalctl -u docker -f
# Logs since last bootsudo journalctl -u docker -b
# Logs for the last hoursudo journalctl -u docker --since "1 hour ago"Restart a Service
Section titled “Restart a Service”sudo systemctl restart dockerWait a few seconds, then check status:
sudo systemctl status dockerDocker Container Issues
Section titled “Docker Container Issues”Container Won’t Start
Section titled “Container Won’t Start”Check the logs:
docker logs container-nameCommon causes and fixes:
| Issue | Check | Fix |
|---|---|---|
| Port in use | sudo lsof -i :8080 | Change port or stop conflicting service |
| Missing image | docker images | Run docker pull image-name |
| Permission denied | docker logs container | Fix volume ownership with chown |
| Out of memory | docker stats | Increase memory or reduce container limits |
Container Exits Immediately
Section titled “Container Exits Immediately”docker compose logs appLook for error messages in the output. Common patterns:
Error: Cannot find config file at /app/config.yamlError: Database connection refusedError: Port 8080 already in useNetwork Connectivity Issues
Section titled “Network Connectivity Issues”Test if containers can reach each other:
docker compose exec app ping dbIf this fails:
# Check networksdocker network ls
# Inspect the networkdocker network inspect app-network
# Check if containers are on the networkdocker inspect app | grep -A 20 NetworkSettingsNetwork Diagnostics
Section titled “Network Diagnostics”Check Network Interfaces
Section titled “Check Network Interfaces”ip addr showLook for your server’s IP address and interface status.
Test Connectivity
Section titled “Test Connectivity”# Ping a remote hostping -c 4 8.8.8.8
# Check DNS resolutionnslookup google.comdig google.com
# Trace route to a hosttraceroute google.comCheck Open Ports
Section titled “Check Open Ports”# Show all listening portssudo ss -tlnp
# Check specific portsudo lsof -i :8080Firewall Rules
Section titled “Firewall Rules”# Check UFW statussudo ufw status verbose
# List all rulessudo ufw show added
# Test if port is opensudo ufw allow 8080/tcpDisk Space Issues
Section titled “Disk Space Issues”Check Disk Usage
Section titled “Check Disk Usage”# Overall disk usagedf -h
# Directory sizedu -sh /srv/apps/*
# Find large filesfind /srv -type f -size +1GClean Up Docker
Section titled “Clean Up Docker”# Show disk usagedocker system df
# Remove unused imagesdocker image prune -a
# Remove unused volumesdocker volume prune
# Remove everything unuseddocker system prune -aMemory and CPU Issues
Section titled “Memory and CPU Issues”Check Resource Usage
Section titled “Check Resource Usage”# System-widefree -htop
# Docker containersdocker statsIdentify Resource Hogs
Section titled “Identify Resource Hogs”# Find processes using most CPUps aux --sort=-%cpu | head -10
# Find processes using most memoryps aux --sort=-%mem | head -10Limit Container Resources
Section titled “Limit Container Resources”Edit compose.yaml:
services: app: image: myapp:latest deploy: resources: limits: cpus: '1' memory: 512MSSH and Remote Access
Section titled “SSH and Remote Access”Can’t Connect via SSH
Section titled “Can’t Connect via SSH”# Check SSH is runningsudo systemctl status ssh
# Check SSH is listeningsudo ss -tlnp | grep ssh
# Try verbose connectionssh -v user@serverPermission Denied
Section titled “Permission Denied”# Check key permissionsls -la ~/.ssh/
# Should be:# -rw------- id_ed25519# -rw-r--r-- id_ed25519.pub# drwx------ .ssh
# Fix permissionschmod 700 ~/.sshchmod 600 ~/.ssh/id_ed25519chmod 644 ~/.ssh/id_ed25519.pubLocked Out
Section titled “Locked Out”If you can’t connect via SSH:
- Use console access or recovery mode
- Check
/etc/ssh/sshd_configfor errors - Verify
~/.ssh/authorized_keyshas your public key - Check file permissions (see above)
- Restart SSH:
sudo systemctl restart ssh
Database Issues
Section titled “Database Issues”PostgreSQL Won’t Connect
Section titled “PostgreSQL Won’t Connect”# Check if container is runningdocker compose ps
# Check logsdocker compose logs db
# Try connecting from another containerdocker compose exec app psql -h db -U postgres -c "SELECT 1"Database Corruption
Section titled “Database Corruption”# Check database integritydocker compose exec db pg_checksum -D /var/lib/postgresql/data
# Rebuild indexesdocker compose exec db psql -U postgres -d myapp -c "REINDEX DATABASE myapp"Backup and Restore
Section titled “Backup and Restore”# Backupdocker compose exec db pg_dump -U postgres myapp > backup.sql
# Restoredocker compose exec -T db psql -U postgres myapp < backup.sqlPerformance Tuning
Section titled “Performance Tuning”Slow Docker Commands
Section titled “Slow Docker Commands”# Check Docker daemondocker info
# Restart Dockersudo systemctl restart docker
# Check disk I/Oiostat -x 1 5Slow Network
Section titled “Slow Network”# Check bandwidthiperf3 -c remote-host
# Monitor networkiftop
# Check for packet lossping -c 100 remote-host | grep lossIncident Response Template
Section titled “Incident Response Template”When something breaks, follow this template:
Time: [when did it happen?]Service: [what broke?]Impact: [what's not working?]
Symptoms:- [what did you observe?]- [what error messages?]
Diagnosis:- Checked: [what did you look at?]- Found: [what was wrong?]
Fix:- Action: [what did you do?]- Result: [did it work?]
Prevention:- Root cause: [why did this happen?]- Next steps: [how to prevent this?]Common Error Messages
Section titled “Common Error Messages”| Error | Cause | Fix |
|---|---|---|
Connection refused | Service not running or port wrong | Check service status, verify port |
Permission denied | File permissions or user access | Fix ownership with chown or chmod |
Out of memory | Container needs more RAM | Increase memory limit or reduce workload |
Disk full | No space left on device | Clean up old files or add storage |
DNS resolution failed | Network or DNS issue | Check /etc/resolv.conf, test DNS |
Getting Help
Section titled “Getting Help”When you’re stuck:
- Check the logs — Most issues are documented there
- Search the error message — Someone has probably hit this before
- Simplify the test — Isolate the problem to one component
- Document what you tried — This helps others help you
See the Ubuntu Server First Steps for basic setup issues.