Skip to content

Troubleshooting

When something breaks in your homelab, systematic diagnosis beats random reboots. This guide walks through common issues and how to read the logs to find the real problem.

Terminal window
sudo systemctl status docker

Output shows if the service is running, when it started, and recent log lines.

Terminal window
# Last 50 lines
sudo journalctl -u docker -n 50
# Follow logs in real-time
sudo journalctl -u docker -f
# Logs since last boot
sudo journalctl -u docker -b
# Logs for the last hour
sudo journalctl -u docker --since "1 hour ago"
Terminal window
sudo systemctl restart docker

Wait a few seconds, then check status:

Terminal window
sudo systemctl status docker

Check the logs:

Terminal window
docker logs container-name

Common causes and fixes:

IssueCheckFix
Port in usesudo lsof -i :8080Change port or stop conflicting service
Missing imagedocker imagesRun docker pull image-name
Permission denieddocker logs containerFix volume ownership with chown
Out of memorydocker statsIncrease memory or reduce container limits
Terminal window
docker compose logs app

Look for error messages in the output. Common patterns:

Error: Cannot find config file at /app/config.yaml
Error: Database connection refused
Error: Port 8080 already in use

Test if containers can reach each other:

Terminal window
docker compose exec app ping db

If this fails:

Terminal window
# Check networks
docker network ls
# Inspect the network
docker network inspect app-network
# Check if containers are on the network
docker inspect app | grep -A 20 NetworkSettings
Terminal window
ip addr show

Look for your server’s IP address and interface status.

Terminal window
# Ping a remote host
ping -c 4 8.8.8.8
# Check DNS resolution
nslookup google.com
dig google.com
# Trace route to a host
traceroute google.com
Terminal window
# Show all listening ports
sudo ss -tlnp
# Check specific port
sudo lsof -i :8080
Terminal window
# Check UFW status
sudo ufw status verbose
# List all rules
sudo ufw show added
# Test if port is open
sudo ufw allow 8080/tcp
Terminal window
# Overall disk usage
df -h
# Directory size
du -sh /srv/apps/*
# Find large files
find /srv -type f -size +1G
Terminal window
# Show disk usage
docker system df
# Remove unused images
docker image prune -a
# Remove unused volumes
docker volume prune
# Remove everything unused
docker system prune -a
Terminal window
# System-wide
free -h
top
# Docker containers
docker stats
Terminal window
# Find processes using most CPU
ps aux --sort=-%cpu | head -10
# Find processes using most memory
ps aux --sort=-%mem | head -10

Edit compose.yaml:

services:
app:
image: myapp:latest
deploy:
resources:
limits:
cpus: '1'
memory: 512M
Terminal window
# Check SSH is running
sudo systemctl status ssh
# Check SSH is listening
sudo ss -tlnp | grep ssh
# Try verbose connection
ssh -v user@server
Terminal window
# Check key permissions
ls -la ~/.ssh/
# Should be:
# -rw------- id_ed25519
# -rw-r--r-- id_ed25519.pub
# drwx------ .ssh
# Fix permissions
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_ed25519
chmod 644 ~/.ssh/id_ed25519.pub

If you can’t connect via SSH:

  1. Use console access or recovery mode
  2. Check /etc/ssh/sshd_config for errors
  3. Verify ~/.ssh/authorized_keys has your public key
  4. Check file permissions (see above)
  5. Restart SSH: sudo systemctl restart ssh
Terminal window
# Check if container is running
docker compose ps
# Check logs
docker compose logs db
# Try connecting from another container
docker compose exec app psql -h db -U postgres -c "SELECT 1"
Terminal window
# Check database integrity
docker compose exec db pg_checksum -D /var/lib/postgresql/data
# Rebuild indexes
docker compose exec db psql -U postgres -d myapp -c "REINDEX DATABASE myapp"
Terminal window
# Backup
docker compose exec db pg_dump -U postgres myapp > backup.sql
# Restore
docker compose exec -T db psql -U postgres myapp < backup.sql
Terminal window
# Check Docker daemon
docker info
# Restart Docker
sudo systemctl restart docker
# Check disk I/O
iostat -x 1 5
Terminal window
# Check bandwidth
iperf3 -c remote-host
# Monitor network
iftop
# Check for packet loss
ping -c 100 remote-host | grep loss

When something breaks, follow this template:

Time: [when did it happen?]
Service: [what broke?]
Impact: [what's not working?]
Symptoms:
- [what did you observe?]
- [what error messages?]
Diagnosis:
- Checked: [what did you look at?]
- Found: [what was wrong?]
Fix:
- Action: [what did you do?]
- Result: [did it work?]
Prevention:
- Root cause: [why did this happen?]
- Next steps: [how to prevent this?]
ErrorCauseFix
Connection refusedService not running or port wrongCheck service status, verify port
Permission deniedFile permissions or user accessFix ownership with chown or chmod
Out of memoryContainer needs more RAMIncrease memory limit or reduce workload
Disk fullNo space left on deviceClean up old files or add storage
DNS resolution failedNetwork or DNS issueCheck /etc/resolv.conf, test DNS

When you’re stuck:

  1. Check the logs — Most issues are documented there
  2. Search the error message — Someone has probably hit this before
  3. Simplify the test — Isolate the problem to one component
  4. Document what you tried — This helps others help you

See the Ubuntu Server First Steps for basic setup issues.