Watchtower spawning copies of itself

I also had two nodes totally taken out by this. The nodes were so overrun by watchtower instances that they stopped responding. Docker also crashes so running docker commands resulted in failure to open the docker unit socket.

I was able to recover them manually but it took a long time and my uptime rep is pretty much destroyed for the time being. I also had to yank power a few times so I’m hoping I don’t have corrupted data at this point…

To resolve, I had to:

  • Reboot the node
  • cd to my /var/lib/docker/containers directory
  • grep for “watchtower” in each container subdir in order to find all the container IDs
  • create a “docker stop” command with every container ID as an argument
  • Reboot the node again to get Docker running (attempts at restarting the service hung the box)
  • As soon as containers start deploying, run the “docker stop” command with all the container IDs from above
  • Repeat the process until I got lucky enough to have contains begin stopping before my system ran out of resources and crashed docker again

I’m not sure I trust the use of storj watchtower any longer…