Just thought I’d bring this to the front. I have 3 nodes running on Synology boxes and this last Monday to Wednesday time frame something happened on all three boxes. Of course the 2yr old 10tb node would be the one to go first but it was disqualified completely off the network. After I got the emails, I logged in to check it and Docker was completely unresponsive. rebooted the system and I removed the container and recreated it and all was working but of course disqualified. There was almost no down time on the node so I don’t know what was going on. Then I was in a slight panic so I checked the other systems and all were unresponsive. I had to reboot them but the containers seemed to be working. Looking through logs I couldn’t find anything because they had been cleared the day before. These systems have always been solid and I’m unsure if something happened with Watchtower or Docker. What most concerned me was that all 3 systems had the same issue with docker unresponsive. So far the other nodes are fine and almost no downtime as well.
Nothing to do here but I just thought I’d report this. Never had this happen before and I have several nodes.
Did the Synology NAS update itself? It may have also updated Docker if you had installed it from the Synology app store. Just seems possible that it was some kind of Synology specific issue.
Can’t add much here other than to say I’m also running nodes on Synology and didn’t have any issues. Did you already update to DSM7? That update hasn’t been released for my model yet, so we may not have been in exactly the same boat.
I have automatic DSM updates as well as docker updates switched off intentionally to avoid issues though.
Had a similar problem last year. Could suddenly not reach the node dashboard, even synology page too. I had tried via ssh to check what was blocking there, but even after 2 hours found nothing and rebooted the syno hard. And yes for this downtime I was DQd on a satellite, lucky not the whole node.
If the system was unresponsive then I think it’s more likely a case of responding to the audit but being unable to send the requested data. This can happen on systems that are nearly completely halted. I had a synology before the one I have no that eventually broke completely, but I remember that when that thing ran out of RAM, it basically grinded to a halt and I had to hard reset it to pull it out of that. Which is why I bought one with much more RAM to replace it. So keep an eye on RAM and IO. I think those are the most likely culprits of such stalls.
Ya know, I had seen it run low on RAM before. It’s a DS216+ with only 1gb RAM so I’ll bring it up to 4.
Still doesn’t say why the others were freaking out though. I was looking through the system logs and Docker had unexpectedly stopped a few days before but restarted. Also, watchtower logs reported yesterday 8/12 1AM on the DS216 (box that DQd) "level=info msg=“Waiting for running update to be finished…”
The DS416 at 15:26, and DS918 at 15:27 got the same messages. The DS216 started getting the DQ emails at about 11:15PM on the 11th which was right before that watchtower log message. Most probably I’d lost all 3 nodes had I not rebooted…
My qnap got infected by ransomware recently and this also destroyed some nodes. The whole system was unresponsive like you described… Maybe you have a similar security issue?
There have been some reports about new botnets adopting Synology devices specifically. But from what I’ve read they were all based on brute forcing common admin passwords. I tend to assume people running nodes use secure passwords, but might be worth a mention anyway. It is curious that multiple devices ran into similar issues at the same time. Did they share passwords? Might be a good idea to change that if so and enable 2FA. Also make sure you don’t expose SSH to the public internet.
They do have the same passwords and I will change that. However, everything is on way non-standard ports so I’d think they’d have a problem finding them. Anyway, It was sad to lose such an aged node but hey, I have several others. Again I appreciate all your help and suggestions and I’ll change the passwords to be unique.
Non standard ports are trivial to get around if you have a botnet attacking. That’s just brute force able. You really need to rely on long random passwords and 2FA. There is also an automatic banning feature built in, which I recommend you use.