Synology and Docker issues

Hi All,

Just thought I’d bring this to the front. I have 3 nodes running on Synology boxes and this last Monday to Wednesday time frame something happened on all three boxes. Of course the 2yr old 10tb node would be the one to go first but it was disqualified completely off the network. After I got the emails, I logged in to check it and Docker was completely unresponsive. rebooted the system and I removed the container and recreated it and all was working but of course disqualified. There was almost no down time on the node so I don’t know what was going on. Then I was in a slight panic so I checked the other systems and all were unresponsive. I had to reboot them but the containers seemed to be working. Looking through logs I couldn’t find anything because they had been cleared the day before. These systems have always been solid and I’m unsure if something happened with Watchtower or Docker. What most concerned me was that all 3 systems had the same issue with docker unresponsive. So far the other nodes are fine and almost no downtime as well.

Nothing to do here but I just thought I’d report this. Never had this happen before and I have several nodes.

1 Like

Did the Synology NAS update itself? It may have also updated Docker if you had installed it from the Synology app store. Just seems possible that it was some kind of Synology specific issue.

Can’t add much here other than to say I’m also running nodes on Synology and didn’t have any issues. Did you already update to DSM7? That update hasn’t been released for my model yet, so we may not have been in exactly the same boat.

I have automatic DSM updates as well as docker updates switched off intentionally to avoid issues though.

If you updated to DSM 7.0 it may have caused some issues cause ive have had nothing but issues with it so far…

I do have auto update turned off but I updated Docker at the end of July. I haven’t installed DSM7.

Definitely odd. Hopefully all keeps chuggin along on the 2 remaining and I’ll reload the one that DQd…

Had a similar problem last year. Could suddenly not reach the node dashboard, even synology page too. I had tried via ssh to check what was blocking there, but even after 2 hours found nothing and rebooted the syno hard. And yes for this downtime I was DQd on a satellite, lucky not the whole node.

Lucky you… I lost every sat in under 6 hours… I ran the earnings script and there was no reported down time but the audits failed simultaneously.

sounds like you might of had a permission issue

If the system was unresponsive then I think it’s more likely a case of responding to the audit but being unable to send the requested data. This can happen on systems that are nearly completely halted. I had a synology before the one I have no that eventually broke completely, but I remember that when that thing ran out of RAM, it basically grinded to a halt and I had to hard reset it to pull it out of that. Which is why I bought one with much more RAM to replace it. So keep an eye on RAM and IO. I think those are the most likely culprits of such stalls.

1 Like

Ya know, I had seen it run low on RAM before. It’s a DS216+ with only 1gb RAM so I’ll bring it up to 4.

Still doesn’t say why the others were freaking out though. I was looking through the system logs and Docker had unexpectedly stopped a few days before but restarted. Also, watchtower logs reported yesterday 8/12 1AM on the DS216 (box that DQd) "level=info msg=“Waiting for running update to be finished…”

The DS416 at 15:26, and DS918 at 15:27 got the same messages. The DS216 started getting the DQ emails at about 11:15PM on the 11th which was right before that watchtower log message. Most probably I’d lost all 3 nodes had I not rebooted…

Are those watchtower messages normal?

My qnap got infected by ransomware recently and this also destroyed some nodes. The whole system was unresponsive like you described… Maybe you have a similar security issue?

There have been some reports about new botnets adopting Synology devices specifically. But from what I’ve read they were all based on brute forcing common admin passwords. I tend to assume people running nodes use secure passwords, but might be worth a mention anyway. It is curious that multiple devices ran into similar issues at the same time. Did they share passwords? Might be a good idea to change that if so and enable 2FA. Also make sure you don’t expose SSH to the public internet.

They do have the same passwords and I will change that. However, everything is on way non-standard ports so I’d think they’d have a problem finding them. Anyway, It was sad to lose such an aged node but hey, I have several others. Again I appreciate all your help and suggestions and I’ll change the passwords to be unique.

Non standard ports are trivial to get around if you have a botnet attacking. That’s just brute force able. You really need to rely on long random passwords and 2FA. There is also an automatic banning feature built in, which I recommend you use.

2 Likes