I have 2 nodes on Syno 216+ 1GB. 4TB occupied from 7TB on each node.
Since yesterday, the nodes keep dieing, but curiously, the UptimeRobot sees them online. I even got the Satellite offline emails.
The NAS is working fine and is reaponsive.
I don’t run the startup filewalker. I quited Saltlake.
I guess, the new US1 tests are to blame, because they put to much pressure on the machine.
I didn’t checked the logs, and I will come back with more details when I find the time to look into them.
I saw test traffic spike about 3.5h ago (from when I post this)… but things were otherwise idle the 16h before that. If test traffic is impacting your nodes I would only expect it in those last 3.5h.
I had one node (of several) bonk out this morning. dashboard down, got the email about it being offline, but docker logs didn’t show anything they just… stopped having new log entries a few hours ago.
I crash my nodes often so I didn’t think much of it but it may have resembled yours. (i had a bunch of RAM btw)
My log is full of mumble jumble… a huge line of messages from which I can’t understand anything.
Please post the beginning of the log entry, starting from a timestamp and ending with the first \n
in the message.
I have the Synology DS218+ and was able to install 16GB of RAM. Way over the limit that Synology specified as the maximum. This was the limit specified for the processor as maximum addressable memory
The DS216+ has only one memory slot, soldered. So no upgrade options there.
The errors starts with a readability timout set to 2min… so yeah, the system is overstressed.
I will try triming the log lines to fit them here as Alexey said, to get an ideea what’s happening.
But as I remember, the drive timeouts should not crash the node anymore. Strange. Maybe other things fail too.
What was the filesystem there? Did you enable the badger cache?
I do not think so.
If the memory is soldered, you don’t have a Plus version but the regular 216 version which is a system on chip.
You can install up to 8GB.
and here is a Howto in pictures
ext4 fs, noatime, no RAID, no badger, no startup FW, ver 109.x.
Nice guide! I realy can’t remember why I believed that the RAM is soldered.
Maybe I thought that is like the newer models with one soldered and one free.
Now, the hard part… where to find DDR3 compatible memory?
lol you can get DDR3 SODIMMS really cheap on ebay or local sale equivalent. the guide even had an amazon link but $18 seems overpriced.
actually I may have one in my drawer but the shipping from me to you might exceed the cost to buy one from a regular seller.
This is the last part of the log when node crashes; I marked with * the longest lines, and I did some trimming. I see some errors linked to buffers that is starting the cascade.
2024-08-14T17:00:08Z ERROR nodestats:cache Get stats query failed {"Process": "storagenode", "error": "nodestats: rpc: tcp connector failed: rpc: dial tcp 34.94.153.46:7777: connect: no buffer space available; nodestats: rpc: tcp connector failed: rpc: dial tcp 34.126.92.94:7777: connect: no buffer space available; nodestats: rpc: tcp connector failed: rpc: dial tcp 34.150.199.48:7777: connect: no buffer space available; nodestats: rpc: tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connect: no buffer space available", "errorVerbose": "group:\n...
...
2024-08-14T17:23:37Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory\n...
2024-08-14T17:31:13Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "trust"}
2024-08-14T17:31:12Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "collector"}
2024-08-14T17:31:12Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "console:endpoint"}
2024-08-14T17:31:14Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "orders"}
2024-08-14T17:45:14Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "nodestats:cache"}
2024-08-14T17:31:15Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "bandwidth"}
2024-08-14T17:32:31Z WARN servers service takes long to shutdown {"Process": "storagenode", "name": "debug"}
2024-08-14T17:31:43Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "piecestore:cache"}
2024-08-14T17:31:18Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "contact:chore"}
2024-08-14T17:31:15Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "forgetsatellite:chore"}
2024-08-14T17:31:17Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "retain"}
2024-08-14T17:23:39Z ERROR contact:service ping satellite failed {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 3, "error": "ping satellite: manager closed: closed: read tcp xxxx:54033->34.94.153.46:7777: read: connection reset by peer"...
2024-08-14T17:46:00Z WARN servers service takes long to shutdown {"Process": "storagenode", "name": "server"}
2024-08-14T17:31:11Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "pieces:trash"}
2024-08-14T17:31:17Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "version:chore"}
2024-08-14T17:31:16Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "gracefulexit:chore"}
2024-08-14T17:56:12Z ERROR contact:service ping satellite failed...
2024-08-14T17:47:39Z ERROR contact:service ping satellite failed...
2024-08-14T18:00:21Z INFO contact:service context cancelled {"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-14T18:00:20Z INFO contact:service context cancelled {"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-14T18:00:21Z INFO contact:service context cancelled {"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-08-14T17:29:00Z ERROR contact:service ping satellite failed...
2024-08-14T18:00:30Z INFO contact:service context cancelled {"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-14T18:01:16Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: context canceled", "errorVerbose": "satellitesdb: context canceled\n...
2024-08-14T18:01:22Z * INFO services slow shutdown {"Process": "storagenode", "stack": "goroutine 1025\n...
2024-08-14T18:01:16Z * INFO servers slow shutdown {"Process": "storagenode", "stack": "goroutine 1021\n...
2024-08-14T18:25:34Z INFO orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 sending {"Process": "storagenode", "count": 408}
2024-08-14T18:25:34Z INFO orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs sending {"Process": "storagenode", "count": 1499}
2024-08-14T18:25:34Z INFO orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S sending {"Process": "storagenode", "count": 11265}
2024-08-14T18:25:40Z INFO orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs finished {"Process": "storagenode"}
2024-08-14T18:25:40Z INFO orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S finished {"Process": "storagenode"}
2024-08-14T18:25:41Z INFO orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 finished {"Process": "storagenode"}
2024-08-14T18:25:40Z ERROR orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs failed to settle orders for satellite {"Process": "storagenode", "satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled\n...
2024-08-14T18:25:41Z ERROR orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 failed to settle orders for satellite {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled\n...
2024-08-14T18:25:40Z ERROR orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S failed to settle orders for satellite {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n...
2024-08-14T18:53:56Z INFO Downloading versions. {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2024-08-14T18:53:58Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.109.2"}
2024-08-14T18:53:58Z INFO New version is being rolled out but hasn't made it to this node yet {"Process": "storagenode-updater", "Service": "storagenode"}
2024-08-14T18:53:58Z INFO Current binary version {"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.109.2"}
2024-08-14T18:53:58Z INFO New version is being rolled out but hasn't made it to this node yet {"Process": "storagenode-updater", "Service": "storagenode-updater"}
2024-08-14 22:46:06,353 WARN received SIGTERM indicating exit request
2024-08-14 22:46:06,384 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-08-14T22:46:06Z INFO Got a signal from the OS: "terminated" {"Process": "storagenode-updater"}
....
2024-08-15T09:42:17Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory\n...
2024-08-15T09:47:24Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "orders"}
2024-08-15T09:48:22Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "retain"}
2024-08-15T09:47:16Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "piecestore:cache"}
2024-08-15T09:47:10Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "bandwidth"}
2024-08-15T09:47:24Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "gracefulexit:chore"}
2024-08-15T09:48:16Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "nodestats:cache"}
2024-08-15T09:47:10Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "trust"}
2024-08-15T09:47:58Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "collector"}
2024-08-15T09:48:15Z WARN servers service takes long to shutdown {"Process": "storagenode", "name": "debug"}
2024-08-15T09:48:07Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "forgetsatellite:chore"}
2024-08-15T09:47:10Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "pieces:trash"}
2024-08-15T09:48:15Z WARN servers service takes long to shutdown {"Process": "storagenode", "name": "server"}
2024-08-15T09:47:16Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "console:endpoint"}
2024-08-15T09:48:16Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "contact:chore"}
2024-08-15T09:48:22Z WARN services service takes long to shutdown {"Process": "storagenode", "name": "version:chore"}
2024-08-15T10:15:21Z * ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: context canceled", "errorVerbose": "satellitesdb: context canceled\n...
2024-08-15T10:15:56Z * INFO servers slow shutdown {"Process": "storagenode", "stack": "goroutine 1034\n\tstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1:107\n\tsync.(*Once).doSlow:74\n...
2024-08-15T10:16:14Z * INFO services slow shutdown {"Process": "storagenode", "stack": "goroutine 1088\n\tstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1:107\n\tsync.(*Once).doSlow:74\n...
2024-08-15T10:49:10Z INFO Downloading versions. {"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2024-08-15T10:51:32Z ERROR Error retrieving version info. {"Process": "storagenode-updater", "error": "version checker client: Get \"https://version.storj.io\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)", "errorVerbose": "version checker client: Get \"https://version.storj.io\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n...
2024-08-15 11:08:27,994 INFO exited: storagenode (terminated by SIGKILL; not expected)
2024-08-15 11:08:29,915 INFO spawned: 'storagenode' with pid 211
2024-08-15 11:08:30,377 WARN received SIGQUIT indicating exit request
2024-08-15 11:08:30,379 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-08-15 11:08:31,380 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-15 11:08:33,383 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-08-15T11:08:33Z INFO Got a signal from the OS: "terminated" {"Process": "storagenode-updater"}
If you live in the UK you can try https://www.mrmemory.co.uk
Alternatively, I’m sure eBay can be your friend
Seems you need to increase the timeout. All other messages are the result of the shutdown process. However, even stopping takes too much time, and the processes are crashed.
I increased the intervals to 60 min and timeouts to 10 min.
I bought 8 GB of RAM, let’s hope they help.
10 minutes is too long, your node may not detect the real problem and the node could be disqualified.
Just upgraded the DS216+ to 8GB Crucial memory, the model from the guide and nascompares site. It works!