1GB RAM is not enough anymore

snorkel · August 15, 2024, 11:43am

I have 2 nodes on Syno 216+ 1GB. 4TB occupied from 7TB on each node.
Since yesterday, the nodes keep dieing, but curiously, the UptimeRobot sees them online. I even got the Satellite offline emails.
The NAS is working fine and is reaponsive.
I don’t run the startup filewalker. I quited Saltlake.
I guess, the new US1 tests are to blame, because they put to much pressure on the machine.
I didn’t checked the logs, and I will come back with more details when I find the time to look into them.

Roxor · August 15, 2024, 12:00pm

I saw test traffic spike about 3.5h ago (from when I post this)… but things were otherwise idle the 16h before that. If test traffic is impacting your nodes I would only expect it in those last 3.5h.

EasyRhino · August 15, 2024, 3:42pm

I had one node (of several) bonk out this morning. dashboard down, got the email about it being offline, but docker logs didn’t show anything they just… stopped having new log entries a few hours ago.

I crash my nodes often so I didn’t think much of it but it may have resembled yours. (i had a bunch of RAM btw)

snorkel · August 15, 2024, 8:15pm

My log is full of mumble jumble… a huge line of messages from which I can’t understand anything.

Alexey · August 16, 2024, 3:56am

Please post the beginning of the log entry, starting from a timestamp and ending with the first \n in the message.

surfercool · August 16, 2024, 6:19am

I have the Synology DS218+ and was able to install 16GB of RAM. Way over the limit that Synology specified as the maximum. This was the limit specified for the processor as maximum addressable memory

snorkel · August 16, 2024, 6:55am

The DS216+ has only one memory slot, soldered. So no upgrade options there.
The errors starts with a readability timout set to 2min… so yeah, the system is overstressed.
I will try triming the log lines to fit them here as Alexey said, to get an ideea what’s happening.
But as I remember, the drive timeouts should not crash the node anymore. Strange. Maybe other things fail too.

Alexey · August 16, 2024, 7:25am

What was the filesystem there? Did you enable the badger cache?

surfercool · August 16, 2024, 7:45am

I do not think so.
If the memory is soldered, you don’t have a Plus version but the regular 216 version which is a system on chip.
You can install up to 8GB.

and here is a Howto in pictures

snorkel · August 16, 2024, 11:35am

ext4 fs, noatime, no RAID, no badger, no startup FW, ver 109.x.

snorkel · August 16, 2024, 11:19pm

Nice guide! I realy can’t remember why I believed that the RAM is soldered.
Maybe I thought that is like the newer models with one soldered and one free.
Now, the hard part… where to find DDR3 compatible memory?

EasyRhino · August 16, 2024, 11:26pm

lol you can get DDR3 SODIMMS really cheap on ebay or local sale equivalent. the guide even had an amazon link but $18 seems overpriced.

actually I may have one in my drawer but the shipping from me to you might exceed the cost to buy one from a regular seller.

snorkel · August 18, 2024, 5:24pm

This is the last part of the log when node crashes; I marked with * the longest lines, and I did some trimming. I see some errors linked to buffers that is starting the cascade.

2024-08-14T17:00:08Z	ERROR	nodestats:cache	Get stats query failed	{"Process": "storagenode", "error": "nodestats: rpc: tcp connector failed: rpc: dial tcp 34.94.153.46:7777: connect: no buffer space available; nodestats: rpc: tcp connector failed: rpc: dial tcp 34.126.92.94:7777: connect: no buffer space available; nodestats: rpc: tcp connector failed: rpc: dial tcp 34.150.199.48:7777: connect: no buffer space available; nodestats: rpc: tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connect: no buffer space available", "errorVerbose": "group:\n...
...
2024-08-14T17:23:37Z	ERROR	services	unexpected shutdown of a runner	{"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory\n...
2024-08-14T17:31:13Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "trust"}
2024-08-14T17:31:12Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "collector"}
2024-08-14T17:31:12Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "console:endpoint"}
2024-08-14T17:31:14Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "orders"}
2024-08-14T17:45:14Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "nodestats:cache"}
2024-08-14T17:31:15Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "bandwidth"}
2024-08-14T17:32:31Z	WARN	servers	service takes long to shutdown	{"Process": "storagenode", "name": "debug"}
2024-08-14T17:31:43Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "piecestore:cache"}
2024-08-14T17:31:18Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "contact:chore"}
2024-08-14T17:31:15Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "forgetsatellite:chore"}
2024-08-14T17:31:17Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "retain"}
2024-08-14T17:23:39Z	ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "attempts": 3, "error": "ping satellite: manager closed: closed: read tcp xxxx:54033->34.94.153.46:7777: read: connection reset by peer"...
2024-08-14T17:46:00Z	WARN	servers	service takes long to shutdown	{"Process": "storagenode", "name": "server"}
2024-08-14T17:31:11Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "pieces:trash"}
2024-08-14T17:31:17Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "version:chore"}
2024-08-14T17:31:16Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "gracefulexit:chore"}
2024-08-14T17:56:12Z	ERROR	contact:service	ping satellite failed... 	
2024-08-14T17:47:39Z	ERROR	contact:service	ping satellite failed...
2024-08-14T18:00:21Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-08-14T18:00:20Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-08-14T18:00:21Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-08-14T17:29:00Z	ERROR	contact:service	ping satellite failed... 	
2024-08-14T18:00:30Z	INFO	contact:service	context cancelled	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-08-14T18:01:16Z	ERROR	gracefulexit:chore	error retrieving satellites.	{"Process": "storagenode", "error": "satellitesdb: context canceled", "errorVerbose": "satellitesdb: context canceled\n...
2024-08-14T18:01:22Z *  INFO	services	slow shutdown	{"Process": "storagenode", "stack": "goroutine 1025\n...
2024-08-14T18:01:16Z *  INFO	servers	slow shutdown	{"Process": "storagenode", "stack": "goroutine 1021\n...
2024-08-14T18:25:34Z	INFO	orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	sending	{"Process": "storagenode", "count": 408}
2024-08-14T18:25:34Z	INFO	orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs	sending	{"Process": "storagenode", "count": 1499}
2024-08-14T18:25:34Z	INFO	orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	sending	{"Process": "storagenode", "count": 11265}
2024-08-14T18:25:40Z	INFO	orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs	finished	{"Process": "storagenode"}
2024-08-14T18:25:40Z	INFO	orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	finished	{"Process": "storagenode"}
2024-08-14T18:25:41Z	INFO	orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	finished	{"Process": "storagenode"}
2024-08-14T18:25:40Z	ERROR	orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs	failed to settle orders for satellite	{"Process": "storagenode", "satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled\n...
2024-08-14T18:25:41Z	ERROR	orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6	failed to settle orders for satellite	{"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled\n...
2024-08-14T18:25:40Z	ERROR	orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S	failed to settle orders for satellite	{"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n...
2024-08-14T18:53:56Z	INFO	Downloading versions.	{"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2024-08-14T18:53:58Z	INFO	Current binary version	{"Process": "storagenode-updater", "Service": "storagenode", "Version": "v1.109.2"}
2024-08-14T18:53:58Z	INFO	New version is being rolled out but hasn't made it to this node yet	{"Process": "storagenode-updater", "Service": "storagenode"}
2024-08-14T18:53:58Z	INFO	Current binary version	{"Process": "storagenode-updater", "Service": "storagenode-updater", "Version": "v1.109.2"}
2024-08-14T18:53:58Z	INFO	New version is being rolled out but hasn't made it to this node yet	{"Process": "storagenode-updater", "Service": "storagenode-updater"}
2024-08-14 22:46:06,353 WARN received SIGTERM indicating exit request
2024-08-14 22:46:06,384 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-08-14T22:46:06Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}
....
2024-08-15T09:42:17Z	ERROR	services	unexpected shutdown of a runner	{"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 2m0s while verifying readability of storage directory\n...
2024-08-15T09:47:24Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "orders"}
2024-08-15T09:48:22Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "retain"}
2024-08-15T09:47:16Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "piecestore:cache"}
2024-08-15T09:47:10Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "bandwidth"}
2024-08-15T09:47:24Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "gracefulexit:chore"}
2024-08-15T09:48:16Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "nodestats:cache"}
2024-08-15T09:47:10Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "trust"}
2024-08-15T09:47:58Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "collector"}
2024-08-15T09:48:15Z	WARN	servers	service takes long to shutdown	{"Process": "storagenode", "name": "debug"}
2024-08-15T09:48:07Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "forgetsatellite:chore"}
2024-08-15T09:47:10Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "pieces:trash"}
2024-08-15T09:48:15Z	WARN	servers	service takes long to shutdown	{"Process": "storagenode", "name": "server"}
2024-08-15T09:47:16Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "console:endpoint"}
2024-08-15T09:48:16Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "contact:chore"}
2024-08-15T09:48:22Z	WARN	services	service takes long to shutdown	{"Process": "storagenode", "name": "version:chore"}
2024-08-15T10:15:21Z *	ERROR	gracefulexit:chore	error retrieving satellites.	{"Process": "storagenode", "error": "satellitesdb: context canceled", "errorVerbose": "satellitesdb: context canceled\n...
2024-08-15T10:15:56Z *	INFO	servers	slow shutdown	{"Process": "storagenode", "stack": "goroutine 1034\n\tstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1:107\n\tsync.(*Once).doSlow:74\n...
2024-08-15T10:16:14Z *	INFO	services	slow shutdown	{"Process": "storagenode", "stack": "goroutine 1088\n\tstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1:107\n\tsync.(*Once).doSlow:74\n...
2024-08-15T10:49:10Z 	INFO	Downloading versions.	{"Process": "storagenode-updater", "Server Address": "https://version.storj.io"}
2024-08-15T10:51:32Z	ERROR	Error retrieving version info.	{"Process": "storagenode-updater", "error": "version checker client: Get \"https://version.storj.io\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)", "errorVerbose": "version checker client: Get \"https://version.storj.io\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n...
2024-08-15 11:08:27,994 INFO exited: storagenode (terminated by SIGKILL; not expected)
2024-08-15 11:08:29,915 INFO spawned: 'storagenode' with pid 211
2024-08-15 11:08:30,377 WARN received SIGQUIT indicating exit request
2024-08-15 11:08:30,379 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-08-15 11:08:31,380 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-08-15 11:08:33,383 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-08-15T11:08:33Z	INFO	Got a signal from the OS: "terminated"	{"Process": "storagenode-updater"}

ACarneiro · August 18, 2024, 6:04pm

If you live in the UK you can try https://www.mrmemory.co.uk

Alternatively, I’m sure eBay can be your friend

Alexey · August 19, 2024, 2:53am

Seems you need to increase the timeout. All other messages are the result of the shutdown process. However, even stopping takes too much time, and the processes are crashed.

snorkel · August 19, 2024, 5:36am

I increased the intervals to 60 min and timeouts to 10 min.
I bought 8 GB of RAM, let’s hope they help.

Alexey · August 19, 2024, 7:42am

10 minutes is too long, your node may not detect the real problem and the node could be disqualified.

snorkel · August 25, 2024, 5:07pm

Just upgraded the DS216+ to 8GB Crucial memory, the model from the guide and nascompares site. It works!