Nodes uses a lot of RAM/SWAP direct after startup

flo82 · October 27, 2024, 6:24pm

Hello,

since a couple of days one node uses a lot of memory - especially swap:

If i shut it down and restart it, it takes only a minute and it’s back to this state.
What I did was enabling used-spacewalker and restarting the node while it was running. Before that, the node worked normal as my other nodes. Any ideas?

Node version is 1.114.6 - this one includes the save-state used-spacewalker feature.
DBs are on ssd and the disk is dedicated and NO SMR drive.

How to check which part of the process uses all the RAM/SWAP?

Vadim · October 27, 2024, 7:31pm

if you turned on Budger cache then it is using lot of ram during scan, it loading itself to ram to speedup process.

flo82 · October 27, 2024, 8:12pm

filestatcache is disabled. LazyFilewalker is enabled.

arrogantrabbit · October 28, 2024, 3:00am

Normal behavior.

However if you end up using swap even for normal node operation you shall not run node on that hardware: you don’t have nearly enough free ram (in fact, you have negative amount) so the metadata can’t be cached, your node won’t be able to keep up and will keep being getting killed by OOM watcher.

Your hardware is not suitable for running node.

Alexey · October 28, 2024, 7:09am

How much memory do you have there?
Virtual memory not necessary going to swap.
Could you share the result of this command:

free -h

flo82 · October 28, 2024, 7:25am

hi @Alexey

Storj is just a side process on this machine. Plenty of RAM is available:

My hardware is suitable for running node.

EDIT: if I stop storagenode ~ 2GB of swap is freed.

Alexey · October 28, 2024, 8:07am

Usually the high memory usage is related to the slow disk subsystem.
I have checked my nodes, the usage is not so great.

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
6cf52ff51667   storagenode2   7.55%     1.14GiB / 15.47GiB    7.37%     267GB / 41.8GB    0B / 0B     70
2294ba4b2ff7   storagenode5   0.01%     182.2MiB / 15.47GiB   1.15%     9.47GB / 4.51GB   0B / 0B     49

The first one with the enabled badger cache.

flo82 · October 28, 2024, 8:44am

Disk is not slow. It’s a TOSHIBA 18 TB SATA3 - i have several of them in other nodes and no problems at all.

It takes only 20 minutes after startup to get the swap filled.

This node was running fine for 11 months. The “problem” startet with configuration change to STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP (changed from false to true with version 1.114.6). Changing it to false did not solve it. So I would suspect a change introduced with this version (persistence of piece-scan-on-startup was implemented)

Any other possibilites to analyze why this storagenode-process takes the swap? Does the debug-endpoint give useful information here?

BTW: My other nodes show the same amount of RAM as yours @Alexey

flo82 · October 28, 2024, 11:31am

This is really strange. If I restart the node - the process grabs as much RAM as it can get and CPU is also crunching. After around one minute, you can see this picture:

So CPU has finished it’s job and now the SWAP moving kicks ins. I would expect that the node is now moving all unused things to swap. And after a couple of hours you can see the picture from the start (low memory, but still massive swapping).

Any idea what is running at startup which could cause these kind of symptoms? Still suspecting the piece-scan-on-startup-filewalker…

Vadim · October 28, 2024, 12:40pm

It is logical, pc usually after scan remember file addresses, as it not needed it moves it to swap, then after some time i think it will be changed with more relevant information.

flo82 · October 28, 2024, 3:04pm

it eats my RAM although piece-scan-on-startup-filewalker = FALSE.

arrogantrabbit · October 28, 2024, 3:16pm

Is it magnetic spinning disk — it’s slow. Unless you have plenty of ram to cache metadata, what you see will continue happening. If filewalker takes days — you need more ram.

Spinning disk can support 200 IOPs give or take. Compare with SSD that can sustain 40,000 -500,000 IOPS.

We’ve been through this conversation multiple times on this forum. No need to go over same thing again.

Add 16-32GB of ram and/or SSD cache for metadata. This is the only solution.

EasyRhino · October 28, 2024, 3:43pm

flo, is the slow node using the same storj release as the others?

i’ve seen nodes occasionally use a lot of ram (per docker stats).

things that can increase ram low:

disk not keeping up with incoming write requests (I have some NFS mounted shares so they can really fall behind)
running used space and garbage collection filewalkers
using badger filestatcache

how much ram does docker stats show it using?

what is the load on the hard disk (I use atopfor a glance)

maybe let the node run until it finishes both the used space filewalkers and the garbage collection filewalkers. (I grep the logs for the words “used” and “retain” respectively). For terabytes of data this could take days. And then see if the RAM usage has settled down.

Alexey · October 29, 2024, 4:51am

but mine on 1.114.6 too and the scan is enabled, lazy mode is on except the one node (which uses more memory by the way), this node also has the badger cache enabled, so the scan on startup is much faster for it.
So, I suspect not the code change, but something happened with that exact node.
Could you please check this disk for errors?

AiS1972 · October 29, 2024, 10:37am

Apparently I missed something again
Are these the new requirements for running a node - 16-32 GB ram ?
@Alexey everything is correct - need 32 GB ram and an SSD for the node to work?
In my humble opinion, storj doesn’t pay enough to allocate 32 GB of memory to each node and also SSDs, which degrade anyway.

Vadim · October 29, 2024, 12:42pm

I do not see any benefits in SSD anymore. But RAM is good thing, how much nodes do you have on this hardware?

arrogantrabbit · October 29, 2024, 2:29pm

These are not new requirements. They are not specific to storage nodes. It was discussed before. It’s a physical limitation of a hard drive performance: if combined traffic exceeds 200 iops your HDD will choke and node will get eventually killed. You need to offload IOPs to never exceed 200 hitting disk. Databases to SSD, disable sync, access time updates, and have a lot of ram, for metadata to fit there. How much of that you need to do will depend on current traffic.

How is this news to you?

Here we go again. If you don’t already have a sever with plenty of ram and good performance, and you are ok suffering through laggy performance for your other services you are hosting — you should not be running node on that hardware, you don’t have excess capacity to share, it’s barely enough for yourself.

But the problem here is not storagenode, it’s your chocking server. Add plenty of ram, let filesystem breathe. You will be rewarded with much snappier performance, and look at that, storagenode also works fine without any impact on the system.

On your SSD comment:

SSD is already part of the array because I like my users and don’t want them to suffer thought shit performance.
everything degrades in life. I have never had a single SSD failed yet, and I’m buying exclusively used cheap enterprise SSDs with almost exhausted endurance. You’ll be fine. Just don’t buy garbage made for consumers. It’s not specific to SSDs.

EasyRhino · October 29, 2024, 3:20pm

I think arrogantrabbit is perhaps making his case too strongly.

BUT, we did encounter situations over the summer where a simple hard drive setup couldn’t keep up with storj processing. The symptoms were things like disks filling up, incorrect reported space, failing trash cleanups, etc. or at best just having these processings jobs take days or weeks to run.

The causes of those were:

lots of test ingress and trash deletion
having node size over a certain amount (around 8TB, in my estimate).

under current (historically normal) load levels this isn’t really noticeable.

Anyway, the fix to these problems was to speed up hard disk metadata access by offloading to SSD, the ways to do this included:

ZFS l2arc
ZFS special device
synology pin btrfs metadata to SSD
LVM caching or metadata drives
something-something with windows (primocache? drivepool?)

flo82 · October 29, 2024, 3:36pm

yes. 1.114.6

Virt usage is still high (13 GB). Currently used-spacewalker is running.

already did this once. did not help.

Currently high - because i’m trying a second run with used-spacewalker=true and let the node idle for another 24h afterwards before disabling this. Hope this helps.

EasyRhino · October 29, 2024, 4:05pm

the docker stats RAM usage seems realistic.

the massive virtual usage is… weird? but it may not be significant. I think I saw something like that on one of my nodes from a while.

Maybe it’s like ,virtual disk cache from the filewalker reading though all the files or something.

also linux has it’s own swappiness factor, sometimes a bunch of memory will get sent to swap even if there’s available RAM.