Impact of memory limitation

Hi,

I’m running 4 Nodes in a small Debian 11 VM (12 GB RAM) on my NAS. Each of them has a separate EXT4 drive passed through directly into the VM. Since the last update to 1.39.5 I am seeing my Debian VM freezing quite often and when I check the logs, it’s always some out of memory issues.

I am wondering now if I should limit the amount of memory that each Storj Node can use like described for raspberry pi 3 here:

–memory=800m

Now my question is what impact this might have on my nodes. Will they only use up to 800mb then and work fine or will they crash if they need more than 800mb?

My lxc container with docker in proxmox
lxc

2 Likes

The OOM killer will come when the node would try to get more. This option is used to prevent freeze on raspberry Pi3 because of low amount of RAM.
However, if your nodes started to use more RAM, I would recommend to check your drives. The storagenode uses more RAM when the disk is not keep up with changes.

1 Like

Okay thank you for the info.
I think my cpu might be the bottleneck here. Drives should be fine. I will look for a stronger server now :slight_smile:

docker stats
CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
85c6575d8ef7   storagenode5   7.44%     84.87MiB / 24.81GiB   0.33%     5.51GB / 9.86GB   0B / 0B     19
977a4d89e098   storagenode2   4.82%     50.83MiB / 24.81GiB   0.20%     59.4GB / 28.3GB   0B / 0B     29
94501b4da091   watchtower     0.00%     13.26MiB / 24.81GiB   0.05%     14.1kB / 0B       0B / 0B     15
Get-Process | Group-Object -Property ProcessName | Where-Object {$_.Name -like "storagenode*"} | Format-Table Name, @{n='Mem (MiB)';e={'{0:N0}' -f (($_.Group|Measure-Object WorkingSet -Sum).Sum / 1MB)};a='right'} -AutoSize


Name                Mem (MiB)
----                --------
storagenode               57
storagenode-updater       17

Do I see the correct that you limit your storj node Container to 150mb?

It looks similar for me after some time, but when I start the nodes and the filewalker runs, it uses much more memory…

This could be, but disk should be slow, or use some not native FS, like NTFS in Linux or btrfs/zfs with low RAM.

I’ve literally did that a week ago after seeing that one of my 16 nodes disabled my home server for several hours by heavily swapping… for the third time in a month. This didn’t happen before, so I blame that on either a software update or maybe an unusual distribution of traffic, because none of the other nodes, even hosted on the same file system as the swapping one, did so.

The faulty node is the only one with still free space, so it’s the only one that accepts ingress; it’s my newest node, so it has quite fresh data, and not much of it either, ~400 GB.

In my case I settled on 500m though. So far it works.

I wish there was a similar setting that instead of the OOM killer, it would gracefully restart the node. But I was short on time and didn’t find a better solution quickly enough.

If you using docker, you can disable OOM killer on a node as long as you specify memory limit;

I use;

–memory=512M --oom-kill-disable

Be aware that you must make sure your host has enough memory reserved to support your docker images i.e if you on pi4 with 4GB, then run maximum of 6 nodes (at 512mb a node = 3GB) leaving 1GB for kernel and cache. disabling the oom-killer will mean the kernel could get killed, and that isn’t good :stuck_out_tongue:

I would also say, that in the last 5-6 storage node releases I have seen an increase in memory usage - I was able to run node in 256MB but have needed to increase to 512MB - the process seems to be linked to the expired pieces delete job, whenever node is deleting stuff the memory usage goes really high

3 Likes

Yeah, I actually prefer to kill a node than the kernel ^^ This box has 2 GB of RAM, running all 16 nodes and some other minor services, which I value more than any single node. And, until recently, I didn’t see much swap usage.

If it turns out that in few releases nodes will actually require more RAM, I’ll probably give up on Storj.

A little bit overkill. The only helps, that you have only one of them with free space. On 2GB I would not run more than 4-6 nodes, especially if the system is used for something else or have a GUI.

However, it depends.
I have three nodes on one system, which is used also as hypervisor. Two of them have a free space, but I did not saw a high memory usage

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
977a4d89e098   storagenode2   3.68%     52.72MiB / 24.81GiB   0.21%     65.9GB / 32GB     0B / 0B     29

and

Name        Mem (MiB)
----        ---------
storagenode        57

The only Pi node have more used RAM for some reason

CONTAINER ID   NAME          CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
63537427dc54   storagenode   0.05%     149.8MiB / 800MiB     18.73%    0B / 0B           0B / 0B     20

I am observing the same. My server was always running the nodes very well, but now since 1-2 weeks I am running frequently into issues that the nodes need too much resources…

1 Like

Wow, how do they survive the Filewalker? Are you running on SSD‘s?

Staged startup/restarts and I manually trigger updates, I’m not using watchtower. Only one node at a time scans the file system, and none of the nodes are bigger than 1TB.

No SSDs, too cheap for that.

2 Likes

Yeah, I know it’s overkill to use a 2GB box for 16 nodes. From my observations few months ago 1GB would be enough. Now, not sure…

Small update from my side. I see that my node starts consuming large amount of memory roughly every four days:

2021-10-01T19:31:18.460Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}
…
2021-10-05T15:47:26.232Z        INFO    Configuration loaded    {"Location": "/app/config/config.yaml"}

And restarted it manually few minutes ago (so around 2021-10-09T08:00:00Z) again before it hit the threshold. Yet ~10 hours ago there was no indication of elevated memory use.

Do you have some stat or relations to the operations performed by the node?

Not really. If you want to perform some statistical analysis on node logs, I can share them.

Is it special LXC or is it just a debian/ubuntu?