Memory Leak when running Storj Node

MonkeyBandit · February 13, 2022, 7:08pm

Hello I recently made a node using Proxmox VE and putting Ubuntu Server as a VM for my Storj node. I am seeing that my node is slowly using more and more memory and the only way to release that allocated memory is to restart my Ubuntu Server VM. I am using docker to run my node.

MonkeyBandit · February 13, 2022, 7:08pm

Skyblockpro1 · February 13, 2022, 7:17pm

OK juts one disclaimer I have windows node IDK if this applies

but when my node started using way more ram my HDD was running behind and the OS was storing the data temporarily in RAM

MonkeyBandit · February 13, 2022, 7:22pm

Ok so what did you do to fix that?

Skyblockpro1 · February 13, 2022, 8:44pm

well software wise i honestly dont know a fix

1 I would check if the HDD is ok

2 check if it is older sata version

3 Make sure the HDD is CMR not SMR, SMR has a speed and latency penalty

but i would wait for other users input possibly they have other ideas, im a windows user that mostly uses dell servers so im quite limited in my expertese

Toyoo · February 13, 2022, 9:53pm

So far the only commonly seen cause for memory usage increasing is if your storage isn’t fast enough. A node operates on a large number of small-ish files, hence it needs a decent random read/write performance—raw HDD is fine, but some additional software layers may degrade it too much. For example, using a lazily-allocated qcow image might be the reason. You can test this hypothesis e.g. by running iostat -dmx inside your VM and observing the %util column, if it is >10% I’d be worried.

How is your storage configured inside the VM?

What are the models of your HDDs?

BTW, if this VM is dedicated to hosting a node, you should be fine with 2 GiB and a single core. I’ve got an old server with 2 GiB and two cores operating a total of 12TB in Storj nodes and the CPU usage rarely crosses 10%. What’s more is that using two sockets (as opposed to a single socket with two cores) in Proxmox is discouraged unless the client VM software really needs it—and Storj nodes don’t.

MonkeyBandit · February 14, 2022, 3:01pm

%util for sda is 0.54, sbd 0.07, sdc 0.08

all of them are the same type of hdd

hdd: Seagate BarraCuda ST5000LM000 5TB 128MB Cache SATA 6.0Gb/s & I think it is SMR based off this article. Could not find anything on manufacturers website. CMR vs SMR drives - what to pick? How to tell?

storage config: Bus/Device is SATA, Format is qcow2

I am running Proxmox VE on a Dell Poweredge R810 and it has a NUMA architecture and the Proxmox VE docs said that if I enable it on a VM that it “can bring speed improvements as the memory bus is not a bottleneck anymore.” And that “If the NUMA option is used, it is recommended to set the number of sockets to the number of nodes of the host system.” That’s the reason I was running 2 sockets and 1 core.

Toyoo · February 14, 2022, 3:36pm

Good.

All large 2.5" are SMR. Given you have three of them, there’s a chance that they can be made to work as an aggregate with decent speeds by spreading I/O over all three of them, leveraging the max-concurrent-requests option, and putting database files that keep accounting information on non-SMR storage (they’re very small). There were some threads on configuring Storj nodes on SMR drives here, but I cannot quickly find any authoritative post on the matter. Community, can you help?

Though, I wonder why %util doesn’t show the problem then. Was it measured freshly after a restart?

Interesting! Proxmox used to not handle NUMA properly, ie. matching physical sockets to virtual VM sockets, leading to the NUMA option basically slowing down everything without purpose. If they do so now, great!

MonkeyBandit · February 14, 2022, 4:12pm

No the %util was not measured after a restart. I think the %util column is not being shown correctly. What I mean by that is HDD sda is not using 0.54%, but 54%, sdb 7% and sdc 8%. So you were right.

Knowledge · February 14, 2022, 7:15pm

How fast is your incoming bandwidth? If your drives are 6gb in speed, I find it unlikely that they can’t keep up with the incoming traffic unless you are rocking some crazy incoming pipe. And even then, there isn’t THAT much data flowing in to be backing you up like this. Not unless things have changed recently and everyone is ingesting terabytes of data hourly. (Unlikely)

Nodes tend to open a lot of connections, so if your VM is creating buffers to handle all of these, you might be bottoming out your RAM that way. That would be one guess. Can you add more RAM? Perhaps there is a high water mark you are dealing with.

Sander · February 14, 2022, 7:38pm

I’m using the 4TB variant of this disks and there not the fastest drives around, rather slow especially when you do lots of writes in a short time (80GB+). This is the reason I quit using these in a RAID NAS setup. (I mostly store things and read most of the time) but they couldn’t keep up with my patience.

They actually do perform well on my storj nodes but these are debian with docker running on a “boot” disk and the SMR drive just connected with “RAW” disk (ext4) access without any virtualization, LVM, etc layer between it. My smallest node has 4Gb memory so plenty left besides Docker for the OS.

Running a Gigabit connection I’m not experiencing any problem, no speed capping

MonkeyBandit · February 14, 2022, 7:39pm

I have 100 Megabyte per second bandwidth. Both incoming and outgoing. My total RAM on my server is 125GiB, so yes I could add more RAM. What do you mean by high watermark? Like I have to have a certain amount of RAM so that my node functions correctly?

MonkeyBandit · February 14, 2022, 8:35pm

Yeah I plan on running different types of nodes from different crypto projects, but just one Storj node lol. I will try the LXC container.

MonkeyBandit · February 14, 2022, 9:08pm

There is an option to run a virtual environment with the raw disk image format. Would that improve performance? Btw thanks everyone for all the replies.

MonkeyBandit · February 14, 2022, 9:10pm

There is an option to run a virtual environment with the raw disk image format. Would that improve performance? Btw thanks everyone for all the replies.

edit: I accidentally replied to myself