Possible memory leak, processes not clearing from RAM

iops · November 19, 2019, 6:27am

Been running a node for a few months now, I’ve seen some strange behaviour on 0.25.1.
I see huge spikes in memory with one or two processes usually not being killed. Not sure if this is normal behaviour?

My normal RAM usage sits at around 4GB, with storj running for more than 24 hours or so that increases to 12GB. I can’t find anything out of the ordinary in the logs.

When I restart the container usage returns to normal. Not sure if anyone else is seeing this?

Edit: Not sure how visible the attachment is, here’s an imgur link Storj 0.25.1 memory - Album on Imgur
Edit2: I’ve also tried removing the container and re-adding it, no luck.

Alexey · November 19, 2019, 10:44pm

Hello @iops,
Welcome to the forum!

Please, show results of this command:

docker stats

iops · November 20, 2019, 4:17am

Thanks, good to be here - looking forward to things to come with Storj!
Output as follows.

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
4f80fbaeb71b        storagenode         0.92%               2GiB / 15.55GiB     12.87%              12.5GB / 1.58GB     56.3MB / 0B         123

Not sure if the container caches/buffers? It’s definitely spawning a lot of PIDs. I haven’t set any limits for it yet.

Alexey · November 20, 2019, 8:54am

Here you can see a real resources usage by the storagenode container.

iops · November 20, 2019, 2:46pm

Interesting, thanks. I’ll keep an eye on things to see how they go.
I’ve also misread htop stats, wasn’t aware it combined cached memory there, so not really using that much.

iops · November 21, 2019, 2:25pm

Been watching this quite closely, memory isn’t really an issue but I have picked up that there are a lot of PIDs being spawned that aren’t going away.

My current setup is running the Docker container on my Docker host, NFS mount to a NAS disk. With storj running I can’t seem to run a Plex stream locally, although my LAN throughput doesn’t seem to be hurting. Stopping the storj container takes about 10 mins.

Alexey · November 21, 2019, 11:01pm

Please, avoid such setups, they do not supported. Also, the latency will kill the ability of your node to beat competitors in race for pieces.
Better to run the storagenode directly on your NAS, if it’s able to run a docker

iops · November 25, 2019, 8:55am

Moving away from NFS has solved this, is there documentation on this anywhere?
Also interesting that this has only been an issue since 0.25.1.

cdhowie · November 25, 2019, 5:49pm

In particular, SQLite has documentation that they don’t support accessing databases over a network share. This can corrupt the database.

SQLite depends on the underlying filesystem to do locking as the documentation says it will. But some filesystems contain bugs in their locking logic such that the locks do not always behave as advertised. This is especially true of network filesystems and NFS in particular. If SQLite is used on a filesystem where the locking primitives contain bugs, and if two or more threads or processes try to access the same database at the same time, then database corruption might result.