Windows GUI node got killed because of OOM (out of memory)

Ruskiem · March 6, 2022, 1:24pm

For Your request i made screen from windows event log, and its as follow:

I have memory resource exhausted, pointing storj node v3 consumed 2056974336 bytes, and thats around 2000MB of memory. (Now, atm it uses 18MB for example)

And the nex event related to that storj node service, is its error, terminaded unexpectedly.

In storagenode.log there was nothing, just like i described last time. Last line was downloading some pice, thats all.

Edit:
1st machine, as You can see had more memory available, it consumed all available memory around 3500MB in that case, before storj node v3 service was terminated.

Edit2:

Those are all win10 pro 64bit. I have some node stopped unexpectedly almost every day now a days. its all newest ver. 1,49 or so.

Edit3:
Also a clue, machine 1st,
Windows log first noticed memory exhaustion at 01:06
And unexpected termination of node service was at 02:37

So he was able to keep running for some time even with max memory.
Heres a log from 27.03.2022 till that moment when it happened:
where to upload a log? its 6MB rar (unrared its 67MB) Its storj forum, should have option for uploads files

https://transfer.sh/(/tTlm17/storagenode_from_1stMachine.rar).zip

Alexey · March 6, 2022, 1:56pm

This sounds like your storage is unable to keep up. Is it a network drive?

Ruskiem · March 6, 2022, 2:03pm

Also the screen if You click it, has 3 images, its for 2 different machines.
No its normal HDD, worked perfect, from 1,5-2 years, it was ok when there was big stress test from storj network some time ago, so the configuration is prooven in my opinion. But the problem with memory growing was escalated some time ago as well, then it went away for some months, and now with newest version i guess it come back again…

Alexey · March 6, 2022, 2:06pm

I do not think it is related to a new version, more like to a more repair traffic. And now your setup cannot keep up. Maybe some minor hardware issues like oxidation of cable contacts.
I would ask again - how your disks are connected to these storagenodes?

Ruskiem · March 6, 2022, 2:14pm

HDD is connected via sata cable to motherboard. Its a normal home PCs.

Alexey · March 6, 2022, 2:16pm

Are they SMR?
How many nodes you run on each setup?

Ruskiem · March 6, 2022, 2:18pm

No, the hdds are ultrastars, no SMR problem in those. Those are 2 different physical PCs in that case i uploaded screens today. And Both had this problem today one at 01:06, and 2nd few hours later.

EDIT:
actually it was around the same time today 01:00 -02:00

Edit2:

the node dashboard shows no special peak in egress or ingress, (1GB repair/day and 2nd got 3-4GB repair /day last 2 days, so there is a peak over all in this month, but im not sure if it matters, because as i mentioned, i have this problem almost every day with some different node each time, i have few more nodes, on different machines as well so i dont know it its keeping up by my hardwere. The fact is they both get too big in memory.

1st is ver. 1.49.5
2nd is ver. 1.49.5 as well

im trying to thing, is theres a way the hdd cannot keep up, how to monitor that?
i will setup some hdd monitor to mayby diagnose that if You tell me how

Toyoo · March 6, 2022, 2:38pm

Wild guess. Do you, by any chance, host your nodes on parity-based storage space?

Ruskiem · March 6, 2022, 2:40pm

No. no redudancy. I just need to setup some monitoring of HDD’s and see, mayby there is some bursts in short time who knows, need to measure that somehow

Alexey · March 6, 2022, 3:06pm

In Windows you can use an integrated Performance Monitor. It allows to track metrics over time, and keep them on the disk (please use a different disk from storagenode’s location).
Then you can check what was the load on the disk, memory and processor. You can add a traffic as well.

Ruskiem · March 6, 2022, 4:24pm

okay, i have there set:

Avg. Disk sec/Read
Avg. Disk sec/Write

Memory % Committed Byte in Use

Network Bytes Sent/sec
Network Bytes Received/sec

Procesor % Processor Time

i hope it saves it self somehwere, and after 24h i could read the graph log or something …

Ruskiem · March 6, 2022, 4:25pm

okay, i have there set:

Avg. Disk sec/Read
Avg. Disk sec/Write

Memory % Committed Byte in Use

Network Bytes Sent/sec
Network Bytes Received/sec

Procesor % Processor Time

i hope it saves it self somehwere, and after 24h i could read the graph log or something … coz i clicked around and found no option to save the log, or where the log is saved

Alexey · March 6, 2022, 4:30pm

You need to configure Data collector sets to store metrics to the file.
You can take as an example the preconfigured set (or modify it, but I recommend to create yours) in the Data collector sets → System → System Performance
After the data collector set is configured you need to start it. It will write metrics until you stop it. The console Performance Monitor can be closed if you started your collector.

Derkades · March 7, 2022, 6:41pm

Maybe there’s a memory leak in the storagenode app? One of my nodes started using a lot of memory today, too:

CONTAINER ID   NAME                              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
e7dda1d49d79   storj-storagenode2-1              0.93%     41.91MiB / 31.35GiB   0.13%     479MB / 287MB     38.3MB / 0B       24
4a75d6371fcf   storj-storagenode4-1              13.01%    4.733GiB / 31.35GiB   15.10%    521MB / 1.66GB    27.4GB / 367MB    25
ebaa3d0edf4b   storj-storagenode3-1              8.78%     414.2MiB / 31.35GiB   1.29%     527MB / 2.7GB     17GB / 356MB      25

Version 1.49.5

Alexey · March 8, 2022, 6:20am

What is a disk load for the storj-storagenode4-1 node?

Ruskiem · March 10, 2022, 4:27pm

Alexey, i think theres is some memory leak in software, because im observing a node, which has whole 8TB ultrastar sATA, disk for storj purpose only, average response is 2.9-10ms, so no choking, he is in constantly egress (more than other nodes, its a lucky node) and im observing memory for storagenode service 1,3GB at the moment. Windows 10 pro. 1.49.5 He got no problem otehr than that. I set service to restart in windows services, so i dont have to go and manually start every time, but that memory growth i dont know, i think it shouldn’t be like that.

JDA · March 10, 2022, 10:12pm

Hi,
I dont know if it is related but my StorageNode (Windows 2019 VM with only this service running on it) usualy use around 550-600 threads and since the 08.03.2022 started rising to 800 and then skyrockedted to more than 1800. storagenode.exe ure more than 1000 alone.

Memory is also on the higher side.

Currently running on v1.49.5

EDIT:
Ok i restarted the server, everything back to normal it might be on my side, I’ll monitor closely on the next days