Ridiculously high load v1.102.3

Edit: I have two storagenode rigs, both went down just after midnight, 2024-04-27 5am UTC. I thought I had recovered one and I’ve been having trouble with the other. I’ve been assuming a dying disk causing IO problems until I checked my “working” rig and found it posting a load over 1000.

Is there a way to pin an older version and disable the updater?

Part of my troubleshooting the high load before I realized it must be endemic was enabling the lazy file walker which led to my original post below…

Original post:

I believe I enabled the lazy filewalker via config.yaml but I still have extremely high load at node startup.

pieces.enable-lazy-filewalker: true

How can I check that it is working as intended? I thought the process niceness was how this worked but all the storagenode processes have the same nice priority: 0.

I don’t know, but I have seen this also plus most of mine crash now with some 10000 thread limit error. They crash for a while then restart and run for a while and then crash again but yes the load has gone way up.

1 Like

I’m also noticing high load when trash-cleanup-filewalker is running (also lazy), 500+ load

2 Likes

If you restart your node a lot you can just disable the file walker from running at startup with the following command in your config file.
storage2.piece-scan-on-startup: false

Also, check your nodes temp folder. My node got overloaded yesterday and today I noticed it had filled my temp folder with 16GB of temp files (all created yesterday). I deleted them…

edit: I intended to reply to original poster, but I mistakenly replied to jarridgraham instead. oops, oh well…

2 Likes

We are seeing this issue for years now: Delete temp files?

It is still sad that the temp files don’t get deleted automatically.

Currently, 10 of the 15 nodes experienced memory overflow problems due to hard disk load yesterday.
I tried lowering the version and the problem was still not solved. Multiple nodes should not have hard disk failures at the same time. Each node runs on an independent hard disk.
Checking the logs found no error message.
I don’t know how to solve it.

1 Like

Personally, I used the following setting in my config file to reduce the number of transfers:
storage2.max-concurrent-requests: 50
It’s not ideal, but it’s better than having my node crash.

I am seeing something like this here as well, about 1/2 of my nodes are crashing , when I track back in the logs I have a golang error about reaching the maximum of 10000 threads. They are run in docker containers so I thought they were isolated to a point but I guess the thread limit may be a global thing. They all get the same error about the same time then the flop around for a bit and take back off

This fixed it for me. I thought the lazy filewalker should have been enough but after disabling the startup scan my nodes are able to start with reasonable load.

1 Like

This is not a version problem.
I’m seeing a client probably pushing out a lot of data and it has been maxing out my internet since yesterday.

2 Likes

It has been modified to the parameters you provided and is running normally.
thx

How did you fix it back to normal?

You may try to specify it for the container to a higher value: docker run | Docker Docs

I am also seeing this, but I’m affected by the recent trash issues (5+TB trash on ~20TB of storage), so I’d rather not disable the FW entirely since I really do need to clear that trash out. Is there a way to get more granular control over it?

Edit: Actually, I’m not sure if this is the same issue. I tried disabling piece-scan-on-startup, but I still see abnormally high load average and thread count on several nodes. Even tried lowering the max storage to stop ingress.

I’ve got some stability after disabling the startup piece scan. Most of my nodes are currently 1.99 which surprised me. Occasionally my rigs are still crashing. I’m suspicious my crashes are happening when nodes upgrade to 1.102. :thinking: Or this has been happening for some time but watchdog and docker restart: unless-stopped prevented me from noticing.

can confirm extremely hight memory usage and process instances on windows version is 102.4



normally it was 2 process on 1 node- here u can see 4-5
also 80gb memory usage- its now what u are expecting ! :slight_smile: for a single node

Check if the data come from saltlake sat?

Mass testing data from saltlake? - Node Operators / troubleshooting - Storj Community Forum (official)

Updates on Test Data

not a saltlake issue for my example

This is completely unrelated. The used-space-filewalker is a separate process unless you also disabled a lazy mode.