since a couple of days one node uses a lot of memory - especially swap:
If i shut it down and restart it, it takes only a minute and it’s back to this state.
What I did was enabling used-spacewalker and restarting the node while it was running. Before that, the node worked normal as my other nodes. Any ideas?
Node version is 1.114.6 - this one includes the save-state used-spacewalker feature.
DBs are on ssd and the disk is dedicated and NO SMR drive.
How to check which part of the process uses all the RAM/SWAP?
However if you end up using swap even for normal node operation you shall not run node on that hardware: you don’t have nearly enough free ram (in fact, you have negative amount) so the metadata can’t be cached, your node won’t be able to keep up and will keep being getting killed by OOM watcher.
Disk is not slow. It’s a TOSHIBA 18 TB SATA3 - i have several of them in other nodes and no problems at all.
It takes only 20 minutes after startup to get the swap filled.
This node was running fine for 11 months. The “problem” startet with configuration change to STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP (changed from false to true with version 1.114.6). Changing it to false did not solve it. So I would suspect a change introduced with this version (persistence of piece-scan-on-startup was implemented)
Any other possibilites to analyze why this storagenode-process takes the swap? Does the debug-endpoint give useful information here?
BTW: My other nodes show the same amount of RAM as yours @Alexey
This is really strange. If I restart the node - the process grabs as much RAM as it can get and CPU is also crunching. After around one minute, you can see this picture:
So CPU has finished it’s job and now the SWAP moving kicks ins. I would expect that the node is now moving all unused things to swap. And after a couple of hours you can see the picture from the start (low memory, but still massive swapping).
Any idea what is running at startup which could cause these kind of symptoms? Still suspecting the piece-scan-on-startup-filewalker…
It is logical, pc usually after scan remember file addresses, as it not needed it moves it to swap, then after some time i think it will be changed with more relevant information.
Is it magnetic spinning disk — it’s slow. Unless you have plenty of ram to cache metadata, what you see will continue happening. If filewalker takes days — you need more ram.
Spinning disk can support 200 IOPs give or take. Compare with SSD that can sustain 40,000 -500,000 IOPS.
We’ve been through this conversation multiple times on this forum. No need to go over same thing again.
Add 16-32GB of ram and/or SSD cache for metadata. This is the only solution.
flo, is the slow node using the same storj release as the others?
i’ve seen nodes occasionally use a lot of ram (per docker stats).
things that can increase ram low:
disk not keeping up with incoming write requests (I have some NFS mounted shares so they can really fall behind)
running used space and garbage collection filewalkers
using badger filestatcache
how much ram does docker stats show it using?
what is the load on the hard disk (I use atopfor a glance)
maybe let the node run until it finishes both the used space filewalkers and the garbage collection filewalkers. (I grep the logs for the words “used” and “retain” respectively). For terabytes of data this could take days. And then see if the RAM usage has settled down.
but mine on 1.114.6 too and the scan is enabled, lazy mode is on except the one node (which uses more memory by the way), this node also has the badger cache enabled, so the scan on startup is much faster for it.
So, I suspect not the code change, but something happened with that exact node.
Could you please check this disk for errors?
Apparently I missed something again
Are these the new requirements for running a node - 16-32 GB ram ? @Alexey everything is correct - need 32 GB ram and an SSD for the node to work?
In my humble opinion, storj doesn’t pay enough to allocate 32 GB of memory to each node and also SSDs, which degrade anyway.
These are not new requirements. They are not specific to storage nodes. It was discussed before. It’s a physical limitation of a hard drive performance: if combined traffic exceeds 200 iops your HDD will choke and node will get eventually killed. You need to offload IOPs to never exceed 200 hitting disk. Databases to SSD, disable sync, access time updates, and have a lot of ram, for metadata to fit there. How much of that you need to do will depend on current traffic.
How is this news to you?
Here we go again. If you don’t already have a sever with plenty of ram and good performance, and you are ok suffering through laggy performance for your other services you are hosting — you should not be running node on that hardware, you don’t have excess capacity to share, it’s barely enough for yourself.
But the problem here is not storagenode, it’s your chocking server. Add plenty of ram, let filesystem breathe. You will be rewarded with much snappier performance, and look at that, storagenode also works fine without any impact on the system.
On your SSD comment:
SSD is already part of the array because I like my users and don’t want them to suffer thought shit performance.
everything degrades in life. I have never had a single SSD failed yet, and I’m buying exclusively used cheap enterprise SSDs with almost exhausted endurance. You’ll be fine. Just don’t buy garbage made for consumers. It’s not specific to SSDs.
I think arrogantrabbit is perhaps making his case too strongly.
BUT, we did encounter situations over the summer where a simple hard drive setup couldn’t keep up with storj processing. The symptoms were things like disks filling up, incorrect reported space, failing trash cleanups, etc. or at best just having these processings jobs take days or weeks to run.
The causes of those were:
lots of test ingress and trash deletion
having node size over a certain amount (around 8TB, in my estimate).
under current (historically normal) load levels this isn’t really noticeable.
Anyway, the fix to these problems was to speed up hard disk metadata access by offloading to SSD, the ways to do this included:
ZFS l2arc
ZFS special device
synology pin btrfs metadata to SSD
LVM caching or metadata drives
something-something with windows (primocache? drivepool?)
Virt usage is still high (13 GB). Currently used-spacewalker is running.
already did this once. did not help.
Currently high - because i’m trying a second run with used-spacewalker=true and let the node idle for another 24h afterwards before disabling this. Hope this helps.