How to speedup used-space-filewalker?

pangolin · April 10, 2024, 7:36pm

The used-space-filewalker is now running for 7 days. Not a big surprise since this is my slowest node when it comes to IO performance (external USB case). I wonder if it could benefit from moving the databases to ssd? Or is it mostly IO limited?

arrogantrabbit · April 10, 2024, 7:42pm

It’s not OR, it’s AND.

Advice is the same as in all previous threads on the same topic. Disable atime updates, disable sync writes, increase amount of available ram, use performant filesystem that can take advantage of that.

Or do nothing, because who cares how long does it run for, if this is a separate pool and does not affect anything else. ?

Roxor · April 10, 2024, 7:44pm

The best you may be able to hope for is a resumeable filewalker… so it doesn’t restart from-scratch any time it may get interrupted.

pangolin · April 10, 2024, 7:53pm

I can not change the file system (NTFS) and moving this node (11 TB) would take forever. So the only thing I can change is the database location.

snorkel · April 10, 2024, 8:28pm

If you can’t increase the RAM, try reducing the I/O on the storage drive by:

moving the DB-es and logs on some other drive, SSD, stick;
stop the ingress by reducing the allocated space bellow the occupied space.
After FW finishes, stop the node, reallocate the correct space and restart the node without FW.
You can check my test results in Tuning the Filewalker, at last posts.

Toyoo · April 10, 2024, 8:35pm

If the node has free space for uploads, stop uploads somehow (cut the node on the firewall, disconnect Ethernet cable, whatever) until file walker finishes. Uploads compete with the file walker. Added benefit is that the resulting estimate of space taken will be more accurate.

snorkel · April 11, 2024, 1:34am

Your way misses audits and drops the online score. My way dosen’t, but adds the stop and modify steps. So, they are both ok, depending on what one wants.

Ambifacient · April 11, 2024, 2:00am

Depending on what you need the used-space stat for, you can get a pretty good estimate by just using df or Windows Disk Management to get the overall disk usage, then subtracting the size of the non-STORJ files on the disk (and maybe also STORJ trash). This will be faster, assuming the number of STORJ blobs is far greater than the non-STORJ blobs.

Alexey · April 11, 2024, 7:22am

You may also try to disable a lazy mode, but it will consume IOPS, however it will finish faster.

pangolin · April 11, 2024, 11:42am

I already did but after 5 or 6 days without any filewalker message in my log I returned to the lazy one.

pangolin · April 11, 2024, 3:27pm

The node just did a auto update…So back to zero again…

ledur · April 11, 2024, 6:18pm

Pretty much the same here. My node is at 1.97.3, waiting for 1.99.3 update. Filewalker (I disabled the lazy one) is running for 5 days straight, without any message whatsoever. HDD is running non stop at 100%… In my node UI, paid storage is at 5TB, however it shows 2.6TB (used) and 3.9TB (free space). My HDD is 7.3TB, but it’s full (less than 50GB free space). I don’t know what to do.

snorkel · April 11, 2024, 8:30pm

So I get that you have an 8TB HDD, that is 7.3TiB. In your run command or config you should specify 7TB as allocated for storage. This will stop the ingress, the FW will run faster and you start freeing some space for ovearhead. I run my 8TB nodes this way, and never had a problem. If you overallocate space, your node can end up in disqualification.
The FW can take days, dependind on your hardware, especialy CPU, RAM and HDD connection.

ledur · April 11, 2024, 10:25pm

Allocated space for the SN has always been 6.5TiB. I have no ingress at the moment, because the node knows the HD ran out of (actual) space.
I didn’t do anything unusual (didn’t delete databases or any other operation, HD is intact, no errors in logs), maybe the filewalker (taking too long to complete) could be halted some times during these 14 months and that could have ruined the database. I have a decent connection, enough memory and CPU. But it’s unfair if my node gets disqualified (audit and suspension always at 100% and online >99%). A complete turn of FW session may take weeks to complete and I don’t know exactly why.

pangolin · April 11, 2024, 11:40pm

I don’t think so. Please explain why this should happen.

snorkel · April 12, 2024, 4:46am

It’s allover the forum and in the official docs. But it’s an ipothesis, from what I understand. I believe nobody actualy has been disqualified for this reason. Maybe someone more wise than me can explain it, but I don’t know if they have a real proof of this happening.
Basicaly, they say the satellite records that you stored a piece, but in reality, there was no space left and the piece wasn’t stored. When the sat audits or requests that piece, it finds out you don’t have it and will disqualify you because you lost it. You I mean your node.
I did’t dug dip in this but I always wondered why satellite could believe that you stored a piece since you didn’t and you didn’t send a message to sat telling “done, it’s stored”?

elek · April 12, 2024, 1:42pm

It shouldn’t happen AFAIK.

When Storagenode receives the piece, it saves to the disk. If it was successful, it signs the piece hash, and sends back the signed hash to the client.

This is the contract which says: no I am responsible to store the data…

Client sends back all the signed hashes to the Satellite, and Satellite double checks the signatures. If they match, contract is live.

There can be other problems in case of full disk (like sqlite database couldn’t be written and process fails?). I am not sure if it really happens, never tested it. DQ is possible, but the mentioned case shouldn’t happen, IMHO.

Satellite stores the metadata only if Storagenode 100% saved the data…

snorkel · April 12, 2024, 7:11pm

So the free space necessity for overhead is more an assumption than a fact based on hard proofs, tests etc.?
And if you store dbs-es on another drive, they are not in danger.

pangolin · April 13, 2024, 12:05am

There is a safety buffer of 500 MB free space.

ReportCapacityThreshold memory.Size   `help:"threshold below which to immediately notify satellite of capacity" default:"500MB" hidden:"true"`

If you are below, uploads should stop.

Alexey · April 13, 2024, 9:05am

Yes… But it’s better to be safe, right?