How to speedup used-space-filewalker?

The used-space-filewalker is now running for 7 days. Not a big surprise since this is my slowest node when it comes to IO performance (external USB case). I wonder if it could benefit from moving the databases to ssd? Or is it mostly IO limited?

It’s not OR, it’s AND.

Advice is the same as in all previous threads on the same topic. Disable atime updates, disable sync writes, increase amount of available ram, use performant filesystem that can take advantage of that.

Or do nothing, because who cares how long does it run for, if this is a separate pool and does not affect anything else. ?

2 Likes

The best you may be able to hope for is a resumeable filewalker… so it doesn’t restart from-scratch any time it may get interrupted.

2 Likes

I can not change the file system (NTFS) and moving this node (11 TB) would take forever. So the only thing I can change is the database location.

If you can’t increase the RAM, try reducing the I/O on the storage drive by:

  • moving the DB-es and logs on some other drive, SSD, stick;
  • stop the ingress by reducing the allocated space bellow the occupied space.
    After FW finishes, stop the node, reallocate the correct space and restart the node without FW.
    You can check my test results in Tuning the Filewalker, at last posts.
1 Like

If the node has free space for uploads, stop uploads somehow (cut the node on the firewall, disconnect Ethernet cable, whatever) until file walker finishes. Uploads compete with the file walker. Added benefit is that the resulting estimate of space taken will be more accurate.

Your way misses audits and drops the online score. My way dosen’t, but adds the stop and modify steps. So, they are both ok, depending on what one wants.

1 Like

Depending on what you need the used-space stat for, you can get a pretty good estimate by just using df or Windows Disk Management to get the overall disk usage, then subtracting the size of the non-STORJ files on the disk (and maybe also STORJ trash). This will be faster, assuming the number of STORJ blobs is far greater than the non-STORJ blobs.

You may also try to disable a lazy mode, but it will consume IOPS, however it will finish faster.

1 Like

I already did but after 5 or 6 days without any filewalker message in my log I returned to the lazy one.

The node just did a auto update…So back to zero again… :poop:

1 Like

Pretty much the same here. My node is at 1.97.3, waiting for 1.99.3 update. Filewalker (I disabled the lazy one) is running for 5 days straight, without any message whatsoever. HDD is running non stop at 100%… In my node UI, paid storage is at 5TB, however it shows 2.6TB (used) and 3.9TB (free space). My HDD is 7.3TB, but it’s full (less than 50GB free space). I don’t know what to do.

So I get that you have an 8TB HDD, that is 7.3TiB. In your run command or config you should specify 7TB as allocated for storage. This will stop the ingress, the FW will run faster and you start freeing some space for ovearhead. I run my 8TB nodes this way, and never had a problem. If you overallocate space, your node can end up in disqualification.
The FW can take days, dependind on your hardware, especialy CPU, RAM and HDD connection.

1 Like

Allocated space for the SN has always been 6.5TiB. I have no ingress at the moment, because the node knows the HD ran out of (actual) space.
I didn’t do anything unusual (didn’t delete databases or any other operation, HD is intact, no errors in logs), maybe the filewalker (taking too long to complete) could be halted some times during these 14 months and that could have ruined the database. I have a decent connection, enough memory and CPU. But it’s unfair if my node gets disqualified (audit and suspension always at 100% and online >99%). A complete turn of FW session may take weeks to complete and I don’t know exactly why.

I don’t think so. Please explain why this should happen.

It’s allover the forum and in the official docs. But it’s an ipothesis, from what I understand. I believe nobody actualy has been disqualified for this reason. Maybe someone more wise than me can explain it, but I don’t know if they have a real proof of this happening.
Basicaly, they say the satellite records that you stored a piece, but in reality, there was no space left and the piece wasn’t stored. When the sat audits or requests that piece, it finds out you don’t have it and will disqualify you because you lost it. You I mean your node.
I did’t dug dip in this but I always wondered why satellite could believe that you stored a piece since you didn’t and you didn’t send a message to sat telling “done, it’s stored”?

1 Like

It shouldn’t happen AFAIK.

When Storagenode receives the piece, it saves to the disk. If it was successful, it signs the piece hash, and sends back the signed hash to the client.

This is the contract which says: no I am responsible to store the data…

Client sends back all the signed hashes to the Satellite, and Satellite double checks the signatures. If they match, contract is live.

There can be other problems in case of full disk (like sqlite database couldn’t be written and process fails?). I am not sure if it really happens, never tested it. DQ is possible, but the mentioned case shouldn’t happen, IMHO.

Satellite stores the metadata only if Storagenode 100% saved the data…

1 Like

So the free space necessity for overhead is more an assumption than a fact based on hard proofs, tests etc.?
And if you store dbs-es on another drive, they are not in danger.

There is a safety buffer of 500 MB free space.

ReportCapacityThreshold memory.Size   `help:"threshold below which to immediately notify satellite of capacity" default:"500MB" hidden:"true"`

If you are below, uploads should stop.

2 Likes

Yes… But it’s better to be safe, right?