Release preparation v1.108

Mitsos · July 27, 2024, 6:53pm

How are your nodes getting any ingress if you haven’t run used-space for a while?

Ambifacient · July 28, 2024, 12:41am

You can set a large node size like 100TB and let the node detect when it is near full.

Mitsos · July 28, 2024, 8:39am

So if I have a 20TB drive and set the node size at 100TB it somehow magically bypasses the space calculations and used space isn’t needed anymore?

agente · July 28, 2024, 9:30am

wow. mind blowing… do you tried it for long time? is it safe?

Julio · July 28, 2024, 9:39am

Look guys, what have you to lose? Yes, practically it’s safe. Besides, consider that this is test data, and there is no impediment to making a new node with a new identity - the only hitch is the hold back. So if you f# it up, try again, the data will flow and you’ll get credit for it, eventually.

Julio · July 28, 2024, 9:41am

Even if you run totally out of space, you could nuke the trash directory manually - do you really think they would call back an audit on test data? There really is no vetting on this.

Mitsos · July 28, 2024, 9:47am

@Julio : My questions are pretty simple, yet they are never answered.

If you have a 20TB drive and set the allocated space to 100TB, with all of the issues we had with space not getting properly updated (note to reader: this isn’t whining), how is the node getting any ingress since the available space will only be checked on node restart (allocated X but Y is available)?

The free space isn’t just a cosmetic item on a dashboard. It’s the free space reported back to the satellite and ingress stops when the node thinks it is full even though it isn’t. How does a bigger allocation bypass this?

Balage76 · July 28, 2024, 10:12am

I believe that the node checks the available space on the drive vs the allocated space in the config.yaml during startup. If you set it higher than actual, it will throw an error in the log and report back the actual during startup.
The point here is that this process happens only during startup. So you need to restart your node about weekly to “make” available space.

Julio · July 28, 2024, 10:12am

The only thing that matters is what the satellite knows you have. The average disk space used this month, the last value on that graph - albeit delayed is what it works with. If your allocation - (subtract) whatever your node thinks you have (no matter how f#d up that may be, used of total disk space graph) is greater than that, data inflows.

Mitsos · July 28, 2024, 10:13am

No, you got it backwards. The average has nothing to do with the reported free space to the satellite.

Julio · July 28, 2024, 10:24am

It just goes by what it has, your current stats. If you allocate +1 TB, then the free space stat increases +1 TB. Really it’s that simple. I guess I just don’t get where you’re commin’ from, your node only tries to compute used space. And you only really know what your true used space should approximately be - aside from the delay of the satellite reporting intermittently what that current verified value really is, by looking at the average disk space used this month’s latest data point. And then you can cross your fingers and hope in about 7 days to two weeks, garbage collection etc. actually manifests deletes on your node to match - lol.

Julio · July 28, 2024, 10:32am

In a perfect world… used space should almost specifically match the latest data point showing in the avg. disk space graph. Every day… all the time. But don’t expect that, cuz the volume of data being deleted by TTL hourly and rapid inflow of replacement data can be enormous, and only updated upon proper completion of filewalkers, etc. to that db, subsequently updated and hour or two later.

Mitsos · July 28, 2024, 10:35am

Not that simple. I have a 20TB drive and allocated 20TB to it. The node has filled up all the drive and is now reporting back to the satellite that it doesn’t have any more free space. I edit config.yaml and set the allocation to 25TB. I restart the node and the allocation is still the same. 20TB. trash-cleanup didn’t update the space, so a week after a huge delete, the node still thinks that it has no free space. Me adding +5TB didn’t increase the free space.

No, the node only tries to compute free space which is what is reported to the satellite so that it can get ingress. The satellite does not care how much used space I have, other than cosmetic (pretty graph) reasons.

No, the true used space is what the drive has and is being reported by the OS. Average is just a graph, not used in anything else (no calculations, no payouts, no anything).

No, the stats will not match as long as there are bugs creeping up every time (note to reader: this is not whining).

Julio · July 28, 2024, 10:39am

Interesting…I have yet to run into this condition

pasatmalo · July 28, 2024, 10:41am

Deleting the trash directory manually, even though in most cases will not cause any issues, is not as “fool proof” as you say it to be.

When a piece is requested to the node and for any reason is found in the trash, the node will reply with the piece anyway and indicate that it was found in the trash.

This means that if you delete the trash manually and one of those “accidental trash” pieces happens to be the target of an audit (which are conducted in all satellites, including saltlake), it will result in a failed audit. Sure, it may still be unlikely that you get enough audits to pieces accidentally trashed (which are already rare enough) to be DQ’ed, but trash is there for a reason.

Mitsos · July 28, 2024, 10:47am

That’s how it works.

If a node is full and you delete the databases and restart the node, the node will complain that the allocation is less than the minimum (500GB) and will not start. The allocation is used to calculate the free space (allocation-used-trash=free). Since the node doesn’t know how much data is has, it goes with the free space reported by the OS (exactly the same as setting the allocation to 100TB on a 1TB drive, it will not magically grow the drive). There isn’t any free space = true allocation is very small (ie 5GB) = node complains that the allocation is less than the minimum = node doesn’t start.

Julio · July 28, 2024, 10:47am

Guess you missed my point, I’m not concerned about being potentially DQ’d on SLC, it’s test data, real data will be funneled through US sat., and there’s literally no vetting period now. So if you want more trash test data just start a new node.

Julio · July 28, 2024, 11:10am

Thats sensible Balage76

Balage76 · July 28, 2024, 11:57am

littleskunk:

Balage76:

Will the TTL deletions update the used-space?

Yes.

Balage76:

Will the trash/GC deletions update the used-space?

Yes.

Balage76:

if I have 1.108.3 software and the used-space-fw run fully with this version, then can I disable start-up used-space scan?

Yes and No. The next fix you will need is storagenode/pieces: run cache persistence loop concurrently with the … · storj/storj@74e3afe · GitHub

I believe without that the numbers on the dashboard will still be incorrect the longer the used space filewalker is running. I am not running the used space filewalker on my nodes so I can’t verify that. Furthermore I believe the moment the used space filewalker is finished that offset should stay the same and doesn’t get more incorrect over time. So you could still disable the used space filewalker and accept the offset until v1.109 is out.

Edit: The follow up question would be why run the used space filewalker now. Sure it will reduce the offset that the TTL deletes caused but if you have enough free space you can as well accept even that higher offset and wait for v1.109 with no used space filewalker execution now.

Have you ever wondered to have an option like the retain.concurrency in the config, in order to limit the max number of simultaneous filewalkers?
Even if I limit the retain.concurrency to 1, still there are many FWs (plus the TTL delete) which run simultaneously after startup.
Yes, it is possible to add SSD cache for some systems, but there are many systems where the best case scenario is that you have the databases on the OS SSD and the data is on a different HDD.
I know that it has a drawback, but from the HDD point of view, I believe it is better to have the tasks one after an other, than simultaneously.

Ambifacient · July 28, 2024, 2:25pm

No, the idea is to forego having the node compute accurately compute what the disk space is and just let it run without this knowledge. If there is free space (on both how much you allocated and what it sees on the filesystem), the node should detect this and allow ingress, otherwise stop. Let the node collect garbage and delete data as normal.

For example as of v1.109.2, when deleting pieces from trash it does not update in real time, so if the node gets interrupted/restarted during this process you will need to invoke a used-space walker every time this occurs. Depending on the number of pieces on your node and the amount of bloom filters we have been seeing recently it’s easier to just give up on having the node track the space to avoid running used-space altogether.

I do this on ZFS using quotas and it works fine.