Current situation with garbage collection

That’s why I had asked this question here:
Q1 2024 Storj Town Hall Q&A - #3 by jammerdan

No answer. Well, that’s an answer too.

4 Likes

Source?
Also about what fragmentation are we talking?
Fragmentation on ZFS refers to free space, which I don’t believe applies to ext4.

So lets just look at some recent facts. The community has twice now asked for information on the status of the Terms of Conditions. Those requests for information got completely ignored.
Not a oood look in and of itself. It is also now April 10th and we don’t even have the status posting summarising payments for the month but those seem to have concluded several days ago now . I am a great believer in actions being louder than words and frankly for several things we just can’t get any words from Storj, The T&C in particular is a running joke with Storj staff citing them often but refusing to provide information on the updates/changes.

So, in general I agree with your post! But I still don’t think they care.

i mean its not 50%, but some 10% for sure, maaaybe 20%, not so bad, (well okay i have 1 node 4TB average vs 7TB used so yeah ~40% there), but they really took massive action to solve the SNOs problem, i’m like shocked! The amount of care is most def abnormal lately lol. There probably will be some summary, but this changes have to take effect first, i guess its yet too early.

Also to notice, the absence of @John here as well as in the last Town Hall? All is left to speculation.

Same. No need for marketing blubber. At least not always.

If I compare reported used space vs. reported average on my nodes, then it is closer to 50% than to 10%. Very bad.

Source is me. A ZFS node showed 44% fragmentation within a few months of running. I have EXT4 nodes that show less than 3% and they’ve been running for the past couple of years. Experience (couple of decades in enterprise IT) shows that it’s not going to change any time soon.

Fragmentation is file fragmentation. If there isn’t a continuous block of empty space for the file, then the file is split up (fragmented). To make it more specific for our use case: If a few hundred segments are to be written to a disk, that usually means that they will be written one after the other. If a client deletes some of them, they are removed. When the next segments that are significantly smaller come in, they get written in the now empty space block. Again, the client deletes some. A new client comes along and puts a few hundred more segments, but this time the segments are bigger. The filesystem will do what the filesystem does: try to allocate the files continously, see that it can’t and start allocating all over the disk.

As you can see, file or empty space fragmentation is in essence the exact same thing.

1 Like

I understand that some of you need to vent but could you please do that in a different thread. I want to keep this one a bit more positive and focused on the remaining problems.

4 Likes

Well an I had a 5TB node and fragmentation was 2%. See, this says absolutely nothing. Could be some wrong settings, could be anything. Could also be a none issue. 40% fragmentation with 50% free space is not the same as 40% fragmentation with 5% free space.

My guess: you used blockstorage instead of datasets.

No. That is simply not true for ZFS.

It is not. Not even close. A bicycle is not in essence the same as a car, just because both have wheels. I could have a 10TB ZFS pool where 9.9 TB of data itself is absolutely 0% fragmented because it was written sequentially and the 0.1TB free space is 70% because I did some DB at the end.

2 Likes

Ok that explains at least some of the unpaid space. I would give it one more try and if your node still doesn’t get a bloom filter we should make a bug report out of that. I wonder if the satellite is maybe unable to send the max bloom filter size.

I will also take a loot at the satellite logs. They are a bit noisy. I am not sure if I will find something usefull but I can at least try it.

Yes these gaps are expected. The reason is server side copy / server side move.

Imagin the bloom filter creation would run on the live database while a customer is doing server side move. It would look like this:

1 2 3 4 5 List of segments
      ^        Bloom filter cursor going over all segments. Has seen 1 2 3 but not 4 5
5 1 2 3 4 List gets modified by server side move. Segment number 5 gets a new segment ID that is now at the beginning of the table.
             ^ Bloom filter cursor at the end. Has seen 1 2 3 4 but not 5. Segment number 5 is not part of the bloom filter and gets wiped.

That was an emergency I think somewhere early last year. We had to restore all data from trash and run a special repair worker that was luckily able to recover all segments. Without restore from trash this bug could have been game ending.

Solution for this problem: Don’t run bloom filter creation on a live database. Run it on a backup. The timestamp you see it the moment the backup was created and than it takes almost a week until the bloom filter is created and hits your storage node.

4 Likes

ZFS is COW (copy-on-write). If a file is changed, that means that the entire file needs to be copied over to a second block of space on the disk, then the original file is removed. If there isn’t any continuous block of space on the drive, then the file is split up. This is the reason why there is a recommendation for ~70% utilization on ZFS. This is also the reason why file fragmentation = free space fragmentation.

EXT4 on the other hand allocates space based on extends: It reserves a continuous block of free space and writes files into that. If a file doesn’t fit into that extend, a new extend is chosen instead. Oversimplifying, but I’m not going to write a thesis on the subject.

1 Like

“Give it one more try” means wait for next bloom filter, I assume? Or do I need to do something?

Do you need my node ID for looking at the logs? I can PM it if you want.

1 Like

I did inform the team already but missing one bloom filter could be just bad luck. Missing 2 in a row would be enough for me to call it a bug dial in on it.

Yes that would help. I can search for it in the logs.

2 Likes

So some of the fixes that are helping with space discrepancies… are in the newest versions. But until most/all SNOs upgrade to those new versions… there will be some fixes that can’t be turned-on/used (like larger filters)? So… it may take a month or two for node minimum-versions to be increased enough… then still a few weeks of new-bloom-filter-running to trim away the fat?

If I understand things right… SNOs could reasonably expect the space issues to be mostly-solved in about 3 months?

That would be OK by me: I can wait :calendar:

3 Likes

Is there already an implementation for resuming filewalker from the same place after restarting the node?

Yes starting with version v1.101 that should work. I wasn’t able to test it. I donated a lot of memory to ZFS read caching and it makes GC super fast.

1 Like

It’s must work for “usual” filewalker or for lazy fileworker? or both?

@BrightSilence good news. The developer team already catched up on it. The maximum bloom filter size should be 4 100 000 000 bytes. The problem is a byte conversion at the end that somehow adds 3 extra bytes. And that is above the limit. So yea the bigger nodes don’t get any bloom filters at the moment. Developer team is working on a fix.

There is even some chatting around resending the bloom filter that you missed last weekend. So if we are lucky we can still download it and send it to the storage nodes.

7 Likes

I believe it should work for both. Again with my storage node I am not able to verify it.

2 Likes

But storj never changes piece files… So I guess the usual ZFS rules don’t apply here.

I have seen not much of a difference between ZFS and NTFS when almost full. I never used ext4 so I can not compare.