The trash is unpaid?

Depends on the extension (it’s recently adjusted). It’s either the full bloom filter or the full bloom filter (for one satellite) + some very small metadata (like the checksum).

Theoretically the RAM, but I am not worried. 100MB is probably the right size for 50-60 TB disk or bigger…

Also: it’s slower to check every pieces in the bloom filter.

It’s a bigger challenge on the Satellite side to generate BF, but there are couple of options to scale it…

3 Likes

You need to generate one of those filters to see how it works out. The current size is still too small. If the satellite can generate one reliably, then leave it at that size. Having a bloom scan take 2 days instead of never matching a portion of the data it should have is still the lesser of two evils.

At 6m per TB that still comes out to 34MB. The current size is still too small. There aren’t any wrong calculations.

If your 8TB/10TB figures are real, then there is seriously something wrong with your node, especially if it’s still the same 6 month later.

2TB data will not generate 8TB trash unless a customer keeps uploading and deleting data without a TTL within a short time frame. And then more people would see such figures.

1 Like

Thats true but it is is not like 1/4 for each satellite. I guess US1 is 70-80% of my overall pieces (real numbers need some time to count). So probably only US1 needs significantly bigger bloom filters.

They are not, hence why they were accompanied by “Example:”

Tangential noob question: how do I know if I bloom filter is “happening” on my node?

Feeling sharing time: the situation with trash seems to have gotten worse. Maybe because my node sizes are bigger than when I started (largest are about 7TB), but I suspect it’s more about what the satellites are telling my node to do.

And by “worse” I mean:

  • more of my dashboard space is unpaid trash (sometimes over 1TB on a 7TB node)
  • more of my disk space is taken up by files that are not reflected in dashboard at all (sometimes also up to 1TB on a 7TB node)

And this “worse” behavior makes me feel bad. Both because it reduces payouts somewhat, but also because it makes me feel like I’m wasting my time and efforts.

This thread has already 172 posts and you argue with a not existing “example”?

Like the other guy with his 500MB node who claims this is a big problem for him…

Has something changed in the world that I’m not aware of? It used to be that you used an imaginary example to try and explain something. Are we sticking just to “if it’s not peer reviewed and published in an official publication we aren’t talking about it” ?

Because if that’s the case, please let me know and I will only respond with “Yes.” and “No.” from now on because people seem to be a lot more interested in commas and grammar than actual substance.

You see in log entries about retain started and finished. When you receive a bloom filter, it’s saved in retain folder. After completion, it’s deleted. Bloom filters come 1 or more per week… when Storj dosen’t halt them.
You can read these in reverse order, for the chronologic order. :grin:

/mon/ps is your friend: Guide to debug my storage node, uplink, s3 gateway, satellite

Edit: Well reading the logs might be easier. I am making this a bit too complicated I guess.

1 Like

For 7TB or smaller nodes bloom filter size is not a problem. Maybe filewalkers not running like they should?

According to my statistics unpaid data for the small nodes is 5% atm.

I don’t know the proportion of data that is still on old RS numbers (or any intermediary versions tested). I assume all of these pieces constitute what is aggregated to the average segment size.

I really hope Storj will publish a blog post on optimizing BF generation, this sounds like a very interesting topic!

Of course it’s a big problem with my $0.19 earnings so far. I’m sure you can guess better the purpose of a 500 Gb node.

It will be sent, yes. But if these pieces were deleted by the expiration collector, they will be skipped. However, there could be a race condition and one of these filewalkers can win earlier. I would not expect that it could happen though, the expiration collector is executed much more often and it removes pieces right away.

It ether NOT enabled or NOT working in practice!

At some time ago(last year) i vacuumed orders.db from few hundreds MB(300-700) to just 0.032 MB on each of my nodes.
Because this db was actually empty and not in real use for very long since orders storage moved from db to separate files in /orders/ folder long ago. But it was never vacuumed by storj after it until I did it manually (using the sqlite3 utility).

This year it repeated with bandwidth.db after devs changed way info on bandwidth is stored in that db and about 95% records from it were deleted during transition.
But bandwidth.db file sizes still remains in the 70-90 MB range few month after that change. Until i vacuumed it manually again about 2 weeks ago and it instantly dropped from 70-90 MB to just ~1.5 MB.

I will try to vacuum copy of biggest(>8 GB) piece_expiration.db i have to see how many “dust” it now contain and post result here later…

1 Like

Its now finished (took about an hour to vacuum it with sqlite3)
piece_expiration.db size was squeezed out from 8823 MB to 8435 MB
So 388 MB of “dust” removed. It doesn’t seem to be much (in % terms its only ~4%), but I think this is only because the massive deletion of files by TTL collector has not yet begun on this node.
If I understand correctly, then these records are responsible for this process in the logs:

2024-06-30T16:21:09+03:00	INFO	collector	collect	{"count": 39}
2024-06-30T19:26:44+03:00	INFO	collector	collect	{"count": 5078}
2024-06-30T22:48:37+03:00	INFO	collector	collect	{"count": 19019}
2024-07-01T01:21:14+03:00	INFO	collector	collect	{"count": 66}
2024-07-01T04:22:37+03:00	INFO	collector	collect	{"count": 1047}
2024-07-01T07:21:13+03:00	INFO	collector	collect	{"count": 59}
2024-07-01T10:21:20+03:00	INFO	collector	collect	{"count": 106}
2024-07-01T13:21:41+03:00	INFO	collector	collect	{"count": 210}
2024-07-01T17:26:41+03:00	INFO	collector	collect	{"count": 39409}
2024-07-01T21:41:04+03:00	INFO	collector	collect	{"count": 77809}
2024-07-01T23:02:09+03:00	INFO	collector	collect	{"count": 17178}
2024-07-02T01:25:51+03:00	INFO	collector	collect	{"count": 2213}
2024-07-02T04:23:08+03:00	INFO	collector	collect	{"count": 1177}
2024-07-02T07:22:27+03:00	INFO	collector	collect	{"count": 317}
2024-07-02T10:22:06+03:00	INFO	collector	collect	{"count": 262}
2024-07-02T13:22:17+03:00	INFO	collector	collect	{"count": 329}
2024-07-02T16:22:05+03:00	INFO	collector	collect	{"count": 251}
2024-07-02T19:22:47+03:00	INFO	collector	collect	{"count": 813}
2024-07-02T22:23:05+03:00	INFO	collector	collect	{"count": 526}
2024-07-03T01:23:59+03:00	INFO	collector	collect	{"count": 1223}
2024-07-03T04:21:12+03:00	INFO	collector	collect	{"count": 139}
2024-07-03T07:21:34+03:00	INFO	collector	collect	{"count": 1186}
2024-07-03T10:21:41+03:00	INFO	collector	collect	{"count": 1335}
2024-07-03T13:21:55+03:00	INFO	collector	collect	{"count": 1391}
2024-07-03T16:21:19+03:00	INFO	collector	collect	{"count": 271}
2024-07-03T19:21:25+03:00	INFO	collector	collect	{"count": 264}
2024-07-03T22:23:26+03:00	INFO	collector	collect	{"count": 2287}
2024-07-04T01:21:26+03:00	INFO	collector	collect	{"count": 294}
2024-07-04T04:21:13+03:00	INFO	collector	collect	{"count": 166}
2024-07-04T07:21:15+03:00	INFO	collector	collect	{"count": 108}
2024-07-04T10:22:13+03:00	INFO	collector	collect	{"count": 1056}
2024-07-04T13:21:16+03:00	INFO	collector	collect	{"count": 99}
2024-07-04T16:21:36+03:00	INFO	collector	collect	{"count": 409}
2024-07-04T19:27:59+03:00	INFO	collector	collect	{"count": 5620}
2024-07-04T22:27:48+03:00	INFO	collector	collect	{"count": 6743}
2024-07-05T01:27:02+03:00	INFO	collector	collect	{"count": 4696}
2024-07-05T04:27:55+03:00	INFO	collector	collect	{"count": 6732}
2024-07-05T07:24:43+03:00	INFO	collector	collect	{"count": 3102}
2024-07-05T10:24:52+03:00	INFO	collector	collect	{"count": 2697}
2024-07-05T13:25:30+03:00	INFO	collector	collect	{"count": 3179}
2024-07-05T16:28:36+03:00	INFO	collector	collect	{"count": 7208}
2024-07-05T19:29:53+03:00	INFO	collector	collect	{"count": 7046}
2024-07-05T22:38:57+03:00	INFO	collector	collect	{"count": 16146}
2024-07-06T01:31:34+03:00	INFO	collector	collect	{"count": 9962}

So only about 200к files deleted by TTL for last week. And this piece_expiration.db now contain something like 30-40 millions records. 200k less 1% of it deleted last week. And manual vacuum removed > 4%.
So I think this database has never been vacuumed before either. And after massive TTL deletions which expected to happen very soon, it will also be full of “dust” as it were with orders.dband bandwidth.db before.

It’s just because you’re using a ridiculously small node for the test!

2024-06-20T01:48:40Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 7162, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 216616, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "6m54.754044s", "Retain Status": "enabled"}
2024-06-21T15:00:20Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 1205, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 210036, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "2m50.3943145s", "Retain Status": "enabled"}
2024-06-22T13:55:14Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 230, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 209148, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1m9.0478738s", "Retain Status": "enabled"}

GC processed BF for only ~200k files there. And moved to trash just few thousand of them. Of course, it is not surprising that it coped with such a small simple task very quickly.
And you give it out as if everything was fine and there were no significant problems with performance.
Whereas there are HUGE performance issues with all Storj filewalkers. Including GC. On large real nodes, which now contain tens of millions of files, one complete garbage collector or used space filewalker pass takes DAYS (in worst cases - even few WEEKS), not minutes as you try to show.
Something like that for nodes with 30-90 mil files (and its only 5-15 TB of real data from current storj network):

2024-06-13T13:49:21+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 85549, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2431713, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h2m15.6866686s", "Retain Status": "enabled"}
2024-06-13T18:50:59+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 2802, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 466022, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "21m31.4692929s", "Retain Status": "enabled"}
2024-06-14T10:30:12+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 914, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 464037, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "7m33.4334479s", "Retain Status": "enabled"}
2024-06-20T18:31:40+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 6772007, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 28216438, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "150h44m39.3203172s", "Retain Status": "enabled"}
2024-06-21T15:47:08+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 47032, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2471054, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "36h24m47.0112751s", "Retain Status": "enabled"}
2024-06-23T05:59:17+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 17936, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 2450671, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "28h40m45.1656485s", "Retain Status": "enabled"}
2024-06-27T07:18:13+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "D:\\Storj_Data\\Storage Node/retain", "Deleted pieces": 995449, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 19788591, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "168h57m38.5748511s", "Retain Status": "enabled"}

~50 mil files processed and ~8 mil moved from /blobs/ to /trash/.

Note that processing sat with only 460к files/pieces stored taken 7 and 21 minutes for two runs. Not so far from your data for sat with 200k files. But for satellite with 28 М files (Saltlake) it was 150 hours!
And for other sat with almost 20 М files (US1) it was 168 hours - one full WEEK for GC to finish just this one BF for one satellite alone!

Because the relation here is nonlinear - you can’t take the time for which the garbage collector processed 200k files and say that, for example for 40 million files, it will take “only” ~200 times more time(40 000k/200k = 200). Let’s say 15 hours instead of 5 minutes.

Because on small nodes, after a while, all or almost all the metadata on the stored files being very actively accessed by GC (and other filewalkers and main storagenode) end up in the OS cache in RAM and after this point processing speeds up many folds or even by orders of magnitude due to fact that GC now reads mostly from RAM cache, and not from slow disk. Whereas on large nodes, the volume of metadata is simply too big to fit in RAM caches of any reasonable size(for example above metadata size on disk is approaching ~100 GB and disk RAM cache was allocated in 5-8 GB range) and so almost all that data continues to be read from a slow HDD even with full “warm” caches. They still continue to work, just with very low hit ratio. As a result, filewalker runs takes days instead of hours expected from simple linear extrapolations from a data taken from a small node.

P.S.
Here is another GC log from another real large node. It has slightly more files and also was under high load from upload trafic during GC runs. So it taken even longer for GC to finish due to high concurrency for HDD IOP from the main storagenode process serving customer (and mostly saltlake synthetic) data uploads.

2024-06-13T09:54:18+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 31404, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 400686, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "1h26m32.2431335s", "Retain Status": "enabled"}
2024-06-13T13:10:06+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 95901, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1610581, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h28m0.733484s", "Retain Status": "enabled"}
2024-06-13T19:31:51+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 3117, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 370551, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "2h48m15.3468187s", "Retain Status": "enabled"}
2024-06-14T20:11:09+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 702, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 368671, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "5h50m24.5053326s", "Retain Status": "enabled"}
2024-06-20T16:50:34+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 44613, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1627415, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "6h41m16.8723145s", "Retain Status": "enabled"}
2024-06-23T08:48:53+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 17697, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 1629771, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "31h44m26.8040361s", "Retain Status": "enabled"}
2024-06-25T09:52:00+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 8439223, "Failed to delete": 1, "Pieces failed to read": 0, "Pieces count": 36878846, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "263h21m57.4555898s", "Retain Status": "enabled"}
2024-06-30T09:39:55+03:00	INFO	retain	Moved pieces to trash during retain	{"cachePath": "C:\\Program Files\\Storj\\Storage Node/retain", "Deleted pieces": 1196785, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 18936780, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "248h22m25.9166617s", "Retain Status": "enabled"}

I guess its now time come to add “d” (days) and “w” (weeks) chars for Duration output in addition to “m” and “h”. “m” would be overkill. At least for now, but who knows…

6 Likes

Agreed that if you have super optomised nodes that are overkill you can get reasonable cleaning/gc on larger nodes. But for most common people it does take days to clean things out and if we want to scale up trash processes as a whole need to be improved

2 Likes

It is not that hard. I am having fun with a pi5 and wouldn’t call it super optimised. All I am doing is making use of ZFS metadata caching. This is the first garbage collection run for me. The cache needs to get filled first so I don’t know how effective it is. I might have to run it for an additional week to have enough numbers to work with. The point is what you call super optimized are standard settings that you can apply on your nodes as well.

1 Like