Two weeks working for free in the waste storage business :-(

Climbingkid · June 28, 2024, 4:22pm

HI

Node running for about 7 weeks now. Previously I posted about the amount of uncollected garbage I have had for weeks now - it was 40% of my storage for ages - while it has improved slightly, when we had a long awaited bloom filter I think from SLC, but for the last 10 days its not really come down and my node has been full for over a week now. While I have capacity to bring alot more storage online, im just not prepared to when its being wasted like this.

2.4TB of the 8.2TB allocated is just uncolelcted garbage - 30% !!!

June 2024 (Version: 14.0.0)                                             [snapshot: 2024-06-28 16:07:17Z]
REPORTED BY     TYPE      METRIC                PRICE                     DISK  BANDWIDTH        PAYOUT
Node            Ingress   Upload                -not paid-                       22.49 TB
Node            Ingress   Upload Repair         -not paid-                       46.53 GB
Node            Egress    Download              $  2.00 / TB (avg)              185.85 GB       $  0.37
Node            Egress    Download Repair       $  2.00 / TB (avg)                5.57 GB       $  0.01
Node            Egress    Download Audit        $  2.00 / TB (avg)                3.35 MB       $  0.00
Node            Storage   Disk Current Total    -not paid-             8.20 TB
Node            Storage              ├ Blobs    -not paid-             8.03 TB
Node            Storage              └ Trash  ┐ -not paid-           165.84 GB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             2.47 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             2.64 TB
Satellite       Storage   Disk Last Report      -not paid-             5.56 TB
Satellite       Storage   Disk Average So Far   -not paid-             2.59 TB

I have raised this before in other threads, no real resolution. I have therefore spent the last 10 days being patient - but its not really improved.

All file walkers are completing - I force a restart every few days, free-space-filewalkers run pretty quickly on my setup, all of them log successful. No database errors, log error or anything reported.

But as far as I can see - I just dont see any bloom filters…

Is it just me?
CC

littleskunk · June 28, 2024, 4:59pm

How about your ttl db file? Any chance that might have been lost at some point? Because this data should be collected without having to wait for garbage collection.

Climbingkid · June 28, 2024, 5:04pm

@litteskunk - Ooooo tell me more… How can I look or check, is it one of these?

-rw-r--r-- 1 apps apps      49152 Jun 28 17:35 /mnt/BogardePool/StorJ-Node/Config/storage/bandwidth.db
-rw-r--r-- 1 apps apps      24576 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/garbage_collection_filewalker_progress.db
-rw-r--r-- 1 apps apps      32768 Jun 28 06:48 /mnt/BogardePool/StorJ-Node/Config/storage/heldamount.db
-rw-r--r-- 1 apps apps      16384 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/info.db
-rw-r--r-- 1 apps apps      24576 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/notifications.db
-rw-r--r-- 1 apps apps      32768 Jun 25 19:14 /mnt/BogardePool/StorJ-Node/Config/storage/orders.db
-rw-r--r-- 1 apps apps 2450997248 Jun 28 17:35 /mnt/BogardePool/StorJ-Node/Config/storage/piece_expiration.db
-rw-r--r-- 1 apps apps      24576 Jun 28 17:23 /mnt/BogardePool/StorJ-Node/Config/storage/piece_spaced_used.db
-rw-r--r-- 1 apps apps      24576 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/pieceinfo.db
-rw-r--r-- 1 apps apps      24576 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/pricing.db
-rw-r--r-- 1 apps apps      24576 Jun 28 14:49 /mnt/BogardePool/StorJ-Node/Config/storage/reputation.db
-rw-r--r-- 1 apps apps      32768 Jun 25 19:16 /mnt/BogardePool/StorJ-Node/Config/storage/satellites.db
-rw-r--r-- 1 apps apps      24576 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/secret.db
-rw-r--r-- 1 apps apps      61440 Jun 28 06:48 /mnt/BogardePool/StorJ-Node/Config/storage/storage_usage.db
-rw-r--r-- 1 apps apps      20480 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/used_serial.db
-rw-r--r-- 1 apps apps      24576 Jun 25 18:46 /mnt/BogardePool/StorJ-Node/Config/storage/used_space_per_prefix.db

CC

littleskunk · June 28, 2024, 5:14pm

Yes that one. The size looks like it has all the information in it. I didn’t check if the collector that deleted these pieces is working. It should run once per hour I believe. I am on mobile right now and can’t give you a log example.

pangolin · June 28, 2024, 5:40pm

Log entrys are like this:

2024-06-26T17:30:57+02:00       INFO    collector       collect {"count": 1153}
2024-06-26T18:30:56+02:00       INFO    collector       collect {"count": 815}

Search for ‘collector’.

jammerdan · June 28, 2024, 5:42pm

Are you sure you are not deleting them by your restarts:

Also:

Climbingkid · June 28, 2024, 5:43pm

@pangolin - yes I see those, and runs hourly. Not really bringing the garbage down though - this has to be TTL data right, else it would either be credited or in the trash by now, have I understood that right?


zgrep "collector"  /mnt/BogardePool/StorJ-Node/Config/node-2024-06-28.log
2024-06-25T17:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 580}
2024-06-25T18:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 621}
2024-06-25T19:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 611}
2024-06-25T20:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 634}
2024-06-25T21:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 581}
2024-06-25T22:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 558}
2024-06-25T23:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 518}
2024-06-26T00:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 477}
2024-06-26T01:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 469}
2024-06-26T02:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 371}
2024-06-26T03:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 17}
2024-06-26T04:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 5}
2024-06-26T07:14:26Z    INFO    collector       collect {"Process": "storagenode", "count": 199}
2024-06-26T08:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 425}
2024-06-26T09:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 459}
2024-06-26T10:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 483}
2024-06-26T11:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 391}
2024-06-26T12:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 1}
2024-06-26T14:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 1}
2024-06-26T17:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 2}
2024-06-26T20:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 6}
2024-06-26T23:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 9}
2024-06-27T00:14:25Z    INFO    collector       collect {"Process": "storagenode", "count": 2}
2024-06-27T02:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 72}
2024-06-27T03:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 414}
2024-06-27T04:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 376}
2024-06-27T05:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 361}
2024-06-27T06:14:31Z    INFO    collector       collect {"Process": "storagenode", "count": 370}
2024-06-27T07:14:31Z    INFO    collector       collect {"Process": "storagenode", "count": 356}
2024-06-27T08:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 410}
2024-06-27T09:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 413}
2024-06-27T10:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 409}
2024-06-27T11:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 441}
2024-06-27T12:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 430}
2024-06-27T13:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 457}
2024-06-27T14:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 469}
2024-06-27T15:14:33Z    INFO    collector       collect {"Process": "storagenode", "count": 503}
2024-06-27T16:14:32Z    INFO    collector       collect {"Process": "storagenode", "count": 535}
2024-06-27T17:14:38Z    INFO    collector       collect {"Process": "storagenode", "count": 640}
2024-06-27T18:14:33Z    INFO    collector       collect {"Process": "storagenode", "count": 625}
2024-06-27T19:14:30Z    INFO    collector       collect {"Process": "storagenode", "count": 604}
2024-06-27T20:14:29Z    INFO    collector       collect {"Process": "storagenode", "count": 598}
2024-06-27T21:14:29Z    INFO    collector       collect {"Process": "storagenode", "count": 575}
2024-06-27T22:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 502}
2024-06-27T23:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 420}
2024-06-28T00:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 400}
2024-06-28T01:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 434}
2024-06-28T02:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 417}
2024-06-28T03:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 345}
2024-06-28T04:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 390}
2024-06-28T05:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 401}
2024-06-28T06:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 368}
2024-06-28T07:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 407}
2024-06-28T08:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 373}
2024-06-28T09:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 442}
2024-06-28T10:14:28Z    INFO    collector       collect {"Process": "storagenode", "count": 423}
2024-06-28T11:14:27Z    INFO    collector       collect {"Process": "storagenode", "count": 298}
2024-06-28T13:14:26Z    INFO    collector       collect {"Process": "storagenode", "count": 168}
2024-06-28T14:14:30Z    INFO    collector       collect {"Process": "storagenode", "count": 570}
2024-06-28T15:14:29Z    INFO    collector       collect {"Process": "storagenode", "count": 527}
2024-06-28T16:14:29Z    INFO    collector       collect {"Process": "storagenode", "count": 592}

Vadim · June 28, 2024, 5:44pm

Is it possible?
to setup log WARN + some system messages like firewalkers?
because info with all client connections is too much
or maybe just filter out downloads and uploads?

Climbingkid · June 28, 2024, 5:50pm

I have been keeping a really close eye on logs - all looks normal as far as I can tell, just have all this uncollected garbage thats really not going anywhere.

CC

pangolin · June 28, 2024, 5:55pm

Yes, TTL data should not go to trash. However, there was problem with uploads from SLC some time ago which results in TTL deletes not working as they should.

nerdatwork · June 28, 2024, 6:55pm

2024-06-28T07:23:57Z    INFO    collector       collect {"Process": "storagenode", "count": 662151}
2024-06-28T10:02:30Z    INFO    collector       collect {"Process": "storagenode", "count": 663088}
2024-06-28T10:19:16Z    INFO    collector       collect {"Process": "storagenode", "count": 663128}
2024-06-28T11:11:13Z    INFO    collector       collect {"Process": "storagenode", "count": 663128}

Something like this should give noticeable free space but I don’t see that happening on Windows.

Climbingkid · June 28, 2024, 7:05pm

@jammerdan

Pretty sure its not this - node is sitting doing nothing when I stop it for a few seconds, rotate log and start again. Besides, I have only one entry in the log from the last week and it was successful. If it was partial would I not have a start and no completion?

24-06-25T17:29:47Z    INFO    lazyfilewalker.gc-filewalker    starting subprocess     {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-25T17:29:47Z    INFO    lazyfilewalker.gc-filewalker    subprocess started      {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-25T17:29:48Z    INFO    lazyfilewalker.gc-filewalker.subprocess Database started        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode"}
2024-06-25T17:29:48Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker started   {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "createdBefore": "2024-06-21T17:59:59Z", "bloomFilterSize": 79866}
2024-06-25T17:29:52Z    INFO    lazyfilewalker.gc-filewalker.subprocess gc-filewalker completed {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Process": "storagenode", "piecesCount": 148058, "trashPiecesCount": 7406, "piecesTrashed": 7406, "piecesSkippedCount": 0}
2024-06-25T17:29:52Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully        {"Process": "storagenode", "satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}

That may be the case - but that was the advice I had to force trash collection which was not working.

Welcome any other ideas. Logs show Im not receiving bloom filters, well, very few.

Thanks
CC

pangolin · June 28, 2024, 7:23pm

I can’t see any reliable schedule in bloom filter creation either.

snorkel · June 28, 2024, 7:50pm

Yes, of course. Search the thread about custom log level. I made a guide there.
https://forum.storj.io/t/log-custom-level/25839/17?u=snorkel

Alexey · June 29, 2024, 3:43am

The Average Disk Space Used This Month is way off, because US1 and SLC are missed to send a few reports to your node, so likely your node doesn’t have a problem with GC.
See:

Climbingkid · June 29, 2024, 9:57am

@alexy, thank you, I dont think this is an Average Space Disconnect discussion. I know you have directed me to that link previously, but as per the title, for how long do we store this data for free? Its been a few weeks now - its stopping further ingress.

StorJ are wasting 30% of storage, with a very casual attitude and complete lack of urgency to resolve. Then on the flip side I see threads looking at expansion and asking about who plans to scale up.

I have 50-60TB ready to bring online, this 8TB node was a test for me, but the prospect of having 30% of those drives effectively wasted, or 18TB, is a ridiculous proposition.

What is the eta to resolve this please, are you able to find out?

Many thanks
CC

Alexey · June 29, 2024, 11:03am

Please try to understand. Every estimation tool is lean on the report provided by the satellites. If they off (which is known and confirmed), any estimation will be wrong.

If you have also the discrepancies between a piechart on the dashboard and the actual usage, your node likely suffering from the errors, related to a filewalker and/or databases. There is no other known issues as far as I know.
So. The Avg Used Space perhaps would be fixed before the next month (I have a little hope here, unfortunately, because it’s a manual operation and all devs are busy), however, it will not affect payouts. You will be paid for what’s your node is submitted to the satellites (it’s signed by the customer, by the node and by the satellite), there is no way to screw it. It’s cryptographycaly confirmed and cannot be altered.

For the Avg Used Space discrepancy there is no ETA. It’s not important, because it doesn’t used anywhere except the dashboard… If they have a time, they would backfill, but, I know, there are a lot of a customer features which should be implemented first… Thus, I do not believe, that it would be fixed in the near future until this feature would be implemented:

github.com/storj/storj

Fix storage usage gaps displaying

opened 09:52AM - 18 Jun 24 UTC

mniewrzal

In case ranged loop don't run tally for one or more days storage nodes displays …gaps in usage (see SN screenshot). After investigation it looks that this will not affect billing in a substantial way as when new tally will be calculated it will take into account whole missing period and usage in TB/h will be higher (see DB entries from `accounting_rollups` tables for single node) We need to fix how nodes are displaying usage to avoid confusion and complains from SNOs. I think we have two options: * calculate average usage stats fully on satellite side. with that we should calculate usage for missing days from first available tall after those missing days. in this case we should return final values of usage for each day in a month * fix displaying on storage node side by ignoring gaps with 0 usage and just display those days that we have tallies for. alternatively we could try to calculate gaps in the same way as proposed for satellite side but not sure if that make sense on client side. * maybe there is another option I didn't think about ![Image](https://github.com/storj/storj/assets/7313031/520f42c0-7369-430d-a52c-60913806de5e) ![Image](https://github.com/storj/storj/assets/7313031/c49c93c8-4305-4253-bc92-c6bc898211cf)

ACarneiro · June 29, 2024, 11:28am

I’m not sure we’re talking about the same thing, though.

My understanding is that @Climbingkid is seeing significant numbers of uncollected rubbish on the file system (not the dashboard) despite running filewalkers and them completing successfully (supposedly). At the numbers he is seeing it’s a valid concern.

I believe @BrightSilence has also mentioned many times that there seems to be a significant amount of uncollected rubbish on nodes at the moment, which may be related to Bloom Filter issues.

littleskunk · June 29, 2024, 11:37am

How do you define “rubbish”? How do you know how much rubbish you have on disk? (Just continue that thought and the post from @Alexey will suddenly explain a lot…)

BrightSilence · June 29, 2024, 11:43am

My earnings calculator uses the last reported stat from the satellite. So as long as that report is correct, the resulting uncollected garbage will be too. The gaps earlier in the month don’t impact that. Yet I have several nodes with 5TB+ uncollected garbage according to that calculation.