Two weeks working for free in the waste storage business :-(

Climbingkid · July 2, 2024, 4:28pm

@Alexey - Just wanted to update here. The situation is now worse Im afraid. Of my 8TB node, 7.14TB is now flagged as uncollected garbage - so my node has a whopping 90% of yet to be categorised trash - my node shows as full, no ingress now for 3 weeks.

I can only think that as a result of the month end, a large amount of test data has expired and is awaiting bloom filter? This has now become evben more important to me a resolution is forthcoming - any updates? Still not seeing any bloom filters.

Wonder if this has happened to anyone else?

Node            Storage   Disk Current Total    -not paid-             8.19 TB
Node            Storage              ├ Blobs    -not paid-             8.04 TB
Node            Storage              └ Trash  ┐ -not paid-           155.31 GB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             7.14 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             7.29 TB
Satellite       Storage   Disk Last Report      -not paid-           902.11 GB
Satellite       Storage   Disk Average So Far   -not paid-           904.45 GB
Satellite       Storage   Disk Usage Month      $  1.49 / TBm (avg)   24.93 GBm                 $  0.04

Thanks
CC

brainstorm · July 2, 2024, 4:38pm

hold my beer

July 2024 (Version: 14.0.0)                                             [snapshot: 2024-07-02 16:35:18Z]
REPORTED BY     TYPE      METRIC                PRICE                     DISK  BANDWIDTH        PAYOUT
Node            Ingress   Upload                -not paid-                        3.74 TB
Node            Ingress   Upload Repair         -not paid-                        7.42 GB
Node            Egress    Download              $  2.00 / TB (avg)               40.32 GB       $  0.08
Node            Egress    Download Repair       $  2.00 / TB (avg)              841.90 MB       $  0.00
Node            Egress    Download Audit        $  2.00 / TB (avg)              623.36 KB       $  0.00
Node            Storage   Disk Current Total    -not paid-            15.79 TB
Node            Storage              ^\ Blobs    -not paid-            15.71 TB
Node            Storage              ^T Trash  ^P -not paid-            79.83 GB
Node+Sat. Calc. Storage   Uncollected Garbage $ -not paid-            13.71 TB
Node+Sat. Calc. Storage   Total Unpaid Data <^@^X -not paid-            13.79 TB
Satellite       Storage   Disk Last Report      -not paid-             1.99 TB
Satellite       Storage   Disk Average So Far   -not paid-             1.99 TB
Satellite       Storage   Disk Usage Month      $  1.49 / TBm (avg)   53.02 GBm                 $  0.08
________________________________________________________________________________________________________+
Total                                                                 53.02 GBm   3.78 TB       $  0.16
Estimated total by end of month                                        1.99 TBm  69.37 TB       $  4.58

13.7TB uncollected garbage out of 15.7TB

Vadim · July 2, 2024, 4:59pm

I think some sattelite just not report back data. General Stats have same problem

Storj Network Statistics - Grafana (storjstats.info)

Climbingkid · July 2, 2024, 5:05pm

@brainstorm - ok you win!

This could be a new sport - anyone better BrainStorm’s 13.7TB?

CC

pangolin · July 2, 2024, 5:31pm

brainstorm · July 2, 2024, 6:19pm

yes, it’s ridiculous. It’s because there is no market and no accountability.

zero incentive for storj to fix this (they are the sole client of SNOs and dictate prices)
zero incentive for SNO’s to expand capacity (their capacity is wasted and accounting is whatever the satellites say)

this needs an architectural redesign and solution

brainstorm · July 2, 2024, 7:44pm

update

uncollected garbage went from 13.7TB to 1.58TB
disk average went from 1.5TB (dropped to that on July 1st, from 14TB), to again 13.91TB

quite the rollercoaster.
It’s absurd this accounting is not accurate at all times. We know how many bytes are flowing.

jammerdan · July 3, 2024, 6:42am

It seems that some missing data from the satellites have flown in. So that result would be what could be hoped for.

brainstorm · July 3, 2024, 10:10am

heh, now I had to reboot because of some hardware maintenance, and as it started all the databases were gone. empty database directory !
ok so I was told it’s just informational anyway, so no worries. And now it’s happily downloading again as it thinks there is 16TB free. Even though the filesystem is completely full. hilarious. what a mess.

Alexey · July 4, 2024, 6:37am

Perhaps result of the maintenance?
Did you also lost data?

Databases cannot disappear without either you or hardware or the OS intervention. I would suggest to check the system journals to figure out what’s the reason, it’s not normal.

brainstorm · July 4, 2024, 10:09am

I think some mess up with datasets
I had a backup of all the databases from a few days ago, I put that in and after a few complaints in the log, it figured it all out. Seems fine now.

Climbingkid · July 5, 2024, 9:39am

Another week - no ingress, 2.5TB of uncollected garbage or 30% on the node wasted by StorJ. From teh logs I have still had no bloom filters,

@Alexey - what is the lastest please on the wasted node storage and lack of blook filters - I dont think there have been any in weeks

Thanks
CC

Mark · July 5, 2024, 8:27pm

@Climbingkid Just curious, can you post a screen shot of the contents of your trash folder? I believe some people had their folder structure corrupted when their nodes upgraded and then unintentionally downgraded. If that has happened you might have to manually fix it before the trash starts to empty again. Just a theory but it might be worth checking.

Alexey · July 6, 2024, 10:25am

How much free space does your node actually have (can it access)?

Dunc4n1d4h0 · July 7, 2024, 1:49am

2.33TB on left, 4.85TB on right side on “used space” on 1st node
1.77 vs 4.12 on 2nd node.

So I’m just hosting unpaid garbage mostly? I understand this correctly?

edit:
and yes, like 2 months ago left and right sides were equal afair.

Alexey · July 7, 2024, 7:56am

Please use

df --si -T

for Linux/Mac or

Get-PSDrive

for PowerShell.

Climbingkid · July 7, 2024, 9:03am

@Mark and @Alexey

Allocated filesystem space in ZFS is 8.2TB - with ZFS reporting 7.9TB used.

root@H:~# ls -l  /mnt/Pool/StorJ-Node/Config/storage/trash
total 2
drwx------ 2 apps apps 2 Jun 22 12:09 pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa
drwx------ 3 apps apps 3 Jun 29 08:50 qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa
drwx------ 3 apps apps 3 Jul  5 22:38 ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa
drwx------ 4 apps apps 4 Jul  6 11:23 v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa

du -shc  /mnt/BogardePool/StorJ-Node/Config/storage/trash
123G    /mnt/BogardePool/StorJ-Node/Config/storage/trash
123G    total

Nope, its not trash, all that was collected recently according to the date, and usage is in line with GUI.

Issue is still uncolelcted garbage, the disconnect between used diskspace and paid for diskspace, best represented by @BrightSilence script

July 2024 (Version: 14.0.0)                                             [snapshot: 2024-07-07 09:01:33Z]
REPORTED BY     TYPE      METRIC                PRICE                     DISK  BANDWIDTH        PAYOUT
Node            Ingress   Upload                -not paid-                      352.82 GB
Node            Ingress   Upload Repair         -not paid-                        4.13 GB
Node            Egress    Download              $  2.00 / TB (avg)               37.84 GB       $  0.08
Node            Egress    Download Repair       $  2.00 / TB (avg)                5.16 GB       $  0.01
Node            Egress    Download Audit        $  2.00 / TB (avg)                1.39 MB       $  0.00
Node            Storage   Disk Current Total    -not paid-             8.20 TB
Node            Storage              ├ Blobs    -not paid-             8.07 TB
Node            Storage              └ Trash  ┐ -not paid-           122.96 GB
Node+Sat. Calc. Storage   Uncollected Garbage ┤ -not paid-             2.49 TB
Node+Sat. Calc. Storage   Total Unpaid Data <─┘ -not paid-             2.61 TB
Satellite       Storage   Disk Last Report      -not paid-             5.58 TB
Satellite       Storage   Disk Average So Far   -not paid-             5.64 TB
Satellite       Storage   Disk Usage Month      $  1.49 / TBm (avg)    1.04 TBm                 $  1.56
________________________________________________________________________________________________________+
Total                                                                  1.04 TBm 399.96 GB       $  1.64
Estimated total by end of month                                        5.64 TBm   1.94 TB       $  9.10

@Alexey - what is the progress please? Is the current thinking still the lack of bloom filters? If so when ?

Thanks
CC

Climbingkid · July 7, 2024, 9:09am

@Dunc4n1d4h0

It sure does look that way for alot of SNOs - and its often sizable, and many are only just realising just how big it is. Seems to be a lack of willingness to resolve by StorJ as it really does not impact them.

Thanks
CC

BrightSilence · July 7, 2024, 9:56am

This is frankly getting to ridiculous proportions. @elek , we had a great conversation around bloom filters early May and things seemed to really be on the right track. But now…

And it seems Saltlake hasn’t sent out bloom filters in literally over a month.

2024-06-06T18:50:52Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-01T17:59:59Z", "Filter Size": 5397472, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-06-07T03:56:33Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 1028461, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 26253675, "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Duration": "9h5m41.135133831s", "Retain Status": "enabled"}
2024-06-12T21:30:48Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-09T17:59:59Z", "Filter Size": 460606, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-12T22:12:34Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 66212, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 848448, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "41m46.358769283s", "Retain Status": "enabled"}
2024-06-13T11:27:18Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-09T17:59:59Z", "Filter Size": 2411035, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-13T12:34:56Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 176833, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4275794, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h7m38.160572627s", "Retain Status": "enabled"}
2024-06-13T13:15:09Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-09T17:59:59Z", "Filter Size": 460606, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-13T13:27:19Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 6783, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 794783, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "12m9.928919121s", "Retain Status": "enabled"}
2024-06-14T13:33:43Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-10T17:59:59Z", "Filter Size": 462200, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-14T13:52:06Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 2995, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 794078, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "18m22.156305318s", "Retain Status": "enabled"}
2024-06-19T16:39:32Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-13T17:59:59Z", "Filter Size": 16699654, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-06-19T22:21:43Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 2910495, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 32856337, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "5h42m11.749953998s", "Retain Status": "enabled"}
2024-06-20T01:26:55Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-15T17:59:59Z", "Filter Size": 2481750, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-20T02:20:25Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 145251, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4342359, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "53m30.467482496s", "Retain Status": "enabled"}
2024-06-21T23:07:58Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-17T17:59:59Z", "Filter Size": 2475603, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-22T01:45:43Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 60245, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4207524, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "2h37m45.09634223s", "Retain Status": "enabled"}
2024-06-22T06:46:49Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-18T17:59:59Z", "Filter Size": 2475603, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-22T08:41:47Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 19681, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4147278, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h54m58.414141336s", "Retain Status": "enabled"}
2024-06-25T10:42:39Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-19T17:59:59Z", "Filter Size": 16862250, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-06-25T20:13:42Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-21T17:59:59Z", "Filter Size": 2452566, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-26T04:09:55Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 47304, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4158698, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "7h56m13.208577815s", "Retain Status": "enabled"}
2024-06-28T22:46:11Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-25T17:59:59Z", "Filter Size": 462226, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-28T23:15:09Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 38467, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 816432, "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Duration": "28m58.86716795s", "Retain Status": "enabled"}
2024-06-29T11:58:24Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-25T17:59:59Z", "Filter Size": 2459116, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-29T13:20:39Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 28516, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4124317, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "1h22m15.486125133s", "Retain Status": "enabled"}
2024-07-05T23:32:20Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-06-29T17:59:59Z", "Filter Size": 16222864, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}          2024-07-06T03:51:38Z    INFO    retain  Prepared to run a Retain request.       {"Process": "storagenode", "cachePath": "config/retain", "Created Before": "2024-07-02T17:59:59Z", "Filter Size": 1878640, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}           2024-07-06T06:38:03Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 882895, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 4095795, "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Duration": "2h46m25.295524371s", "Retain Status": "enabled"}
2024-07-06T17:38:07Z    INFO    retain  Moved pieces to trash during retain     {"Process": "storagenode", "cachePath": "config/retain", "Deleted pieces": 2082729, "Failed to delete": 0, "Pieces failed to read": 0, "Pieces count": 29393536, "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Duration": "18h5m47.2562348s", "Retain Status": "enabled"}

This despite knowing that due to a testing misconfiguration with random file names, many files were overwritten, causing deletes. In addition to that, I think it’s fairly well known that failed/canceled uploads often end up placing the file on the node anyway and rely on GC to clean that up. This is probably why we see such wildly different percentages of uncollected garbage across nodes. This simply isn’t acceptable to stop bloom filters during this testing. Storj claims it needs people to add more disk space, but it’s wasting the disk space already available. Please take action on this asap and start sending bloom filters for Saltlake again.

As many know, I don’t usually use a harsh tone in my posts, but in my opinion, this is uncharacteristically bad and needs to be resolved. There has also been 0 communication around bloom filters being paused. (I may have missed it to be fair, but I didn’t see any) And to ask people to expand when many might not know half or even more of their stored data is garbage, encourages people to make bad decisions and possibly end up buying expensive HDD’s only to figure out later that they didn’t need them at all after this issue has been resolved. It’s a really bad look. (And please don’t give me the “we don’t recommend buying hardware” response. We all know SNOs buy HDD’s, so it doesn’t matter what the recommendation is. You have a responsibility to not misinform them.)

@Knowledge @Bryanm , this next part is relevant for you, as you’ve both shown interest in gauging SNOs willingness to expand.
In conclusion, I will not expand further until this situation is resolved. I have no way to gauge how much storage I actually need now, despite the testing. I have one last migration running that will add some free space to existing nodes, but after that, I’m done until this is fixed. I’m not going to risk ending up with empty HDD’s after this is fixed.

Pinging @littleskunk as well as this possibly significantly impacts calculations for the testing and might go some way to explaining why it looks like nodes get much more data than would be expected on average. In addition currently free space shown by nodes, significantly underestimates their actual potential.

jammerdan · July 7, 2024, 10:34am

Welcome! Have you now arrived where I have been for a long time already?
I am complaining for a long time now that it is worst of the worst when the numbers are wrong.
We are having this for various reasons for 1.5 years now. But it is always “low priority”.
This is unacceptable.

Yes, only possible conclusion.

Maybe they will listen to you. There is still hope…