I have trawled the forums, and still a little unclear on what triggers garbage collection. With all the test data, I have 75% of my storage in limbo, waiting to be moved to trash for over a week.
I thought I read its every 7 days? Only its been more than that and its still mounting up - is there anything I can do to trigger it or check when it plans to run?
June 2024 (Version: 14.0.0) [snapshot: 2024-06-13 19:03:38Z]
REPORTED BY TYPE METRIC PRICE DISK BANDWIDTH PAYOUT
Node Ingress Upload -not paid- 10.97 TB
Node Ingress Upload Repair -not paid- 4.40 GB
Node Egress Download $ 2.00 / TB (avg) 33.48 GB $ 0.07
Node Egress Download Repair $ 2.00 / TB (avg) 2.72 GB $ 0.01
Node Egress Download Audit $ 2.00 / TB (avg) 210.18 KB $ 0.00
Node Storage Disk Current Total -not paid- 4.96 TB
Node Storage ├ Blobs -not paid- 4.95 TB
Node Storage └ Trash ┐ -not paid- 11.64 GB
Node+Sat. Calc. Storage Uncollected Garbage ┤ -not paid- 3.10 TB
Node+Sat. Calc. Storage Total Unpaid Data <─┘ -not paid- 3.11 TB
Satellite Storage Disk Last Report -not paid- 1.85 TB
Satellite Storage Disk Average So Far -not paid- 810.43 GB
Satellite Storage Disk Usage Month $ 1.49 / TBm (avg) 333.05 GBm $ 0.50
Thanks - based on the ingress, and known short lived data from SL its beleivable as I have been watching it increase.
If I understand this correctly then, I have 75% of my allocated storage taken up with data I am not getting paid for, and is not being cleared out by GC prompty. Surely this is going to cause issues - if this was a full drive in this state, and this was preventing me taking paying data this would be a bad situation.
I think if the plan is to move to short TTL data then GC has to be more frequent
This could be a case only when the missed data would be backfilled from the satellites side (US1 and SLC in particular), and when it happen you would still have a discrepancy between the physically used space on the disk (not on the dashboard) and a usage reported by satellites (Average Disk Used Space This Month).
The discrepancy between the used space on the dashboard (piechart) and the usage on the disk meaning that your databases are not updated with the correct usage. This can be fixed by enabling a scan on startup (it’s enabled by default) and restarting the node. You need to wait until used-space-filewalker will be successfully completed for all trusted satellites.
You also need to remove data of the untrusted satellites, if you still have it:
There is no such a plan. TTL was here since the alpha. You always will have data with TTL or without, they are not a mutual exclusive.
For example, if you deleted databases, the info about TTL will be lost, so this data then will be collected by GC.
@Alexey Thank you, but I am now more confused than before.
I guess I assumed this was uncollected trash as that is how its labelled in the earning calculator script that I posted above.
This is a new clean node - about 6 weeks old, so I do not beleive I should have anything from untrusted satelites, as I started after this change.
I have restarted the node and a grep for “filewalker” returned success
2024-06-14T06:55:33Z INFO lazyfilewalker.trash-cleanup-filewalker subprocess finished successfully
Still I have 3.3TB of uncollected garbage or 60% of my storage, according to the earnings calc script. Which matches what the dashboard says in terms of used space. Thing is this used space in the dash is misleading, as most of it appears to be unpaid. I worry I am quickly filling up the allocated storage, and in another week, disk will be full - and little will attract payment.
All help greatly appreciated here. Anything else I can try or look at?
My bandwidth has been maxed out for days - 12TB in - dashboard reports 5.21TB used, trash is a few GB. Available space is disappearing fast. Earning calc reports 3.28 in uncollected garbage.
No idea what the real story is here. How much storage has StorJ used here? Whats worrying is the payout information on the dashboard suggests 381GB disk usage for the month - which I know is an average, but the maths just doesnt add up.
June 2024 (Version: 14.0.0) [snapshot: 2024-06-14 14:07:03Z]
REPORTED BY TYPE METRIC PRICE DISK BANDWIDTH PAYOUT
Node Ingress Upload -not paid- 11.80 TB
Node Ingress Upload Repair -not paid- 7.05 GB
Node Egress Download $ 2.00 / TB (avg) 45.16 GB $ 0.09
Node Egress Download Repair $ 2.00 / TB (avg) 2.72 GB $ 0.01
Node Egress Download Audit $ 2.00 / TB (avg) 238.59 KB $ 0.00
Node Storage Disk Current Total -not paid- 5.24 TB
Node Storage ├ Blobs -not paid- 5.23 TB
Node Storage └ Trash ┐ -not paid- 11.64 GB
Node+Sat. Calc. Storage Uncollected Garbage ┤ -not paid- 3.27 TB
Node+Sat. Calc. Storage Total Unpaid Data <─┘ -not paid- 3.28 TB
Satellite Storage Disk Last Report -not paid- 1.95 TB
Satellite Storage Disk Average So Far -not paid- 869.25 GB
Satellite Storage Disk Usage Month $ 1.49 / TBm (avg) 381.94 GBm $ 0.57
________________________________________________________________________________________________________+
Total 381.94 GBm 11.85 TB $ 0.66
Estimated total by end of month 869.25 GBm 26.17 TB $ 1.51
So questions still stand im afraid - if this is indeed used and expired data when can I expect it to be freed up again?
What is the current storage metric that attracts payment?
a) Is it 5.21TB Used as per dashboard? (I know it probably isnt this if data has expired)
b) Is it 1.95TB as reported in teh earning calc?
c) Is it 381GB as reported in the payout dashboard?
or None of the above.
Some bugs like average disk used space this month would be fixed anyway (it’s a reporting issue on the satellites, US1 and SLC in particular, see Avg disk space used dropped with 60-70%), some - are filewalker related (but perhaps not a bugs, but errors, check your logs).
At the moment I can suggest only to use your df --si -T as a point of true. Or wait until all filewalkers will finish their job for each trusted satellite. There are 4 more, not only used-space-filewalker, did you know?
So I can now see that 3 of the 4 satalites completed, the 4th SLC, started and is still running - slowly… I can see see trash going up slowly also, supprised its not faster given the server specs but thats another thing to explore.
Why has it got so far out of step though to teh tune of 3.5TB in just a few days, surely we should not need to restart the node often just to recover space - should we? Is this not a process that should be running regularly to keep GC work managable so its not taking days?
Update - “used space filewalker” has now completed on all satelites and I have success messages from the logs.
I still have a huge disconnect between the dashboard and the earnings calculator. With the filewalker now finished, can I trust this dashboard? Earnings calc is still saying 2.5TB in uncollected garbage. Nothing aligns still. Where does the earnings calc get its info?
Node Storage Disk Current Total -not paid- 7.51 TB
Node Storage ├ Blobs -not paid- 4.99 TB
Node Storage └ Trash ┐ -not paid- 2.52 TB
Node+Sat. Calc. Storage Uncollected Garbage ┤ -not paid- 2.49 TB
Node+Sat. Calc. Storage Total Unpaid Data <─┘ -not paid- 5.00 TB
Satellite Storage Disk Last Report -not paid- 2.50 TB
Satellite Storage Disk Average So Far -not paid- 984.10 GB
Satellite Storage Disk Usage Month $ 1.49 / TBm (avg) 463.80 GBm $ 0.69
Maybe someone can also explain to me as a newbie how GC is supposed to have worked, as it still seems something is broken and the node needs constant attention which was never the plan.
My understanding from the forums was that the Salt Lake test data was all TTL data, which if thats the case why did it need the used space walker to move it to trash? Or was the Salt Lake data store and deletes, in which case there seems to be the potential to disable storage nodes with trash that does not get cleared or removed in a similar time frame to its creation.
I just trying to get clear in my mind how this is all supposed to work and if there is a problem here.
I guess because the lazy mode is enabled and your node accepting traffic from the customers. So there are three possible solutions:
Disable the lazy mode, all filewalkers would be run with a normal priority. This likely reduce a success rate for your node, so it will be less often offered to the customers, so it should be ok.
Wait until it finish
Set the allocation below the usage (this will stop ingress) and restart the node. This will give a higher priority for lazy filewalkers.
this is a separate issue, not related to filewalkers at all - it’s a reporting issue from the US1 and SLC satellites, see Avg disk space used dropped with 60-70%