Avg disk space used dropped with 60-70%

thelastspark · June 26, 2024, 10:38pm

Check if you have these logs:

tankmann · June 28, 2024, 9:01am

Hey, I’m following the discussions around test data, garbage collection, etc. I’ve just used the script from @BrightSilence which I haven’t for a while:

REPORTED BY	TYPE	  METRIC		PRICE			  DISK	BANDWIDTH	 PAYOUT
Node		Ingress	  Upload		-not paid-			 13.41 TB
Node		Ingress	  Upload Repair		-not paid-			122.10 GB
Node		Egress	  Download		$  2.00 / TB (avg)		390.62 GB	$  0.78
Node		Egress	  Download Repair	$  2.00 / TB (avg)		455.23 GB	$  0.91
Node		Egress	  Download Audit	$  2.00 / TB (avg)		 96.65 MB	$  0.00
Node		Storage	  Disk Current Total	-not paid-	      24.43 TB
Node		Storage	             ├ Blobs	-not paid-	      24.43 TB
Node		Storage	             └ Trash  ┐	-not paid-	       0.00  B
Node+Sat. Calc.	Storage	  Uncollected Garbage ┤	-not paid-	      11.45 TB
Node+Sat. Calc.	Storage	  Total Unpaid Data <─┘	-not paid-	      11.45 TB
Satellite	Storage	  Disk Last Report	-not paid-	      12.98 TB
Satellite	Storage	  Disk Average So Far	-not paid-	      11.62 TB
Satellite	Storage	  Disk Usage Month	$  1.49 / TBm (avg)   10.23 TBm			$ 15.25
________________________________________________________________________________________________________+
Total								      10.23 TBm	 14.38 TB	$ 16.94
Estimated total by end of month					      11.62 TBm	 15.76 TB	$ 19.17

Do I read this right that my node has 11 TB of uncollected garbage sitting around?
The dashboard shows this:

What to do then?

Maybe to add: It is a Synology DS1019+ with 5x 14TB running in SHR with two drive fault tolereance. Also with 1TB SSD cache.
The node updated to v1.105.4 a bit less than two days ago.

BrightSilence · June 28, 2024, 9:28am

Unfortunately, yes. Though keep in mind this is an estimate. It’s calculated by comparing the last reported storage usage by the satellite against your local node usage. Sometimes the last reported usage from the satellite is not entirely reliable, though in those cases it has always been lower than real usage, which would only lower the amount of uncollected garbage. This also relies on the file walkers on your node. If you have disabled the filewalker, it may use unreliable local data as well. That said, I have seen big amounts of uncollected garbage on my end as well and this seems to be an issue that has lasted for a while now. I guess I’m glad I added this metric to surface that information.

tankmann · June 28, 2024, 10:07am

Thanks. So since the last updated my Synology is also very very busy… whenever the filewalker is active / going through it I have 100% drive busy. This is one month:

I’ve read about how much it should delete etc. / but 11TB is a bit less than 50% of what I have allocated and it is just sitting there doing nothing. I have like 5 TB more space I could allocate but I don’t see a reason - it might slow down even more (everything is working fine and responding) but what I don’t understand: Why is not ‘faster’ to delete the garbage?

Alexey · June 29, 2024, 9:40am

Nothing. Until the team would backfill gaps, your Avg Used Space would be way off.
However, it doesn’t mean that you have such amount of garbage, because your node doesn’t have a full picture right now.
Please also note, it will be paid for the used space and the bandwidth, which your node submitted as a signed orders to the satellites. So, everything is cryptographically confirmed and it will be paid for the all used space and bandwidth.

Of course, if you have errors related to gc-filewalker or retain, you have a problem with the garbage. If you do not have any errors, then you do not have any problem with GC.

tankmann · June 30, 2024, 7:23pm

Thanks Alexey,
I’ve been part of storj since the beginning, so I am reading along but also don’t understand it all. Couple of questions:

So you’re saying I don’t have 50% garbage on my disks. How is it possible to find out how much is for real?
Or how could I check that nothing is really broken somehow?
I have more space available that I could allocate but because the load / time the filewalker and what not go through I wasn’t sure if I should just allocate more space. Also as it looks it juts gets stuffed and not deleted in time And yeah I read about TTL etc.
In my logfiles I see primarily updater stuff, which command to check gc-filewalker?

Thanks!

st99ab · July 1, 2024, 3:48am

It is not possible. Only the satellites would know but those don’t/can’t tell your node timely. It can take weeks until they do. I suspect this might be an architectural scalability issue. As the amount of data stored in the network grows the satellites struggle more and more to cope with it.

You can only check that nothing is broken on your node’s side. I would start with Filewalker status - #5 by Alexey

If all your nodes are full and ingress has stopped I would allocate more. Otherwise I doubt it will make any difference.

Yes, this is the biggest issue that SNOs keep complaining about in several threads.

nyancodex · July 9, 2024, 3:09pm

Guys, it happens again…

Alexey · July 10, 2024, 6:45am

And likely would happen again until this feature would be implemented:

However, SLC still might send a report later.

JoshieGarza · July 12, 2024, 7:36pm

having more used space for the whole month, the “average disk spaced used this month” is lower… is there any reason? Both v1.108.3

imagen
imagen

Alexey · July 13, 2024, 8:17am

Please check, does all satellites sent a report about usage to your node?

JoshieGarza · July 13, 2024, 11:12am

the SN was not sending orders
I added the storage2.orders.sender-interval variable to the config file, restarted and it sent them all (>5000 orders per satellite).

Thanksss

Alexey · July 13, 2024, 12:14pm

What was the previous value in your config file?

JoshieGarza · July 13, 2024, 4:46pm

there was no value in config file same config file since years ago
so i guess it used the default value.

Alexey · July 14, 2024, 9:01am

Then the restart itself fixed an issue I guess.

jammerdan · July 15, 2024, 6:28am

Average usage dropping since Saturday like crazy again.
Are we having a satellite issue again?

Alexey · July 15, 2024, 6:38am

You may select each satellite to see which one is missing.

jammerdan · July 15, 2024, 6:40am

I am seeing my overall average dropping almost every minute or so. This doesn’t seem normal.

agente · July 15, 2024, 7:05am

US… on many nodes.

jammerdan · July 15, 2024, 7:09am

Oh… again…