Huge drop of Average disk space used

Nokotopulele · July 23, 2023, 5:31pm

Hello,

Do you know why I had a huge drop of disk space used this month ?

Total disk space used has not changed though

Thank you

JWvdV · July 23, 2023, 6:12pm

See the different satellites, and you will see it’s due to the stop of Europe-North test satellite.

See also: Announcement: Storj to shut down europe-north-1 and us2

Alexey · July 24, 2023, 5:11am

The removed data should be collected by the garbage collector to the trash and removed 7 days later.
But the garbage collector is running once a week, so it could take up to two weeks (maybe more, since the bloom filter is covering only 90% of garbage for the one run) to remove this data from your disk.

Nokotopulele · August 20, 2023, 7:23pm

Hello,

Its been almost a month now, and I still have almost 15tb of total used space, but only 10tb average space used, is that normal ?

Thank you

Alexey · August 21, 2023, 4:28am

No. Likely either a filewalker didn’t finish a calculation and/or the garbage collector did not remove the unused pieces.
So, if you have errors in logs related to either of them - you need to fix this issue and restart the node (to force re-run a filewalker again), and keep your node online to receive a bloom filters from the satellites.

It’s also possible, that your node was offline when the test satellites got shutdown and your node still has their data.

Nokotopulele · August 23, 2023, 4:31pm

What errors am I looking for ?

In 20 hours of running, I have :

WARN piecestore:monitor Disk space is less than requested. Allocated space is {“bytes”: 15969472489449}

330 lines of :
INFO collector deleted expired piece

26 lines of :
ERROR collector unable to delete piece

and:
ERROR orders cleaning DB archive {“error”: “ordersdb: database is locked”,

Alexey · August 24, 2023, 3:44am

This warning meaning that the node could not recognize the used space, due to lack of information in the database, it should be updated during filewalker, which starts after the node restart.

This error:

You need to check the reason of inability to delete a piece, perhaps related to files permissions.

This error

means that your disk subsystem is slow, the one time occurrence should not be a problem, it will retry, but if you see a lot of such errors, it’s advisable to move databases to SSD: How to move DB’s to SSD on Docker

agente · August 24, 2023, 9:21am

What happen if node was down during satellites shutdown? Data will never be deleted? Do we have to manually do something?

Alexey · August 25, 2023, 3:18am

Nothing happen, I mean your node will skip this Garbage Collector cycle until the next one.
The bloom filter is configured to catch up to 90% of excess pieces, so it requires to pass several such cycles. Thus you need to keep your node online to receive a message with the bloom filter.

You need to keep your node online. Also, check logs for errors related to a filewalker (search for filewalk) and/or garbage collector (search for retain).

Nokotopulele · August 25, 2023, 7:06pm

I move the DB to SSD, no longer have DB archive error. (for now)

I have this error about de filewalker :

ERROR pieces failed to lazywalk space used by satellite {“error”: “lazyfilewalker: exit status 1”, “errorVerbose”: “lazyfilewalker: exit status 1\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:83\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}

Do you know what it means ?

Alexey · August 26, 2023, 2:12am

It’s failed abruptly due some unknown error. Could be a FATAL error somewhere later in the logs or a previous error with an explanation what could went bad.
Please search for errors around this time.

I do not remember such an error. Do you have it from logs? Or do you mean this one:

if so, it’s a database is locked error. It could happen in any time if the disk is slow, not only during cleaning DB archive.

Nokotopulele · August 28, 2023, 3:45pm

Yes I meant this one

I have those around that time :
INFO pieces:trash emptying trash started…
INFO lazyfilewalker.used-space-filewalker starting subprocess…
INFO lazyfilewalker.used-space-filewalker subprocess started…
INFO pieces:trash emptying trash started…
INFO lazyfilewalker.used-space-filewalker.subprocess Database started…
INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started…
INFO lazyfilewalker.used-space-filewalker subprocess exited with status…
ERROR pieces failed to lazywalk space used by satellite…
INFO Interrogate request received.
WARN console:service unable to get Satellite URL…

But no FATAL

Alexey · August 29, 2023, 3:13am

Does it have at least one successful attempt?

Nokotopulele · August 29, 2023, 4:30pm

I dont think there is :

Alexey · August 30, 2023, 2:52am

How is your disk connected? Is it SMR?
It’s so slow (“context canceled” errors for every attempt), that even lazy wilewalker cannot finish its work.

Please try to disable lazy filewalker, it will turn on the normal (with a normal priority):

# run garbage collection and used-space calculation filewalkers
# as a separate subprocess with lower IO priority (default true)
pieces.enable-lazy-filewalker: false

save the config and restart the node. If you use docker, you may provide this flag after the image name: --pieces.enable-lazy-filewalker=false

Alexey · August 31, 2023, 8:06am

The graph. The payout information for the current month uses a local stat from your databases (not was reported by your node to the satellite).
It will be updated after payout to use what’s is signed and sent to the satellites.

Nokotopulele · August 31, 2023, 4:14pm

I think it is SMR yes. Connected by SATA 2

I disabled lazy filewalker. I will wait and see

Thanks

BrightSilence · August 31, 2023, 6:17pm

TBm has a different unit for a reason. It stands for TB-month and works similar to kilowatt-hour. As in it counts TB stored for a whole month. So if you store 2TB and it’s exactly halfway through the month, it would show 1TBm.

BrightSilence · September 1, 2023, 7:46am

No it would not. It would start at 0 every month and slowly build up towards the total average amount of storage at the end of the month. Just like a 100w device would have used 0Kwh at the start of the hour, 50Kwh after 30 minutes and 100Kwh at the end of the hour. So that would only match average storage over the month at the last second of the month.

You can expect payout data to match closely to the Average Disk Space Used This Month graph though as both of those are based on satellite data.

BrightSilence · September 1, 2023, 7:49am

I think this is incorrect. I believe this is also based on satellite reporting. Just the Total Disk Space used graph is based on local node data. But the node only tracks the current amount of storage locally, not historic storage, which is what would be required to calculate the TBm.