Huge drop of Average disk space used

Hello,

Do you know why I had a huge drop of disk space used this month ?

image

Total disk space used has not changed though

Thank you

See the different satellites, and you will see it’s due to the stop of Europe-North test satellite.

See also: Announcement: Storj to shut down europe-north-1 and us2

3 Likes

The removed data should be collected by the garbage collector to the trash and removed 7 days later.
But the garbage collector is running once a week, so it could take up to two weeks (maybe more, since the bloom filter is covering only 90% of garbage for the one run) to remove this data from your disk.

3 Likes

Hello,

Its been almost a month now, and I still have almost 15tb of total used space, but only 10tb average space used, is that normal ?

Thank you

No. Likely either a filewalker didn’t finish a calculation and/or the garbage collector did not remove the unused pieces.
So, if you have errors in logs related to either of them - you need to fix this issue and restart the node (to force re-run a filewalker again), and keep your node online to receive a bloom filters from the satellites.

It’s also possible, that your node was offline when the test satellites got shutdown and your node still has their data.

What errors am I looking for ?

In 20 hours of running, I have :

WARN piecestore:monitor Disk space is less than requested. Allocated space is {“bytes”: 15969472489449}

330 lines of :
INFO collector deleted expired piece

26 lines of :
ERROR collector unable to delete piece

and:
ERROR orders cleaning DB archive {“error”: “ordersdb: database is locked”,

This warning meaning that the node could not recognize the used space, due to lack of information in the database, it should be updated during filewalker, which starts after the node restart.

This error:

You need to check the reason of inability to delete a piece, perhaps related to files permissions.

This error

means that your disk subsystem is slow, the one time occurrence should not be a problem, it will retry, but if you see a lot of such errors, it’s advisable to move databases to SSD: How to move DB’s to SSD on Docker

1 Like

What happen if node was down during satellites shutdown? Data will never be deleted? Do we have to manually do something?

1 Like

Nothing happen, I mean your node will skip this Garbage Collector cycle until the next one.
The bloom filter is configured to catch up to 90% of excess pieces, so it requires to pass several such cycles. Thus you need to keep your node online to receive a message with the bloom filter.

You need to keep your node online. Also, check logs for errors related to a filewalker (search for filewalk) and/or garbage collector (search for retain).

I move the DB to SSD, no longer have DB archive error. (for now)

I have this error about de filewalker :

ERROR pieces failed to lazywalk space used by satellite {“error”: “lazyfilewalker: exit status 1”, “errorVerbose”: “lazyfilewalker: exit status 1\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:83\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:105\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:57\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:44\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”}

Do you know what it means ?

It’s failed abruptly due some unknown error. Could be a FATAL error somewhere later in the logs or a previous error with an explanation what could went bad.
Please search for errors around this time.

I do not remember such an error. Do you have it from logs? Or do you mean this one:

if so, it’s a database is locked error. It could happen in any time if the disk is slow, not only during cleaning DB archive.

Yes I meant this one

I have those around that time :
INFO pieces:trash emptying trash started…
INFO lazyfilewalker.used-space-filewalker starting subprocess…
INFO lazyfilewalker.used-space-filewalker subprocess started…
INFO pieces:trash emptying trash started…
INFO lazyfilewalker.used-space-filewalker.subprocess Database started…
INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started…
INFO lazyfilewalker.used-space-filewalker subprocess exited with status…
ERROR pieces failed to lazywalk space used by satellite…
INFO Interrogate request received.
WARN console:service unable to get Satellite URL…

But no FATAL

Does it have at least one successful attempt?

I dont think there is :


How is your disk connected? Is it SMR?
It’s so slow (“context canceled” errors for every attempt), that even lazy wilewalker cannot finish its work.

Please try to disable lazy filewalker, it will turn on the normal (with a normal priority):

# run garbage collection and used-space calculation filewalkers
# as a separate subprocess with lower IO priority (default true)
pieces.enable-lazy-filewalker: false

save the config and restart the node. If you use docker, you may provide this flag after the image name: --pieces.enable-lazy-filewalker=false

I think I have a similar problem.

The disk usage in this month never went below 3,85TB:
image

but still the average disk space used is 3,65TB:

(don’t bother with the overusage for now, I changed the total from 4TB to 3TB 2 days ago when I restarted the node with storage2.piece-scan-on-startup: true to let Mr. Luke Filewalker run - it took 2 days for it to finish)

Also on the payout page the number differs from both of the above:
image

Which one is true then? Am I getting payed for this month for at least 3,85TB average used disk and only the local data is flawed, or what is going on?

1 Like

The graph. The payout information for the current month uses a local stat from your databases (not was reported by your node to the satellite).
It will be updated after payout to use what’s is signed and sent to the satellites.

I think it is SMR yes. Connected by SATA 2

I disabled lazy filewalker. I will wait and see

Thanks

TBm has a different unit for a reason. It stands for TB-month and works similar to kilowatt-hour. As in it counts TB stored for a whole month. So if you store 2TB and it’s exactly halfway through the month, it would show 1TBm.

1 Like

Exactly.
That is why I wrote on the last day of the month that the usage never went below 3,85TB for the WHOLE month.
Surely that would mean the average AND the kilowatt-hour should not be below 3,85, no?

(although if Alexey is right, the numbers will be updated on payment day. I’ll check and report back when that happens)