Used space gets reset after restart

Alexey · November 28, 2024, 3:55am

Yes, I would expect that the used-space-filewalker should update these databases with the actual used space, so discrepancy between the OS usage and the node’s usage should become negligible.

EasyRhino · November 28, 2024, 6:34am

hmm mine is still strange. the used filewalkers completed, and a garbage retention run completed (none running now) and my local dashoard still shows 1.5TB used, while the sum of satellites are reporting 1.09TB used.

(my overall disk space used being higher at 1.8TB was because of some unrelated files I had on the filesystem, so please disregard that)

Alexey · November 28, 2024, 6:47am

The difference between an Average Disk Space Used This Month and the piechart has been discussed a lot in this thread:

Nothing has changed since that. And it’s unrelated to the current thread with the potential bug in a used-space-filewalker.

edortaprz · November 28, 2024, 9:06am

Blockquote Yes, I would expect that the used-space-filewalker should update these databases with the actual used space, so discrepancy between the OS usage and the node’s usage should become negligible.

I have done this in a small node I have and it fix the issue.

blares-serious0g · November 29, 2024, 9:03pm

I’m also having problems with the used space reseting to ~9.2GB after restarting the node. I’m fairly new here, my node had ~140GB of used space.

My node runs on a docker container in TrueNAS. Log level is set to debug and the only errors I see so far are sporadic EOF errors like the one below… (which I guess are normal?)

2024-11-29T20:43:38Z	ERROR	piecestore	upload failed	{"Process": "storagenode", "Piece ID": "CHROGAIUAYCN33V5HVDDMVYIS5TZ5PGVZLHDQEBK4XHUV3VHUKIA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "xxx.xxx.xxx.xxx:48836", "Size": 65536, "error": "unexpected EOF", "errorVerbose": "unexpected EOF\n\tstorj.io/common/rpc/rpcstatus.Error:98\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:584\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:62\n\tstorj.io/common/experiment.(*Handler).HandleRPC:43\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:166\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:108\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:156\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}

Reading the comments, people with database issues should have errors in the logs, right?
Is this just a GUI error and when files that are actually stored in the node are pulled from it that’s actually going to fix the discrepancy?

EDIT: This actually fixed it for me. Idk why.

Alexey · November 30, 2024, 6:35am

Hello @blares-serious0g,
Welcome to the forum!

Yes, this a usual long tail cancelation - your node was slower than competitors. Your node cannot be close to everyone customer in the world, so it’s normal.

Not always. You may use this article to check all databases:

However, in this case we suspect a bug in a used-space-filewalker, which doesn’t calculate the usage after restart for some reason and in some cases (not all nodes are affected).
The mentioned workaround fix this behavior, but it’s unknown for how long.

because it forces the node to re-create this database and as a result the used-space-filewalker has forced to recalculate the actual usage and do not use this cache.

foegra · December 1, 2024, 6:45pm

Well - it helps only until the next reboot.

blares-serious0g · December 2, 2024, 6:44am

Yeah, next reboot broke everything again lol.

All ok with the databases.

foegra · December 2, 2024, 8:10am

seems the only solution now is to wait for the fix. I don’t remember having this issue with 1.115.5 version.

Alexey · December 3, 2024, 3:31am

I think it was introduced in 1.115.5. At least I was able to reproduce it on that version in storj-up.
Would also test it on the latest main.

foegra · December 3, 2024, 11:34am

So I guess the only thing for us now - just wait?

surfercool · December 3, 2024, 8:19pm

same… after restart used space is again at 200 GB after the workaround it over 1 TB. after next restart 200… growing with the ingress.
SSD read cache speeds up the scan.
In the earnings script the report satelitte+node has negativ values.

pieces.enable-lazy-filewalker: false
storage2.piece-scan-on-startup: true

Alexey · December 4, 2024, 2:03am

Right now only this workaround may help until the next restart:

A bug is reported here:

Vilonauzd · December 7, 2024, 9:43pm

i had the same issue, signed up to forums to look for solution - this resolved it. Thank you @Alexey !

foegra · December 8, 2024, 11:25am

Well, wait until next reboot…

Juberstine · December 8, 2024, 9:16pm

Is there any update on when this will be resolved in the upstream repo? Anyone have an entrypoint override script to automate fixing it? The directions are still a little unclear to me.

Alexey · December 10, 2024, 6:28am

You may subscribe to the GitHub issue.

ermejoromano · December 10, 2024, 1:00pm

I tried this procedure and followed it exactly. I let all used-space-filewalkers finish without issues. It has solved the used disk space problem but it also added some trash amount that was not there before. I checked and all the trash-filewalker have completed without errors.

Alexey · December 11, 2024, 4:12am

Yes, the workaround should fix the usage displaying. However, it may get a discrepancy again after the restart, since the bug is not fixed yet. It may happen to not affect every nodes though.

surfercool · December 20, 2024, 11:05am

After automatic node Update the Used Space is reseted again.
Filewalker ist finished in 20-78ms.

Is there a plan to fix this in future versions
v1.117.8