Disk usage discrepancy?

Alexey · July 9, 2024, 6:56am

Alexey · July 9, 2024, 6:57am

However, I believe some filewalkers (the trash one for example and perhaps TTL collector are still working with a normal priority).

That’s usually mean that the collected stat was not written to the databases. So you need to figure out why. I believe you have errors, related to the databases (malformed, “database is locked”, “file is not a database”, etc.).
If you also have errors, related to a used-space-filewalker, then the databases will not be updated too. As a result - “reset” stat on the dashboard (to what is it stored in the databases).

blanaru · July 9, 2024, 7:16am

And may I please ask how can I reset database?

pcresumen · July 9, 2024, 7:30am

Good morning,

Today a very strange thing happened to me with the maximum capacity of one of my nodes. After having finished the lazyfilewalker (10 days later) and having updated the databases, I found that in the dashboard, the total disk space is higher than configured in the config.yml (15.00TB) and the strangest thing is even above it is greater than the maximum capacity of the hard drive (~16TB)

I understand that this is a bug, and if I restart the node and run the lazyfilewalker again it should be resolved. But why has the total available disk changed if I have it assigned to 15TB?

Pascal51882 · July 9, 2024, 8:29am

I was moved to this topic and now?
Here are so many different problems. Which fits to my case exactly?

Alexey · July 10, 2024, 4:54am

You do not need to. Just enable the scan on startup, if you disabled it (it’s enabled by default) and restart the node. Then make sure that you do not have errors related to a databases (error and database) and filewalkers (error and used-space).

Alexey · July 10, 2024, 4:56am

You likely checked the multinode dashboard, am I right?
If so, see

Alexey · July 10, 2024, 5:02am

Exactly the linked one in my reply.
I would copy the content

You may check errors related to databases and/or filewalkers in your logs

docker logs storagenode 2>&1 | grep -i error | grep -E "database|used-space" | tail

The suggestion depends on what error do you have. And also I believe that filewalker works noticeable slower on xfs, at least seems so: Topics tagged xfs.

pcresumen · July 10, 2024, 6:51am

Hello Alexey, I don’t use the multinode dashboard, I always consult single dashboard. It has been a very strange thing. I’m going to restart the node and let it update the databases again. I will inform you when it has been updated

Alexey · July 10, 2024, 7:00am

Then it’s weird. We reverted the change for such kind of behavior for the SND. But the related code is still provides this information of what’s a node currently decide to use as an allocated via API (it chooses the minimum from the “allocated”, “used + free (in the allocated)” and “used + free (on the disk)” to do not use more space, than it should).
However, it takes the “used” from the databases (because there is no way to request a used in the allocated space from the OS unless you provided the whole volume to the node, but the node is not aware of that fact in any way).

Pascal51882 · July 10, 2024, 8:00am

I see database locked and filewalker errors. I have also seen that my uptime is at 92-94% although the node is actually active. There some random restarts after some hours/days.

2024-07-10T06:29:35+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "context canceled"}
2024-07-10T06:29:39+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "context canceled"}
2024-07-10T06:29:41+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "error": "context canceled"}
2024-07-10T06:30:36+02:00       ERROR   filewalker      failed to get progress from database    {"Process": "storagenode"}
2024-07-10T06:30:38+02:00       ERROR   filewalker      failed to get progress from database    {"Process": "storagenode"}
2024-07-10T06:46:57+02:00       ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "VWJMW7YIY3IJY76WIJL7OH35NSXLDZREN2JOIBLR7GI62MUEEXQA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "79.127.201.209:56894", "Size": 197376, "error": "pieceexpirationdb: database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:115\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:541\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-10T08:43:26+02:00       ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "IEU4ZUSOP7MLO5PY4U2IQSWHCZTJDPJ3N6GNRJMV4PNTTG43SPPA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "109.61.92.75:42512", "Size": 197376, "error": "pieceexpirationdb: database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:115\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:541\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-10T09:32:12+02:00       ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "action": "GET_REPAIR", "amount": 375448832, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:254\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-10T09:32:12+02:00       ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 146006272, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:254\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-10T09:32:12+02:00       ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 23521792, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:254\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}```

Alexey · July 11, 2024, 3:28am

So, you need to try to optimize your disk subsystem. I do not know much about Unraid, but if it’s possible to add a cache before the slow disk, it should improve things.
Or you can at least move the databases to another, less loaded disk, this should help with the databases issues and likely also should help to reduce an IO load on the disk, and it’s possible that the filewalker would now have a chance to finish its scans.
If the moving databases would not help with a filewalker, then you have only option to disable a lazy mode to allow it work with a normal IO priority.

You need to search for FATAL and/or Unrecoverable errors in your logs.

nico1234 · July 11, 2024, 4:07pm

hello

From my Truenas Scale version: Dragonfish-24.04.1.1

there are a mismatch between the used space is this issue a well know issue.

Storj App Version:
v1.68.2

nico1234 · July 11, 2024, 4:10pm

for app installation i used the new function ixVolume see the screenshot too

donald.m.motsinger · July 11, 2024, 4:50pm

Pascal51882 · July 11, 2024, 7:17pm

Thank you I will give this a try
I think the system is just to slow for 2 nodes with the new workload of storj.
It only has 8GB RAMs and i3 6 Gen.

How would I disable a lazy mode exactly?

Alexey · July 12, 2024, 6:57am

The reasons are the same, as explained in this post:

Alexey · July 13, 2024, 8:17am

A post was merged into an existing topic: Avg disk space used dropped with 60-70%

support25 · July 13, 2024, 10:16am

I’ve had to kind of reset my node (I was not able to get to the dashboard) so I ended up deleting my .db files and when the issue was corrected - it recreated the database files.

After being able to reconnect - the used space was reset to 0 and has been slowly increasing (it now sits at 87.54GB).

Will the used space get back to where it was before the issue?

nerdatwork · July 13, 2024, 10:50am

Yes.

Also please consider 2 things.

You can search the forum for similar issues so you can get faster solution to your issues.
Please don’t create another account just because your new account got limited. Its meant to be limited to avoid bots from creating multiple accounts and spamming the forum.

Ref:

I suspect you created accounts support22, support24 ~~and~~ support25 and support26