Disk usage discrepancy?

Hello :slight_smile:
my node has become very full thanks to the changes made by Storj. That’s great! Unfortunately, I now have a very large difference in the dashboard and the physical hard drive.
Dashboard 5.4TB used in total. Physical hard drive without other content 11.5TB. The drive is XFS formatted and has 12TB. I never had such problems on other systems.
Another node with ext4 is very close to the dashboard value.
The ext4 node has a newer version on the same system.

What could be the reason?

System:
Unraid 6.12.10
i3 CPU
8GB RAM
WD 12TB XFS v1.105.4
WD 10TB ext4 v1.107.3

Qq here: how lazy is the lazy filewalker? I mean for a node restart, I believe it is starting all over again, right?
As for lets say 8tb data on disk, how many days without interruption it takes to ckean up trash and discrepancies?
After almost a month uptime, my nide shows correct my 13tb data, but after a restart it shows only 8tb and was curious why is it acting like this.

However, I believe some filewalkers (the trash one for example and perhaps TTL collector are still working with a normal priority).

That’s usually mean that the collected stat was not written to the databases. So you need to figure out why. I believe you have errors, related to the databases (malformed, “database is locked”, “file is not a database”, etc.).
If you also have errors, related to a used-space-filewalker, then the databases will not be updated too. As a result - “reset” stat on the dashboard (to what is it stored in the databases).

1 Like

And may I please ask how can I reset database?

Good morning,

Today a very strange thing happened to me with the maximum capacity of one of my nodes. After having finished the lazyfilewalker (10 days later) and having updated the databases, I found that in the dashboard, the total disk space is higher than configured in the config.yml (15.00TB) and the strangest thing is even above it is greater than the maximum capacity of the hard drive (~16TB)

image
image
image

I understand that this is a bug, and if I restart the node and run the lazyfilewalker again it should be resolved. But why has the total available disk changed if I have it assigned to 15TB?

I was moved to this topic and now? :smiley:
Here are so many different problems. Which fits to my case exactly?

You do not need to. Just enable the scan on startup, if you disabled it (it’s enabled by default) and restart the node. Then make sure that you do not have errors related to a databases (error and database) and filewalkers (error and used-space).

1 Like

You likely checked the multinode dashboard, am I right?
If so, see

Exactly the linked one in my reply.
I would copy the content

You may check errors related to databases and/or filewalkers in your logs

docker logs storagenode 2>&1 | grep -i error | grep -E "database|used-space" | tail

The suggestion depends on what error do you have. And also I believe that filewalker works noticeable slower on xfs, at least seems so: Topics tagged xfs.

1 Like

Hello Alexey, I don’t use the multinode dashboard, I always consult single dashboard. It has been a very strange thing. I’m going to restart the node and let it update the databases again. I will inform you when it has been updated

Then it’s weird. We reverted the change for such kind of behavior for the SND. But the related code is still provides this information of what’s a node currently decide to use as an allocated via API (it chooses the minimum from the “allocated”, “used + free (in the allocated)” and “used + free (on the disk)” to do not use more space, than it should).
However, it takes the “used” from the databases (because there is no way to request a used in the allocated space from the OS unless you provided the whole volume to the node, but the node is not aware of that fact in any way).

I see database locked and filewalker errors. I have also seen that my uptime is at 92-94% although the node is actually active. There some random restarts after some hours/days.

2024-07-10T06:29:35+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "context canceled"}
2024-07-10T06:29:39+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "context canceled"}
2024-07-10T06:29:41+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"Process": "storagenode", "satelliteID": "12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo", "error": "context canceled"}
2024-07-10T06:30:36+02:00       ERROR   filewalker      failed to get progress from database    {"Process": "storagenode"}
2024-07-10T06:30:38+02:00       ERROR   filewalker      failed to get progress from database    {"Process": "storagenode"}
2024-07-10T06:46:57+02:00       ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "VWJMW7YIY3IJY76WIJL7OH35NSXLDZREN2JOIBLR7GI62MUEEXQA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "79.127.201.209:56894", "Size": 197376, "error": "pieceexpirationdb: database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:115\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:541\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-10T08:43:26+02:00       ERROR   piecestore      upload failed   {"Process": "storagenode", "Piece ID": "IEU4ZUSOP7MLO5PY4U2IQSWHCZTJDPJ3N6GNRJMV4PNTTG43SPPA", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "109.61.92.75:42512", "Size": 197376, "error": "pieceexpirationdb: database is locked", "errorVerbose": "pieceexpirationdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).SetExpiration:115\n\tstorj.io/storj/storagenode/pieces.(*Store).SetExpiration:587\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload.func6:483\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Upload:541\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:294\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:61\n\tstorj.io/common/experiment.(*Handler).HandleRPC:42\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:167\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:109\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:157\n\tstorj.io/drpc/drpcctx.(*Tracker).track:35"}
2024-07-10T09:32:12+02:00       ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "action": "GET_REPAIR", "amount": 375448832, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:254\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-10T09:32:12+02:00       ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 146006272, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:254\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-10T09:32:12+02:00       ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 23521792, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:254\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}```

So, you need to try to optimize your disk subsystem. I do not know much about Unraid, but if it’s possible to add a cache before the slow disk, it should improve things.
Or you can at least move the databases to another, less loaded disk, this should help with the databases issues and likely also should help to reduce an IO load on the disk, and it’s possible that the filewalker would now have a chance to finish its scans.
If the moving databases would not help with a filewalker, then you have only option to disable a lazy mode to allow it work with a normal IO priority.

You need to search for FATAL and/or Unrecoverable errors in your logs.

1 Like

hello

From my Truenas Scale version: Dragonfish-24.04.1.1

there are a mismatch between the used space is this issue a well know issue.

Storj App Version:
v1.68.2


for app installation i used the new function ixVolume see the screenshot too

1 Like

what do you get for
zpool list -v

Thank you I will give this a try :slight_smile:
I think the system is just to slow for 2 nodes with the new workload of storj.
It only has 8GB RAMs and i3 6 Gen.

How would I disable a lazy mode exactly?

The reasons are the same, as explained in this post: