ERROR filewalker failed to get progress from database

thelastspark · June 15, 2024, 6:44pm

As the title suggests - is this an ERROR I should ignore or investigate further?

Alexey · June 16, 2024, 4:42am

Could you please post the whole error?
Please post it between two lines with three backticks, like this:

```
error line here
```

thelastspark · June 16, 2024, 3:34pm

2024-06-06T17:40:53-04:00	ERROR	filewalker	failed to get progress from database	{"error": "gc_filewalker_progress_db: context canceled", "errorVerbose": "gc_filewalker_progress_db: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*gcFilewalkerProgressDB).Get:47\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:154\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:565\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:369\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:258\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

thelastspark · June 16, 2024, 3:40pm

I’m also noticing that my filewalker isn’t running after this error, even after restarts

That command is just returning blank - filewalker just isn’t kicking off. So I double checked and my nodes are wildly inaccurate:

So it claims to be using 6TB but it’s closer to 10TB. I manually put a limit of 5TB to prevent my node from overfilling and letting filewalkers finish asap.

I have the exact same on a different 8TB node (same device).

It seems like the last filewalker to run successfully was May 30th,:

After this my log was rotated, hence there are no new lines, even despite restart.

EDIT:
I found the issue, it seems like if you disable lazy filewalker the above command just returns nothing/use file walker doesn’t start? I re-enabled lazy for both problematic drives and I can see in cat the filewalkers started

So this brings me back to the original post question if the failed to get progress from database error can be safely ignored. Sorry for the long post

Alexey · June 17, 2024, 6:54am

Yes, it’s a known issue:

Likely not. Did you check databases?

thelastspark · June 17, 2024, 12:32pm

I checked my logs, but there is no error messages for my DB - or do you mean to go through the steps anyways on my DB?

thelastspark · June 17, 2024, 12:33pm

Okay got it, I’ll let lazy finish to see if it fixes things (probably will) and the leave lazy off for future runs

Alexey · June 18, 2024, 7:13am

I would suggest to do the reverse. To fix the issue with a discrepancy I would suggest to disable a lazy mode, set allocation below the usage and enable scan on startup. When the stat on the dashboard would be updated (you shouldn’t see a 100% disk activity - unfortunately without log messages I wouldn’t able to suggest anything better), then enable lazy and disable scan (unless you also have a third-party usage on the same disk), set the correct allocation, save the config and restart the node.

thelastspark · June 18, 2024, 12:12pm

So after two days of it not finishing I dug in the logs and realised I missed errors. Everytime it tries to start I see the following:

2024-06-18T08:06:18-04:00 INFO lazyfilewalker.used-space-filewalker subprocess exited with status {"satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "status": 1, "error": "exit status 1"}
2024-06-18T08:06:18-04:00 ERROR pieces failed to lazywalk space used by satellite {"error": "lazyfilewalker: exit status 1", "errorVerbose": "lazyfilewalker: exit status 1\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2024-06-18T08:06:18-04:00 INFO lazyfilewalker.gc-filewalker subprocess exited with status {"satelliteID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "status": 1, "error": "exit status 1"}
2024-06-18T08:06:18-04:00 ERROR pieces lazyfilewalker failed {"error": "lazyfilewalker: exit status 1", "errorVerbose": "lazyfilewalker: exit status 1\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:85\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkSatellitePiecesToTrash:160\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:561\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:373\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:259\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-18T08:06:18-04:00 ERROR filewalker failed to get progress from database
2024-06-18T08:06:18-04:00 ERROR lazyfilewalker.used-space-filewalker failed to start subprocess {"satelliteID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "context canceled"}
2024-06-18T08:06:18-04:00 ERROR pieces failed to lazywalk space used by satellite {"error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2024-06-18T08:06:18-04:00 ERROR retain retain pieces failed {"cachePath": "C:\\Program Files\\Storj2\\Storage Node/retain", "error": "retain: filewalker: context canceled", "errorVerbose": "retain: filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:181\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:568\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:373\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:259\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-18T08:06:18-04:00 ERROR lazyfilewalker.used-space-filewalker failed to start subprocess {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2024-06-18T08:06:18-04:00 ERROR pieces failed to lazywalk space used by satellite {"error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-06-18T08:06:18-04:00 ERROR lazyfilewalker.used-space-filewalker failed to start subprocess {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "context canceled"}
2024-06-18T08:06:18-04:00 ERROR pieces failed to lazywalk space used by satellite {"error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:707\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-06-18T08:06:18-04:00 ERROR piecestore:cache error getting current used space: {"error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:716\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

It seems like something is broken and it can’t start it. I will check the DB’s in the meantime

thelastspark · June 18, 2024, 12:48pm

Not sure what I’m doing wrong:

And trying to run it manually I get:

P.S.

While doing this, I realised that none of my DB’s are actually on my SSD, and instead are on the same drive as the node data. So now I want to move them out… but can’t find a good windows guide on it? @Alexey

nerdatwork · June 18, 2024, 1:00pm

thelastspark · June 18, 2024, 1:07pm

Thanks for that, the DB’s got moved, everything is running smoothly. HOWEVER - my 2 big nodes still cannot start the filewalker (so that remains unchanged) (error pasted above).

Checking the DB’s individually all seem ok - but I didn’t do all of them as that’s crazy tedious

nerdatwork · June 18, 2024, 1:13pm

context canceled usually denotes slower disks. Can you confirm if the disks are not SMR ? You could check their model number.

thelastspark · June 18, 2024, 1:17pm

Maybe a smarter person can double-check for me please?
20TB: ST16000NM001G-2KK103 (I’m 99% sure it’s cmr)
8TB: ST8000VN004-2M2101 (Not 100% sure which one it is)
@nerdatwork

nerdatwork · June 18, 2024, 2:04pm

I am not sure about this as all Exos drives are CMR. You can check the list here

As per the above list Ironwolf series is CMR too

thelastspark · June 18, 2024, 2:56pm

where did you get the smr part from?

nerdatwork · June 18, 2024, 2:58pm

Google’s generative AI search

thelastspark · June 18, 2024, 3:26pm

Err I don’t trust AI for this stuff at all

Alexey · June 19, 2024, 4:46am

You duplicated the command in the one line. Try to paste it only once.

flwstern · June 19, 2024, 1:27pm

My filewalker is not even starting on some nodes running the following commands:

pieces.enable-lazy-filewalker: false
storage2.piece-scan-on-startup: true

I have over 50tb thats in “trash” but its however deleted but since lazyfilewalker is not able to clear the update trash value and the normal filewalker does not start im loosing alot of storage. What to do? And when will you fix these issues we have?