Fatal Error on my Node / Timeout after 1 min

snorkel · June 26, 2024, 1:26pm

I don’t understand something. When in normal oparation of the storagenode, after filewalkers, or without filewalkers, does the entire filesystem and metadata is cached in RAM? Or just parts of it that reffers to files that are accessed?

ACarneiro · June 26, 2024, 1:32pm

My understanding is that the caching is done by the O/S, agnostically to which process it was that accessed the information.
So, I would assume (as it is the whole concept of “cache”) that it will keep in RAM the information which has been accessed at least once already.

My old brain may be playing tricks on me but I think I remember @arrogantrabbit suggesting running a fsck on node drives before launching file walkers in order to “force” the O/S to load all the metadata into RAM before starting the node software (but I may have misunderstood what he was saying).

snorkel · June 26, 2024, 2:42pm

Yes, but if you don’t run the filewalker at start, the node will only access some files, not all of them. My logic dictates that the cache will only keep the pointers to those files, but I don’t know how the filesystem works and how it’s read.
Maybe to find the pointer to a file, the OS must read the entire fs until it finds that, and this makes it to try to cache the entire fs and metadata all the time.
Also what happens when there is not enough ram? Does it cache it in swap file? This wouldn’t make sense, because reading from drive cache is like reading from fs itself.

Alexey · June 27, 2024, 7:14am

If you does not have a filewalker on start, then you may use the suggestion with fsck to fill the cache. However, I believe, that used-space-filewalker would be more useful, since it’s not only warm the cache, but also updates databases with a correct usage.

no, it will be discarded.

Adavan · June 27, 2024, 5:45pm

This is step n. 4 .

1 > allocate some free time (or better - get some free time )
2 > find best values for my rig
3 > buy UPS
4 > using them to permanent acceleration

But first point will be most problematic, because last 2 weeks I work 16-18h per day (regular work + extra work for extra money) and I begin feel burned out .

Anyway, when I will find something usable, I will share it with another SNOs as a starting point for them .

Regarding to “RAM cache”:
I found something like this. But it looks very dangerous, especially vm.dirty_expire_centisecs and vm.dirty_writeback_centisecs in the topic.
In another topic I found something similar, lower performance, but looks safer.

This is reason, why I would like to have more time and life energy to do my own investigation, what values will be best for me .

Alexey · June 28, 2024, 7:46am

Take your time, don’t burn yourself out at work(s)! This is very dangerous, and the future you will not be grateful to the present you. Also don’t forget about your family members, they most likely need you too. Not all life is just work, believe me!

NP152 · July 2, 2024, 3:26pm

i could fix the problems i had but i was not able to fix this
is this again related to Storage Speed …?
i have also moved the Database to the Local Drive, i changed the Space the Node should use from 4 to 3 TB and i have 740 GB free from 5,49 TB

2024-07-02T17:05:06+02:00       ERROR   pieces  used-space-filewalker failed    {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": true, "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:711\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "J5SC37AH3DNFGB5XC3EVDLBOXYFZ4SAAE5M4KNDMQ6MG3UVDSHFA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "-", "Size": 65536}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "Q4XUHYDS5D2JJFR2XMXGG4CH4FOGKFKEHPNQITA3ER7KDLNC77ZQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT_REPAIR", "Remote Address": "-:50675", "Size": 0}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "GEWN3PTQFTAF2ENIBNNXKEANXLCYXTZGTJNS475CQ65RJMQ35KTA", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT_REPAIR", "Remote Address": "-:53648", "Size": 0}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "CD4CBQDKOJQYPVNTYJQA7LK6UCHPSG3Q6Y6IVG7WOYDYDFHRMI6Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Remote Address": "-:46956", "Size": 65536}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "6QYI3YV2E3WCP2JO4C3VBTR3V4BEH6OSWYWIB757IQGII3ODWVEQ", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Action": "PUT", "Remote Address": "-:56568", "Size": 65536}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "2CW7L52O7L57AZDMM72LIHR2DGTRZJ2NNOHE4QIJZEKJPC6CN5TQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "Remote Address": "-:34746", "Size": 65536}
2024-07-02T17:05:06+02:00       INFO    piecestore      upload canceled {"Piece ID": "KVM65WS7MK4V6FAIYWVWQTR5LPMGVIHFUHAYBUME3GSVZ2Q2T2PQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT_REPAIR", "Remote Address": "-:39522", "Size": 0}
2024-07-02T17:05:06+02:00       ERROR   pieces  used-space-filewalker failed    {"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:06+02:00       INFO    pieces  used-space-filewalker started   {"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-07-02T17:05:06+02:00       INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-07-02T17:05:06+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2024-07-02T17:05:06+02:00       ERROR   pieces  used-space-filewalker failed    {"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": true, "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:711\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:06+02:00       ERROR   pieces  used-space-filewalker failed    {"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:06+02:00       INFO    pieces  used-space-filewalker started   {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-07-02T17:05:06+02:00       INFO    lazyfilewalker.used-space-filewalker    starting subprocess     {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2024-07-02T17:05:06+02:00       ERROR   lazyfilewalker.used-space-filewalker    failed to start subprocess      {"satelliteID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "context canceled"}
2024-07-02T17:05:06+02:00       ERROR   pieces  used-space-filewalker failed    {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": true, "error": "lazyfilewalker: context canceled", "errorVerbose": "lazyfilewalker: context canceled\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*process).run:73\n\tstorj.io/storj/storagenode/pieces/lazyfilewalker.(*Supervisor).WalkAndComputeSpaceUsedBySatellite:130\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:711\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:06+02:00       ERROR   pieces  used-space-filewalker failed    {"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:06+02:00       ERROR   piecestore:cache        error getting current used space:       {"error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:720\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:58\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:07+02:00       ERROR   failure during run      {"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-02T17:05:07+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

Julio · July 2, 2024, 11:06pm

You entirely sure you have also update the storage.path: variable in your config?

It’s the FATAL error that matters:
FATAL Unrecoverable error {“error”: “piecestore monitor: timed out after 1m0s while verifying writability of storage directory”

Is your disk about to die? Is it SMR? Is it fragmented to death?

NP152 · July 3, 2024, 3:21am

Yes storage path is correct, I got the same errors with the database on the storage drive

Then I guess the only way is to keep the service alive so I won’t get suspended and copy all the files to another drive and then setup my Storage again?

I have never had problems since 2019 and the past weeks I have got that problem

I am using Microsoft Storage spaces with 3-way mirroring probably I should change it to one way

Julio · July 3, 2024, 8:49pm

Well… most probable, one of those three drives of your mirror has gotten flakey. Have you tested each drive with a SMART utility like crystal disk info, or anything? Consider verifying, and replacing if an array member is going defective. If that array is nearly full, and has an ssd tier, that will naturally cause incomprehensible fragmentation; windows will joyfully fragment all your storage node files every 4 hours - lol. You may want to increase the writing timeout (within config.yaml) in the interim, while you duplicate the node to another disk or array solution.

NP152 · July 4, 2024, 4:05am

Thank you for the suggestions ,I’ll check,
The funny thing is since 14h it didn’t crash or has errors in the log

kosti11 · July 13, 2024, 9:06pm

Hello,

Found a few errors. Of course I have switched filewalker mode to lazy again because my nodes restarted due to my bad hardware (low RAM). I know it saving it’s progress to a file well I think it will be better solution now. Please see below:

2024-07-13T16:57:12Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:13Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:14Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:14Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:14Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "Lazy File Walker": false, "error": "filewalker: v0pieceinfodb: context canceled", "errorVerbose": "filewalker: v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).getAllPiecesOwnedBy:67\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).WalkSatelliteV0Pieces:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:71\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:14Z	ERROR	piecestore:cache	error getting current used space: 	{"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: v0pieceinfodb: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: v0pieceinfodb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).getAllPiecesOwnedBy:67\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).WalkSatelliteV0Pieces:96\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:71\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:17Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:151\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T16:57:17Z	FATAL	Unrecoverable error	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:151\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:30:53Z	ERROR	services	unexpected shutdown of a runner	{"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:151\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:30:56Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:31:10Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:31:10Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:31:10Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:31:22Z	ERROR	piecestore:cache	error getting current used space: 	{"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:31:19Z	ERROR	pieces:trash	emptying trash failed	{"Process": "storagenode", "error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:193\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:361\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:430\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}
2024-07-13T20:31:18Z	ERROR	retain	retain pieces failed	{"Process": "storagenode", "cachePath": "config/retain", "error": "retain: filewalker: context canceled", "errorVerbose": "retain: filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:181\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:569\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:379\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:265\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:32:22Z	ERROR	failure during run	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:151\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:32:22Z	FATAL	Unrecoverable error	{"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:151\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:140\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:42:04Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:42:05Z	ERROR	pieces:trash	emptying trash failed	{"Process": "storagenode", "error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:193\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:361\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:430\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:84\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89"}
2024-07-13T20:42:05Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:42:05Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:42:05Z	ERROR	retain	retain pieces failed	{"Process": "storagenode", "cachePath": "config/retain", "error": "retain: filewalker: context canceled", "errorVerbose": "retain: filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePiecesToTrash:181\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePiecesToTrash:569\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:379\n\tstorj.io/storj/storagenode/retain.(*Service).Run.func2:265\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:42:05Z	ERROR	pieces	used-space-filewalker failed	{"Process": "storagenode", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Lazy File Walker": false, "error": "filewalker: context canceled", "errorVerbose": "filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-13T20:42:05Z	ERROR	piecestore:cache	error getting current used space: 	{"Process": "storagenode", "error": "filewalker: context canceled; filewalker: context canceled; filewalker: context canceled; filewalker: context canceled", "errorVerbose": "group:\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78\n--- filewalker: context canceled\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkSatellitePieces:74\n\tstorj.io/storj/storagenode/pieces.(*FileWalker).WalkAndComputeSpaceUsedBySatellite:79\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:721\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

Alexey · July 14, 2024, 7:45am

This is mean, that your drive cannot keep-up. You need either optimize your disk subsystem or increase a timeout and an interval for this check.

Pascal51882 · July 15, 2024, 7:06am

I read some of the posts here but did not find a fix for my system.
I get the following error after some hours and restarts:

024-07-14T12:19:25+02:00       ERROR   orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs      failed to settle orders for satellite     {"Process": "storagenode", "satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup eu1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settleWindow:294\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:231\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-14T12:19:25+02:00       ERROR   orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S      failed to settle orders for satellite     {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup us1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settleWindow:294\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:231\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-14T12:19:25+02:00       ERROR   orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6      failed to settle orders for satellite     {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled", "errorVerbose": "order: failed to start settlement: rpc: tcp connector failed: rpc: dial tcp: lookup ap1.storj.io: operation was canceled\n\tstorj.io/storj/storagenode/orders.(*Service).settleWindow:294\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:231\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-07-14T12:19:26+02:00       ERROR   failure during run      {"Process": "storagenode", "error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
Error: piecestore monitor: timed out after 1m0s while verifying writability of storage directory
2024-07-14 12:19:26,236 INFO exited: storagenode (exit status 1; not expected)
2024-07-14 12:19:27,239 INFO spawned: 'storagenode' with pid 997
2024-07-14 12:19:27,245 WARN received SIGQUIT indicating exit request
2024-07-14 12:19:27,246 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2024-07-14T12:19:27+02:00       INFO    Got a signal from the OS: "terminated"  {"Process": "storagenode-updater"}

I think my problem ist that my drive is to full because of untrusted satellites. I run the command but I only gain like 10GB a day.
Its XFS formatted and has only 500GB left. The node thinks it has 5,3TB but there are 11,5TB on it. Ext4 would maybe help a little with this problem but I dont have a spare drive and it would take forever and the online time would drop below 90. My other drive with ext4 has 6,38TB free but thats not enough to move the data.

Its not possible to manually delete old data from untrusted satellites right?
I just dont know how to fix this. Seems like my node is out soon or later…

Drive:
WD120EDBZ-11B1HA0 with 12 TB (CMR from what I found out)

Alexey · July 15, 2024, 7:26am

There are only two options:

either improve the disk subsystem (with adding more RAM or SSD as a cache)
or increase the timeout for this check

If you would select the second approach, then I would suggest to increase it on 30s as a step, save the config and restart the node. Increase it the same way until the node wouldn’t stop to crash.

You need to use the --force flag and list all untrusted satellites after the forget-satellite command.

Pascal51882 · July 15, 2024, 7:36am

Thank you! I need to try the second option. Nothing I can do about the subsystem at the moment.
I think more RAM would be good but the CPU is the bigger bottleneck at the moment.

My config.yaml is really old.
Do I need to ad and change those 2 options?

# how frequently to verify the location and readability of the storage directory
# storage2.monitor.verify-dir-readable-interval: 1m0s

# how long to wait for a storage directory readability verification to complete
# storage2.monitor.verify-dir-readable-timeout: 1m0s

I used this command for the satellites. Is this correct?

docker exec -it storagenode-v3_WD12TB ./storagenode forget-satellite --force 12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB  12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo  118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW   --config-dir config  --identity-dir identity

2024-07-15T10:25:26+02:00       INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}
2024-07-15T10:25:26+02:00       INFO    Anonymized tracing enabled      {"Process": "storagenode"}
2024-07-15T10:25:26+02:00       INFO    Identity loaded.        {"Process": "storagenode", "Node ID": "1SBAXdMLqjBnm9Kchk2WRWr7B74LMXxVba7nT994xWx4eWGEeo"}
Satellite ID                                         Status
12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB  In Progress
12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo   In Progress
118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW   In Progress

Alexey · July 15, 2024, 8:41am

It’s never a bottleneck. You need to use top to see, that most of the CPU load is actually IO wait (the disk is slow).

no. Only the related one, in your case it’s

so, you need to change only

Uncomment the option (remove the # and the space after) and increase it to 1m30s, save the config and restart the node.

Pascal51882 · July 16, 2024, 7:52am

Thank you. I still have shutdowns every 1 hour or so. Should I ad another 30 seconds until it hopefully works? Is there a limit?

I only gained 7 GB since yesterday. The command for the untrusted satellites seems not to do much.

daki82 · July 16, 2024, 7:17pm

yes, to both write interval and timeout

yes, the timeout should stay under 5min. all above is endangering the audit score.

Alexey · July 17, 2024, 4:03am

For a writeability check you may leave the check interval as is until you increase a timeout more than 5m0s, because the default value for the writable check interval is 5m0s.

However, the readability check has an equal interval and timeout, unlike a writeability check.
@Pascal51882 please make sure that you still have a writeable check failures and not a readable ones.