Used space and GC lazy filewalkers start at the same time?

EasyRhino · July 5, 2024, 4:48pm

My fullest node is a nearly full 8TB ext4 formatted drive and its probably pretty fragmented.

I’m trying to let the initial used space filewalker finish and it’s taking… 5 days so far. This may be “normal time” from what I’ve seen others post, but while looking at logs, I noticed…

it seems that the lazy gc-filewalker (garbage collection) and used-space-filewalker both started at the same time when the node started, and that both are still running 5 days later.

Would having both running at the same time be thrashing the disk?

Would it perhaps be optimal to have one finish first (probably GC) and then start the second one?

incidentally my config.yaml looks like this:

storage2.piece-scan-on-startup: true
pieces.enable-lazy-filewalker: true

(I’m going to hang on for dear life until these walkers finish so I may not be able to test the change for a few more days.)

daki82 · July 5, 2024, 8:24pm

It seems my disk (10TB data of 16TB for node 20TB disk) is permanently running filewalkers.

before the tests beginnig it had 5 tb and was mostly silent (writes once a minute via primocache)
512GB nvme readcache. filewalk max: some hours a day.

lazy disabled for better dashboard values.

EasyRhino · July 5, 2024, 8:50pm

coincidentally in just the few hours since my post the lazy GC filewalker finished after 5+ days. Used space walker still running.

Alexey · July 6, 2024, 4:55am

The trash filewalker and the collector are running periodically.

Collector

$ ./storagenode setup --help | grep collector
      --collector.interval duration                              how frequently expired pieces are collected (default 1h0m0s)

Trash filewalker

The interval is hardcoded to 24h

github.com

storj/storj/blob/5430b961c6a8c276b1b613ae42a9d3ee13b36c1b/storagenode/peer.go#L531


      
          
          		peer.Storage2.PieceDeleter = pieces.NewDeleter(process.NamedLog(log, "piecedeleter"), peer.Storage2.Store, config.Storage2.DeleteWorkers, config.Storage2.DeleteQueueSize)
          		peer.Services.Add(lifecycle.Item{
          			Name:  "PieceDeleter",
          			Run:   peer.Storage2.PieceDeleter.Run,
          			Close: peer.Storage2.PieceDeleter.Close,
          		})
          
          		peer.Storage2.TrashChore = pieces.NewTrashChore(
          			process.NamedLog(log, "pieces:trash"),
          			24*time.Hour,   // choreInterval: how often to run the chore
          			7*24*time.Hour, // trashExpiryInterval: when items in the trash should be deleted
          			peer.Storage2.Trust,
          			peer.Storage2.Store,
          		)
          		peer.Services.Add(lifecycle.Item{
          			Name:  "pieces:trash",
          			Run:   peer.Storage2.TrashChore.Run,
          			Close: peer.Storage2.TrashChore.Close,
          		})

EasyRhino · July 6, 2024, 6:29am

Interesting and thank you for the response.

my collector-interval-duration is unchanged, but on this same node, there was a 3hr 20min gap between when the first gc-filewalker finished and the next one began. (it began the same time as my node received a “retain” request)

Alexey · July 6, 2024, 9:47am

The GC filewalker should run one by one if the parameter

$ docker exec -it storagenode ./storagenode setup --help | grep concur
      --retain.concurrency int                                   how many concurrent retain requests can be processed at the same time. (default 5)

would be set to 1 (the default is 5).