After disabling lazy mode the filewalkers didn't print anything to the logs

No, it’s the default. You only need to add it if you want to set it to false.

2 Likes

Omg so currently my nodes do both, at startup and lazy. Then it’s clear, why both are struggling. I think.

This flag is not present on older nodes before this flag was introduced. So technically you have to manually add it to the config.yaml file then set it to true/false.

Config.yaml isn’t updated if newer flags are added. Its created only once when you create the node.

1 Like

No, they should not struggling if the lazy mode is enabled (it’s enabled by default, if you did not disable it).
You may disable a startup scan, if you do not have discrepancies between the data used on the dashboard and data used reported by the OS (please use SI units) and do not have a side usage on the same disk with node’s data, it should reduce the load on startup.

i for one greatly look forward to the filkwalker messages in the log.

even bettter would be an occassional (every few hours) “still running” message just because they are sooo slow to finish.

1 Like

I believe the “started” and “finished” are enough. If you do not have errors during the way, it’s still running until “finished”. So check your logs. Or the disk activity as a side effect.

I disagree.
It should print a progress message every once in a while, like every hour.

1 Like

yes something that could indicate the progress it shouldn’t be 100% accurate, but if someone want to do some maintenance it would be good to know if he can plan it for the next day, or a he need a week

And how can it know the progress? The FW reads all files to get the total used space and what files are there. When it starts, it dosen’t know either, so how could it calculate a progress without knowing how much is till the end?

3 Likes

It would be very rough, but if you assume that the amount of pieces to be removed from your disk is a fairly large number, the filewalker iterates through each of the prefixes aa to 77, you can check the current prefix it is iterating and get some idea of the progress.

You can even do that in linux by checking the file descriptors of a process.

Example snippet of a bash script of mine to get the current prefix:

#!/bin/bash

NODE="your-node-here"

declare -A SATELLITE_NAMES
SATELLITE_NAMES["ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa"]="US1"
SATELLITE_NAMES["pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa"]="SLC"
SATELLITE_NAMES["v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa"]="EU1"
SATELLITE_NAMES["qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa"]="AP1"

# Garbage collection

FILEWALKER=$(docker top $NODE | grep gc-filewalker)
if [[ ! -z $FILEWALKER ]]; then
    FD=$(echo $FILEWALKER | awk '{print $2}')
    FOLDER=$(ls -l /proc/$FD/fd | grep storage/ | awk '{print $NF}')
    SATELLITE_HASH=$(dirname $FOLDER | xargs basename)
    CURRENT_PREFIX=$(basename $FOLDER)
    SATELLITE_NAME=${SATELLITE_NAMES[$SATELLITE_HASH]}
    echo "Garbage: $NODE $SATELLITE_NAME $CURRENT_PREFIX"
fi

# Trash cleanup

FILEWALKER=$(docker top $NODE | grep trash-cleanup-filewalker)
if [[ ! -z $FILEWALKER ]]; then
    FD=$(echo $FILEWALKER | awk '{print $2}')
    FOLDER=$(ls -l /proc/$FD/fd | grep storage/ | awk '{print $NF}')
    SATELLITE_HASH=$(basename $(dirname $(dirname "$FOLDER")))
    CURRENT_PREFIX=$(basename $FOLDER)
    SATELLITE_NAME=${SATELLITE_NAMES[$SATELLITE_HASH]}
    echo "Trash: $NODE $SATELLITE_NAME $CURRENT_PREFIX"
fi
2 Likes

It’s 1024 folders per satellite. They could just count how many are done and log that number frequently.

You may see it right now:

Even for a non-lazy one:

Incidentally I noticed the used space filewalkers, which previously had been taking days or erroring out, now finish on my TB disks in just a few hours. Which isn’t so bad from a log perspective.

the reported disk usage is still pretty different than what the OS has as used but we’re down to hundreds of gigabytes instead of terabytes so I guess that’s good.

Do you have any errors related to a filewalkers or databases in your logs?

You know, I was like “of course I would have ensured that file walkers finished before saying this” but… sigh…

one node the latest used space walker errored with “context canceled”, and on the other node the lazy filewalker was apparantly still running.

on the context canceled I just restarted with a non-lazy filewalker

1 Like

It can be easily calculated based on which of the prefixes (1024 two-letter subdirectories created for each satellite) it is currently processing. Due to the fact that the distribution of pieces in these prefixes is pseudo-random, they contain approximately equal numbers of files (valid within a same satellite only)

Current versions of storagenode ALREADY track and record this information for all FWs in the database - FWs update this info every time they switch to each next prefix.
Developers recently (versions 104-105) added this feature to save FW progress when restarting a node, so that FW can continue its working after restarting from about the same place where their work was interrupted by restart(or crash/reboot).

So all that needs to be done for FW progress display is to periodically output this information to the log in addition to the database.
Probably by converting it to % for convenience. For example, if the 341st prefix out of 1024 is processed, then it is 33% done for this FW run.

This is not a precise messurement.

Does it need to be? For a simple progress bar indicator on the web UI it should be fine. A lot of error would be hidden anyways if you reported progress percentage as an integer.

1 Like

unfortunately it’s tracked in memory, then updates the database, when it successfully complete.
However, it may use the prefixes to track a progress, yes.