Tuning the filewalker

Ruskiem · October 26, 2023, 9:14am

Hello.

I would like to raise for consideration the issue of introducing in the log file, a report, about in what state the filewalker is at the moment. (or even in SNOs dashboard if possible)

Yes in log im getting starts and exit lines, but nothing in between,
and monitoring it in debug mode is somehow problematic as if not sure why filewalker is not shown in proceses, altho logs assures me the filewalker is started and not exited yet.
https://forum.storj.io/t/guide-to-debug-my-storage-node-uplink-s3-gateway-satellite/1372/33?u=ruskiem

Because for example, now, i dont know if my node is half way in checking all of the 15 milion files my node has, or in what other percentage he is…
For example i need to know, to plan maitance, and simply not waste the work of the filewalker.
Checking all files in Storage folder takes days, even for windows, even with a lot of RAM, here:

V1dv932

it took windows 6-7 days to count all files.
Similar is true for the filewalker function, which in full (not lazy form) is crucial for reporting what is the real usage of the disk.

RAM i allocated 7GB, i can do 14GB but it seems it just does not get any faster in small files counting despite more RAM.
it’s normal PC, quite modern, win10, storagenode GUI.
(processor AMD 4700GE, 8/16 cores,
DDR4 3200MHz RAM, sata III, 8TB HDD HelioSeal WD Ultrastar, NTFS)

quote from Release preparation v1.90 topic

also in other topic i found people confirming the problem:

That’s @jammerdan quotes, from this topic: here

So im being paid for 4,81TB, and my nodes actually holds 6,41TB, and the dashboard shows 6,98TB. That’s all because filewalker wasn’t been able to finish its work without interruption over course of 6-7 days. Because theres always some unexpected shutdown, or service restart in log i notices. I got logs from 2021. From what i saw, the filewalker over course of last months,
… i never was able to determinate, if the process was completed with success even once,
i didn’t saw that even once in the logs.

Probably there’s always something occurring, a restarts or a new version update, i will provide logs maybe later, but that doesn’t change the clue of a problem: that a filewalker can do all the job and a restart in last moment causes it to forget all the work he has done…and to start allover again form 0.

Unexpected shutdowns in log can occur few times in a month, seems its enough to interrupt the file counting, if it takes soo long to complete. I realized i have only 3 maybe 5 chances in a month to perform a full filewalker, to provide actual information, to get paid correctly for actual files the node have, so it sends that info to satellites (if current gaining of space is around 0,4TB to 0,6TB a month, thats a mater of almost a 1$ per disk, per node a lose, if once a month a filewalker wont go to check all the files. This is a arduous task to watch if it was succesfull in every month now, to track it in every node like that. Would be much easier if the process would remember where it stopped, and not start from 0 every time.

Im doing defragmentation and MFT Optimization, just like advised here on forum, but im not sure if my large nodes will be able to ever finish a full normal filewalker (not lazy) before some unexpected restart of service.

So i rise that to Your attention, now when egress prices are slashed from 20$/TB to planed from Dec 1, 2023 2$/TB. a correct payout from storage is crucial for me as a SNO. Would warmly welcome announcement that STORJ inc. is improving filewalker function so it can SAVE work done, if restart occurs, and IF it was not longer than few minutes ago, so it will knows, it can CONTINUE WHERE IT STOPPED, counting the files on hdd, and not start from the beginning.
Thank You!