Ways of speeding up filewalker on NTFS

Hallo, i am looking for ways to speed up filewalkers for GC and Space used on many Windows nodes.
I know its about iops and the following can be done:
Db on ssd
Log on ssd
No file indexing on drive

What else tho?

I have a spare and brand new 2tb 2.5 SSD from a used machine I bought. I might sell it but wanted to know if anyone has added caching to windows nodes to speed up things?
I also got many 120-500gb ssd and nvme, so if 2tb is not needed then I can sell it and use less.
Thanks in advance

There’s a new metadata caching layer coming that you should be able to try soon. But I’m not familiar with many options on Window: I know I’ve seen a few SNOs mention PrimoCache: maybe search for their threads?

If you have any nodes on Linux try using your SSD with ZFS: either as a metadata-only L2ARC (for a single device) or as a metadata special device (if you can create a mirror).

You may also:

  1. Defragment the disk and enable the automatic defragmentation if you disabled it (it’s enabled by default)
  2. NTFS Disable 8dot3name
  3. disable access time: [Solved] Win10 20GB Ram Usage - #17 by arrogantrabbit
  4. Try to add an SSD as a tiered storage:

And @Roxor mentioned, there is a new experimental feature for caching in the nearest release.

3 Likes

Thanks for the answers Roxer and Alexey - i will give it all a look :slight_smile:

1 Like

Also you can install more RAM, then more metadata will be cached and work faster.
But more you restart pc, let it help.

1 Like

Vot gde ja eto chital.
cache - How do I tune windows server 2012 R2 to handle NTFS file structure with 50 million files? - Server Fault

In case someone needs to do it for Lunix:

And do not use NTFS under Linux :slight_smile:

1 Like

Oh yeah you are right! I still stick to FAT16, good ol’ times :crazy_face:

(Just to not confuse anyone, that was a joke)

1 Like

FAT16 under Linux? Are you a beta tester or do you want to have an extraordinary experience?
(that’s was an almost joke).
Sorry but you are hijacked a Windows NTFS thread…

as i have said before, a Microsoft Drive Optimizer - Wikipedia is a magic wand. Theres simply nothing else what can help, if windows plays stupid. Not any defragmentations, nor any stupid 3rd party programs, not a cache, nothing will replace, what windows suppose to do on its own. And this tool unlocks it.

Press “analyze”, it can unlock drives hidden powers (actually window’s hidden powers).
Do the test before and after on storj’s blob folder, “right click → properties”. if it counts files same slow, restart the pc, and do the analyze again.

This effect is lost every restart or disk dismount, so You have to redo it, but its worth it, it makes filewalker runs x times faster, on bare metal for sure.

3 Likes

All good no offense meant :slight_smile:
I just put/linked it here because of a reference to this thread related to the access time settings and tought it may help someone else who is coming down the same way (independent of the OS). But ill stop hijacking the thread, sorry about that.

1 Like

Thanks i Will give this a try. Why dies it “reset” when Windows reboot? If the drive has Been optimized whould that not be lasting longer?

1 Like

We need more power Scotty! It’s just a f#ing cache - not a miracle Jim! When you run any analysis on a disk in windows, or access files Windows caches, similar to linux. That’s all, no miracle. It’s not a pirority cache unless you know how to set it, windows will naturally evict the cache when not in use, and memory needed for other normal operations on a basis of popularity/re-access. If you don’t have the memory available, you simply don’t have the memory. It will eventually evict everything - within minutes or hours, depending on how much ram you have and config of the OS. So where are you left? having to re-read the files again in order to cache them again. So there’s really no advantage, except that you’re basically sequentially reading the fat vs. randomly reading it - for momentary gains. It only depends on your memory use - rather non-starvation, that lets it stay in ram. Older versions it became a big problem on servers, as it aggressively kept too much file cache and would not evict non-addressed files, thus severely hampering applications regularly trying to live there lives and bogging down servers. So in summary it’s not just a first in, first out cache, it trys to manage it dynamically.

Just get more ram.

Next re-boot, even for servers - it’ll have to ingest it all again, unless you figure out how to prepopulate it by purposeful cache structuring or use persistent memory.

2 cents

How much RAM can you buy for this?

It’s not an Optimize, it’s an Analyze. What is this command do - it’s walking over MFT and load it to the memory cache (to analyze, is the defragmentation required or not), if then a filewalker started, it will be much faster. However, as soon as files becomes less used, the cache would start to expire. For example, if the node would be stopped for several hours likely the magic effect of Analyze will disappear (cache would likely be cleared). Of course the reboot would clear the cache too, the same would happen if you unmount a disk.
The Optimize will do an actual defragmentation, so its effect is more persistent.

1 Like

Perhaps at least double of MFT size plus at least 8GB for Windows itself accordingly this discussion:

funny i didn’t notice any gains in used RAM during or after the “Analyze”.
Also my VM has like 4GB of ram, and the MFT of all those milions of storj files is way above that (it was some double digit GB’s, the MFT alone, of a full 4TB disk (and 14TB disk MFT is 66GB for example, thats what UltraDefrag 7.14 is showing me))

So i don’t know what it does, but the whole used-space filewalker was able to finish in 1h after that.
And before it was like 6-7 days.
(Windows 10 pro, and disk formatted under win10pro too,
might not work if disk was formatted under win 7)

This is interesting results. But also you should not have a load, or use a non-lazy filewalker to compare before and after, otherwise it would also depends on what other things the node is doing.

im talking only about windows Drive optimizer’s “analyze”. No storj node was online during that. (was off). Im just questioning what exactly this windows tool is doing, and its usage of RAM is low.