Ways of speeding up filewalker on NTFS

Vadim · July 26, 2024, 4:14pm

after i upgraded server with 8 nodes, from 8gb ram to 24 Gb ram it getting around 20% more ingress, so RAM is helping significantly, also after i upgraded lot of nodes from 32GB ram to 48-64 nodes working much stable and not turn off under big load. Before they turned off time to time by writability check.

Ruskiem · July 26, 2024, 8:35pm

gr8 but that will not speed up filewalker, more RAM helps for stability yes. I’m maxed out on mobo for RAM too at 64GB

batelis · July 28, 2024, 6:40pm

anyone here running 128gb of ram? does storj even use half of it?

Vadim · July 28, 2024, 7:11pm

storj is not useing it, but Windows will use to store MFT and reason used files
that will speed up storj.
I use 64 GB of ram work better than before

Roxor · July 28, 2024, 7:23pm

Individual nodes won’t benefit from that much: but when you have multiple nodes, and the OS can use a healthy chunk of RAM for filesystem/metadata caching, it will help keep things running smoothly.

If I had 128GB RAM…I’d probably set up an Ununtu 24.04 install and let ZFS ARC have half. Then steal 16GB as a compressed-RAM-swap-device. Which would leave about 48GB for OS+nodes. Probably would run 50+ nodes easily?

batelis · July 28, 2024, 7:23pm

How much ram usage you get? I use 32gb and like 25-27gb used constantly

batelis · July 28, 2024, 7:25pm

So probably my 3 nodes wont benefit from 128gb

batelis · July 28, 2024, 7:27pm

I was thinging about tiered storage but somepeople said it makes your hard drives work harder.
Now I was thinging about increacing ram to 128gb and maybe that would work better with less stress on hdds?

Roxor · July 28, 2024, 7:33pm

It sounds like help may be on the way without changing anything. (But more RAM is always better )

Vadim · July 28, 2024, 7:42pm

I have 14 nodes here.
Les nodes you have and more memory, this mean your pc can have more MFT per node in ram.

xsys · August 1, 2024, 7:35pm

Hey, it looks you have deep knowledge about how MS Windows storage tiers work, especially metadata caching. Or maybe not knowledge but emotions.
I searched this forum and not found anyone using it. And since I use what I already have running anyway, I will give it a go. And if such setup will prove as not working great, I’m ok with it, nothing to lose…

Vadim · August 1, 2024, 7:41pm

you can try budger cache, it work from v1.108 I tried it on some nodes, i like it.
First time it takes time, then it faster. no need anything additional.

batelis · August 1, 2024, 8:31pm

What does the badger thingy do exactly?

EasyRhino · August 1, 2024, 10:28pm

builds a little database about pieces in use, stuff that is currently only retrieved by querying the file system. That makes subsequent filewalkers and garbage collection much faster.

still considered experimental by storj.

Julio · August 2, 2024, 8:08pm

Sounds good. Storage Spaces has it’s uses, certainly.
If you plan to use parity at all, the gui defaults to a raid 5, with three columns, and 256kb interleave. If you’d like to make a proper/better setup, then it’s Powershell command line stuff.
For example:
The ultimate preference for a parity set-up, would be 5 disk, 16k interleave, amounting to 64k Cluster size. The reason being with stripes of 4 data disks at 16k interleave, you would match your formatting cluster 4x16 = 64k; meaning there would be no compute overhead at all for files larger than 64k, it would just bypass the writeback cache, and could sequentially write at max speed.
If you were to use a three column, you could set an interleave of 16, and with two data disks, you could match that by using 32k cluster size.
The difference is meaningful, 40MB/s vs. 160-200 MB/s (max of one disk)
Of course the parity read speed and the above efficient write settings, would be beneficial for nodes.
Anyways that’s a newbie trap to avoid… 256k interleave, default gui setup.
Quick synopsis of other considerations.
-1GB writeback cache is the default. You can change that, but it’s suggested not to go above 50% of the tier size.
-The automated heatmap scan of your disk every 4 hours, can be disabled; it’s merely an entry in scheduler, that you can disable.
-When making a pool, the order of the disks added makes a difference.
-There is the opportunity when adding disks, to have the system self align data (evenly balance between disks) - think carefully … very, if that’s advantageous to your use case.
-I’ve had problems when doing fancy mirroring outside of server versions (in Win 10/11), where the mirror tier can break to a point of identical disks. Only one (1/2 the mirror) can be in the pool at a time - and thus, both cannot be loaded driver level at OS boot; your bootup will brick. As an unsolvable kernel mode panic ensues. Your drive & data is toast.
-Simple tier may be your friend.
-Storage spaces works nicely with USB disks, generally. Even if you controller is flakey, has bad cabling, or over taxed bus., frequent disconnects, etc.
-Tiering used to work the opposite way it does now. It used to push all new data into the fast tier, then draw from that fast tier into the slow tier. Now it will push data to the slow tier (aside from the writeback cache), and by schedule, pull the ‘hot’ data back into the fast tier. As mentioned in some previous post of mine, this causes major - 100% fragmentation of your main body of slow tier data over a short amount of time.
-Using a simple pool, is a good way to just stack disks, and make a jbod.
-Once you choose your configuration for a tiered system, you cannot change it.
-REFS format will do some special things with metadata, and make a black box of it’s caching techniques. NTFS will not, you would have to do your own special caching of file records. REFS is only flavoured in 4k, and 64k clusters.
-Also, Sectorsize, Physical sector size, and file record size are all specifically adjustable. For instance if you have 4k native disks, very useful - no write amplification, etc. You can even make file record sizes match, thus saving about 15-25% of your nodes files; fitting them nicely into file records; however of course files >4k will waste this space. Nevertheless, you do stuff like that correctly and even a hdd filesystem can be faster than badger cache, you don’t even need a fast tier.
-you can pin files specifically to the fast tier, so if say you want your .db files on there.
-Do not ever put an actively used dynamic .vhdx to a tiered system, the fragmentation will destroy it’s usefulness; unless it’s small enough to pin to the fast tier.

Etc., etc.

Any specific questions, I’m happy to answer.
6 cents

daki82 · August 2, 2024, 8:27pm

Ah, and the fine thing is, you can change/pause cache configuration as you want, without restart.
I created the cache nvme partition while in use, and put 12GB ram on the databases drive, while reducing the OS ssd cache.
Nice upgrade especialy when hardware free.

xsys · August 7, 2024, 6:21pm

oh men it’s getting bit expensive, not sure I can afford it

This is not the case. When setting up Tiered space there is nothing like Interleave size and Number of Columns. Therefore, all the calculations do not apply here.

I don’t posses the server anymore, but will be getting similar setup in a couple of days, so I will get deep into this later…

xsys · August 7, 2024, 6:25pm

I tested Primocache on NTFS shortly, it did not cache metadata and did not help at all with filewalkers (what is the heaviest stuff on storagenode). Maybe I set it up wrongly, really didn’t have spare time. Maybe someone will be more lucky. When I search this forum for Primocache related posts, I don’t se anyone really using it long term and sharing their experiences.

Julio · August 7, 2024, 9:15pm

A simple space, obviously not. And yes, any tier of a tiered space can have whatever parameters you want, the fast tier itself, could be mirrored, 3 way mirror, parity, whatever - in addition to any combination desired of the slow tier, or even a mid-tier; think multiple classes even. Apparently I wasted a detailed explanation for you. Above, I specifically referenced the backing data of any tier, for purposes of efficient parity acceleration of rust spindles - the slow tier. Any fast tier, will beget utter and complete fragmentation of the slow tier, if it’s just a single hdd; without the concept of parity read acceleration/column spanning/etc. to counter balance such effects. For the data pattern of Storj, a simple tiered system would be pointless. Even so, my advice above merely scratches the surface of what’s necessary to utilize SS efficiently.

Good luck with those calculations!

Julio · August 7, 2024, 9:40pm

Primocache is decent software.

It’s not a matter of luck to make it work. First you have to understand it’s nature. It acts primarily on idle, so unless you park your node and purposely & properly seed it’s cache, it will fall on it’s face. Using the correct parameters it’s fine, will fully accelerate your file-walker activities, etc. So long as you have the ram (patience and intelligence/knowledge), but then again you could just let the file system do that - and go by luck. Caching file records on windows isn’t rocket science… it’s read the file metadata from ram, at ram speed. vs. a max hdd speed of ~50k files per second. And it’s simple math too, although nobody seems to have a clue on these forums. You have 10 million files to traverse/‘touch’? You need a cache available of 9.54 Gigs. And if the OS does it it’s known as metafile cache allocation; be it files themselves (in their whole) or just metafile info.

Excellent for gamers who don’t need/have a clue. It’s best efficiency comes from exploring last access FILE times while target disk/volume is idle. ie: overnight. All the lovely games that ran and loaded their data files previous to it’s last scan…it aggregates a heatmap of all your most popular run time hits. By pulling the underlying blocks of those files into it’s own cache; one can consider it’s file caching onto, and from - at a block level - that’s by design.

2 1/2 cents … 3 cents now (had to edit this)