Tuning the filewalker

Alexey · March 12, 2023, 6:21am

hm, usually, you do not need to update DDNS in the storagenode’s config if you configured DDNS updater on your router or installed their app.

Ruskiem · March 13, 2023, 5:01am

mayby Your right,
i was refering to changing DDNS provider,
Or name like “mynode1.ddns.com” to “optimusprime.ddns.com”
all that in config file, to has a change, need storagenode to be restarted.
And it filerwalks again, it should not after manual restart of the service, its a waste of energy and resources. Thank You.

Alexey, click to unfold, to read

@Alexey Also Alex, does this tag triggers Your notification?
when it’s placed here in “hide details” method?
never used this option in forum

Alexey · March 13, 2023, 5:04am

Yes, the tag triggers notification independently of place.

snorkel · September 17, 2023, 6:41pm

Storagenode ver. 1.86.1. I turned back on the File Walker on all nodes, just to refresh the database, and see how the lazy FW is doing, and I see that it still runs at startup with full throtle, keeping CPU at 80% and HDD at 100%, but I see some regular downspike, of no activity, from time to time, like an hour apart or more. The runing time of FW is:

Synology 18GB RAM - 9.52/14TB - 7h 30’
Synology 10GB RAM - 8.85/14TB - 29h
Synology 1GB RAM - 4.65/7TB + 4.03/7TB - 58h

Still too much, I’ll turn it off again. Good news is that it dosen’t affect to much the success rates.
On Syno 1GB first node, the success rates are:

FW ON + mem copression ON: 93% UP / 84.1% DOWN;
FW OFF + mem compression OFF: 96.91% UP / 94.47 DOWN;
FW OFF + mem copression ON: 96.77% UP / 97.14% DOWN.
So it seems that memory compression helps with success rate on low RAM systems.

snorkel · September 17, 2023, 8:27pm

Without SSD cache, there’s nothig can be done, I think.

nyancodex · September 17, 2023, 9:37pm

You should get a small 64GB SSD ($5 in my place) to use as boot drive and also put storagenode docker + databases. Huge improvement.

Alexey · September 18, 2023, 3:53am

You may not use BTRFS and time will become much better. Or you may use SSD cache, as @snorkel suggested.

snorkel · September 18, 2023, 5:56am

To put it were? On USB? No thanks! I witnessed to much drama with USB drives here on the forum, to even consider it.

arrogantrabbit · September 18, 2023, 9:20am

It depends on the system, and on the SSD itself. You can find the working USB to SATA bridge that even supports SMART, or just use one of those Samsung T5 of T7 usb ssds, and they will be fine. (Raspberry Pi-s are of course special, but you can get stable solution connecting to USB2 ports)

Even if this fails - databases are disposable and non-essential anyway.

Of course if you ordered $2 thumb drive from amazon – send that e-waste back

Booting from SSD vs HDD however of course makes no difference. If the system already boots from something else, let it be.

Walter1 · September 19, 2023, 8:00am

Can someone explain how to turn off, that the small files are not coming into the node? For the last months the average file size is below 10 mb and keeps getting smaller. How to avoid small files and to collect only bigger ones?

Toyoo · September 19, 2023, 8:44am

@Walter1, that would be a violation of the Node T&C. Though, if you have some programming skills, it’s pretty simple.

arrogantrabbit · September 19, 2023, 8:48am

I don’t think your calculation is correct. Segment size is 64MB and it is split into dozens of erasure coded chunks. a lot of files customers upload are smaller than the segment size.

Vast majority of files on the node are smaller than 16 kilobytes actually.

Multiple fellow forum users posted file size histograms from their nodes.

Average size is a poor metric to begin with.

LOL. I assume you are joking and won’t comment on this.

Use filesystem that allows to manage vast numbers of small files effortlessly. Use compression to avoid wasting space on incomplete segments. You can’t control what customers are uploading but you can optimize your system for any usecase, on a subvolume or dataset level. Of course, if you are using exFAT or NTFS - very little if anything can be done. Don’t use these filesystems.

Walter1 · September 19, 2023, 8:48am

And I tought it would be in the config.yaml itself.

Toyoo · September 19, 2023, 8:50am

Walter probably has this in mind.

Zrzut ekranu

(see Grafana)

daki82 · September 19, 2023, 7:44pm

Here, icounted Myself
files total: 5.221.174
fragmented files: 4.530
tiny … < 10 KB: 949.347
small … < 100 KB: 3.590.024
average … < 1 MB: 458.258
big … < 16 MB: 223.540

And thats a mere 500GB of node data.

daki82 · September 19, 2023, 8:05pm

And here my 7.5TB node data on NTFS:

files total: 21.331.972
fragmented files: 445.628

tiny … < 10 KB: 4.802.771
small … < 100 KB: 8.544.420
average … < 1 MB: 5.235.664
big … < 16 MB: 2.749.108

ps: biggest files are around 2.3MB

Toyoo · September 19, 2023, 8:26pm

There’s a method to have a higher proportion of large files though. Losing races has disproportionately big impact on small files, as the time to transfer and store a small file is dominated by latency. On the other side for big files bandwidth is a bigger factor. So if you relocate to be further away from the customers, you will start losing more races on small files than on large ones.

In other words, move to Fiji Added benefits are living on a tropical island.

daki82 · September 19, 2023, 8:28pm

My fast node has more % small files than the big slow one.
for the statistics, same connection, slow is on lan, fast on wlan(ax)

arrogantrabbit · September 19, 2023, 8:48pm

Where does this aversion to small file come from? Why are you all obsessing over it?

Nodes stores what node stores. It’s an engineering challenge knocking to your door to make it fast. Take it. Don’t swipe it under the rug.

I’m for one happy that there are tons of small files. I learned a lot about the filesystem and software I was using. Now my node serves small files much faster then large file (I measured). Yes, I’m not joking. No, it’s not caching. I turned caching off for measurements. And it is not wasting space on partially utilized sectors either. I feel rather good about. And you are trying to hide from it. Weird.

agente · September 24, 2023, 9:07am

Little recap:

I disabled filewalker because I have more than one node (one node per hdd) and need 2 days after a system restart.
Now I need a way to manually start filewalker when I need. Can I?