Slowing down filewalker

Alexey · March 2, 2024, 11:01am

Is it VM too? If yes - the same rules are applied.
This is simple like that - can you run it not in VM? - then - run it, otherwise, well, expect slowdowns.

Balage76 · March 2, 2024, 11:21am

No, I run them with the storage node toolbox.
Nothing has changed from my side in the past 1-2 years, just keep updating the nodes to the actual storagenode.exe versions.

Alexey · March 2, 2024, 11:43am

The problem could be related only to the amount of used space, not to the version (they all do the same).
So, I would recommend to perform a defragmentation for this drive and enable the automatic defragmentation back, if you disabled it (it’s enabled by default).

Pentium100 · March 2, 2024, 12:02pm

Yes, my storage is slow, it’s made up of 7200RPM drives, not SSDs (obviously the Storj rates are not enough to buy SSDs and I do not otherwise have them, so, I use what I have).

The thing is - the IO load generated by the usual uploads and downloads (customer traffic) is very low, it is something like 2MB/s. I normally get over 98% success rate, a lot of the time over 99%. The problem comes from a housekeeping process, one that could be slowed down without impacting the customer experience (or, in my case, probably improving the customer experience). This is different from a scenario where Storj has so many customers and they are so active that my node is constantly uploading and downloading data at 600mbps.

Ideally, it should be possible to dynamically limit the speed of that process so that it generates no more than, say, 50% IO load. It could run faster when it is reading mostly-cached data and it would slow down when reading data that is not in the cache, but still not using 100% IO.

Cgroups can only do fixed MB/s or IOPS limits, there is no way to set “50% IO” as a limit. I will try to write a script that monitors the IO load and then increases or decreases the limit accordingly, that should work.

Yes, the way I run the node is not optimal for the node, but, as I said, I use the server for other things as well and it is the same setup I use with my other servers. The node is just one VM with others.

That depends on how the node handles a situation when the filewalker process runs too long. Does it start another one (so now there are two in parallel, does it kill the old one and start a new one or does it delay the start of a new one if the old one is still running?

Alexey · March 2, 2024, 12:12pm

This is already made by switching it to a lazy mode. It will use 100% If there is no processes with a normal priority. As soon as the normal priority process will arise (download or upload request from the customer/repair worker, this process will get 100% of the disk). But since you use a VM, well, only inside this VM, unfortunately. For the host it will be still a normal priority, unless you de-prioritize this VM.

Balage76 · March 2, 2024, 12:27pm

Auto defrag is enabled and it is not fragmented according to Windows. I checked it now.

As for used space:
Node-HDD1: 10TB drive - set for 9TB max - 5TB used
Node-HDD2: 4TB drive - set for 3,6TB max - 2,6TB used
Node-HDD3: 4TB drive - set for 3,6TB max - 2,2TB used
Node-HDD4: 4TB drive - set for 3,6TB max - 1,8TB used

The HDD2 used to be full (at max capacity), but due to some recent deletions it is less used now, but filling up…
The used space what I see (+trash) on the dashbards is about the same as I see in windows.

Mitsos · March 2, 2024, 12:30pm

Re storage: if you aren’t using all of the storage on the server (you aren’t using it, otherwise you wouldn’t have 27TB of storj data), you should have passed through a controller to the VM and set up any array (or preferably run one node per disk) directly on the VM. This bypasses the slowdowns that are giving you issues: there is NO zfs in between (parity check+checksums), there is NO translation layer (ie virtio), there is NO extra CPU cycles wasted (filewalker doesn’t choke the host’s cpu doing the aforementioned translation, it stays within the VM’s limits and gobbles up data from the passed-through disks directly). Your CPU has VT-d, it can do the passthrough at near (within a rounding error) speed.

The node handles multiple filewalkers as I have described: if one isn’t finished, another one spawns and so on and so forth.

Mitsos · March 2, 2024, 12:33pm

How are your disks connected? can you post a smart output for each of them?

Alexey · March 2, 2024, 12:50pm

This is true, partially. Unfortunately, as shown by many SNOs, who uses a pass thru drives to VM, this works not so good as expected, sometimes only 30% of the possible performance for some reason.
So I would assume, that pass thru (under Linux specifically) not so efficient.

I would suggest to test each case separately to help advanced SNOs in the future.

I tested under Windows - it works pretty well (no issues so far, include Hyper-V VM and WSL2, used only “shared folders” features, no pass thru). Tested also under Linux (KVM), but results were… confusing. My system has a hardware issue, incompatible with Linux, unfortunately (if someone interested - Moving from Windows to Ubuntu and back).

Balage76 · March 2, 2024, 1:00pm

The disks are in a HP N40L Gen7 microserver.
Windows info:

CrystalDisk info:

Alexey · March 2, 2024, 1:01pm

Actually - doesn’t matter, the matter is - the host and where the load.
If the both - the same Windows, it should work.

Mitsos · March 2, 2024, 1:02pm

Not drives, the controller is still emulated. PCI passthrough for an SAS/SATA controller. If it has VT-d/IOMMU, it should handle it just fine.

Alexey · March 2, 2024, 1:04pm

aha, yeah, I was a believer too. Is this a VM? if so - you are affected, inevitable, as shown by other SNOs.
Could you move your node out of a VM? Or at least use iSCSI to make it independent?

Mitsos · March 2, 2024, 1:08pm

Drives look ok. Is power management disabled on the 10TB one?

Pentium100 · March 2, 2024, 1:09pm

I started with the 6x4TB drives and I was using them both for the node and for other things. Later, I expanded the pool by adding more drives. So, the node data and my data is in the same pool (node data takes up the majority though).
There is no real way to change that now. Also, I would still use zfs for the node, even if I was running it on bare metal. Right now it’s ext4 inside a zvol (using zfs inside zfs seemed redundant).

However, I think I have managed to write a script that dynamically adjusts the speed limit for filewalker. It runs every 5 minutes, checks the current IO load of the data disk (1 minute average) and increases or decreases the speed limit depending on if the disk load is below or above 60% with hard limits of 10MB/s and 100MB/s.
So far it looks OK, the speed limit went up to 80MB/s.

The weird part of all of this is that 50MB/s or whatever sustained speed may not be enough for a service that uploads/downloads data to/from customers at less than 10mbps

Balage76 · March 2, 2024, 1:14pm

I did not disabled it by myself, everything is on default.

Mitsos · March 2, 2024, 1:17pm

Is that the drive with the error or do all drives error out?

Balage76 · March 2, 2024, 1:34pm

No. I had around 10 stops in the past ~3 month.All of them stoped randomly. 5 out of 10 was the HDD3. I put “pieces.enable-lazy-filewalker: true” into that node’s config after the 3rd stop. All the other node’s config are default.
What I noticed is that when there are a lot of files (60-70+GB) in the node’s trash, then it is quite sure that one of the nodes will crash.
On the other hand, the nodes can clear the trash sooner or later, becouse it went down from 300+GB to 5-10GB.
Also it is random how long the nodes can run without crash, it can be 2 days or even 3 weeks. If they “survive” the filewalker after the monthly update/restart, they will run for long. Seems that 3+ parallel filewalker run will cause crash for sure.
Whenever I notice a crash, I immediately restarted the same node (without restarting the PC) and then everything was fine for 3 weeks…

Alexey · March 2, 2024, 2:07pm

you may use a zfs dataset instead, and it likely will use benefits from that.

Alexey · March 2, 2024, 2:09pm

I would suggest to search for FATAL errors, because it is the only reason to crash your node (outside of clearly hardware issues).