Ideas how to improve disk performance?

binary · July 22, 2023, 8:47pm

I have fast internet. My hardware is my bottleneck. Any ideas how to [try to] improve disk write speed, in terms of Linux / storj settings? Uploads loose race, small percentage are successful, the rest is “upload failed”

daki82 · July 22, 2023, 9:09pm

what hardware is it? defragmentation? how full is the disk? what says the system ressources scanner from linux? smr drive? USB drive? whats the ingress per day?

JWvdV · July 23, 2023, 2:48am

First of all is formatting it with the right file system: EXT4, ZFS or NTFS (especially no FAT, exFAT or BTRFS).

Second: database not on your data drive, but on your system drive. So the many IOPS of the database don’t compete with your data.

The remainder is only of little value, like:

Furthermore, it’s acceptance. As long as your node is growing some GB’s a day on average, I wouldn’t bother. Even my worst node on SMR was growing 5GB a day on average. The biggest concern is and was the fact there was a big difference between disk use and use according to Storj-network, probably reflecting the slow filewalker process.

daki82 · July 23, 2023, 6:20am

made a node yesterday1.80.1, the default is 4Mib now.

Alexey · July 23, 2023, 8:55am

This is unlikely related to a slow hardware, because storagenode can be run even on the router:

The usage depends on the customers, not your hardware, unless you use network filesystems (NFS, SMB/CIFS, SSHFS, etc.) or SMR drives.

binary · July 23, 2023, 9:00am

So what would be your suggestion? Why my node started failing on most of the uploads?

P.S. at moments when CPU iowait time drops, I see mote successful uploads

Alexey · July 23, 2023, 9:13am

Keep your node online, of course

Then probably your disk is not able to keep up. There is not much you can do about it, except using a native filesystem for your OS - ext4 for Linux, NTFS for Windows.
If you have more RAM, it usually starting to work better, because OS would use most of the RAM for cache.
If you have UPS and you use Windows, you may enable a write cache. This could help to increase number of successful uploads. But if you do not have UPS, enabling write cache would lead to data loss in case of power interruption.

binary · July 23, 2023, 9:26am

How about that suggestion of moving DB to other place? I understand it is additional point of failure…

snorkel · July 23, 2023, 9:39am

You should enable TCP-Fastopen if you didn’t already. I see so many lost races since TCP-Fastopen is been enabled on others, because my setup dosen’t support it (Synology NAS).
Lost races are a combination of slow hardware and high network latency between clients and your node.

Alexey · July 23, 2023, 9:53am

This is not needed, unless you see a lot of errors like “database is locked”

Toyoo · July 23, 2023, 9:59am

You need to diagnose your specific bottleneck and remove it. Can you paste the output of iostat as described here?

binary · July 24, 2023, 11:30am

not sure if it really worked, but it seems it’s a little bit better after changing scheduler from deadline to noop

Toyoo · July 24, 2023, 9:13pm

Out of curiosity, is your setup capable of doing NCQ (native command queueing, a SATA subprotocol—would need support from both the controller and the drives)? It’s somewhat known that NCQ and the deadline scheduler don’t mesh well.

daki82 · July 24, 2023, 9:35pm

Since OP does not provide the OS kernel,hardware setup or disk type, how should these “ancient texts” help?

Alexey · July 25, 2023, 3:28am

I’m not sure why do you ask me
The greatly improvement in usage can be only from the customers side. All other is fine tune and remove bottlenecks, if you have them.

daki82 · July 25, 2023, 6:23am

Somehow i missclicked

I wanted to say that moving dbs is indeed not neccesary but it helps a little bit with load reducing.

Toyoo · July 25, 2023, 7:51am

Sorry, what do you mean?

daki82 · July 25, 2023, 11:02am

this NCQ is about 13 Y old

and @binary does not tell us his setting, or hardware…

…i can pracitcaly say: just make a new node in the same /24 ip to get better performance.

Toyoo · August 5, 2023, 1:04pm

Yeah, and it is still relevant.

Vortal · August 5, 2023, 2:14pm

LVM nvme caching.

There is also a lot of RTT you can trim from your networking stack also.

One of the easiest ways, is 1 node 1 disk, and if the node fails, attempt to move before failure, and just run a bunch of smaller nodes.