Upcoming storage node improvements including benchmark tool

artur · June 10, 2024, 7:33pm

There’s no graphical interface for this benchmark tool, unfortunately.

artur · June 10, 2024, 7:38pm

No, you can delete both the binary and the benchmark folder after the test.

For these particular patches, it’s best to test with blob, orders, and the database on the same disk.

Pentium100 · June 10, 2024, 7:46pm

I get an error trying to run go install …

go: finding gotest.tools/v3 v3.5.1
go: gotest.tools/v3@v3.5.1: unknown revision gotest.tools/v3.5.1
go: error loading module requirements

Is my version of go too old?

ksp · June 10, 2024, 8:44pm

ANOTHER DRIVE and FS
new benchmark:

/media/disk018/bench/new1$ piecestore-benchmark -pieces-to-upload 100000
uploaded 100000 62.07 KB pieces in 22.954588536s (257.87 MiB/s, 4356.43 pieces/s)
collected 100000 pieces in 7.35056042s (805.28 MiB/s)
/media/disk018/bench/new2$ piecestore-benchmark -pieces-to-upload 100000
uploaded 100000 62.07 KB pieces in 22.981609182s (257.57 MiB/s, 4351.31 pieces/s)
collected 100000 pieces in 7.452747976s (794.24 MiB/s)
/media/disk018/bench/new3$ piecestore-benchmark -pieces-to-upload 100000
uploaded 100000 62.07 KB pieces in 23.125528437s (255.96 MiB/s, 4324.23 pieces/s)
collected 100000 pieces in 7.359297953s (804.32 MiB/s)

/media/disk018/bench/new1$ piecestore-benchmark -pieces-to-upload 200000
uploaded 200000 62.07 KB pieces in 50.369984448s (235.03 MiB/s, 3970.62 pieces/s)
collected 200000 pieces in 1m3.155559848s (187.45 MiB/s)
/media/disk018/bench/new2$ piecestore-benchmark -pieces-to-upload 500000
uploaded 500000 62.07 KB pieces in 2m13.992165989s (220.88 MiB/s, 3731.56 pieces/s)
collected 500000 pieces in 1m44.229967126s (283.95 MiB/s)

baseline benchmark:

/media/disk018/bench/old1$ piecestore-benchmark.old -pieces-to-upload 100000
uploaded 100000 62.07 KB pieces in 28.369228058s (208.65 MiB/s, 3524.95 pieces/s)
collected 100000 pieces in 10.508789289s (563.27 MiB/s)
/media/disk018/bench/old2$ piecestore-benchmark.old -pieces-to-upload 100000
uploaded 100000 62.07 KB pieces in 29.169365081s (202.93 MiB/s, 3428.25 pieces/s)
collected 100000 pieces in 8.584800602s (689.51 MiB/s)
/media/disk018/bench/old3$ piecestore-benchmark.old -pieces-to-upload 100000
uploaded 100000 62.07 KB pieces in 28.601768357s (206.95 MiB/s, 3496.29 pieces/s)
collected 100000 pieces in 8.995053427s (658.06 MiB/s)

/media/disk018/bench/old1$ piecestore-benchmark.old -pieces-to-upload 200000
uploaded 200000 62.07 KB pieces in 1m2.618386893s (189.06 MiB/s, 3193.95 pieces/s)
collected 200000 pieces in 1m16.211374239s (155.34 MiB/s)
/media/disk018/bench/old2$ piecestore-benchmark.old -pieces-to-upload 500000
uploaded 500000 62.07 KB pieces in 3m57.504496122s (124.61 MiB/s, 2105.22 pieces/s)
collected 500000 pieces in 2m11.269413295s (225.46 MiB/s)

Disk:

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Ultrastar DC HC550
Device Model:     WUH721818ALE6L4
Firmware Version: PCGAW660
User Capacity:    18,000,207,937,536 bytes [18.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)

FS: EXT4

Alexey · June 11, 2024, 6:29am

ext4 seems is a winner

artur · June 11, 2024, 10:57am

What’s your Go version? What are your GOPROXY settings?

artur · June 11, 2024, 10:58am

Thank you for all your testing @ksp! This is very helpful.

Pentium100 · June 11, 2024, 11:16am

go version go1.11.6 linux/amd64

No idea. Let’s pretend that my only experience with go was copying and pasting the commands from the other post.
When I first run the go install command it “finds” a huge pile of stuff and fails with the gotest.tools. Running the command a second, third etc time the pile of stuff gets smaller, but it still fails on that one.

artur · June 11, 2024, 4:43pm

Could you upgrade it to the latest version and try building the benchmark again?

Pentium100 · June 11, 2024, 7:31pm

using version 1.22.4 the go install ./cmd/tools/piecestore-benchmark/ command goes with no errors, but there is no piecestore-benchmark executable I can find.
-bash: piecestore-benchmark: command not found

Alexey · June 12, 2024, 5:23am

It’s likely placed to ~/go/bin

Pentium100 · June 12, 2024, 6:52am

I ran it on my node VM, when the ingress was about 55mbps.

The original benchmark:

uploaded 100000 62.07 KB pieces in 1m4.736535104s (91.44 MiB/s, 1544.72 pieces/s)
collected 100000 pieces in 11.24156623s (526.55 MiB/s)

The modified benchmark:

uploaded 100000 62.07 KB pieces in 1m5.397864035s (90.51 MiB/s, 1529.10 pieces/s)
collected 100000 pieces in 9.679974413s (611.50 MiB/s)

There doesn’t seem to be much difference.
I also tried the original benchmark with more pieces:

uploaded 1000000 62.07 KB pieces in 11m20.067108739s (87.04 MiB/s, 1470.44 pieces/s)
collected 1000000 pieces in 1m42.173460725s (579.33 MiB/s)

Similar results.

My setup:
Node runs in a VM with 4 CPUs and 32GB RAM. The virtual disk is formatted with ext4.
The host has 2x Xeon X5687 CPUs and 192GB RAM.
The virtual disk is a zvol, the pool is made from 3x raidz2 vdevx (6x4TB, 6x6TB, 6x8TB) and has a SLOG made from a mirror of two SATA SSDs.

This tool will be useful if/when I decide to tweak the settings of the VM.

Alexey · June 13, 2024, 7:43am

Could you please try to run it on the host?

Pentium100 · June 13, 2024, 8:18am

new on a dataset

uploaded 100000 62.07 KB pieces in 31.646801656s (187.04 MiB/s, 3159.88 pieces/s)
collected 100000 pieces in 11.715940021s (505.23 MiB/s)

old on a dataset

uploaded 100000 62.07 KB pieces in 30.862582035s (191.79 MiB/s, 3240.17 pieces/s)
collected 100000 pieces in 9.509575456s (622.45 MiB/s)

new on a zvol

uploaded 100000 62.07 KB pieces in 32.722675417s (180.89 MiB/s, 3055.98 pieces/s)
collected 100000 pieces in 8.426361066s (702.47 MiB/s)

old on a zvol

uploaded 100000 62.07 KB pieces in 32.768229414s (180.64 MiB/s, 3051.74 pieces/s)
collected 100000 pieces in 8.049847955s (735.33 MiB/s)

I’m starting to suspect that my “new” and “old” benchmarks are the same, probably messed something up with git.

So, running directly on the host (access to all cores) on an empty dataset or zvol gives about double the write performance compared to a zvol full of data inside a VM that has 4 cores. Hmm… maybe I should try tweaking the VM settings to get better performance (more cores and more queues or something else) when I have the time and there is low ingress traffic.

Alexey · June 14, 2024, 3:12pm

Thanks! You confirmed my suspects. VM is the evil for storagenode. But I thought only for Windows on VMWare. Seems not. If I would have more evidences like that - I would add this disclaimer to the documentation.

At the moment I would consider this my claim as confirmed:

Pentium100 · June 14, 2024, 8:16pm

I don’t really see it that way. Double speed on the host, sure (though I will try to tweak the settings of the VM), but even so, the performance from the VM should be good enough, at least for my internet connection.

The reason to use a VM is to be able to use the hardware and drives for other things (you know, so it’s not dedicated to Storj).

Alexey · June 15, 2024, 3:27am

Yes, I know that using VM is convenient, however, there are so many options how to do that and most of presented setups have issues with the disk subsystem performance if they uses VM. Especially Windows VM on VMWare. I do not think that VMWare is bad, but perhaps their implementation of how they presents disks to the guests OS is not optimal where you need a full IOPS throughput from the disk. Plus how Windows is working with disks and NTFS are not adding to the performance.
Perhaps using Hyper-V if the host is Windows too could have less drop in performance (at least my tests showed me that it’s almost the same as a bare metal, in some cases it even faster when you use a virtual disk). But I didn’t perform the test with this tool yet.

Pentium100 · June 15, 2024, 4:54am

Yeah, using a VM results in lower performance (probably more so on my server compared to something newer), but it looks good enough to me. Apparently a new version of QEMU has a way to make the IO multithreaded, but Debian 12 has an older version which does not support this.

I still have the option of redoing the pool as mirrors instead of 3xraidz2, but for that I would need to borrow a server to temporarily put the data on. So far, however, the node does not seem to be too slow, at least with the current usage.

Pentium100 · June 16, 2024, 8:48am

It’s has a bit more space and is a bit safer. Any two of my 18 drives can fail at the same time and the data would survive. Compared to mirrors where if the wrong two drives fail at the same time you’re in trouble.

Storage efficiency of the zvol seems good enough

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        40T   29T   12T  72% /storj

NAME              USED  AVAIL     REFER  MOUNTPOINT
vm/storjv3/stor  29.4T  32.3T     29.4T  -

Running trim on the virtual disk would probably shrink it a bit too.

Apparently sqlite does not play well with nfs.

Pentium100 · June 16, 2024, 9:13am

In that case NFS can probably work well. Storj was (is?) against NFS because by default the databases would be there as well and that would lead to problems.

Yeah, having to use a larger block size (I use 64K) is one of the reasons for lower performance. I remember testing the speed of a zvol from inside a VM (though the pool was on SSDs) and IIRC found that lower block sizes worked better and then discovered that it would make the zvol take up double space.

Though, at least for now, my node seems to have enough performance and obviously there is no easy way to change the composition of the pool to test.