Benchmark different ZFS pools

IsThisOn · February 6, 2024, 7:32am

I decommissioned my old TrueNAS System.
Now I have a test rig for some ZFS stuff.
My idea is to compare 3 different systems:

ARC only
persistent L2ARC metadata only
special vdev metadata only

To have some benchmark, the idea is to see how long it takes for file walker to finish.
For that I will look at logs.

Proposed task order:

create pool with special vdev
rsync data from old new to new node
do benchmark
move data another pool an the same host
destroy special vdev
create L2ARC pool
move data to L2ARC pool
do benchmark
do benchmark again, now L2ARC is hot
remove L2ARC, that makes is an ARC only system
do benchmark
do benchmark again, now ARC is hot

Other ZFS settings: Record Size 1MiB, Sync disabled, Atime off, lz4.

striker43 · February 6, 2024, 9:02am

I’m really looking forward to the results as I will build a new Server in the coming weeks Thanks for testing this!

s-t-o-r-j-user · February 6, 2024, 12:16pm

When moving data, I understand locally, would you mind taking a look at rsync performance, just a look, no precise measurements. I listed some alternatives here with a plan of updating the thread during the last weekend. I could not find time needed but I hope to update possibly later today and to add a few more alternatives if there is any interest of course.

IsThisOn · February 6, 2024, 9:04pm

hmmm… To be honest, I am not really interested in installing additional software.
I will probably use rsync to copy from the old node to the new.
Once the data lives on the new node, I will probably locally use cp.
That is the plan for new, because I think these are the fastest options available?!

But to be honest, I expect the pool to be the bottleneck anyway.

s-t-o-r-j-user · February 6, 2024, 10:03pm

Are you implying that instead of Solaris you are running this Kubernetes ZFS cluster? If such a case, I guess that not only @Pentium100 but also @penfold might be smiling, especially @penfold.

TBH, I dont know, its hard to say.

Roxor · February 6, 2024, 10:10pm

I thought I read in old threads that @Toyoo had a script that could read a node log and recreate the reads+writes? Wouldn’t that be a repeatable way to compare filesystem layouts (because filewalker has nothing to do with winning upload/download races or not)?

It sounds like you want to benchmark performance that would affect node wins: and not maintenance utilities.

Toyoo · February 6, 2024, 10:25pm

Yeah, there was a script like that, but I’d have to reengineer it for new logs now + it wasn’t actually taking into account factors like partial downloads + now it’d need to more carefully reproduce GC given it’s bigger importance. Sorry, I don’t have enough time for this kind of work recently.

s-t-o-r-j-user · February 6, 2024, 11:23pm

Correction:
*I guess that not only @Pentium100 but also @penfold might be smiling, especially @penfold a little bit of course. :- )

@IsThisOn
TBH, as for ZFS, I think that well documented setup by @arrogantrabbit is spot on. Maybe its worth to recall, I mean, remind some of the settings in a concise way.

IsThisOn · February 7, 2024, 10:18am

Currently I run a VM with a NFS mount from TrueNAS Core.
I plan to run the test host on TrueNAS Scale with the included STORJ plugin, jail, app, whatever the heck it is called nowadays

I get where you are coming from but disagree. My guess is, winning upload and download races as mostly to do with peering, because peering has a way higher latency than disks.
To see if you win or lose races, there is a script.
You can run this script and should see around 99%.
But it is nothing you can really benchmark, because you can’t reproduce it.

What I am most interested in is filewalker. Filewalker is the only task that stresses the disks. So for me, this is the only maintenance benchmark there is.

Link please? Not sure what you are referring to.

I probably have to find out how to store the DB. Problem is that with TrueNAS SCALE, is it realistic that besides the L2ARC SSD you also have an mirrored SSD pool for apps?
With the boot SSD, that is a minimum of 4 SSDs

s-t-o-r-j-user · February 7, 2024, 11:51am

I plan to run the test host on TrueNAS Scale with the included STORJ plugin, jail, app, whatever the heck it is called nowadays

Ah, so … vanilla setup. :- )

My guess is, winning upload and download races as mostly to do with peering, because peering has a way higher latency than disks.

I would probably say with congestion control algorithms and with packet queuing and scheduling, just to be a bit more precise. Nevertheless, I believe in general this upload download race is really a crazy idea. It makes so much noise. Completely not in line with the low io piece storage idea. Not to say that the current remuneration policy is out of sync, totally. So it should be probably addressed on both sides, the technical and business, as soon human and financial resources allow.

Link please? Not sure what you are referring to.

I guess, its scattered over the place, thus I pinged @arrogantrabbit, but recently he is even more silent then @BrightSilence. But maybe @Alexey can provide you the advice on this very particular topic.

Cheers.

s-t-o-r-j-user · February 7, 2024, 2:57pm

I took a look at my notes. I am currently running BBR as a congestion control algo. Some info here and here. I noticed significant improvement when connecting by RDP (I am using nomachine and nice dcv, both over tailscale, sometimes WireGuard). As for now, cant say much about its influence on storagenode operations. However, I hope it will make my storj setup much more stable. I do not know exactly why but my setup was heaving problems, mostly at about 04:30 UTC, recently closer to 04:00 UTC to the point that my online score dropped to about 90-95% and I was not able to make it higher with default storagenode settings. (BTW, does anybody know if there is anything special taking place in irregular intervals (lets say every few days) at about 04 - 05 UTC like updates, garbage collection or filewalker?))

As for ZFS, I have limited knowledge (mostly Linux workstations an LXD). My current interests are mostly related to

a) running Oracle ZFS Appliance in VMware or Proxmox. (My experience so far was mixed. On VMware it was working quite good, on Proxmox I was able to make it to the point that Solaris was starting but after that the system was rebooting over and over). Oracle ZFS Appliance demo seems to be fully functional (link here - look at the very bottom of the page. I like it because it seems to be much lighter than TrueNAS implementation, especially much lighter then Scale. And of course, because it is exotic. :- )

I took more space then I initially intended - I wish you best with your testing.

IsThisOn · February 7, 2024, 6:33pm

My guess is none, since there is no usage.
Without usage, there is no congestion

s-t-o-r-j-user · February 7, 2024, 7:40pm

My guess is that you are probably very wrong, but thats a topic for another conversation. Sorry, I cant commit more time right now. Happy testing. :- )

IsThisOn · February 8, 2024, 7:10am

My real life testing tells me I am not
Usage is 20mbit at max, ISP is 10GBit.
There is no congestion and I win over 99% of races. There are way more important factors at play.

Funny side note:
10y ago, CEO of my ISP company said this:
„QoS and traffic shaper is for people with lousy ISPs. Throttling your customers makes no sense. Just give everyone 1Gbit“

That was true 10y ago when nothing could make use of Gigabit. Now they give every customer 25GBit… Why? Because the their switches are capable of 25Gbit. So why not give every customer 25Gbit?

I choose 10Gbit, because you have to buy the optics yourself

s-t-o-r-j-user · February 8, 2024, 11:53am

… test it
(20 …)

Ottetal · February 8, 2024, 12:58pm

This looks interesting!

If you want, you could test performance running single drive, RAID10 and perhaps 6drive- RAID5?

BrightSilence · February 8, 2024, 3:12pm

Just quietly observing… waiting for my moment to strike…

s-t-o-r-j-user · February 8, 2024, 6:06pm

Hey, I really cant, wait. :- )

IsThisOn · February 9, 2024, 12:59pm

Why? There is not point in it. I already win everything.
And you have not explained how and why it should make a difference.
As far as I am concerned, I could also run a test if playing WoW will help me win races

I have two 16TB drives and 4 8TB drives. So my testing capabilities and the fact that I have to move around stuff make my testing limited.

I am not sure if RAIDZ is even suitable for STORJ.
Not sure about RAIDZ storage efficiency for suboptimal recordsizes and padding overhead.
On the other hand, who would host STORJ on a mirror with only 50% storage efficency and such low payouts.

s-t-o-r-j-user · February 9, 2024, 1:26pm

Because.

Because, I wrote you that in my opinion you are very wrong and I suppose you have not even looked at provided writings?

And because that even though zfs is within the topic of your thread, bbr is probably a topic for another thread.

And because if you are expressing such opinions as above you have probably not run even the basic tests.

And because I told you that currently I cant commit more time for this discussion.

Did you get the explanations now?