Upcoming storage node improvements including benchmark tool

littleskunk · May 9, 2024, 10:20am

I tried to tell you a few times that I totally expect you to disbelieve my numbers and that there is only one way to verify it yourself. That message didn’t get to you. So tell me how can I deliver that message to you? I don’t understand what the problem is. You try to argue about my node but that one is out of scope for this conversation. I am happy with the performance of my node. I want to find out what else is needed to improve the performance for other nodes as well like for example that one slow down we found for Windows. What is needed to focus on that outcome instead of talking about my node that is currently running better than ever before?

Mitsos · May 9, 2024, 10:26am

On a completely different tangent, those of us that know what they are doing, is it ok if we update a couple of nodes to the upcoming rc version to test this in production or are we expecting an apocalypse if we do that?

littleskunk · May 9, 2024, 10:39am

I did update my own storage nodes already. So far it runs stable. There is one downside. You can’t downgrade this time because this version contains a DB migration. Worst case there might be a way to downgrade by fixing the DB by hand but it might be better to avoid that.

Not included in the current release are the latest Windows improvement. That will hopefully get cherry picked today together with a fix for the trash cleanup job not updating free space numbers. Up to you if you want to wait for that version or just install any version in between.

Up next on our todo list is the next benchmark test. This time the storage node will be pushed to its limits from a remote machine. First test results are looking bad indicating that there might be some bottleneck on the transport layer. If the team keeps the speed up they might find the root cause later today and we have another round of cherry picks tomorrow.

I don’t have a new benchmark tool for windows yet that would contain the latest improvements. The benchmark tool needs to get merged to the main branch for that. As a workaround if you don’t want to wait:

# make sure you delete the old repository first
git clone https://github.com:storj/storj
cd storj
git fetch https://review.dev.storj.io/storj/storj refs/changes/99/13099/4 && git cherry-pick FETCH_HEAD

This does a cherry pick so you need to make sure you reset your repository to latest main branch first otherwise it will not contain the windows improvements. I am sorry this is still a bit more hands on while we keep improving the performance in the background. This is basically what happens when reducing the feedback loop to just a day between us commiting something and you having the chance to run it.

littleskunk · May 9, 2024, 10:53am

Try apt install sqlite3

littleskunk · May 9, 2024, 11:02am

It will only write that file on a successful run. You don’t have to go to extreme values. A short test duration of at least 30 seconds is all we need. If you can zip the trace file for that one we can look into it and see if there is something left to optimize. Based on your dd output I would guess there is not much room for improvement but if you like I will pass it over to the developer team.

And one more detail. The benchmark tool has some overhead by its own. It needs to sign orders and so on. That part might consume some extra memory and cause crashes. The benchmark isn’t designed to run for hour long work loads. The scale for this one is more like minutes maybe up to a full hour.

Roxor · May 9, 2024, 12:52pm

I’m on the same page @IsThisOn : and I decided this wasn’t important enough to keep arguiing about. I can be happy the new fsync=off behavior is an improvement in 99.9% of the cases… while understanding these benchmarks don’t show what @littleskunk thinks they show. As long as the code makes it into an upcoming release I can tolerate a little delulu

BrightSilence · May 9, 2024, 1:24pm

Alright, let’s calm the tensions a little.

Yes, small tests will mostly display performance of cache writes
Yes, large tests represent an unrealistic 100% sustained load scenario
Yes, as always, real world performance will lie somewhere in between
You can help by testing different scenarios that show different results
Regardless, tests so far suggest that it will be an epic performance improvement for most nodes
Disputing results without showing your own results clutters up the topic and doesn’t add value

I’ve personally opted out of the benchmark until binaries are available, as I don’t want to bother setting up a go environment on my NAS. And so I’ve also refrained from disputing other people’s results. But I’m very much looking forward to these changes going live.

snorkel · May 9, 2024, 1:58pm

In the end the real use results matter and I’m very curious if this mod will increase the discrepancy between low performers and high performer nodes, or lost and won races will be the same. I will opt-in on one and opt-out on other just too see the difference between nodes.

Roxor · May 9, 2024, 2:06pm

I was wondering the same thing. I can totally believe the smallest RPi/SBC nodes with something like external USB HDDs would win more races: as the OS caching layer can soak up writes so the node always appears “ready for more”. Those same SNOs may notice higher buff/cache RAM use (in tools like top)… but that’s kinda the point.

pasatmalo · May 9, 2024, 2:19pm

In my opinion it should decrease that discrepancy:

High performing nodes: Previously they already could cope quite well with the amount of IOPS/load to the drive (less than 100%), therefore the change from sync to async calls will improve some performance (no need to wait to write data to disk), but less as they were not already bottlenecked.
Low performing nodes: They previously could not cope with the amount of IOPS which resulted in lost races and such (disk at 100%). Therefore with async calls, they will be able to perform more “IOPS” (because writes are bunched together in cache), which should in turn mean that they can attend more requests than they previously could.

In essence: the benchmarks show the performance improvement with a 100% load. If your disk was not at 100% before (with the current network load), you will not gain as much as the nodes that were at 100%. Of course if the network load increases, then the nodes that still have leftover IOPS will be able to attend more requests than those who are at 100% (after the performance improvement), but everyone benefits anyway.

littleskunk · May 9, 2024, 2:41pm

Game plan for today:

We will look into CPU usage especially on slower devices like a Pi3. It is unlikely that we can speed it up to full HDD speed but they might be some smaller gains that are still worth it. Developer team has some trace files to look into. If you have a similar CPU constrained device I am happy to hand it over to the developer Team as well.

Up next we will do some spot checks regarding the bandwidth our storage nodes have. We will run the new remote benchmark tool against some random selected nodes for a short duration. We are going to use big piece sizes to compensate for the fact that the performance improvements are not rolled out yet. We hope we will still get an accurate reading. The data will be unpaid because we upload the pieces to the storage nodes without commiting the result to the satellite. Garbage collection is going to clean it up later. We will keep the upload burst short to keep the amount of unpaid data as low as possible.

Roxor · May 9, 2024, 2:49pm

I understand that you need to move some beefly slabs of data to get reliable performance numbers… but would your current auditing system already hold some clues too? Like the round-trip times to service audits may point to faster/slower nodes?

Maybe that rough audit data could give you an idea of which nodes are faster… for you to run your real tests against?

Or maybe audits are just too tiny to really tell you anything…

littleskunk · May 9, 2024, 3:02pm

Fun little story. My coworker with the Pi3 has a 1 GBit/s connect. I have just 100 MBit/s at the moment. It could be even worse. I already thought about downgrading my internet connection to 50 MBit/s to reduce my costs.

In terms of audits we will both respond as quickly as we can. In fact it will look like his node is a bit slower because my node has a 98% cache hit rate while his node will have to read the piece from disk. Does it mean my node is faster? Nope. The moment a high upload stream hits my node my limited bandwidth will get a problem. Ok I would still outperform his Pi3 but I hope you still get the picture. The question is what is the peak throughput we can push into the network. We are confident that we manged to improve the storage node performance but we don’t have any data about available bandwidth.

snorkel · May 9, 2024, 3:08pm

You should benchmark all nodes for the complete picture. If you test 10% of nodes, you can get way off results then a conplete network test.
The same remote test could be done for the fsync tool too, for all nodes, to get the big picture.
You can use SL to do future tests too, before putting them in production.
And… you can publish a general statistics.
And… you can send SNOs, based on their node ID and associated email, the resuls of the tests, to let everyone see where their node stands in comparison to the general statistics.
Or at least make a microsite where everyone can search by his node ID to see the results.

littleskunk · May 9, 2024, 3:30pm

That is planned for later. One step at a time. Some early samples will work for now even if they are inacurrate. They still give us a overall direction what to expect for the full test run.

LrrrAc · May 9, 2024, 4:28pm

Can the rest of his suggestions be implemented for this run and any future? (And before its brought up again, yes I am asking for information for information’s sake)

littleskunk · May 9, 2024, 4:34pm

I don’t understand the question. It is all implemented.

LrrrAc · May 9, 2024, 4:36pm

This bit ##20 CHAR##

littleskunk · May 9, 2024, 4:46pm

Not that I am aware of. In my culture we value privacy more than anything else. For me the benefit of an email notification or a public list isn’t worth the privacy violation. I will always seek for solutions that do not violate my privacy. For example running a benchmark tool locally on my machine.

Ruskiem · May 9, 2024, 5:35pm

how about general ANONYMIZED statistics?
And mail to SNOs: “Your node is better than 40% of the nodes”
or
“You need improvement, Your node performs worst that 80% of the nodes”

to make benefits from such tests,
ppl could get some feedback, and chance to improve, or get interested and improve!