Upcoming storage node improvements including benchmark tool

Roxor · May 9, 2024, 6:38pm

We already have a script any SNO can run, whenever they want, that shows if they’re winning most upload/download races… or have areas to improve. And it’s based on the success of client requests: the actions that directly influence payouts.

I can understand Storj needing internal rankings though: that could help their dev and sales teams.

Ruskiem · May 9, 2024, 6:59pm

yea but noone wants to run it, i tried and got some Powershell errors and give up. i rememebr back in days it worked, but then stopped and im currently unable to run any test, so will welcome with open hands any statistic if Storj can run it for me

thelastspark · May 9, 2024, 7:00pm

I have go, I have gcc but what do I enter into cmd to run the test?
Additionally for the sqlite3 step:

littleskunk · May 9, 2024, 7:04pm

We don’t have access to your node and are unable to run it for you. Running a storage node is easy. Optimizing it requires additional knowledge and there is no shortcut. We can provide help for setting up grafana or the new benchmark.

Mitsos · May 9, 2024, 9:56pm

Installed 1.104.0-rc and skipped lazy used space because I saw the comment above about trash being fixed later today.

Installed 1.104.1 and enabled lazy used. All I can say is it flies compared to before. ~1h for ~5TB (dbs still on disk). Well done.

agente · May 9, 2024, 10:49pm

Without fsync patch? Maybe less than 5mib…

littleskunk · May 10, 2024, 12:55am

You mean the lazy file walker is faster now? How long was it before? This is just a side effect. We didn’t touch the lazy file walker itself. We made the uploads cheaper so that the lazy file walker has more IOPs available to run a little faster.

Mitsos · May 10, 2024, 1:08am

Yea it’s a lot faster since it doesn’t have to pause all the time waiting for IO. I don’t remember exact figures, but it was > 3 hours for sure on that node (node runs on array that is used for other things as well).

littleskunk · May 10, 2024, 1:09am

Awesome. Thats a nice side effect.

Ambifacient · May 10, 2024, 4:53am

I’ve also manually upgraded one of my nodes on XFS to v1.104.1, and you can clearly observe the difference (upgrade was taken at around 00:43):

And this is with both garbage collection and trash emptying taking place, and garbage collection was occurring before the upgrade. The read operations are getting a big boost. The intermittency in the reads seems related to when the data is being flushed from memory, not 100% sure though.

Unfortunately I think a side effect of the bandwidth DB change is that the existing prometheus exporter no longer gives nice bandwidth charts in Grafana.

agente · May 10, 2024, 6:40am

This graph is better than any tests to explain the improvement

PS: How long it takes usually to go live ver 1.104?

littleskunk · May 10, 2024, 10:00am

I don’t see that on my Grafana or storage node dashboard. To me it looks like it should continue to work just fine.

Ambifacient · May 10, 2024, 1:45pm

For reference I am using this exporter: GitHub - anclrii/Storj-Exporter: Prometheus exporter for monitoring Storj storage nodes

My graphs look like this:

It’s only picking up the periodic flushes. Are you using the same exporter?

littleskunk · May 10, 2024, 1:50pm

No. I don’t like the idea of running some third party tool that basically gets full access to my storage node. Instead I use the build in metrics endpoint that works without any additional log scraping tool.

Ambifacient · May 10, 2024, 1:52pm

Ah gotcha. I will have to check that out.

In the meantime I’ll see if I can patch this.

zip · May 10, 2024, 4:24pm

Any hints on how to graph JSON data in Grafana? The exporter mentioned here is reading Storagenode API and exposing the data for Prometheus to scrape, but what would be a best way to do the same directly with Storagenode API?

littleskunk · May 10, 2024, 4:26pm

Not storagenode API. Just the metrics endpoint with prometheus.

zip · May 10, 2024, 4:45pm

Thank you, didn’t know about that.
For anyone else wondering, it is at /metrics at the debug.addr port.

littleskunk · May 10, 2024, 4:47pm

Alexey · May 10, 2024, 5:28pm

try this suggestion: