Benchmark nodes

How? I have a node in my home. I buy a VPS and I want to simulate e file request to my home node and check speed of the entire operation.

Easy

  • Go to speedtest.net
  • click on ā€œchange serverā€
  • select a random server by searching for a city. As a European I searched for California and clicked on Wyyerd Fiber
  • Get random benchmark numbers that donā€™t mean a lot
    ???
  • profit :wink:

Is it a arbitrary number, that does not say a lot? Yes, but the same would be true if you would speedtest against my node.

Youā€™re going to test from that city to your pc. Yes, itā€™s easy. Iā€™ve asked something different.
I need to test storj node too.

City to your PC is what you are asking for, minus CPU and HDD.

CPU is easy, look at the usage. It is simply not there.
HDD is easy, look at the usage. It is simply not there. If you really wanna go further, use a benchmark tool like CrystalDiskMark. I asked before on the forum what would be realistic benchmarks settings but got no answers. So I choose a worst case setting and still got 30mbit on my shitty SMR drive which again, exceeds realistic write speeds by far.

Other option would be to use the success rate script, to see if your node fails down or uploads. If that is not the case, you can stop looking for bottlenecks/errors.

I know this sounds very patronizing, but I know it because I went down the same road as you do now. We are unhappy with the numbers and are looking for bottlenecks. But there are no bottlenecks. A toaster could run a node. Only real bottleneck we have is underutilization of Storj. And we discuss a Disneyland phantasyworld where we have to fine tune our ZFS arrays with SLOG and ARC and fusion pools, because demand is so high :kissing_heart:

If you have nodes installed in VM even changing vm software can improve your latency, or just change some settings. There are multiple layers of latencies in some environment.

And latency is important how? Serious question, if have no idea.

I would assume, if latency is too high, you would loose some upload races and thous not reach 100% in the success script.

Precisely this.

Remember also that the success rate script shows overstated values, i.e. higher than real. This is because a node might finish work before the node notices dropped connection. In this case logs will often still report that the data transfer was finished successfully. So a high number in the success rate script does not translate to a high number of won races.

So you think that Storj writes success messages into the log, before the process/transaction is finished?
If true, can that be changed to wait until it is finished so we donā€™t have these wrong datapoint in the log?

Itā€™s not me, itā€™s the scriptā€™s author who thinks so.

Whether this can be corrected for, I donā€™t know. The simplest place to collect lost races data would be uplink, because this is the code that actually implements the race. But right now thereā€™s no communication on these results implemented. AFAIU in some circumstances the notification on dropped connection might be delayed by many seconds, maybe even minutes, so trying to work it around might be difficult.

EDIT: after thinking about it a bit, it might be even more difficult. Both uplink and node may close connections not knowing about the other side also closing connection. Then, even if uplink closed connection earlier, the nodeā€™s OS may simply ignore uplinkā€™s closing connection because the node already closed it as well. Youā€™d probably need to ask someone experienced with socket libraries across different OSes to have an answer.

1 Like

Not exactly, the transfer probably actually finishes. The uplink terminates all running remaining uploads the moment it has 80 successfully uploaded pieces. But there is of course latency between the uplink detecting this and the cancellation message being received by the remaining nodes. If lots of nodes are fast and likely equally fast, they may all finish at almost the same time and before that cancellation message arrives. The node may then still end up losing the race, even though the entire transfer was completed succesfully.

Basically, if the script shows low numbers (<95%) you are losing more races than averageā€¦ but if the scores are highā€¦ you still might beā€¦ we just canā€™t know for sure. So low = problem, high = who knowsā€¦

3 Likes

Uplink knows. :stuck_out_tongue: ŲœŲœŲœŲœ

True, but there is no upside for the customer to add additional communication with nodes to report that data back. And you want to keep any additional overhead as low as possible during transfers. So I doubt weā€™re going to get that information reflected on our nodes.

1 Like

Indeed! But uplink already passes this information to the satellite (for accounting reasons), so the satellite could tell the node which transfers did it win or not. Or, well, at least provide node operators some aggregated information via some web interface.

Hmmmā€¦ this might be a strictly better option than the one in post #15.

2 Likes

Itā€™s an interesting idea, but Iā€™m pretty sure the satellite doesnā€™t store the nodes it has offered to the uplink for upload and the uplink only send back those who finished. So it would require storing interrupted transfers somewhere as well and also a heavy query to calculate percentages for each node. Probably too much overhead on the satellite DBā€™s for storj to consider.

Eh, the minimum necessary would be to collect four more integeres counting events per node: attempted downloads, successful downloads, attempted uploads, successful uploads. So, let say, whenever the satellite increments nodeā€™s bandwidth consumption, it would also increment the successful counter.

Then it would be a matter of periodically sending these counters, just like data for payments is sent now.

I have some experience working with telecom performance monitoring systems and we routinely processed billions of counters like that on almost commodity hardware. 60k counters is nothing.

Sure, it would be great to have more detailed view (e.g. per customerā€™s area), but this would go a long way already.

I think this benchmark from different regions will be interesting to clients. That it is not only tested big speed from US and some Eu countries, but some test results from every country in the world.
I understand that it will be changing during the time, but it will give people some measurements what to expect. Also if will be logged by satellite every uplink made, that data will be accurate all the time. This give Storj Labs very good information where need make something better.

1 Like

That may be worse than just dumping the cancelled transfers in a table. What youā€™re suggesting involves updating 110 records for each transfer. And because itā€™s an update, that would require an index seek to find the record, locking the record, updating the counter and releasing the record. All the while transfers are highly parallel and multiple processes might want to update the same record at the same time, leading to lock contention. And the worst part is that this then has to be implemented as part of dealing with a data transfer, instead of in a separate process.

The satellite updates bandwidth usage based on bandwidth orders sent by the node in batches. Exactly to avoid the kind of updates I described above.

2 Likes

Ok, you might be right on technical detailsā€”I donā€™t know the code enough. Still, the scale doesnā€™t look anywhere close to the point where it would be a problem.

I could only invite everyone interested to try to implement these suggestions and create PR on our GitHub. It could be a nice addition from the Community!

Would it convince you if we could probabilistically update them, e.g. once every 100 transfers? That should give good enough accuracy while limiting overhead.