Raspberry 3/4 Node Owner - Did you do any optimizations?

Uau! Exactly the same setup as me and a completely different result!!!
Then I’d say it’s just “losing de race” pure and simple.
Even the internet connection is the same except my location is The Netherlands!

I did another experiment this evening. I stopped the node after seeing for several hours the fairly steady upward trend in disk used space. Reconnected the disk to the PC and ran the node for another 2 hours. Then reconnected back and started the RPI node again.

Success rates were ≈10-15% for RPi and 60% for the PC. However the disk used space grew at roughly the same rate with both configurations!

My conclusion is that RPi is just a little bit slower in handling tiny storj pieces and thus loses many races when a few extra milliseconds of the latency are critical.
OTOH if dealing with large pieces the location and network connection seems to be more important.

WDYT?

1 Like

Any one got thoughts on pi 4 32bit vs 64bit Storj performance. (Try to stick to actual storj issues and not 64 trolling :slight_smile: )

This is a good question. I never test it, because there is no Official Raspbian 64bit at the moment. I know there some other OS but dont know if there up to date and run smooth like Raspbian.

I don’t understand the debug numbers:
current: 0, highwater: 5, success: 79, errors: 969, panics: 0
error grpc_Internal: 969
success times:
0.00: 14.229741ms
0.10: 24.029575ms
0.25: 32.821467ms
0.50: 37.695758ms
0.75: 3.234962688s
0.90: 6.0720128s
0.95: 7.82956375s
1.00: 1m17.300383744s
avg: 4.641647216s
ravg: 3.65233536s
failure times:
0.00: 14.415185ms
0.10: 16.974203ms
0.25: 33.165115ms
0.50: 35.168242ms
0.75: 1.515124096s
0.90: 6.091244185s
0.95: 7.093505305s
1.00: 18.997825536s
avg: 2.272000783s
ravg: 1.553624064s
Form pi 4 32bit 2GB SSD 30/10M internet
Is it the numbers you would use to make a histogram?

Normal distribution…

Wikipedia is not the best resource to start with in order to get a general idea…

Here’s a random financial website which seems to be a bit easier to understand.

So I dug out my old node from weeks ago…core 2 quad 4GB 3TB HDD WIFI to 30/10 internet
The 0 and 0.1 scores are higher at 40ms and 50ms but the success rate is higher at about 15%

This is interesting, I’m running a ryzen 2400, 16GB DDR4, USB3.0 HDD and my numbers look almost as “bad” as the RPI ones (I have ~14% successrate):

[7000443944203516876] storj.io/storj/storagenode/piecestore.(*Endpoint).doUpload
  parents: 3150330293024660781
  current: 0, highwater: 13, success: 31303, errors: 182361, panics: 0
  error grpc_Internal: 182334
  error grpc_InvalidArgument: 23
  error grpc_Unauthenticated: 4
  success times:
    0.00: 3.817766ms
    0.10: 5.15771ms
    0.25: 27.967726ms
    0.50: 618.825568ms
    0.75: 2.235812928s
    0.90: 4.055611776s
    0.95: 6.890583398s
    1.00: 12.941505536s
    avg: 2.172078896s
    ravg: 1.617695744s
  failure times:
    0.00: 1.46311ms
    0.10: 3.996434ms
    0.25: 24.032765ms
    0.50: 292.378016ms
    0.75: 1.261809184s
    0.90: 3.041646566s
    0.95: 4.01124631s
    1.00: 12.3474944s
    avg: 2.431997027s
    ravg: 1.123468544s

However my internal HDD doesn’t have a higher upload successrate so it isn’t the HDD connection that causes a problem.

I like that failure time of 1.46ms!

Sorry everyone else was quicker!?

You’re misinterpreting the debug output.

A few failures at 1.46 ms gives a client range of:

(3E+8*1.46E-3)/1000/2 = 219 km

However, the average failure rate is way up at 2.432 seconds, which gives:

(3E+8*2.432)/1000/2 = 364,800 km

And most failures are sitting within 1.123 seconds of 2.432 seconds. So, the expected range of failure times for data pieces in @kevink debug data is:

2.432(+/-)1.123 seconds

or

1.309 to 3.555 seconds

1.309 seconds is more than enough time for several round trips around the Earth…

Thanks for the explanation!
That makes me wonder why my powerful machine has such a low successrate although it has plenty of time (2.432±1.123 seconds…).
Internet connection is 1000Mbit download so it certainly isn’t the speed of the download and obviously not the latency to the uploader… What could it be?

This is determined by the relative speed of other nodes in the same geographical area which are also within the same WAN subnet.

So, it’s possible for something like 2 or 3 very fast nodes to compete for data pieces coming from literally half way around the Earth… The relative difference in speed of those 2 or 3 nodes might be very tiny… on the order of microseconds. However, the failure time is sitting up at 2 seconds due to the absolute distance to the client.

This is most likely the reason.

As I said in the original post, my internal HDD has the same successrate, so it definitely is not the USB3.0 HDD that is the reason.
My internal HDD is a zfs raid1 of 4TB WD Reds

The relative speed of your node may or may not have anything to do with your HDD connection.

However, it’s my best guess that if there exist two nodes within the same WAN subnet running right next to one another… with the same processor and other hardware except for the HDD connection… that an internal SATA connected HDD will win most of the data pieces offered vs. an USB 3.0 connected drive.

so your bias is wrong and you should adjust yourself based on the evidence?
In your formula , could explain the working? I guess the divide by 2 is for there and back?

I don’t understand your comment.

The debug data are statistical measures. The absolute maximum straight line distance is the speed of light multiplied by the measured failed time divided by 2… probably divide by 4 is a better guesstimate.

However, my point is there is no point in posting the minimum failed time… since the number of clients connecting at that speed/distance is very tiny and not representative of the statistical success/failure time of the node in question.

ok thanks. I had thought that there would be more to it.
I mean to say, we are thinking fast nodes are better but it really mostly comes down to distance from client (so long as you aren’t really really slow?

It’s multivariate…

Absolute distance is important, but also relative node speed within one’s own local neighborhood of nodes.

Of course, with an average failure time of around 2 seconds, the local neighborhood seems to be more important than the absolute distance… which is a bit counter-intuitive, and not what I expected when I first starting running a node.

Anyone of the Raspberry owner try to chance this settings?

# Maximum number of simultaneous transfers
storage2.max-concurrent-requests: 7

Maybe here we can find a good value for some raspberry Nodes.
I know, the value of one is not the best for the other, but maybe someone have good results whit this, so that other can play whit this too.

I can try but I don’t think it would do much difference.
I’ve played with it in the past, with no major changes on traffic performance.

1 Like