What determines traffic?

tachyontec · November 18, 2022, 6:15pm

Hello team ,

So i was comparing my node’s data with one of my friends and came up with some strange (to me) results. Device 1 : old pc with intel i5, Device2: rasberry pi 4B 2GB

Both devices are running on ubuntu and have the same disk but in the pi (2) there is an adapter that cost some time, and probably the processor isn’t that strong (i don’t know what else can be causing this):

Pc: Running for 2.5 months 1.4/2.67TB filled ~0.56tb/month
Pi: Running for 7.5 months 2.65/2.67TB ~0.35tb/month (it was just filled a week before now doesn’t change a lot)

Note also that they have same ISP (i clarify because this could explain p2p connection problems sometimes)

DISK PERFORMANCE

pc:

sudo hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   19410 MB in  1.99 seconds = 9748.55 MB/sec
 Timing buffered disk reads: 488 MB in  3.01 seconds = 162.28 MB/sec

pi:

sudo hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   1526 MB in  2.00 seconds = 763.91 MB/sec
 Timing buffered disk reads: 116 MB in  3.09 seconds =  37.50 MB/sec

SPEEDTEST

The problem is the the internet connections can never actually support these speeds in any of the places, and ironically speed in the pc is half as the one in the pi.

pc:

Idle Latency:     4.82 ms   (jitter: 0.55ms, low: 4.53ms, high: 5.92ms)
    Download:   102.21 Mbps (data used: 47.5 MB)                                                   
                 18.26 ms   (jitter: 1.89ms, low: 6.90ms, high: 25.11ms)
      Upload:     9.99 Mbps (data used: 4.6 MB)                                                   
                 40.75 ms   (jitter: 28.48ms, low: 5.71ms, high: 497.41ms)
 Packet Loss:     0.0%

pi:

Idle Latency:     4.30 ms   (jitter: 0.24ms, low: 4.11ms, high: 4.70ms)
    Download:   190.95 Mbps (data used: 176.7 MB)                                                   
                 15.62 ms   (jitter: 5.30ms, low: 4.65ms, high: 83.59ms)
      Upload:    20.88 Mbps (data used: 25.9 MB)                                                   
 Packet Loss:     0.0%

LOGS

But the logs on the pc are way more than the logs on pi and the differences here are insane:

pc:

docker logs storagenode --since 100m |wc -l
22189

pi:

docker logs storagenode --since 100m|wc -l
7302

so pc has roughly 3 TIMES the logs of pi.

ERRORS:

But also i found out that pi has more erros for some reason. Most are “used of closed internet connection” . (238/246)

pc:

docker logs storagenode --tail 100000|grep ERROR|wc -l
66

pi:

docker logs storagenode --tail 100000|grep ERROR|wc -l
246

Its still 180 errors more in 100k logs but in terms of percentage it is almost 4 times the errors

Does anyone have any ideas of what’s the problem here, can the disk speed make such a huge difference when these speeds can never be supported from the internet connection?
What could i do to fix this?

****EDIT after @Toyoo 's reply:

pc:

docker logs storagenode --since 100m | grep upload.started | wc -l
3485

pi:

docker logs storagenode --since 100m | grep upload.started | wc -l
110

we see huge difference on this traffic too…

SUCCESS RATE SCRIPT: (thanks for that whoever made it)

pc:

========== AUDIT ============== 
Critically failed:     0 
Critical Fail Rate:    0,000%
Recoverable failed:    0 
Recoverable Fail Rate: 0,000%
Successful:            151 
Success Rate:          100,000%
========== DOWNLOAD =========== 
Failed:                1153 
Fail Rate:             0,717%
Canceled:              1433 
Cancel Rate:           0,891%
Successful:            158323 
Success Rate:          98,393%
========== UPLOAD ============= 
Rejected:              0 
Acceptance Rate:       100,000%
---------- accepted ----------- 
Failed:                63 
Fail Rate:             0,015%
Canceled:              745 
Cancel Rate:           0,174%
Successful:            427549 
Success Rate:          99,811%
========== REPAIR DOWNLOAD ==== 
Failed:                0 
Fail Rate:             0,000%
Canceled:              0 
Cancel Rate:           0,000%
Successful:            1270 
Success Rate:          100,000%
========== REPAIR UPLOAD ====== 
Failed:                0 
Fail Rate:             0,000%
Canceled:              0 
Cancel Rate:           0,000%
Successful:            27029 
Success Rate:          100,000%
========== DELETE ============= 
Failed:                0 
Fail Rate:             0,000%
Successful:            80619 
Success Rate:          100,000%

pi:

========== AUDIT ============== 
Critically failed:     0 
Critical Fail Rate:    0.000%
Recoverable failed:    0 
Recoverable Fail Rate: 0.000%
Successful:            168 
Success Rate:          100.000%
========== DOWNLOAD =========== 
Failed:                1154 
Fail Rate:             0.749%
Canceled:              3607 
Cancel Rate:           2.341%
Successful:            149338 
Success Rate:          96.910%
========== UPLOAD ============= 
Rejected:              0 
Acceptance Rate:       100.000%
---------- accepted ----------- 
Failed:                6 
Fail Rate:             0.015%
Canceled:              141 
Cancel Rate:           0.352%
Successful:            39865 
Success Rate:          99.633%
========== REPAIR DOWNLOAD ==== 
Failed:                0 
Fail Rate:             0.000%
Canceled:              0 
Cancel Rate:           0.000%
Successful:            9872 
Success Rate:          100.000%
========== REPAIR UPLOAD ====== 
Failed:                0 
Fail Rate:             0.000%
Canceled:              0 
Cancel Rate:           0.000%
Successful:            3338 
Success Rate:          100.000%
========== DELETE ============= 
Failed:                0 
Fail Rate:             0.000%
Successful:            24305 
Success Rate:          100.000%

@BrightSilence more lost races from pc?
TRASH:

pc:47.26GB
pi:22.81GB

It seems trash is bigger on the pc but if we see the stats above it doesn’t seem pc is losing more races…

littleskunk · November 18, 2022, 6:30pm

A lot of traffic is caused by short-living data. Let’s take blockchain snapshots for example. They will get downloaded a lot of times in a relatively short time. After a week or so the snapshot will get replaced with a new one.

From a storage node perspective, that means a node that has free space will get more of these short-living data than a node that is full and can’t accept new data. That is probably the reason why your Pi has less download traffic.

snorkel · November 18, 2022, 7:55pm

In the last 3 months, the ingress increased significaly. You should compare the TB added in the same period on both nodes, note an average on different time periods. Back in spring 2021, the added TB/month was aprox 200-250 GB. Last month was 660 GB.

tachyontec · November 18, 2022, 8:04pm

Yeah maybe this kind of explains it, but it was even more crazy that downloads on pi were still more than uploads though…

@snorkel You are right about the average but still as we saw in the last 100minutes in both nodes the pc has way more downloads as well as uploads. @littleskunk explained this a bit though, but still even now the pc is filling up in a much bigger pace.

What about the more errors from the pi though? Is there any case this is a problem of my hardware? Because the internet connection is really fast and stable…

littleskunk · November 18, 2022, 8:09pm

Yes the pi will lose a few more races because of the hardware disadvantage. You could compare upload and download success rate. You would have to count loglines or get the number from the debug endpoint.

Toyoo · November 18, 2022, 9:13pm

Ingress should roughly be the same between your nodes. Compare docker logs storagenode --since 100m | grep upload.started | wc -l, unless, by some chance, someone has another node within the same IPv4 block as one of your nodes. Run these commands at exactly the same time for best comparison results, so that the logs analysed for both nodes cover the same time period.

Egrees will depend on the amount and type of data already stored, so it’s natural that this metric will differ between your nodes. Compare docker logs storagenode --since 100m | grep download.started | wc -l.

Various success rate scripts would be a better way to compare error rates. Your attempts in comparing errors depend too much on the differences in egrees traffic between the nodes.

EDIT: actually, given that one of your nodes is full, you won’t be getting egrees. Noticed this part in your description only after typing the post.

BrightSilence · November 19, 2022, 12:34am

It’s iffy… I would at this point only say that low success rates indicate a problem/bottleneck. High ones don’t ensure you actually get to keep the data. Many times transfers finish but were still not among the top 80 pieces and will be cleaned up by garbage collection. Don’t assume high success rates mean your node is performing well.

tachyontec · November 19, 2022, 1:05pm

Ok i did both and updated initial post accordingly, but as @BrightSilence said, same success rate doesn’t mean much when a node has half the traffic (requests)

I still can’t understand can this be it? the hardware?
I was expecting the internet connection to be the bottleneck here once we can clearly see that the speeds from disk are never supported any of the 2 internet connections and pi’s connection is way better…
what would happen if i swap their positions, pc will get 4 times the traffic of pi?..

Everyone here is saying that a rasberry pi 4B is strong enough and strongly suggested for this network but here we saw that the possible extra gain from a regular computer which costs rougly the same money with building a pi is very very significant.

BrightSilence · November 19, 2022, 1:10pm

It’s possible, but we have no good way to measure it. Could you check and compare the sizes of the trash on the nodes? This could indicate more movement of data to trash for “lost races” even though the transfer finished.

Though you mentioned the pi node is now full, so it may be too late to use this as an indicator now.

tachyontec · November 20, 2022, 2:23pm

Thanks for that, i updated my answer again but lost races don’t appear as errors?
Because now the errors on the pi are way more even as a number and not just as a percentage as i wrote at first.
So still pc seems to be performing better…

BrightSilence · November 20, 2022, 2:26pm

Some do, some appear as failed, but many succeed before they can be cancelled.

This is expected if the pi is now full. These numbers don’t mean much anymore if the pi has been full and thus not receiving data for the past weeks.