Bandwidth utilization comparison thread

shoofar · July 13, 2020, 1:04pm

Hmmm… ok, sooo… the difference of our egress averages might be just the matter of time that I joined (just few days ago)
But here is one more observation…
@TheMightyGreek has 2x more data than me (assuming that from the amount of Storage Held - and that we get ~100GB/day equally spread for all nodes) , but his egress average vs TB stored is closer to other guys who have 10x more stored data.
On the other hand Kevin has even more data than @TheMightyGreek (again 2x more) and he has egress similar to mine…
And one more thing … if we all have similar data spread around (because it is one satellite spreading most of the data now), shouldn’t we all observe similar egress on that data? - question arised from this @SGC is quoting his very high success rate… - unless the script that averages success rates doesn’t take into account the changes over time and if SGC has all 11TB accounted for success , last 10days contributed to only 10% of his storage. So if he could run success rate on just last month data it could show lower success rate (because of the distance)

Hope it was clear what I was musing above

SGC · July 13, 2020, 1:16pm

it’s the download successrate… upload successrate is… well basicallly a useless metric because it’s logged wrong which files are cancelled… but looking at our ingress, then there are basically no difference unless if one has bandwidth limitations or uptime issues… or such…

meaning that the successrates on uploads, i believe is most likely very uniform across the board…
5% deviation on the avg ingress numbers over 7 days when ignoring a couple of extreme numbers to get a more accurate numbers…

and i have the highest ingress when my system is up 24/7… and well… anyone that can beat a 99.4%-99.6% download successrate… i’m sure its quite possible that my server can be beat in many ways… it is after all ancient technology in the IT world… most stuff is a decade old and only thing newer is like the hba’s and the ssd’s and then ssd’s are also a couple of generations old to put it mildly…
so by far not impossible to beat, but it’s good numbers… not easy to beat for most people

SGC · July 13, 2020, 1:20pm

i know why they hate me…

i had all those crashes when i was doing gpu passthrough only a week or so ago…
its the satellites that either hates me or have made lots of repair, … i’m sure thats it… has to be… i mean i had over an hour of dt every day for a week… that has to be it… would make so much good sense…

kinda forgot about that didn’t really cause me that much trouble, was just trying out some upgrades stuff… duno why i didn’t think of that sooner… i guess i’m not fully use to the long reboot cycle of my server takes like 20-25minutes to get back online when it crashes… might have to lower my watchdog timer, as it is set to 5minutes… but also nice that the system does just reboot because of whatever lag might arise…

shoofar · July 13, 2020, 1:34pm

yeah… my download is success rate
========== DOWNLOAD ===========
Failed: 2
Fail Rate: 0.014%
Canceled: 20
Cancel Rate: 0.139%
Successful: 14375
Success Rate: 99.847%

My another hipothesis is that if you are so far away from let’s say europe-north that running the success rate on smaller dataset (not entire log just a cut off of last 13 days) would yeld different success rates than parsing entire log.
I say that because last 10 days gave you 1.3 TB of data vs 10 TB that you already have
So it might be hard to spot the difference (parsing entire storage log) taking into account of whole 11TB vs last 1.3 TB

Just an idea
Idea that gets hammered taking into account that MightyGreek is closer but farther than me) and has only 2x more data than me, (and similar egress to yours) and Kevin has 4x more data than me and has similar egress to me (I assume we are almost at the same distance from europe-north - Germany, Poland vs europe-north) …

shoofar · July 13, 2020, 1:41pm

It looks discouraging, yeah… comparing those different results…
And probably we will get frustrated when you get your egress highest, when we will be trying to beat you to saltlake or us-central.

TheMightyGreek · July 13, 2020, 1:45pm

I just found out that my ingress and egress are in fact bottlenecked by the speed of my internet connection, not much but I think it could spike higher if I had a better connection…

========== DOWNLOAD ===========
Failed: 104
Fail Rate: 0.194%
Canceled: 239
Cancel Rate: 0.446%
Successful: 53251
Success Rate: 99.360%
this is my downloads success rate since the start of July. Now that I look at it 100+ failed downloads seems kind of high. Should I be worried about that ?

shoofar · July 13, 2020, 2:07pm

I would say It might be impacting your egress.
Here are two snapshots of my outgoing stream

It is hard to see on this column picture unfortunately, when I was on windows i had nice bandwidth monitor, but on headless linux didn’t find anything that would give me something similar to windows soft.

but here is what i noticed:
Light grey is outgoing traffic, dark grey incomming.
I’ve noticed that most transfers are like the left one (left lower) - there is a large spike and then smaller continuation.
Rarely it looks like that on the right (right lower) continous outgoing stream.

That means that whenever the spike is higher than your upstream it can get truncated.
And I’ve noticed some spikes have more than 2MBytes (16MBits/s) maybe even 4MBytes (32MBits), once I’ve seen like 5MBytes in one shot (but it was like once).

I would have to parse my database of my bandwith monitor to see if it has any data supporting my theory, but I don’t know how to do it yet (don’t even know how it stores it’s data

BrightSilence · July 13, 2020, 2:12pm

I’m not sure what you think the impact of that would have been. But it would basically only have an impact while you are down or for a short while after you were down (until the satellite detects you as online again). As past behavior is currently not a factor in node selection for upload or download, unless you are disqualified or suspended.

SGC · July 13, 2020, 2:21pm

actually… that would become more complex, because repairs comes into play… and thus increasing the amount of pieces existing and thus leading to more competition, i duno how it works… nor how long such an effect could last…

ofc that also depends on how egress is started… does the customer send out a request to the satellite and then the satellite orders a certain amount of storagenodes to send data to a certain customer…

anyways way to many unknowns for me to explain why it would happen… but it would seem like the obvious choice… i had uptime issues… and shortly after i notice my egress is the worst out of a few people over 10 days… sure could be other stuff… but no matter time will tell if this was the cause… and then i will be a bit wiser xD

BrightSilence · July 13, 2020, 2:29pm

Repairs could come into play if while you were offline, segments your node was holding dropped below the repair threshold. It takes 4 hours before a node is marked as offline though and I don’t think that applied to your node. But even if it did, this is what would have happened. Segments that have dropped below the repair threshold will be added to the repair queue. When repair is triggered, remaining pieces on other nodes than yours are used to recreate the missing pieces, which are then uploaded to other nodes. Note that this doesn’t involve your node at all. When your node comes back online, these pieces will eventually be cleaned up by garbage collection. Non of this process impacts the ingress and egress on your node.

There is also time between a piece being added to the repair queue and actually being repaired. And it’s highly likely your node is actually back online before the repair ever happens. So no, I highly doubt this had any impact on your node. And even if it did, it wouldn’t be a detectable impact.

If there would be a detectable impact you would also see a significant increase in trash over the next days.

A list of nodes (I think around 39) is sent back to the uplink/customer, which then initiates the download.

I think the first part implies you don’t know enough to make the statement in the second part. I don’t know what caused it, but I don’t see any way it could have been the down time.

SGC · July 13, 2020, 4:41pm

didn’t think it would delete pieces, i mean doesn’t it loose pieces all the time and if it costs to regenerate them, why not keep extra if they are already nearly free compared to repair being a what 6.6times more expensive process, ofc that requires one to know the cost vs benefit of it…

just seemed like the obvious most performance friendly way to do that… if pieces are continually lost, then having a few extra which should work just fine if each piece is unique… ofc that also depends on how it works because the generation of new pieces may invalidate the old pieces…

but i’m sure you got a better idea about this than i … lol

didn’t think about the 4hour thing either… but yeah essentially that should mean i didn’t see any detrimental effects from it… which i duno

yeah doesn’t look like it affected it … keep in mind this graph is very avg and it will not be 100% storj data… just like 90-98% which is why it has a deviation here and there… but the from about the 28th of the last month and until now its storj data + 100-300kbs in and out data…

but yeah no major changes… one thing i duno if has any meaning is node age it also comes up in regard to when i check the egress to stored ratio…,

my node was created the 8th i believe… and on the 8th i also see the egress start to rise…
doesn’t really mean anything… because i’m also being beat by a guy with a 12 day node… so yeah most likely not very relevant… but i duno… i just don’t understand why i would get so much lower egress consistently for so long… it’s like there is some factor that is to blame for it

SGC · July 13, 2020, 5:11pm

@BrightSilence as you can see my egress to stored ratio consistently lower… duno if it’s just random chance… or related to node size maybe… or something else…

 July 3rd
    dragonhogan - ingress 17.75 - egress 19,58 = 1,89 ‰ of stored 10,31 TB
    Mark        - ingress 18.17 - egress 3,14  = unknown
    SGC         - ingress 17.40 - egress 19,34 =  1,77 ‰ of stored 10,9 TB 
    striker43    - ingress 25,2 - egress 36,93 =  3,62 ‰ of stored 10,2 TB
    (had to extrapolate here so striker numbers are highly inaccurate)

4th July
Mark        - ingress 57,97 - egress 3,14  = ‰ of stored
SGC         - ingress 55,32 - egress 23,58 = 2,16 ‰ of stored 10,9 TB - (still with stability issues)
kevink      - ingress 48.05 - egress 9,73   = 2.86 ‰ of stored - 3,4 TB approx
(seems slightly off, but have been changing/tinker numbers of nodes)

striker43   - ingress 57,75 (i will assume mark and striker are the two accurate numbers here)
dragonhogan - ingress 57,03 - egress 22,63 = 2,20 ‰ of stored 10,31 TB approx

July 5th
striker43       - ingress 105,81
SGC             - ingress 104,20 - egress 32,39 = 2,97 ‰ of stored 10,9 TB 
Kevink          - ingress 106,07 - egress 17,61 = 5,18 ‰ of stored 3,4 TB approx
Krystof         - ingress 106,15 - egress 38,24 = 3,55 ‰ of stored 11,85 TB
dragonhogan     - ingress 101,62 - egress 34,74 = 3,37 ‰ of stored 10,31 TB approx
the mighty geek - ingress 105,45 - egress 6,27  = 5,22 ‰ of stored 1,2 TB

6th july
striker43       - ingress 112,83 - (missing egress)
dragonhogan     - ingress 108,38 - egress 34,33 = 3,32 ‰ of stored 10,31 TB
TheMightyGeek   - ingress 113,09 - egress 6,87  = 5,72 ‰ of stored 1,2
SCG             - ingress 108,71 - egress 32,64 = 2,93 ‰ of stored 11,11TB
zagg            - ingress 108,77 - egress 6,52  = 4,93 ‰ of stored 1,32 TB

7th july
TheMightyGeek - ingress 109,30 - egress 6,06  = 5.0 ‰ of stored - 1,3 TB
SGC           - ingress 109,96 - egress 29,61 = 2.6 ‰ of stored - 11,21 TB
kevink        - ingress 109,60 - egress 15,69 = 4.6 ‰ of stored - 3.4 TB approx
dragonhogan   - ingress 104,21 - egress 29,11 = 2,8 ‰ of stored - 10,31TB
striker43     - ingress 109,76 - egress 25,58 = 2.5 ‰ of stored - 10,2TB

July 8th
TheMightyGeek - ingress 113,82 - egress 7,13  = 5.50 ‰ of stored 1,4 TB
SGC           - ingress 114,14 - egress 36,49 = 3.20 ‰ of stored 11,33 TB
Dragonhogan   - ingress 108,71 - egress 41,79 = 4,03 ‰ of stored 10,35 TB
striker43     - ingress 114,25 - egress 36,31 = 3,52 ‰ of stored 10,31 TB 

9 July
Dragonhogan   - ingress 98,57  - egress 67,20 =  6,41 ‰ of stored 10,48 TB 
SGC           - ingress 103,8  - egress 42,75 =  3.75 ‰ of stored 11,39 TB
TheMightyGeek - ingress 102,65 - egress 7,13  =  6,89 ‰ of stored  1,49 TB
 kevink       - ingress 102,73 - egress 45,78 = 12,71 ‰ of stored  3,60 TB
(exceeded 1% congratz - i kinda feel a bit left out egress wise lol)

JoshGarza - data inconclusive


10 july 
Dragonhogan   - ingress 107,53 - egress 67,20  = 6,75 ‰ of stored 10,47 TB
TheMightyGeek - ingress 105,78 - egress  7,13  = 6,81 ‰ of stored  1,6 TB
kevink        - ingress 109,26 - egress 47,75 = 12,76 ‰ of stored 3,74 TB
SGC            - ingress 109,89 - egress 47,02 = 4,07 ‰ of stored 11,54 TB

11 July 
SGC           - ingress 118,39 - egress 59,78   = 5,13 ‰ of stored 11,64 TB
TheMightyGeek - ingress 117,14 - egress 11,09   = 6,41 ‰ of stored  1,73 TB
kevink        - ingress 117,66 - egress 44,19   = 11,5 ‰ of stored 3,84 TB
dragonhogan   - ingress 115,75 - egress 73,64   = 6,97 ‰ of stored 10,56 TB

12 july 
SGC           - ingress 114,58 - egress 71,56  = 6,09 ‰ of stored 11,75 TB
TheMightyGeek - ingress 113,97 - egress  11,2  = 6,15 ‰ of stored 1,82 TB
kevink        - ingress 113,68 - egress 37,18  = 9,40 ‰ of stored 3,95 TB
dragonhogan   - ingress 111,00 - egress 68,49  = 6,41 ‰ of stored 10,68 TB

BrightSilence · July 13, 2020, 5:44pm

Because Storj doesn’t work with replication, but rather erasure codes. After repair, the piece you had is now stored on a different node. Having a copy of that exact same piece would only protect against the loss of that exact same piece, instead of against the loss of any piece, like erasure codes do. Therefor it’s not worth it to keep a copy and probably not even something that the piece metadata currently supports.

Not directly, but there is a clear indirect link. If a lot of data has been uploaded in March that is downloaded now, nodes that weren’t online yet in march or were full during that time won’t see any downloads now. So it’s not necessarily age, but just which data got uploaded to your node or more accurately during which time frames was your node accepting data.

Egress went up for everyone on the same day.

Could simply be the effect of recent data being downloaded more frequently. We’ve seen that before.

I can’t give you an exact answer, but I think my informed speculation could help illustrate what may influence these percentages. First off, while your numbers seem to be the lowest, the differences are fairly small. Considering the rough timing of when your node came online, it is my guess that you started right when saltlake started pumping a LOT of data onto nodes. This data doesn’t seem to be downloaded as much currently as the more recently data uploaded by europe-north-1. As a result you have more data stored compared to egress. Nodes that existed longer before that saltlake data was pushed see more downloads from the other data uploaded before that and nodes that came online after the majority of testing had moved to europe-north-1 see more download/stored because they don’t have the less downloaded data from saltlake.

I could be way off there. But even if the exact timing and facts don’t match, at least it illustrates how the period in which you received data influences your egress numbers.

SGC · July 13, 2020, 6:04pm

i did also compare it against dragonhogan’s 12 month or so old node, which seemed to get better numbers… but only check that a few times, so even smaller dataset… but i have also had nearly days long down times in the past, also there might be some sort of cycle to it which we just cannot see yet because this is a limited time span…

ill try and run through and compare it with dragonhogan’s 12 month old node then.

anyways… atleast you are also puzzled lol

Vadim · July 13, 2020, 7:01pm

@shoofar this is better thread to speack about bandwidth.
today 50-100gb incoming is looks like OK

shoofar · July 13, 2020, 7:11pm

Hi, yeah

100GB/day is it per each node?

Because 100GB/day is usual but shared between nodes on same public IP (as you can see in thread above) … - that gives around 1.3-1.5 MB/s per public IP.
But your snapshot your picture from different thread
suggests that you get ~60Mb/s->6.25MB/s - you have 4 nodes so ~1.5MB/node
given that you have 2 separate internet intakes gives you 2x more on top of what our nodes achieve.
That is what really stumps me here

Vadim · July 13, 2020, 7:21pm

If to be absolutly fair it is all my home trafic, so it contains litle TV, some online movie trafic also, but most is nodes i hope.

Vadim · July 13, 2020, 7:24pm

if you able to make 2 connection from 2 different provider, you will have 2 different subnets. Also mater that there is no other SNO in you /24 network that is also posible.

Krey · July 13, 2020, 7:24pm

my stats per nodes (from 1 july)

some of nodes are full, so avg not really helpfully indicator

today:

Vadim · July 13, 2020, 7:27pm

every row is node? i think you not mesured repear egres, it is big this month, in begining of the month it was bigger than usual Egress.