Bandwidth utilization comparison thread

BrightSilence · December 18, 2020, 11:24am

Math is a set of agreed upon rules for the manipulation of numbers. Applying those agreed upon rules is by definition objective.

Just going to put this here…

Source: Physical constant - Wikipedia

the gravitational constant G ,

the speed of light c ,

the Planck constant h ,

the 9 Yukawa couplings for the quarks and leptons (equivalent to specifying the rest mass of these elementary particles),

2 parameters of the Higgs field potential,

4 parameters for the quark mixing matrix,

3 coupling constants for the gauge groups SU(3) × SU(2) × U(1) (or equivalently, two coupling constants and the Weinberg angle),

a phase for the QCD vacuum.

There are a few different lists going around and there may be some discussion on the once closer to the bottom, but the speed of light is absolutely on every version of this list you can find.

So lets stop with the reductio ad absurdum every time your logic doesn’t hold up. It’s not the fault of math. It’s not a matter of opinion. It’s a matter of not providing sufficient evidence for the claims you are making.

deathlessdd · December 18, 2020, 11:29am

I highly doubt our nodes will ever see 240MB/s on a single node there’s over 9000 other nodes out there that would be sharing the load.

200mbit sounds like a more realistic usage maybe 400mbit if your lucky. But never 240MB/s were talking over 1gig internet territory now.

Also keep in mind the number of nodes will keep going up meaning your chances of ever seeing that kinda bandwidth is very very unlikely. Least on a single node.

Pentium100 · December 18, 2020, 11:47am

Also, even if someone decided to fill up the entire Storj netowrk, it would take them a while duo to network and hard drive speeds.

SGC · December 18, 2020, 1:44pm

i wasn’t aware that there are fixed limitations on customer uploads, that would ofc restrict what can happen and how fast.

the 240MB/s was in relation to my internet connection, initially i thought it was full duplex, which would mean a 1gbit full duplex connection can do 240MB/s data transfer 120 in both directions…

so that would be my theoretical max that can hit my server from the internet, but i suspect my cpu’s would give out way before then, even 20-30MB/s of high iops workloads seems to put a strain on them at times.

what i was trying to say was to make an example that my system wouldn’t be bandwidth limited but the bottleneck would be cpu during intensive storj traffic… i wasn’t trying to say that i would expect 120MB/s ingress … that’s like … well the highest sustained avg i’ve seen is like 3.5MB/s which is 300GB a day.

i was saying 10x of what we saw during testing… putting it at a peak of 35MB/s
also that was over 3 years… the world and data amounts will also change quite a bit over the next 3 years, i really don’t think thats unrealistic if storj gets popular.

that is a very machine like sentence, but yeah i guess… don’t really like the word objective tho, it sort of indicates that something is really defined, i find when i dig enough into basically anything, reality and sense just dissolves as the fabric slowly becomes undone.
we can agree that numbers and math is pretty useful… ofc if we accept numbers we need to accept infinities and then math starts to get interesting…

this has nothing to do with my logic, i’ve studied physics for decades, it’s a topic of which i’m very well versed, sorry that reality doesn’t conform to what you want it to be.
let me tell you a secret… it’s turtles all the way down…

@deathlessdd
yeah i know, was max theoretical internet bandwidth to my server, not the storj ingress.

@Pentium100
yeah but would get faster from a free capacity Node’s perspective as the network fills up and more nodes are full…
certainly iops seems to be pretty demanding in the higher TB ranges… my node seems to use like a couple hundred iops atleast… my avg is at 175 iops
and thats at 2.5MB/s so if that is any indication then i doubt i could push it past 20MB/s
which would be pushing it… i think my array cap out at about 1000 or 2000 iops not sure if it will run that sustained tho.

if the avg hdd can do like 400 write iops, then wouldn’t that mean a CMR drive might cap out if storj ingress goes above 7.3MB/s

so if we where to see ingress above that or even close i suppose many 1 hdd nodes would be unable to keep up, which … well this is getting long already… lets just say that’s another good reason for having more than one nodes on a subnet…

and that SMR drives might be even more handicapped than i thought

deathlessdd · December 18, 2020, 1:52pm

Its not only just internet bandwidth you also have to think about the hardware bandwidth there is no way your going to get max 120MB/s read and write simultaneously on your drives anyways. Because the size of the files are really going to be a bottleneck to everything. But lets stick to realistic numbers of what your drive will actually see. You will maybe see 22MB/s read and write speeds.

Your cpu shouldn’t be under that much load the only time it should see alot of load is compressing and decompressing files, Simple file transfers doesn’t really use that much cpu.
This is my pie running a node

I have a 10gig internet connection you know how hard it is to get that kinda speed over the internet? Then ontop of that My hard drives can’t even keep up with the speed. I have to use more then one PC to max out the connection speed.

BrightSilence · December 18, 2020, 2:02pm

Except that the balance between supply and demand probably will stay roughly the same. You still fail to count the similar rise in nodes into your calculations. I think my previous post sufficiently showed how that balance would be maintained. It’s impossible for 10x the ingress to happen if I’m correct about that. I’m open for anyone pointing out the flaw in my thinking though.

I haven’t noticed any physics in your comments around expected future traffic. It sounds more like speculation to me. I don’t doubt your credentials in the field, but I’m not sure where it applies to the conversation at hand.

I recommend that instead if drowning in details, you take a look at the bigger picture. Storj Labs did tests that toppled some of the slower nodes. They did those tests partially to find the limits of the network. They have tools to balance the load. So they know the limits and they know how to prevent crossing them. So do you think they will let anything close to 10x those limits happen on the network?
There’s really no need to make it more complicated than that.

SGC · December 18, 2020, 2:40pm

but in 3 years the same amount of data will be a less % of the total capacity of a node, thus it wouldn’t scale linear over time even when you take the supply and demand into account…

just like we had and used less data 3 years ago, in 3 years the norm will be higher…

i know it doesn’t seem like it, but i do attempt to keep atleast a little on related topics… even tho i’m terrible at it lol. but yeah i really should get better at this forum business.

that’s a good point, sure maybe they can control the ingress and plan ahead… but i suspect the internet is a bit like the ocean very difficult to control…

there maybe some fixed limits storj labs has to keep everything running smoothly… and like pointed out it does seem like the hdd ingress max will be iops…

meaning at less than 100mbit pr node there would be nodes starting to skip a beat and thus other nodes that will have to pick up the slack.

wait… . was that actually a stress test back when we got 300gb or whatever a day?

it’s the whole zfs thing i think… it’s not a problem… i can just see that before i would reach my internet / network max i might reach a cpu bottleneck, like you say you got 10gbit, but there are other bottlenecks i got 9 drives serving my main storagenode 10 if we count the OS SSD
but yeah… from what my numbers say it doesn’t look like i can do past 40MB/s max and more realistic more like 20MB/s sustained… ofc that would depend on how big the file are at the time… sequential transfers i can ofc do much higher, but from my avg recorded iops over the last week and the avg MB/s in relation to what i believe my max raw write iops to the array has been.

made my array the size i did because i was considering setting up 10gbit at home and wanted to be ready for that… so it can do like 600-1.1GB/s sustained sequential reads and like i said before i think i can get 1k or 2k iops out of it sustained.

haven’t really used it for much yet tho…but alteast i won’t have to migrate my big storage node when i finally get a 10gbit link to the server… just so long distance pulls and haven’t been totally sure on where i even want it terminating.

BrightSilence · December 18, 2020, 2:47pm

I read this at least 5 times and I still have no idea what you mean.

Balanced growth means 10x the demand on 10x the nodes. How exactly does that lead to more per node?

All the heaviest loads we’ve seen were mainly Storj Labs testing. They don’t go into details about what the tests are for. But even if it wasn’t a stress test, it showed them some of the limits nodes will run into. So it functioned as one whether that was the intention or not.

deathlessdd · December 18, 2020, 2:48pm

You do but your only on what 1x or 4x pcie2.0? This can make the difference in speeds. It might sound like alot when you have 9 drives working together but there going to be limited on the speed of the bus which older servers back in there ages was fast. But even 4x pcie3.0 is already much faster and can handle higher speeds.
If I can remember max speed for Pcie gen 2 was 500MB/s read and write, So if you have 9 drives there not actually running max speeds. Unless your running at 16x

ElectronHarvester · December 18, 2020, 3:01pm

Can all of this be broken out into a lobby thread? The last 40 post or so have nothing to do with bandwidth comparison… This thread was extra helpful when I was just getting started to see the storj environment and this recent conversation is rather diluting the topics quality.

SGC · December 18, 2020, 3:08pm

should be a lot more than that… i think my lowest bandwidth connection is 8gbit to my south bridge…
aside from that i’m running 2 or 3xHBA’s over multiple PCIe2.1 x8 with the disk scattered across them.
depending on the different pools the main one i split across two…

i think PCIe 2.1 x8 is like 20 or 40gbit must be 40 actually because i got 200gbit on the pcie bus and thats split over 5 slots of x8… well that didn’t fit… maybe it’s the dual cpus that makes that math go bad.

says it’s 16gbit on this so maybe thats right…

even my ESI is like 8Gbit and that’s the slowest link i think there is internally on the mobo.

i suppose you aren’t far from right tho… 16gbit and then x4 would be 8gbit… and then maybe some overhead and what not

but yeah there is plenty of power left in this puppy, but i did try to make sure i got just about the minimum of what i would need to setup a mean storage rig.

deathlessdd · December 18, 2020, 3:09pm

Oh ok that makes sense then, I didn’t realize you had a 8x card.

SGC · December 18, 2020, 3:17pm

certain does need to pay attention to the limited bandwidth on the bus thats for sure…
there is both a dual 1gbit nic controller and a 6x sata controller on the south bridge… behind a ESI of 8gbit and also having an additional pcie socket i think… had put one of but HBA’s in that early on and couldn’t understand why it wasn’t running right, because it was bouncing network data over the ESI to the cpu then to a Cache / l2arc on the SATA so back over the ESI, and then when it needed it again from the l2arc back over the ESI and ofc when accessing and writing stuff on the one HBA again over the ESI.
8gbit quickly becomes very little

lets just say it didn’t run right had to pull out the block diagram lol

HisEvilness · December 18, 2020, 6:35pm

PCIe 2.0 on 1x lane does 500 Mb/s write speed max, most 5400 RPM drives will max out on 125 Mb/s write speed so you can have max of 4 SATA 6 ports per AIB on 1 x PCIe 2.0. FYI.

SGC · December 18, 2020, 7:13pm

if i had any PCIe 2.0 x1 slots, lowest i got is an x8 which seems to be get 4x lanes and then the rest is seems to go to the dual nics, and a 6 port sata 6 controller which i would assume get an additional two lanes so that there is some head room.

not sure if it actually has the lanes for 8x tho and if the other stuff is disabled or unused it might be able to use it anyways… the block diagram has the slot as x8 but that’s just the slot i think… because the path to it seems to be x4

all the rest are full x8 tho one of them shares the 8x with an onboard sas controller but thats not enabled.

HisEvilness · December 18, 2020, 9:08pm

PCIe slots on most sata add-in boards are set to 2.0. So the limit is set there most motherboard are on 3.0 unless they are really old. And seems you done this before i think you can set slot config in your BIOS for high end mobo’s

SGC · December 18, 2020, 9:57pm

both mobo and hardware are PCIe 2.0 to ensure proper compatibility, sometimes there can be weird stuff, even tho in most cases it might not be an issue… also this is enterprise LSI SAS HBA’s built to run 2x 6Gb
so many numbers they seem to be giving very different numbers than the PCIe wiki… but the PCIe wiki may be simplified for ease of use… oh thats not the right documentation either…
mine are 4i 4e and a 8i
same board and same lsi controller basically, so it was pretty much plug and play because it’s enterprise gear, it will most likely start yelling at me if it wasn’t getting atleast 4x

their numbers are a bit interesting for the PCIe 2.0 bus…
they got 5Gbit for the PCIe 2.0 x1 thus having it at 600MB/s but maybe the last bit is gone for overhead or to keep it simply for the wiki, it’s kinda rare wiki is really correct
also it’s bi directional / full duplex… so you could essentially be writing at 600MB/s to one set of hdd’s and reading at 600MB/s from another set…

and since the HDD’s are unable to do two tasks at one time, the limitation really isn’t as bad atleast for effective use, sure it will affect your peaks… but the effective throughput when utilizing full duplex will be immensely higher… ofc it won’t apply to all uses cases…
not really worth anything if you never write or read at the same time…

Pentium100 · December 18, 2020, 10:07pm

I don’t think so. I have only seen Intel 10G network cards (or rather, the ixgbe mudule) complain about not having enough bandwidth. I have not seen other stuff do it.

However, if your uplink is 1G, then it does not really matter how fast the HBA is - even a PCI-X one would be enough. On the other hand, if you have 10G uplink, now it’s a bit more interesting, but still, not that hard to get, at least for the interface bandwidth - being able to read/write to hard drives at 10gbps, that would require a large array.

SGC · December 18, 2020, 10:45pm

yeah even with 8 enterprise sata hdd’s i get a sustained read of like 600mb/s at the slowest times… and ofc thats sequential… i’m sure with enough iops that can go down to a 1/10th
i actually think my node migration was about that 30-40mb/s and then ofc closer to 100mb/s for a good while… but much of the time it was around that…

takes a lot of hdd’s to do what most SSD’s do so easily, if they are proper SSD’s, ofc most arent lol
but the numbers look good short term… ofc who needs 4Gbyte /s sustained

my IoMemory SSD tracks PCIe Lanes also… or i think it does… tracks wattage on the PCIe bus… sadly mine being 2.0 can only do 25watts… which means i cannot boost my SSD for 50% more performance lol tsk tsk… power hungry little beast tho… 37 watts for an SSD lol i would have thought that decreased lifetime or something… but it seems like they claim it doesn’t… maybe it was simply part of their design and they decided to allow it to be able to pull the power if it could get it for more performance.

my SSD can get keep up with 10gbit sustained unboosted

kalloritis · December 19, 2020, 3:13am

I’ll bite, but only because this steps inside of the SAN and distributed storage architecting I do for work.

So, yes, individual storage device nodes would suffer but only because they would get less successful ingress- not so much that they just wouldn’t get any ingress. Storj operates on a latency basis distributed model, so once it meets the durability level required by the EC profile, they cancel all other upload requests of the original 110 nodes (as per the white paper). Basically they would only be able to confirm fast enough to successfully take in that up to 5-7MBps and would just start failing uploads with an ever increasing percentage until they reach either full capacity or become never fast enough to commit the segment piece before enough others do.

Secondly, from the hardware side of things, you’re kinda on target with the number of IOPs that a single spinning disk might be able to do, but this is also highly dependant on several factors- size of data being written (512B, 4K, 1M, 4M, etc), format of the drive, spindle speed, cache, queue depth, and even fullness of the drive platters. For example, if you were to take a 7200rpm HUH721212ALE600 drive and fill it to about 70% and hit it with a 4k RW ratio of 100/0, you may be able to get 70-150 IOPs and about 180-250MBps on reads. Mix up the RW ratio to 30/70 (high ingress, low egress) and things get even worse where you might drop the throughput to the floor and see as low as 250KBps, due to the drive spending a lot of time in latency spinning the platter around to where it needs to place or read data on that head position. Basically, writes are expensive, latency wise, and can very negatively impact reads. Reads are also prioritized over writes, unless we’re talking about drives with firmware to prioritize writes (ie, WD Purple drives for DVR/NVR’s).

This is some of the IOPs and throughput of the 2x2x12TB ZFS “RAID10” I run at home:

              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        19.3T  2.55T    388    246  35.3M  5.06M
tank        19.3T  2.55T    542     13  8.44M  1.16M
tank        19.3T  2.55T    557      2  9.35M   187K
tank        19.3T  2.55T    580     11  9.19M  1.08M
tank        19.3T  2.55T    567      7  11.9M   912K
tank        19.3T  2.55T    400      1  36.2M  4.80K
tank        19.3T  2.55T    371      7  35.3M   915K
tank        19.3T  2.55T    475    234  23.1M  4.53M
tank        19.3T  2.55T    537      9  9.35M   932K
tank        19.3T  2.55T    517      5  8.90M   434K
tank        19.3T  2.55T    471      0  12.3M  1.60K
tank        19.3T  2.55T    405      4  13.9M   222K
tank        19.3T  2.55T    378     12  33.7M  1.29M
tank        19.3T  2.55T    307    155  29.9M  2.95M
tank        19.3T  2.55T    523    112  12.3M  3.34M
tank        19.3T  2.55T    617      3  9.18M   136K
tank        19.3T  2.55T    331     10  15.6M  1.10M
tank        19.3T  2.55T    325      8  19.0M   830K
tank        19.3T  2.55T    378    239  29.7M  3.37M

Granted my “IOWait” is hovering around 45-64%, but that’s a ZFS scrub for you while still taking in 2.5-3.5Mbps ingress, dealing with NextCloud’s noisy mysqld, and a few other things (SMB share with indexing running, openHAB, and more nonsense). But… really wanted to a scrub before I moved from a 2x2x12TB to a 3x2x12TB array, but without a scrub the 2x2x12TB can intake up to 50+MBps with low Read pressure (read queue depth).

Also, hell of a day today: