So my 21 hour scrub did nothing to my ingress.
Egress is a bit lower than the day before.
bandwidth usage is still stable.
The “disk space used this month” is weird, it says I stored less data on the 10th than on the 9th which is basically impossible because I had 100+ GB of ingress those two days.
From my understanding it would mean that I had more than 100 GB deleted from my disk in one day. Does anyone have an idea of what causes these dips in disk space usage ?
11 July
SGC - ingress 118,39 - egress 59,78 = 5,13 ‰ of stored 11,64 TB
TheMightyGeek - ingress 117,14 - egress 11,09 = 6,41 ‰ of stored 1,73 TB
kevink - ingress 117,66 - egress 44,19 = 11,5 ‰ of stored 3,84 TB
dragonhogan - ingress 115,75 - egress 73,64 = 6,97 ‰ of stored 10,56 TB
12 july
SGC - ingress 114,58 - egress 71,56 = 6,09 ‰ of stored 11,75 TB
TheMightyGeek - ingress 113,97 - egress 11,2 = 6,15 ‰ of stored 1,82 TB
kevink - ingress 113,68 - egress 37,18 = 9,40 ‰ of stored 3,95 TB
my egress is consistently lower even at 99.5% successrate… duno what that actually means or why… but i’m going to start shutting down vm’s and let the server focus completely on the storagenode… see if that helps… sadly this internet connection is shared, but i doubt people can really affect a 400mbit/400mbit fiber connection long term… i mean they would run out of space eventually…
that should be all of it updated… weird why mine is consistently lower… even can’t beat the mighty geek…
has been running continuously, no deviations since 23hours when i had to shutdown a second scrub i started because i kicked a keyboard… xD it happens
if anyone think they can produce better numbers then they are welcome to try… i think my numbers are fairly respectable for an active pool running a storagenode…
well spinning down what i can… i hate doing this… some of these systems take like 72hours just to boot… and running correctly only after maybe a week…
zpool iostat -l 3600
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 28 1.30K 325K 1ms 3ms 1ms 830us 2us 1us 3us 2ms - 48ms
tank 16.4T 16.3T 2.06K 233 625M 5.26M 36ms 9ms 4ms 2ms 5us 3us 749us 10ms 31ms 20ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 28 1.05K 331K 885us 3ms 883us 829us 3us 1us 4us 2ms - 47ms
tank 16.4T 16.3T 6 298 457K 9.30M 37ms 14ms 32ms 2ms 173us 10us 11ms 15ms - 22ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 27 984 307K 827us 3ms 823us 743us 3us 1us 4us 2ms - 40ms
tank 16.4T 16.3T 6 287 511K 7.13M 27ms 8ms 24ms 1ms 3us 4us 5ms 8ms - 19ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 27 2.80K 301K 1ms 2ms 1ms 693us 3us 2us 141us 2ms - 39ms
tank 16.4T 16.3T 38 289 625K 6.47M 17ms 10ms 16ms 2ms 87us 5us 10ms 11ms - 22ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 27 878 4ms 1ms 4ms 411us 3us 1us - 1ms - -
rpool 63.4G 75.6G 1 27 48.1K 305K 686us 2ms 506us 694us 19us 1us 225us 2ms - 40ms
tank 16.4T 16.3T 284 491 2.63M 8.53M 17ms 12ms 13ms 2ms 986us 3us 27ms 18ms - 27ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 1 27 43.1K 303K 861us 3ms 636us 785us 84us 1us 247us 2ms - 46ms
tank 16.4T 16.3T 175 480 2.26M 8.12M 22ms 13ms 19ms 2ms 1ms 3us 12ms 20ms - 46ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 28 496 310K 904us 3ms 904us 767us 3us 1us 3us 2ms - 45ms
tank 16.5T 16.3T 11 273 425K 5.90M 49ms 19ms 46ms 4ms 3ms 3us 6ms 22ms - 36ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 28 972 311K 857us 3ms 857us 767us 3us 1us 3us 2ms - 46ms
tank 16.5T 16.2T 4 323 287K 6.27M 24ms 8ms 24ms 2ms 34us 3us 205us 9ms - 53ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 27 5.18K 303K 667us 3ms 602us 779us 2us 1us 93us 2ms - 41ms
tank 16.5T 16.2T 3 294 286K 5.90M 27ms 8ms 27ms 2ms 3us 2us 163us 8ms - 27ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 26 1.39K 299K 973us 2ms 970us 670us 3us 1us 3us 2ms - 43ms
tank 16.5T 16.2T 6 348 330K 6.73M 34ms 15ms 32ms 3ms 120us 3us 6ms 19ms - 50ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 25 309 286K 841us 3ms 824us 727us 3us 1us 4us 2ms - 45ms
tank 16.5T 16.2T 2 288 269K 5.84M 40ms 12ms 39ms 2ms 6us 3us 3ms 14ms - 33ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 25 139 294K 2ms 3ms 2ms 837us 5us 1us 4us 3ms - 44ms
tank 16.5T 16.2T 2 280 304K 5.48M 32ms 9ms 32ms 2ms 3us 2us 142us 9ms - 32ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 25 180 292K 900us 3ms 900us 776us 3us 1us 3us 3ms - 43ms
tank 16.5T 16.2T 8 274 439K 5.52M 38ms 14ms 27ms 3ms 1ms 3us 35ms 15ms - 36ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 26 17.6K 310K 3ms 4ms 1ms 772us 2ms 1us 1ms 3ms - 43ms
tank 16.5T 16.2T 4 287 362K 5.84M 37ms 11ms 30ms 2ms 2ms 3us 14ms 11ms - 23ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 25 2.02K 295K 561us 3ms 371us 778us 2us 1us 822us 3ms - 39ms
tank 16.5T 16.2T 2 270 335K 5.81M 39ms 14ms 39ms 3ms 13us 3us 18us 15ms - 35ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 24 136 280K 1ms 3ms 1ms 748us 3us 1us 4us 2ms - 42ms
tank 16.5T 16.2T 4 272 345K 5.99M 28ms 10ms 25ms 2ms 432us 3us 7ms 11ms - 46ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 25 6.12K 295K 966us 3ms 589us 778us 2us 1us 733us 2ms - 42ms
tank 16.5T 16.2T 3 290 372K 6.51M 33ms 9ms 33ms 2ms 3us 4us 169us 9ms - 36ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 25 116 287K 983us 2ms 983us 725us 3us 1us 4us 2ms - 42ms
tank 16.5T 16.2T 7 291 278K 6.18M 19ms 7ms 17ms 1ms 129us 2us 6ms 7ms - 27ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 25 125 286K 1ms 3ms 1ms 758us 3us 1us 4us 2ms - 43ms
tank 16.5T 16.2T 9 284 322K 5.96M 21ms 14ms 19ms 2ms 58us 3us 10ms 15ms - 50ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 24 131 283K 589us 3ms 589us 728us 3us 1us 3us 2ms - 41ms
tank 16.5T 16.2T 2 280 279K 5.96M 32ms 10ms 32ms 2ms 3us 3us 23us 10ms - 39ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 24 362 275K 419us 3ms 419us 734us 3us 1us 3us 2ms - 43ms
tank 16.5T 16.2T 3 287 310K 6.14M 33ms 8ms 29ms 2ms 9us 3us 9ms 8ms - 39ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.4G 75.6G 0 27 32.0K 337K 942us 3ms 669us 825us 15us 1us 2ms 3ms - 52ms
tank 16.5T 16.2T 7 282 311K 6.21M 16ms 8ms 16ms 2ms 129us 3us 457us 9ms - 38ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 24 2.15K 282K 2ms 3ms 874us 778us 2us 1us 3ms 2ms - 40ms
tank 16.6T 16.1T 2 244 287K 5.77M 64ms 22ms 63ms 4ms 88us 3us 2ms 25ms - 40ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
opool 2.73T 2.72T 0 0 0 0 - - - - - - - - - -
rpool 63.5G 75.5G 0 24 1021 282K 555us 3ms 550us 793us 4us 1us 15us 2ms - 43ms
tank 16.6T 16.1T 2 283 263K 5.84M 24ms 7ms 24ms 1ms 60us 3us 392us 7ms - 29ms
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
@kevink and yeah 60 egress is nice… but still doesn’t change the ratio… i’m like almost always 20% if not even more behind even the closest
and yeah 20% is putting it mildly more like 50-100% in many cases against many nodes… i just ran through the summery of our data… i’ve only beaten 1 node in the 9 days we’ve been testing… on july 7th i won over striker43 but a slight margin… else i’ve been consistently the lowest every damn day… while you seem to be among the highest performers… i also think that when we started, my vm’s was down… so yeah doubt this will help…
maybe it’s a node age… node size thing… also think i’m the node that has the most stored data… so maybe thats why… which would suck…
looks like that could be accurate… in most cases one can almost sort the nodes by egress to stored ratio and then they will also be sorted by stored… ofc the egress to stored ratio is derived from half of that… so maybe thats why… still seems kinda interesting that the more data one gets less egress one will also get… might be interesting to see if that also correlates through having multiple nodes on a subnet…
anyways more questions than answers atm
One question - what RPMs do your drives have ? are they 5400 or 7200 RPMs?
Can anyone post their drives RPMs? maybe this can shed some light?
I have 1x 6TB HDD 5400 rpms and and my nodes are like few days old my egress is also ~1%… or maybe someone else on your shared network is spinning new storage node that is impacting your “vetting process” ?
see my egress before the “offline” annotation (till that day I had only one node, the day prior to offline I was rsyncing to different filesystem and it probably caused drive saturation - hence lower egress)
Here are my stats for couple of days (didn’t have time to contribute to the thread but finally catching up)
node 1
node 2 (spinned off because had some audits failures caused by node migration gone bad)
3x8TB 5400RPM in raidz1. So not the fastest either.
my pool is consists of
3x raidz1 of 3 drive each… they are all 7200rpm and enterprise sata
i got a 750gb L2arc with 600gb allocated, and a dedicated slog/os ssd and 48gb ram of which 23 is usually assigned to the ARC.
the 3x raidz1 is to give me the same iops and thus decreases load on the pool, so that the drives have better durability and better latency…
i would like to see anyone show better latency besides my successrate on downloads i get assigned is 99.4-99.6% thats a 0.4% loss thats like at the limits of what the tcp/ip protocol can do… i believe it’s avg failed connections is around that mark…
yeah you cannot really use the numbers you get early on… very unreliable… this is doubtful its a hardware issue…
tho maybe i should test running sync standard for a while…
i seem to remember that having an effect on egress… atleast in the proxmox graph but i’ve since found out the proxmox graph is unreliable at gauging sustained numbers…
@kevink yeah that kind shows what it isn’t lol
i basically got 33% less seek time and 3 times the max iops and 3 times the max throughput
and still you have me beat by a long shot… sync always tho…
i use darkstat set to monitor only docker ports - it has nice features though not in development from few years now…
darkstat -i interface -p HTTP interface -f “port 28967 or port 28968” --user USRDIR to store database --chroot ~/darkstat/ --import DB name --export DB name
This is how it looks when you http://localhost: to it
Another hipothesis could be - someone else in your network is setting up STORj node - look at my stats near offline annotation
Before I had even 2.5%
After that day it dropped to ~1% that’s when I started second node
Easy check - how many port forwards does your router have
or Are you the only person who has access to router config
egress is independent of how many nodes are behind a single ip and only depends on the data stored.
you stats look fine… tho i just goes to show how much the data stored seems to affect it… 2.55% is your best in 12 days…
but lets remove that because it’s an extreme… so your best is about 1.12% which isn’t to far from kevinks 1.55 or whatever it was… also actually around the same time also…
so most likely the same download
going to try and set my sync = standard… but if this improves my numbers then it would be really weird…
from what i can tell it only improves my read latency which is what should be relevant for downloads.
if there was more nodes my Ingress would deviate by the ratio of other nodes
could just be random chance still… node age or maybe location…
from what people keep trying to hammer into my head is that node setup doesn’t matter aside from if you having failing successrates… which seems perfectly reasonable…
the requests comes from the outside… either one can process them or not… one cannot get more or less of them…
that would be weird because ther is no writing involved in egress. And since your latency is already a lot better than mine, that can’t be it.
Tuning your setup will do nothing if you just don’t get as many download requests as others. Those won’t go up magically.
the location idea is pretty good tho… i’m in a different small country… maybe for whatever reason that could have an effect… duno why it would… but it’s a unique factor of my node… aside from that i got more data stored…
and then ill keep testing my uptime long shot…
yeah you are right about sync=standard or should be… also think its a waste of time to turn off all my vm’s…
but difficult to argue with that there must be a cause for my node being consistently the lowest… i’m sure we will figure it out over time… just really only started to notice that trend now…
could simply be something related to what data blocks each node has … and i’m just in one of the blocks that is on low egress test data right now… and then after a while it switch over… the test data satelites / test customer servers or whatever it is… will have to have limit bandwidth and have to switch between clusters of data over time… that could also be what we are seeing…
crunched the previous numbers and added DragonHogan
here is some difference in Egress in kB/s per TB stored
The formulas are:
Egress - summed all egress (nodes, usage,repair) in [GB]
Stored - just stored data in [GB]
egress % - egress / Stored
egress [kB/s] = Egress /24/3600 x 1000 x 1000 [kB/s]
egress [kB/s per TB stored] = egress [kB/s] / Stored [GB] / 1000
Kevin’s and my nodes egress per TB stored has the highest value.
But also Kevin has more Stored than TheMightyGeek…
Can you state What continents are you? and what is your main satellite?
I am eastern EU and main satelite is north-europe (like 95% of usage is from there)
kevink is germany so location wise you are closer than i am… it could be a location thing.
also an important thing when considering that… is how the world map of internet connections are…
there are certain bandwidth limitations between nations and other such geographic clusters that can greatly restrict traffic and thus make latency or network communication avoid certain routes if possible…
with my connection i can see how my bandwidth goes down across the atlantic… which makes good sense
located near Geneva in Switzerland and the bulk of my data is from europe-north as well
@SGC is your main satelite europe-north?
So… wondering why your main satelite is not saltlake or us-central ?
Arent all satellites having their own testing data/processes ? (i am new so i might be missing something)
Shouldn’t satellite choose your for the lowest ping? (unless there is not that much data from other satellites)
Though Mighty Greek is in europe and I would say that he’s closer to europe-west than europe-north… (unless he isn’t )
That’s not really how it works. All satellites work with all nodes. That way data is redundantly spread around the world. Transfers are overprovisioned though, so more transfers are started than are actually needed and when enough are done, the rest are canceled. Those cancelations likely happen a bit more often for more distant nodes. But in general everyone should see about the same traffic. Recently upload traffic on europe north has simply been larger than on other satellites. Before that saltlake got a lot of data. So it differs from time to time, but the patterns tend to be the same or similar to everyone.
i get the most bandwidth usage from europa north sat and saltlake after that… if thats what you mean…
the satellites pick you for ingress, test data is mainly for some satellites, tho i’m sure there are some kinds of irrelevant test data in the other satellites networks as well…
egress … well you can only really get that if you are among the ones that got the ingress in the first place… i will assume it tries to reach everybody on the network… but that would also create a ton of unwanted traffic on the internet… i duno the details of how it works…
just think of egress as more random… if you have high successrates on downloads aka egress then you cannot really get much better… atleast in theory…
Like what Brightsilence said, all satellites work with all nodes. I’m in US, and highest amount of traffic this month from a single satellite is from Europe-North, and second is SaltLake, like I’m sure most nodes are seeing.