Bandwidth utilization comparison thread

TheMightyGreek · July 31, 2020, 3:12pm

I think it’s a rough estimate, we’ll know more when V1.9.5 rolls out on docker as it will show the amount of data deleted per day (from my understanding).

SGC · July 31, 2020, 3:14pm

well we all got direct access to the data… it’s literally on our own systems pretty easy to track… in fact maybe i do…

hold my beer while i check my logs…

brilliant… apperently it tries to log, but nothing is logged… just 5 weeks of empty files…

TheMightyGreek · July 31, 2020, 4:15pm

we have the number of deletes in the logs but does it log the size of the piece that’s been deleted ?
Would be interesting to throw together some code to analyze the logs. Might look into that once I’m done with my exams.

SGC · July 31, 2020, 4:32pm

well i tried to setup a script to make zfs export it’s stats on the different stuff…
if it had worked correctly i would have the total size of the dataset containing the blobs folder and thus i could calculate the deletes from day to day…

but yeah … when the deleted this month graph comes one the dash… it will be a much more complete picture so we can work on numbers we know is 100% real

BrightSilence · July 31, 2020, 6:29pm

It’s both very easy and very hard. The easy method is just looking at all ingress since the start of your node (or the last network wipe) and then subtract the current stored amount. Divide it by the number of months. Then divide by average amount stored. There will be a mismatch though because incomplete uploads still count as ingress. It’s an approximation based on my own nodes and that calculation method.

It’s near impossible right now to do better than that because not only are deletes not graphed. There is also no deletion history in the storagenode db’s. Hopefully that will be part of an upcoming update.

SGC · July 31, 2020, 6:43pm

well in theory can’t one just take the trash number…
i mean sure it’s a week before stuff is deleted from trash or whatever…
and i guess not everything goes through trash… but the stuff we want to worry about do, afaik…

so we look at the trash number over time, and each day stuff will get into trash while other stuff will move out of trash…

or should… then our trash avg over time like say 1 month would basically only be 1 day off in accuracy… or something like that… which should be irrelevant almost since its out of 30 days…

ofc it’s nowhere near the granularity to actually see 2½ % … but should still be kinda accurate… atleast thats what i think… but have been to lazy to actually try it… also since my trash barely moves i take it that it most likely doesn’t delete alot…

oh and to make it more accurate we just used the trash number from the first day as our zero on the number line

but pointless work most likely now… since we can just wait for the graph

BrightSilence · July 31, 2020, 6:46pm

Trash has nothing to do with deletes. Deletes don’t end up in trash. Nor do incomplete uploads end up in trash.

SGC · July 31, 2020, 6:47pm

yeah i think people keep telling me that…
so it’s just for zombies?

BrightSilence · July 31, 2020, 6:50pm

It’s for data that wasn’t caught in any normal processes, so zombie segments, but also deletes that happened while your node was online. It’s just the backup in case the normal way left some garbage around. It won’t really tell us anything about normal deletes.

SGC · July 31, 2020, 6:57pm

sounds familiar… i keep thinking deletes goes in the trash… microsoft brainwash xD

i suppose since most of the pieces are that 2mb size atleast like 70% from the last data i saw and then some bigger and smaller… but i bet that kinda evens out… so one other way of tracking deletions is just to count them and multiple by 2 and say thats megabytes… ofc could be off by 30-40% in either direction… or even more… but that would be my gut shot…

SGC · July 31, 2020, 7:53pm

since i cannot sort the file sizes then i can ignore it completely and calculate around it…

if i run my successrate it looks like this…

========== AUDIT ==============
Critically failed:     0
Critical Fail Rate:    0.000%
Recoverable failed:    0
Recoverable Fail Rate: 0.000%
Successful:            48508
Success Rate:          100.000%
========== DOWNLOAD ===========
Failed:                17
Fail Rate:             0.002%
Canceled:              14659
Cancel Rate:           1.733%
Successful:            831110
Success Rate:          98.265%
========== UPLOAD =============
Rejected:              0
Acceptance Rate:       100.000%
---------- accepted -----------
Failed:                63
Fail Rate:             0.003%
Canceled:              945628
Cancel Rate:           45.636%
Successful:            1126418
Success Rate:          54.361%
========== REPAIR DOWNLOAD ====
Failed:                0
Fail Rate:             0.000%
Canceled:              0
Cancel Rate:           0.000%
Successful:            191702
Success Rate:          100.000%
========== REPAIR UPLOAD ======
Failed:                3
Fail Rate:             0.001%
Canceled:              109195
Cancel Rate:           44.908%
Successful:            133955
Success Rate:          55.091%
========== DELETE =============
Failed:                0
Fail Rate:             0.000%
Successful:            287795
Success Rate:          100.000%

then i add the successful and cancelled uploads, because i expect i’m at 98-99% like most of the windows hosts seems to have in v 1.9.5

that gives me 2.072.046 files uploaded, and then my deletions are 287795 files

which gives me a % of 13.8 or something deleted of all the files uploaded since the last update…
duno if thats right… but it’s another number xD

SGC · August 1, 2020, 9:54am

so with 2 tb ingress a month, and a deletion of 14% that would give a node max size of 14TB

ofc now we have had an avg of 2.5TB so the node max since wipe… if the 14% is accurate to any meaningful degree which it isn’t because it’s only my node since last patch… because that was easiest to check…

14% is 1/7th of 100% so 2.5tb would be x7 and then 17.5 TB max node size currently…

which is actually surprisingly close to what you said the largest nodes are at…
might hit that this method of determining deletion ratio is semi valid…

ofc it’s not perfectly accurate… but i doubt it’s to far off if done across multiple nodes and averages are taken… also it’s fairly easy to run…

doesn’t need a high deletion rate to really kill the max node size pr IP/24 tho…

BrightSilence · August 1, 2020, 10:55am

That 14% is a completely different calculation. Forget about the number of operations vs size. You’re suddenly calculating deletes per ingress instead of deletes per amount stored. The rest of your post makes no sense as it builds on this mistake.

The largest nodes are around 16TB right now because there hasn’t been more ingress to fill them up more, not because they are done filling up. These nodes are still growing quickly on average.

Btw, if you want to use the number of deletes instead of the total size of deletes, you should be dividing it by the total number of pieces stored on your node. Which could be a simple file count in the blobs folder. If you’re going to use an alternative method, you should do it right.

SGC · August 1, 2020, 11:20am

i didn’t use a size factor because i figured the files will have an avg of the same size… after all there was millions…

i just calculated the ratio on uploads vs the deleted over a certain period… which came out to 14%

doesn’t need to include a size factor in that… yeah it might not be accurate…
but it was the only sensible way i could come up with for checking it…

and then i simply took those 14% and used that as the 2tb avg deleted (which you promote) and then multiply by 7 because 14x7 = 98 … almost 100% close enough… so 7x2tb is 14tb max before deletions would exceed the ingress… over the last some 20 day period…

i suppose there maybe longer deviations in the test data that causes many deletions over some periods and not others… and thus a 20 day period might provide an inaccurate number…

ill expand it… i got the data anyways… not much tho…only like 5 weeks of full log data… kinda trash them last time i reinstalled the computer or migrated the storagenode to this pool

pretty sure this should be a semi viable way of getting a ratio of deletions

Uploaded Pieces / Deleted Pieces = ratio of deleted to uploaded that time period of the logs…

and then one simply needs to know how much is like say uploaded because we got accurate numbers of that…

GREAT and now ofc because it’s the 1st the dashboard is basically have no useful statistics…

and not even the f’ing screenshots on this thread can tell me… sigh

i really need to setup some proper monitoring of this stuff…

ill just grab my total then 13 tb in 5 months… so 2.6 TB ingress a month… and then the max would be a multiple of 7 of that… so like 18TB max

only issue is that the 14% number is from one node and only from a 20 day period…

BrightSilence · August 1, 2020, 11:54am

Please don’t put words in my mouth. My calculations say 5% of what you have stored is deleted on average per month. You calculated deletes as a percentage of INGRESS instead of STORAGE. Which is nowhere near the same thing. But sure… assuming you had around 2.5TB ingress this month as well (including repair). That would be 14% of 2.5TB=about 350GB worth of deletes. Compare that to how much you have stored and you have a better estimate of the % that is deleted. I’m guessing that’s actually quite a bit lower than 5%.

SGC · August 1, 2020, 1:14pm

well i had 280k pieces deleted in the entire log… and i think thats like 20-30 days worth of log…

so if i assume 2mb a piece avg then thats 560gb deleted and that would then be 1/7th of ingress… so yeah not super accurate

but yeah you are right… the ingress is only recent and the deletes are from the entire storagenode… oh that makes sense then… this is the ratio of when it equalizes if ingress = delete then it obviously won’t grow… so in my logs i’m still gaining at 86% of ingress

sorry got turned about somewhere doing the number lol…

i guess this ratio isn’t that useful then… when its tricky to convert… ofc if one has the exact ingress for the log period, then it should atleast in theory give highly accurate numbers given enough data points…

but yeah you are right… it’s not the same number… but giving a couple of other variables it should be able to be used to calculate the same number as you calculated… in the other way…

it is kinda of an interesting ratio tho… like a scale of how fast the storagenode is growning at present…

storagenode speedometer xD

and the closer it gets to 1 / 1 the slower it will be growing…

Mark · August 2, 2020, 12:03am

dragonhogan · August 2, 2020, 3:07am

01Aug2020:
Node 1:

Node 2:

kalloritis · August 2, 2020, 3:14am

2020-08-01

Fiber node:
TBA

Coax node (post migration):

kinda high ingress for repair, compared to last month- wonder if something’s up there

SGC · August 2, 2020, 4:32am