Trash comparison thread

BrightSilence · July 23, 2020, 10:05am

Is that design public, I don’t see it here? storj/docs/blueprints at main · storj/storj · GitHub

stefanbenten · July 23, 2020, 1:14pm

It has not been shared yet, we are still in the phase of interal discussion around it.
As soon as the priority of it is determined, it will be published.

JL421 · July 23, 2020, 2:55pm

11 month old node

3 month old node

Speedtest

Sasha · July 24, 2020, 11:52am

mrkeyboardcommando · July 24, 2020, 11:52am

afbeelding

mrkeyboardcommando · July 26, 2020, 1:30pm

afbeelding

SGC · July 26, 2020, 1:33pm

from what i have been able to tell trash doesn’t seem to change to much…
granted i’ve had from 2gb to 15gb… but still it seems very much like it is somehow connected to the node in some way…

oh yeah and just had to have the ridiculously absurd Disk Space Used spike…
from 200 to 400tb*h in a day…
only 8.3tb more one day than the other… lol with no ingress

seems legit…

littleskunk · July 26, 2020, 4:14pm

That is correct and kind of expected.

Now you apply a math that is not correct. 400TBh are just 400TBh. Who gives you the guarantee that it reflects 24 hours? That part of your calculation is not correct.

It still isn’t. Stefan explained already where this trash is coming from. We haven’t executed the zombie segment reaper on 2 satellites for a long time. These zombie segments got finally cleaned up and that was what you have seen on your node. Other nodes like one of mine have been full or didn’t get many zombie segments for other reasons. The only problem here is that these 2 satellites have paid storage nodes for zombie segments. They could have cleaned it up earlier and cut down the costs.

Edit: One addition to the TBh calculation. Lets say a satellite needs 23 hours for one tally execution. Tally is the job that calculates the used space. On the storage node side you get one datapoint every 24 hours. Most of the time that shouldn’t be a problem but every 24 days there is one day where the previous run finishes before 1 am and the second execution finishes on the same day. The TBh result from 46 hours will show up as 24 hours on the storage node side. The additional 22 hours are technically missing in the first 23 days. → For accurate calculation on the storage node side you have to take the average over 2 or better 3 days. In most cases except my extrem example this should work.

SGC · July 26, 2020, 5:38pm

i really still cannot say anything real in regards to the trash thing… i don’t understand it… i can just state it as what i see, which is some nodes seems affected by high trash to stored ratio’s while others aren’t… i’m sure you guys are working hard to fix stuff like that…

and in regard to the TBh graph of the storage space used this month…
i don’t dispute the numbers total, just it’s make the graph look ridiculous… and also makes it basically useless for any practical purpose for SNO’s

i think i speak for everybody when i say, if the fix is changing a variable of how often graph updates or whatever, then please get some intern to go change that…

all we want is a graph thats fairly stable…

you looked at it from the satellite perspective, now lets look at it from the storagenode perspective… a storagenode gets between 10-300gb ingress a day, and might get some large deletions from time to time, but how often is that beyond 1tb in either ingress or egress… presently i’m betting near zero…
so on a 5 tb node it would in the most extreme cases take 5-15 days to fill and about the same to empty…
so yeah just take the avg of x number of days… i would say 5 to a week might be fine… and plot that on the graph… that would instantly make it stable as the firmament…

but really what i personally would do, would be to make it a variable in a sort of configuration somewhere… just like all the other graphs should be… might not be something SNO’s need access to, but it should be how its done… then it’s much easier to adjust it later…
but then again it might be easy to fix, but it can be difficult enough if one doesn’t have the time…

ofc even at 5 days… then the spike i saw recently would still be 1/8th or so of the avg…

littleskunk · July 26, 2020, 5:53pm

I don’t speak for everybody. I can only speak for myself. I am fully aware where the data for the graph is comming from and how accurate it is. I don’t care how stable the graph is. In fact I know how I could get accurate results out of it. I believe I could even change the code to make that happen. I just don’t care because it burns time that I could spend on finding ways to optimize my node and increase my payout. So long I will just live with the given graph and work on any bug or optimization that has an impact on my payout.

I could try to explain it.

If you don’t understand it why do you claim that? At that point, I don’t think I want to explain it. I have the feeling you only want to blame someone.

If you need more information about zombie segments I am happy to explain it. I don’t see the point of uploading more and more screenshots that only prove that the root cause is indeed zombie segments especially on 2 satellites that are exactly the once that didn’t execute the zombie segment reaper for a long time.

SGC · July 26, 2020, 5:57pm

yeah i’m just looking for someone to blame for some node, like my own being favored over others…

and i really don’t think this is claiming to know how anything works…

BrightSilence · July 26, 2020, 9:52pm

The only open question I would still have is this. My node has been online and and with available space since march 2019. It should have received as many zombie segments as any other node. Yet it doesn’t have more than 12GB. Even if you add the trash on my 2 smaller nodes (which I probably should add) it’s less than 18GB. There are people reporting 10-20x that amount. That’s a difference I can’t explain based on the information provided. Can you? Have I overlooked something?

littleskunk · July 26, 2020, 10:50pm

There are several reasons why zombie segments are not evenly distributed. For example I could create zombie segments myself and it would affect mostly storage nodes that are close to me. Or we play the same game the other way around. I try not to create zombie segments but from time to time an upload might fail. This time it would affect mostly storage nodes that are far away from me. There are several ways how zombie segments can happen and none of them will be evenly distributed. Some storage nodes will collect more zombie segments than others.

mrkeyboardcommando · July 26, 2020, 11:05pm

Clearly written, so that even me can understand this!
Thanks!

Is it possible that a Node’s hardware (say… USB3 disks) have something to do with getting more zombie segments? As the disks are slower to respond and the node might take longer to store the segment. then the uploader might cancel the slowest nodes, then creating zombie segments on my node.

littleskunk · July 26, 2020, 11:11pm

Yes that is possible but it wouldn’t change much. As explained there are several ways to produce zombie segment. You could try to avoid one case but that doesn’t mean this case is going to happen again. Next time it might be another combination and you get zombie segments again. You might even get more zombie segments as with the old setup.

Long tail cancelation doesn’t. For a zombie segment you have to win the race, the uplink has to tell the satellite that you have stored the piece and for some reason it gets a zombie segment on the satellite later.

mrkeyboardcommando · July 26, 2020, 11:49pm

ah ok, Thanks for explaining.

I might ask stupid questions sometimes.
Trying to learn as i go… which goes slow

litori · July 28, 2020, 6:21pm

I don’t think my node will ever fill. Was at 1.65TB a week ago, today it is 1.54TB. Regression.

SGC · July 28, 2020, 7:19pm

TheMightyGreek · July 29, 2020, 2:20pm

We’re in a low ingress period because storj stopped doing tests, it’s basically just customer traffic. From what I understand testing will resume soon (please correct me if I’m wrong).

SGC · July 30, 2020, 8:00am