Updates on Test Data

agente · June 6, 2024, 6:16pm

How many internet 10gbit lines you need to keep 1PB+ of data with new pattern? Do the math

flwstern · June 6, 2024, 6:19pm

Luckily for me im a network engineer so thats the least of the problems, but this data is artifical so upgrading to 100 gigs is out of questions at the moment. But need to redesign the storage. I knew none of storage when i first started.

Roxor · June 6, 2024, 6:20pm

I saw when you posted hitting 2Gbps ingress earlier. With the size of your setup… and kinda averaging/estimating what traffic could be… do you have a feel for how fast you may have to expand? One new 20TB HDD per month? Two?

I dream of having the first-world-problem of having to buy a new HDD for Storj every month

flwstern · June 6, 2024, 6:22pm

I avged 4-5 gbits mostly cause my nodes are capped at 50mbit. Also my sas controllers can only do 6gb/s and spinning disk is not known for being fast ye?

Mitsos · June 6, 2024, 6:30pm

I’m with @Roxor on this one: I’m sitting on the side until my nodes get filled. After they get filled, I’ll start adding drives at a rate I’m financially comfortable with. When those get filled, I’ll add a 2nd internet line.

Rinse, repeat, popcorn.

Ambifacient · June 6, 2024, 6:36pm

Which other projects are comparable?

I’m aware of Chia and Filecoin but Chia is like $0.12 per TB and you need pretty beefy hardware to plot, and Filecoin has high hardware requirements and investment.

Knowledge · June 6, 2024, 6:41pm

Everyone is a little stressed out. Let’s take a step back and evaluate things. Storj isn’t asking anyone to buy anything right now. We did ask that if you had extra capacity that you could bring online temporarily, that would be helpful to know. Otherwise, continue working with what you have.

If some.of the customers sign contracts and begin heavy use of Storj services, we will discuss what a base node configuration looks like and provide information at that time. There is ongoing discussion both here and internally on what that should look like. I think everyone would agree that with increased usage, system requirements will change, but we don’t want to put the cart before the horse. Sometimes sales deals get very close and then one reason or another they fall through. So, let’s maintain what we have, but obviously work on adjusting crashing nodes and unworkable configurations now while testing is happening.

We will continue to relay needed information as milestones are met.

littleskunk · June 6, 2024, 6:44pm

Nothing has changed. Good old protato will still work. No enterprise gear needed. In fact I am far away from enterprise gear myself. The only device I might call enterprise is a netgate 3100 router.
Everything else in my setup is the cheapest hardware you can think of. And still it works. I am not sure if my router will be able to deal with 500MBit/s but if we really get to that I would be looking at 240$ payout. So not a big deal.

I am trying to explain to you that this artifical load is simulating some existing usecases. So the moment these deals are getting signed this artifical load will turn into real load. There are some aspects that our atificial load doesn’t match but it should be close enough.

First of all HDDs don’t age that way.
Second what extreme load are you talking about? The highest I have seen was 500MBit/s and if that goes to a single disk it will be full in a short time anyway. Just by the numbers there is no extreme load for the disk.

We don’t have control over that. If these customers wants to upload with a TTL than a TTL it is. You can either adopt to it or argue with me about something that I can’t change anyway.

So you are saying I should say our sales team to not sign any deals? I see the bipolar disorder on your side. (just to give back the compliment)

You don’t have to trust me at all. Please do not trust me. I am claiming that with all the code changes we have 6 times more throughput. Don’t trust me on that. You can verify that on your end. I am claiming that these customers will upload with a TTL. Don’t trust me on that. Verify it on your end. Depending on how skeptical you are you can wait for these customers to sign deals and start uploading. If my claims are correct you will see a similar upload pattern. Don’t trust me on that. You can verify that on your end.

Still don’t know what my attitude has to do with that. What is that going to change?

flwstern · June 6, 2024, 6:46pm

Thanks for information @Knowledge

agente · June 6, 2024, 6:57pm

no one questions this. you and your team have literally revolutionized the performance of every single node. a step forward never seen in 5 years since the launch of my first node.

BrightSilence · June 6, 2024, 7:15pm

Consider the alternative. Would you really prefer only PR cleansed communications? Yeah, the way communications happen on this forum are fairly unique, because most companies would stick to PR communications reviewed by 5 departments and stripped of any substantial information.
But look at all the information provided to us in this topic. We get a heads up about testing, explanations about testing patterns and how they resemble prospective customers patterns. Info about changes made for different tests and the results of those tests. This is a “careful what you wish for” situation. If people here are going to demand PR like statements, do you think we would know what we know at this point? Do you think it will make your setup planning any easier?

I for one highly appreciate the way @littleskunk has communicated despite the hours he’s clearly been putting in. He’s taking time to keep us well informed. And I will gladly take the personalities as they come. Even if it’s a little blunt at times.

Let’s not do this though. I know you’re responding “in kind”, but there’s a big difference between saying it about a company or saying it about a person. Even though neither have any place on a public forum, there was no need to escalate on this.

littleskunk · June 6, 2024, 7:21pm

That is still happening in the background. The uploads we are simulating on SLC are on top of what ever you get from the other satellites. It is not a replacement. What is going to change is that this additional load is one day shifting from SLC to US1 but it would still be on top of what ever US1 is already uploading without a TTL.

If you have some kind of traffic constrain you could limit your node size befor hitting your traffic limit. I understand that you would like to use most of that traffic for data without TTL but if I look at the upload rates this is kind of useless. Any idea I can come up with will at least reduce your grow rate on US1 to 0 so you can as well limit the size of your entire node and end up with almost the same outcome.

If you are able to the best grow rate you can get is with a high success rate. So you could say the TTL uploads work like a magnet. The more TTL data you get later from US1 the more non TTL data you also get. With our current choiceof2 factor you can reach a maximum of 2 times higher grow rate of non TTL data. Next deployment we might increase that to choiceof4 which would give you a 4 times higher grow rate of non TTL data. So there are some advantages accepting the TTL data.

I need to be a bit careful with this. To my knowledge it will be a constant flow of data. What ever the TTL is it will get replaced by another upload with a similar TTL. There is also not a single TTL for all the data. There will be some data with a shorter TTL and some data with a longer TTL. Our current uploads should match the TTL that we expect to be most dominant.

The flow of data will vary over the day. In the past few days we tested the peak load that we need to maintain for a few hours per day. There will be hours with a much lower upload rate.

I think that is all I can say for now. Your own node should give you some of the answers you are looking for.

Vadim · June 6, 2024, 7:25pm

Thank you, I understand and appreciate your efforts.

Mitsos · June 6, 2024, 7:27pm

What I think that says (and perhaps not everyone is understanding this) is that on the 1st of the month some data will be uploaded (let’s say with a 30 day TTL). This data will be deleted on the 1st of the next month (let’s go with that). On the 2nd day of the month, some more data will be uploaded, which will be deleted on the 2nd of the next month, and so on.

I get the feeling that some people think that the data will be a huge burst of data on the 1st of the month, then quiet down, data stays for a month and this cycle repeats.

Assuming again (I’m not in @littleskunk 's mind) that there will be indeed a constant “rotation of data”, it doesn’t matter if the data is TTL or not. Even if the TTL is less than a month, you still get paid proportionally anyways.

Vadim · June 6, 2024, 7:32pm

I think TTL is even better, as it will be deleted without sitting in trash for 7 days, so space will free up almost instantly and you are ready to get more data again.
today it is much worse, client delete then we wait several days for bloom filter then 7 day totally you get 10-14 days not paid for data and cant get paid data.

arrogantrabbit · June 6, 2024, 7:43pm

I vote for “cabbage nodes”. You know, layers…

arrogantrabbit · June 6, 2024, 7:46pm

lol. Or lettuce — as in All fluff no substance

littleskunk · June 6, 2024, 7:46pm

Yes I fully agree. It took a long time to get there. Why wasn’t it done earlier? The answer is because there was no need for it. Well to some extend yes but the current improvements go much further than we would have implemented lets say a year earlier.

Something we also learned in the last few tests is that there is a difference between maximum throughout and fast uploads. We can offer both at the same time just not for the same upload. So in the furture there might be one customer that needs fast uploads but limited throughput. Bestofn will dial in on the fastest nodes and ignore almost all slow nodes. Good for performance but only so long you don’t fill the fast nodes to the last byte. So for maximum throughput it is important to stretch out the resources of the fast nodes and try to max out the resources of slow nodes. Ideally let all nodes participate. Choiceofn does that.

Also the bitshift success tracker has some incredible thought process in it. The first percent success tracker would be better on finding and remembering the fast nodes while the bitshift tracker is kind of dump and forgets about the performance of the nodes real fast. It works more like TCP congestion control. Oh there is a new node. Let me try to select it a few times. Ups it missed an upload. Let me dial down a bit. I can’t remember what this node can handle how about I scale up the request rate. Oh there was another miss I will dial down again. This dump behavior turns out to be incredible effective and reacts quickly to changes. If I start a netflix stream the bitshift success tracker will notice it and dial down quickly. If I stop the netflix stream it will intensionally forget about my history and increase the upload rate based on the resources I have available in that moment.

So my conclusion is only thanks to the current usecases we are able to make these improvements. We are talking about a better node selection since a few months now but all we would have done would have been a choiceof2 and call it a day. The current usecases are forcing us to question if that is enough and so we discover all the other improvements that we wouldn’t have found otherwise. It is the right time for these improvements.

Oh and the next challenge is waiting for us. Watch the execution times of the file walker. My math is telling me that it might escalate in the next weeks. We talked about possible mitigation but it is hard to tell which solution works best without seeing the problem. So for now we can only wait until the file walker gets too expensive to run and needs improvements.

Vadim · June 6, 2024, 7:55pm

one more question, we see insane speed on our end, but what speed did you managed to get on upload side? max and average?

littleskunk · June 6, 2024, 8:02pm

Correct. The TTL has an impact on how much bandwidth you need per $ payout. So with my 250MBit/s I an take an inflow of 81TB but if that data has a TTL of halve a month I will end up with just 40.5 TB on disk and that payout while still consuming 81TB of bandwidth. (current dominant TTL you can get from your node)

There is an advantage. I would get this 40.5TB (or 81TB with a TTL of a month) in the first month after vetting. I don’t have to wait. It is a short feedback loop and mistakes like getting disqualified are getting less painful.

Downside is that I would get these 40.5 / 81 TB only as long as the inflow keeps the same. If these customers run away the fun is over quickly. If you decide to add more nodes that will also be subject to the shorter feedback loop and reduce my grow rate.