Updates on Test Data

I am sorry I can’t tell that. I am internally asking that somebody writes a blog post and I don’t want to destroy that effort by leaking something. If the deals are getting signed and there is no blog post I will answer that question. So please ask again later.

8 Likes

TLDR all replys, but I believe we all got used to a specific pattern for the last 3-4 years, and now, the sudden big inflow of data took us by surprise and we’re not sure what to do.
And this test inflow creates more uncertenty because it’s not a guarantee of anything. Just simulates a possible future. Yes, this future sounds verry promising, and I believe there will be room for every level of nodes, low performance, medium and high, but is just a possibility. I won’t give advices to anyone what should do. I can only tell you what I will do.
I won’t do anything out of ordinary; I already added new drives in january, because the first ones were full, and I won’t add any until these ones are filed. So pretty much the same stategy like always. I won’t start small nodes to keep them ready, or buy new drives to keep them on the shelf, just in case my nodes fills up overnight.
And Storj can’t be accused of wrong advices so far; they made recommendations based on the historic usage. They can’t predict the future or clients that will come. Now they have some promises of new usage patterns so they recommend accordingly to let us prepare. Their behavior it’s pretty much on our side. So tumbs up from me!

5 Likes

Why not? I was doing that up until now. I have a few 500 GB nodes on standby in case one of my disk dies and needs to get replaced. I will migrate the 500 GB node and fill the new drive again.

1 Like

Lazzyness I guess. :sweat_smile:
I don’t expect to be some kind of model for anyone.

My thoughts on this:

  1. We are talking about upload here, but what about download. What is expected GET traffic, assuming anybody knows and is allowed to say (it does not need to be precise)? I’m sure that if my node needed to read data as well, it would be slower than right now (especially if the access pattern was such that it was not possible to cache the pieces).
  2. I sure hope the expected use case is not just upload-and-delete without reading the uploaded data, this would make it worse than the archive/backups where the customer uploads some data and never accesses it, but the data stays for a long time.
  3. Based on the currently running test, the piece size is either 3328 or 249856 (with about even count of both) and I assume this is indicative of the expected usage pattern. The small pieces depend more on latency than the bigger ones, so a node may be winning races with the bigger pieces, but losing them with the smaller ones resulting in lower utilization (and thus network performance) because of that.

I vote for Hash Browns. You took the simplicity of a Potato node and messed with it: mostly to be fancy. Some may find it an improvement: but many just think it wasn’t cooked properly…

1 Like

@Knowledge
Am I the only one who sees a contradiction here? I bet 90% of SNOs don’t have enterprise-grade routers, but the person who run these tests has one. That’s why he doesn’t understand the problems many SNOs had while the tests were running. My main concern (and problem for my router) is the amount of connections for every node that we have now. I remember that in January-February we also had a decent amount of incoming data, but the connection count was much lower than now. Can you please check this part and optimize it if possible? Do we really need so many connections (as I can see, there is no constant incoming data for most of them)? For many routers, this will be a problem. I guess many ISPs also might be unhappy about it. And please, people with 2 or 3 nodes, don’t write that you didn’t have any problems. I am talking about SNOs with 50+ nodes and 100+ TB of data.

P.S. Also may be developers can provide recommendations which parameters we should change in OS to optimize network perfomance.

They are not testing downloads so I guess there will be no significant egress if any.

30 day TTL could be cctv. :thinking:

4 Likes

Are there any SNOs that size with networking issues? They have the resources (and monthly payouts) for it not to be a problem. Many are in this thread posting monster throughput graphs!

3 Likes

Yes, i know many SNO of “that size” who had problems during tests.

How? 20 x 20TB x $1.5 = $600/month just in storage. A 60Gbps router is ~$7000.

1 Like

Does it even need to be a 60Gbps router? The maximum I can get is 1Gbps fiber and that is not going to happen any time soon. Even with a 1Gbps line there is a cap of how much storage I would need. And that is close to your 20 x 20 TB example. So a good 1Gbps router should be enough for that amount of storage or am I missing something?

2 Likes

I’m assuming “that size” refers to multiple uplinks to different ISPs, redundant power, the whole lot. For a 3.3Gbps ~$600. That can run 3 different 1Gbps uplinks. With two month’s payout I’d even get two and set them up in a redundant configuration.

1 Like

If you’re talking about @littleskunk’s gerrit link, this is about piece expiration, not GC. Yes, pieces were removed only 24 hours after they expired. If this patch will be merged, this delay will be configurable with 1 hour as the default.

…and…

The Storj business model is based on the fact SNOs take the risk of unused or nonperforming hardware. If you are a SNO, you have to accept this. Complaining is useless. If you don’t accept this fact, there’s no point for you to operate nodes.

Well, apparently there are people like that, which is why Storj still works.

SNOs started this, not Storj.

How’s the Storj regular salary? I do hope they pay you well, dealing with SNOs on forums should warrant some better health benefits.

I think nobody knew, even Storj. It’s not possible to predict upfront what kind of customers a startup will attract.

This was actually a somewhat recent discovery in social media.

The new drives offer 1M hour MTBF and only a 300TB/ year workload rate. WD defines “Workload Rate is defined as the amount of user data transferred to or from the hard drive.”

This would mean the traffic you quote (500 Mbps) would spend the workload limit on just writes in 55 days.

I’ve stated this before, but I do wish SNOs also got a per-piece write revenue to balance this.

I don’t think this is how it works.

Rube Goldberg nodes?

Yeah, also works.

I’ll save this quote for later.

Yeah, this is funny, good observation. I suspect the Algorithm should have separate counters for small and big pieces. My nodes were usually winning way more big pieces than small pieces, being located somewhat outside of the customer-hot areas (so higher latency), but with decent bandwidth.

I am kinda close for historical reasons—used to operate in multiple locations, but had to bring back my nodes together after losing access. Though, already down to ~40 TB, hosted behind an ISP-provided potato router which couldn’t deal with the traffic.

Not with these piece sizes, I think.

2 Likes

I’m just a software engineer talking to another software engineer here. I am so happy my job doesn’t require me to directly talk to either vendors or customers.

5 Likes

Oh that was between some meetings. I didn’t explain it very well. Because the TTL data will stay on disk for additional 24 hours there is a time window for garbage collection to move the pieces to trash instead of TTL delete. The fix will reduce that time window to the point that they should always be TTL delete and never garbage collected. So that timing problem was causing a few pieces to leak into the trash folder.

More problems like that will show up in the following weaks. We never had such an amount of TTL data.

4 Likes

Did ingress just get turned up to 11 again? Hooray! :heart_eyes:

This is interesting. I would assume this should not happen if the delay between database snapshot and sending bloom filters is bigger than 24 hours, which is what historically happened. Is this delay shorter now?

1 Like

Looks like it. I found log lines on my node that looked like 16 hours delay. I didn’ track it down any further. I also noticed that for some satellites I have 3 folders indicating that the team has improved much more than just the bloom filter size.

1 Like

With the current ingress rates from SLC… if you did want to want to incubate a spare node that was only 500GB-1TB… I wonder if it would be best to remove SLC from it’s list of satellites before you ever start it? If auditing requires a node to have data from each satellite… then if you did nothing a new 500GB node could fill from SLC in a day or two… stop accepting uploads… then never be able to pass AP/US/EU audits?

Like maybe disable SLC from the start… let the AP/US/EU audits complete… then turn SLC on? I may be burning brainpower on something that can’t even become a problem…