Storj Network Growth Plan

Bryanm · June 25, 2024, 5:14pm

The Storj network is experiencing a pivotal transition from an innovative concept to a thriving platform, with paid storage growing at ~12%+ monthly and multiple 10+ PB prospects in active testing. This growth necessitates strategic expansion of our network capacity, performance, and geographic distribution.

To address these needs, we’re implementing changes to incentivize proactive capacity expansion. These modifications aim to increase our total node count, enhance node connectivity and performance, and optimize geographic distribution across the network.

This document outlines our Storj Network Growth Plan, detailing these strategic changes and seeking community feedback. Our goal is to build a robust network capable of meeting evolving customer demands while providing enhanced opportunities for node operators. Your input is crucial as we collaboratively shape the future of Storj.

Approach

Our strategy builds on historic success using synthetic data to encourage node growth and involves a multi-faceted approach. We’ll focus on regional capacity targets, a process for ongoing data upload and management, and targeted incentivisation to drive growth in areas where it’s needed most.

Important aspects:

Churned, suspended, delinquent, etc. customers will have their data and account deleted on an ongoing basis according to our internal policies and procedures for doing so, but not held for node incentivisation purposes (as has been done in the past).
Synthetic data will continue to be uploaded with the SLC satellite only.
The amount of synthetic data maintained on the network will be triangulated per region by a combination of the following: 2-3 months worth of projected growth targets, 2-3 months of historical growth, and growth required to support the network in a new region with sufficient capacity and number of nodes to be performant and durable.
Since network performance has a material impact on success with customers, we have improved our upload algorithms to better incentivize fast nodes by placing them in direct competition with other nodes with the losers having the upload canceled. We intend to eventually provide this data in order to inform operators about the performance of their node enabling them to better optimize their setup.
Synthetic data will use a time-to-live (TTL). The data in the network will be uploaded continuously and will be adjusted by changing the upload rate, the time the data is retained, the size of the data, etc. to better match expected customer onboarding and storage patterns.
Because of the continual uploading and TTLing of the data synthetic data will continuously be deleted, and nodes will no longer be able to store synthetic data for the long term in an archive-like fashion. Only customer use cases will be long term storage. This will mean that our fastest nodes don’t just fill up and stay full with synthetic data. And that in order to be successful nodes will need to stay performant and “win races” with other nodes to avoid long tail cancelation
The switch to TTL synthetic data will also allow us to remove synthetic data without having to go through the garbage collection process, improving efficiency of the network as well as node economics.
To create “regional” incentives, we will upload synthetic data using specific placements (geographic or other) to encourage node growth based on sales projections, prospects, or company strategy.
We are updating the minimum requirements for a storage node to more accurately reflect the profile of a successful node on the network. This update will be based on actual customer use cases and usage patterns such that the aggregate network performance meets customer needs.

Expected outcomes

Sufficient capacity and resources in place to handle anticipated growth in paid customer data
Nodes ready and incentivized according to appropriate customer use cases
Greater network performance across different regions
Increased participation and contribution from node operators in high-demand regions
Improved ability to meet the diverse needs of our global customer base

Capacity targets

As of June 25, 2024. Targeted amount of synthetic data.

Placement	Current	Target	Reasoning
Global	11 PB	20 PB	Large expected deals
U.S.	0 PB	5 PB	3 month growth target
Europe	0 PB	2 PB	3 month growth target
South America	0 PB	1 PB	Bootstrapping network
APAC	0 PB	1 PB	Bootstrapping network
India	0 PB	1 PB	Bootstrapping network

Conclusion

By implementing this Storj Network Growth Plan, we aim to proactively address anticipated growth in paid customer data, ensure optimal network performance, and strengthen our position as a leading distributed storage solution provider. Our targeted approach to capacity allocation, combined with strategic incentivization and continuous monitoring, will enable us to meet the evolving needs of our customers and strengthen our partnerships with storage providers/node operators while fostering sustainable growth across the Storj network.

Please see the FAQs below, but let us know if there are additional questions, concerns, or feedback.

The Storj Team

FAQ

Does this apply to the commercial operator program?
- No. That network has other options available to incentivize growth, but might adopt its own form of synthetic load for testing and vetting purposes.
Won’t uploading test data by using IP based geolocation only incentivize operators to leverage VPNs to get the data rather than encouraging real growth in those regions?
- We won’t know until we try it, but yes, we believe that initially a lot of nodes will try this approach. Because data is constantly replaced, the nodes have to continue to win races against each other, with nodes actually located in the area having an advantage over VPN’d nodes. Over time the data should concentrate on those serving the area best which should be local nodes. We can add additional checks or adapt this strategy over time as we learn more.

Roxor · June 25, 2024, 5:37pm

Can’t wait to see node performance leaderboards . And hooray for anti-baked-potato spec changes! And more capacity reservation! Make it rain!

(Edit: no mention of your own ‘surge nodes’ to help maintain minimum levels of upload performance?)

Vadim · June 25, 2024, 5:51pm

I will be happy to get some feedback on dashboard to see how successful node is, some metrics. It cold be average 1h. or something similar. then we can tweak nodes to perform better.

BrightSilence · June 25, 2024, 5:58pm

I assume you are referring to the new node selection, but this sounds more like the old school long tail cancelation. You specifically mention long tail cancelation a little further down your post. I think it would be good to be more specific and mention that you won’t even be selected for an upload. This actually has benefits too. At the moment, I’m running a RAID repair as well as a large data migration along with the regular GC and other stuff. And the new node selection helps to not overload my system while it’s busy too.

Wouldn’t count on leaderboards. Storj has always been good about protecting SNO privacy. Most likely it will be an indicator of percentile or something like that to show on your dashboard to indicate performance compared to the average.

@Bryanm these changes sound excellent to me. And it’s exciting to see the network grow. I’m looking forward to seeing what happens in the upcoming months/years. Do you expect to also be doing more outreach to get new SNOs on board? So far it doesn’t really seem like the tests have brought many new people in and while expanding by existing SNOs helps, there is nothing better than getting more SNOs to join.

Roxor · June 25, 2024, 6:02pm

I vote for a new unit of performance measurement to display in the Node UI. The Relative Potato Index (RPI) . One unit of “RPI” will be equivalent in performance to whatever gimped SBC config @littleskunk is planning to put together .

Satellites keep track of the data his node handles… and every 24h churns out RPI stats for all active nodes (beside the ‘Average Disk Space Used This Month’ stats we usually get today)

Finally… we’ll have reliable performance numbers SNOs can use to compare any nodes!

snorkel · June 25, 2024, 6:02pm

Maybe you will consider reducing the stress upon nodes with that TTL syntethic data. I don’t see the point to stress them more when they must receive customer data.
I am thinking as optimisation you should use bigger pieces, easier to be deleted. The number of pieces is the problem, not their volume.

Roxor · June 25, 2024, 6:04pm

Large pieces… and perhaps a 3-month TTL? It sounds like their targets are quarterly anyways. Less continuous data to replace.

littleskunk · June 25, 2024, 6:05pm

Hey don’t blame me for all the bad nodes. I did my homework and will put everything I have learned to a test.

littleskunk · June 25, 2024, 6:11pm

Yea we somehow forgot to add that. But to be fair the overall idea is a bit different this time. You should read this as a commitment. Up until now we had the risk that the deals might not get signed. Now even if they don’t get signed that wouldn’t reduce the amount of paid test data that much. I read this commitment more as an 3 month plan that reduces the risk to spin up new nodes.

littleskunk · June 25, 2024, 6:21pm

The first part with the 20PB we are currently uploading isn’t going to change much. We expect a specific usage pattern and want to encourage the nodes to adopt to that. We can’t bend reality. We are not going to tell our future customers to please upload data with a longer TTL or bigger files. I mean we can but that would mean we don’t have to upload 20PB in the first place.

For some of the other buckets that are currently still 0PB in size we might use bigger files trying to match expected customer uploads as best as possible. We might also increase the TTL but since the timespan for this planning is 3 months I would argue that the highest possible TTL is 1.5 months so that we can still free up space fast enough the moment a bigger customer might sign up.

Vadim · June 25, 2024, 6:27pm

No one blame, you potato nodes was long before you, they just wasn’t described.

zeromachine · June 25, 2024, 6:28pm

This is very encouraging.

I would make 1 suggestion as far as incentives go. Change the Held amount timing. Something I would like to see is this:

Months 1-6 = 50% held, 50% paid
Months 7+ 100% paid
Month 12 = return 50% of held amount

That would increase my motivation to add capacity further.

Mitsos · June 25, 2024, 6:40pm

Does bootstrapping network mean that more satellites will be added in those regions?

Bryanm · June 25, 2024, 6:44pm

Yes we are working on a plan to try and bring new SNOs onboard.

snorkel · June 25, 2024, 7:39pm

That’s simple. Once the word is out we are getting paid a lot in a short time, they will come.
Just teak the vetting period/process and the held amount.
But get ready for an exponential growth in forum posts asking for help.
Untill now, SNO fit the enthusiast/hobbyst type looking for some revenue along with the pleasure of tweaking and learning.
Now you will have a plethora of farmers seeking only quick income.

Roxor · June 25, 2024, 8:07pm

TTL data can still be manually deleted if you need the room: the trash system can clear up any amount of space in about 10 days, can’t it?

pangolin · June 25, 2024, 8:09pm

So 30 days TTL data with no egress at unchanged payment rates is the incentive? Or is there something I missed?

Roxor · June 25, 2024, 8:10pm

If I’m reading the table in the top-post right: the incentive is paying us for 30PB of data no customer needs yet. I feel motivated!

Roxor · June 25, 2024, 8:15pm

I think we have around 6000 nodes with free space? And many of them will fill (and not expand) while capacity-reservation data is going in? Sounds like an opportunity to me!

littleskunk · June 25, 2024, 8:16pm

20 PB * 1.875 (if we keep the current RS numbers) + 10 PB * 2.25 (assuming we use the default RS numbers)