Updates on Test Data

Hello Storj Community,

We are excited to announce some changes that we have begun to implement regarding how we are using test data on the network!

What is changing

We are currently seeing exponential growth in our paid customer data and expect that trend to continue based on our sales pipeline. Because of this we want to ensure that we are sending timely demand signals to the node community so that the capacity and other resources we need for our growing customer base is available when they need it. As we are experiencing this growth we believe that our two limiting factors will be capacity and IOPS. In order to secure resources and to scale test, we have decided to begin to upload additional data to the SLC satellite essentially “reserving capacity” for our customer growth a few months before we anticipate it will arrive. At times we may also use this method to test synthetic bandwidth load as well.

We have prospective customers with a number of different usage patterns and several of them have significant requirements for volume of writes and ingress throughput. Many of the synthetic load tests we’re conducting are designed to simulate and validate those customer use cases.

How will this be implemented

To accomplish this we plan to analyze our anticipated data growth and begin to continuously upload synthetic data, with the anticipated customer segment sizes, to the SLC satellite with a TTL. Because of the TTL, the data will expire and be removed on the node side without the need for garbage collection. In this way we are testing our upload throughput, reserving capacity, and automatically deleting data as it is being replaced by paid customer data.

As previously mentioned when we announced the discontinuation of the free tier, We have also already been removing old data from our production satellites composed mostly of old abandoned accounts as well as free tier accounts that were closed by users following the conversion to free trial on our production satellites. We plan to increase the rate at which we are doing this in order to transfer all non-paid data to the SLC satellite and clean up the production satellites. As a consequence of this, nodes will see an increase in the amount of garbage they are storing in the short term.

Other items of note

We hope that this process will enable us to scale up the capacity we need as required by our growth. In addition to that we may also send out announcements asking for more nodes to be brought online in specific locations as required.

We also anticipate the need to continually improve many aspects of the network and the storage node code and experience. Things including better scaling our garbage collection processes, and improving our IOPS efficiency (thanks to our community for getting involved and suggesting design docs). We are excited to continue to work with all of you as we grow the network together.

Actions for node operators

We don’t expect that this load is temporary, and may in fact be the new normal. Please take steps to correct issues your nodes may have keeping up with this load, and as always, adding more nodes can also help reduce the per-node load on the network. We will continue to be active here assisting, gathering data to help inform further improvements.

Thank you for all your continued support and contribution to this endeavor.

The Storj Team

28 Likes

Hooray for strong sales! It feels like only yesterday that every TB was sold to customers at a loss. Finally time for Storj to make some money! :money_mouth_face:

I can understand IOPS being a concern: but I think you’ll be fine for capacity. I get a strong sense from the SNOs I see here that they’d be very happy to add a new disks as old ones fill. Bring it on! :slight_smile:

4 Likes

So they want to upload us data, as fast as possible?
That! that i can very much provide! :heart_eyes:

2 Likes

Somewhere out in the world @Th3Van just got excited… though he doesn’t yet know why…

4 Likes

Ssshhhhhh… he’ll hear you and buy another server rack…

3 Likes

Well that sure is a great reason for additional testing. Keep that data coming!

What kind of TTL’s are being used for this testing?

3 Likes

Great news, we will be ready to store that data!

Not sure… did you check his nodes status? 100% full and 60% paid. Just full of garbage that he cannot delete (yet)

I like the ideea of reserved space, but I think we are safe for now. There is a lot of free space on the network… I have 180TB available. :grin:

reminder that there is an expansion factor and the available space shown here still includes Storj Select nodes, too.

4 Likes

Can this be indicated in Network Statistics? What is Select data and what is “normal”? A drop down like there is for satellite would be nice.

1 Like

Why would that be of any use for storage node operators? You get paid the same. How are you planning to benefit from that information?

The same way I benefit from the rest of the info on that page. I like knowing. Also Select is the certified nodes, right? So I wouldnt get that data or be paid for it?

“liking to know” non actionable details, aka, “idle curiosity”, is not a good justification for a feature request, with all due respect.

But I agree with you in that most information there is just as useless. Starting with bandwidth plots, that misrepresent data and hide axis, to payout information, that is approximate at best.

Why do I, as node operator, need to know any of that, if I cannot act on that information?

The only half-useful bit is “Online” status, which I’m getting from external monitoring system anyway: if node lost internet connection it won’t be able to tell me that it lost internet connection because it lost internet connection. You need external monitoring anyway.

And so much disk IO is wasted on maintaining this gimmick — that’s what needs to be fixed.

The ideal dashboard will be: “all good” green circle OR red circle, with last 20 lines of logs in a searchable/filterable text field. With “download logs” button.

3 Likes

These are great news. I hope the storagenode code is prepared for that and nodes getting even bigger than todays larges nodes.

Hopefully you are going to assign enough dev resources into it and stop wasting IOPS on the nodes for databases and filewalkers that run over and over where the node would be better off to serve customer files.

If you listen, there are many ideas for areas where to improve the experience:

Improvements on managing, monitoring and moving nodes might help

2 Likes

I’ll like to add the fact that testing on the test sat is not representative for the entire network. Some of us GE from it. Some limited the egress for it. We will see the real power and problems of the network when real customers will test the system with real data. Maybe give them a month or two of free traffic to make their own tests. I know you know what you have to do and don’t need us to tell you, but I tell it anyway. :grin:

No way. Storj has just realized with the garbage disaster that the storagenode code is not as good as they thought and might not be well suited for current and future load and growing nodes. Of course the customer should not experience issues with that. That would be more disastrous than anything. So they need to test it with the anticipated test patterns to fix things before the customer notices them.

5 Likes

This is worrying me as well, but I don’t think your diagnosis is correct. Code quality is just a symptom. It’s always just a symptom—in all software projects. Focusing on code quality on its own does not create a working system.

1 Like

Well this is good to hear. I was getting worried since it looks like Storj has quite literally flatlined for the last month… just as I’m looking to expand to TX. Suppose I’ll wait to see what happens.

Didn’t we just get done telling everyone we have way to much capacity and to many nodes on the network or am I just suffering from that mandella effect thing again?

I doubt your limiting factor will be capacity since SNOs are itching to add more. However, the IOPS is a real issue. As SNOs grow, it’s not necessarily the disk IOPS that cause problems directly, it’s the limitations of the disk controllers. Add one to many drives to the same system and it will completely choke and lock up under high demand. Took me a minute to track that issue down. For anyone trying to build larger efficient systems make sure to watch that and use multiple controllers.

So your currently deleting a ton of old useless data just to replace it with more useless data? Why not just slowly delete the current useless data as needed instead of throwing a ton of unnecessary write operations at our hardware?

5 Likes

Because they need to copy the usage patterns of the upcoming customers they are talking with to verify it will work for them and the network will not experience the next disaster.

2 Likes