Helping customers to migrate may help

@sorendanielson @bre
Interesting idea, maybe to launch something similar for Storj DCS?

we announce our Cloud to Cloud Migration program—an offer to pay the transfer costs for customers that want to migrate their data from Amazon S3 to Backblaze B2 Cloud Storage

7 Likes

Thank you. Nice find. We’ll talk about it.

3 Likes

Maybe the first ones to offer support to switch from AWS S3 to Storj could the this project: Just heard of Opacity the first time - #2 by jammerdan as they might be locked in AWS S3 currently.

As a crypto project, they might even be willing to accept Storj token for compensation.

1 Like

Together with the promise from your pricing page

If you discover that Storj DCS isn’t a fit for your project or application and you transfer your data to another service, use our support portal to submit a ticket and let us know. As long as you follow the process, we won’t charge you for that egress bandwidth.

it should be a killer argument to convince customers to at least try if Storj would be a good fit for them instead of AWS S3.

2 Likes

Another idea to consider: bulk imports and exports with either drives or multi-disk enclosures. This might be pretty simple since Storj already has a local S3 gateway. There could be a modified version that, instead of contacting the real Storj gateway, the data is stored on a local hard drive in some manner than can be imported. Then the drive(s) are shipped to Storj and uploaded at high speed. Bulk exports could work a similar way.

The way Backblaze does this is they charge $100 for a big USB drive up front, export your data, mail it to you, and if you send the drive back, they refund you, or you can keep the drive and they keep the $100. They don’t allow importing drives (I think they should!), but they have a Fireball product that’s a multi-disk NAS. It costs $550 to rent it for 30 days: they to ship it to you, you load it up, then ship it back. It’s something like 50TB.

S3 allows bulk imports / exports with their Snowball product.

I agree and we had a discussion about such an idea some time ago:

Previously the Storj product was called Tardigrade. I still believe if you want Hundreds of terabytes of data from a customer there should be better solution than that the customer has to upload it on their own. This is what Wasabis is saying:

While a 10 Gbps Internet link is pretty fast, 10 petabytes of data is a massive (but not uncommon) amount of data. Let’s say you’re fortunate enough to have access to a 10 Gbps connection and want to move 10 petabytes from your on-premises data center to Wasabi. It’s going to take you over a year to get all that data uploaded.

And here is what they do:

We’re shipping them five high-capacity Wasabi Balls, each capable of transporting 87 terabytes of data. Because each appliance connects directly to their on-premises storage with its own 10 Gbps fiber link, they’re not clogging up their Internet bandwidth. It takes a little bit over a day to fill a Wasabi Ball and all five appliances can be used in parallel. Once full, the customer ships them to our data center for uploading to the Wasabi cloud, and we ship them five more Wasabi Balls.

In theory, 10 petabytes of data could be moved this way in just a couple of months.

It is a very interesting read from Wasabi here: https://wasabi.com/blog/wasabi-ball-fast-time-to-savings/

There is one major difference though. These centralized players can all just load that data locally into their data centers. With Storj, someone has to still upload it to nodes. There is no way to short cut this process with local transfers. So sure, Storj could have some major ingress servers with fast connections to help out customers with slower connections, but I’m betting the customers with petabytes of data will have fast enough servers to do that themselves. Just because of the way the network works, the advantages of offloading this are a lot smaller than with centralized providers.

Well according to Wasabi:

However, if you have a 10 Gbps pipe it’s usually because you need a 10 Gbps pipe. In other words, it’s being used for normal business. Since you don’t want to clog it up during work hours, that leaves roughly 12 hours a day for uploads. So, double the time it would take to 232 days.

True that the data has to be uploaded, but I am not sure if you can take it for granted that the customer is properly equipped to upload hundreds of terabytes fast.
Parallelization and decentralization could be amazing in this field to: Split the hundreds of terabytes in smaller chunks. Send them on disks to several partner data centers with tenth or hundreds of Gbs of bandwidth and/or directly connected to Internet Exchange Points and upload them through them.

I mean if large competitors like AWS, Wasabi, Google and Backblaze offer such a service, then I would take it for granted that there is a demand for it.

Yeah you could definitely scale it if you have multiple partners with multiple uplinks working through that data. But that’s quite a bit of logistics to set up. Including trusted partner relationships. It would also require splitting the encryption from the upload process.

I was more responding to if Storj would have to do it themselves. Clearly the entire point is that Storj doesn’t want to run data centers to begin with. So it would just be shifting the bottleneck.

I think the scaling options may at some point get interesting, but there would have to be massive demand for it. And I could be wrong, but I don’t think Storj is at that point yet.

I totally agree. I am not seeing Storj operating data centers. I mean the whole point of Storj is not to operate data centers.

That’s why I made the suggestion in that thread to lay ingesting off to professional SNOs where I of course meant those professional Data Centers that Storj at least has thought about to partner with.

Yes it involves trust, yes encryption is a ‘problem’, yes it would involve coordination and probably additional coding and even hardware considerations. And yes, Storj is clearly not there at least from a technical standpoint. But if some large corp approaches Storj tomorrow and asks how to transfer large amounts of data, then Storj would have to decline. This is an undesirable result.
A bit of a hen and egg problem it seems. At least I think the way for such a service should be paved in preparation by considering and implementing this codewise and start tothink about what kind of partners, hardware and processes would be required to offer such a service.

I think there are other issues for quickly onboarding large petabyte customers as well. Since there is just under 6PB of free space atm. There is some more that Storj could free up, but clearly they won’t be able to onboard a 100PB client on short notice. So this gets back to balanced growth. Right now they should probably focus on slightly smaller customers, which probably gets rid of the need for these large onboarding processes. Luckily that also allows them to focus their development efforts elsewhere. It’s not just that technically they aren’t there yet, but the network scale really isn’t there yet.

They’ll get there, but we need some smaller scale growth first before the whales come in.

It is quite interesting to think that over. In contrast to a data center operator Storj normally will never be able to really promise such a capacity. If a competitor like AWS receives a request like that they can start building servers, add more space etc. and guarantee this capacity to a customer.
Given the data of today, the Storj network is 15 PB with 6 PB free. If tomorrow a 10 PB customer is asking for space what would be done to satisfy such a customer?

And even if Storj would manage to attract lots of new SNOs by ‘hinting’ big news. Would those stay with the network long time to have the data at a sustainable level?
But also we have already seen with Chia that massive growth of capacity can happen really fast when there is enough (potential) incentive.
We have around 12k nodes currently. If only 1k of them add 10TB each (and I guess many more have the option to add bigger hard drives on short notice) then the network would grow by 10 PB. I could and would love to add 10 TB instantly… :grinning_face_with_smiling_eyes:

1 Like

Yeah, same here, but it’s hard to say whether 10% of nodes would. However, Storj can do quite a few things to help that growth. If they know some time in advance, they can start pushing dummy data to the network, giving SNOs an incentive to upgrade already. They can do surge payouts to attract more SNOs as well. If needed they could temporarily spin up some of their own nodes even.

Now I wouldn’t say Chia type growth is reasonable. There was the hope there that income could be massive, while with Storj income is more stable. But despite that natural growth will probably happen anyway. It just won’t be instantly 100PB. But 10% growth over and over will eventually also get you there.

But yes, you can consider my nodes effectively endless as I will just expand when needed.

Just to add to your list:

  1. Delete data from temp folder ( There is a low priority ticket) for it.
  2. Lower trash data’s removal time from 7 to < 7 days.
  3. Delete disqualified satellite’s data (Suggested here )
  4. Limiting storagenode’s log size.
  5. Shrinking database if possible.

These problems exist for the general file upload case, but some backup products support seeding. With HashBackup, you can create a “dir” destination on a USB drive, do your initial backup, take the drive to the target server and copy the data. The data is already encrypted by the backup process and the key is never stored on the USB drive. You’d just have to change a config file and switch the destination from a dir destination to a storj destination.

So if there was a service to copy files into a Storj bucket at high speed, some backup programs could use it today I think. Maybe there would be a hangup with access grants, I dunno; not that familiar with things yet.

Anyone heard of S3Mule?

I saw today that MinIO’s S3 gateway can be used as a transition and/or backup of S3:

Continuous sync between Amazon S3 and Azure

MinIO can migrate or keep objects in sync between Amazon S3 and Azure Blob Storage or MinIO object storage continuously using Lambda compute notifications.

I think the way this works is if you upload to S3 it triggers an event and MinIO downloads the object. This would allow the Storj MT Gateway to mirror an S3 bucket for backup. You’d have to run an rclone sync in the background initially to start the cloning process, but new objects would get mirrored on the fly.

Another way to do it is like Cloudflare R2: Storj is the primary reference. If it has an object, it serves it. If it doesn’t have an object, it checks the S3 backing bucket, downloads the object, serves it, and stores it in Storj. Migrating objects as they are referenced avoids a huge S3 egress bill. The S3 object could have its expiration date set for auto-deletion or it could be actually deleted if the customer didn’t want to maintain an S3 backup.