Tardigrade Thursday: Key concepts, decentralization (short read)

jocelyn · September 18, 2020, 1:14am

It’s Thursday! In advance of kicking off our interview series around decentralization, here is a refresher from our documentation around key concepts of decentralization

Decentralization

Decentralized data storage means more security and privacy. Decentralized cloud storage is more difficult to attack than traditional centralized data. On a decentralized network, files are broken apart and spread across multiple nodes. The Tardigrade network uses Erasure Coding to distribute file pieces over many nodes located in different physical locations around the world.‌

There are more than a number of reasons why you may wish to utilize decentralized storage over legacy alternatives, namely:‌

Better performance
Simple, and economical pricing
Ease of integration
Client-side encryption and key-based ownership of object data

One of the main motivations for preferring decentralization is to drive down infrastructure costs for maintenance, utilities, and bandwidth. We believe that there are significant underutilized resources at the edge of the network for many smaller operators. In our experience building decentralized storage networks, we have found a long tail of resources that are presently unused or underused that could provide an affordable and geographically distributed cloud storage.‌

Our decentralization goals for fundamental infrastructure, such as storage, are also driven by our desire to provide a viable alternative to the few major centralized storage entities who dominate the market at present. We believe that there exists inherent risk in trusting a single entity, company, or organization with a significant percentage of the world’s data. In fact, we believe that there is an implicit cost associated with the risk of trusting any third party with custodianship of personal data.‌

Unique Advantages of Decentralized Storage

Client-side encryption: the cryptographic technique of encrypting data on the sender’s side, before it is transmitted to a server such as a cloud storage service. Client-side encryption features an encryption key that is not available to the service provider (in this case, Storj Labs), making it difficult or impossible for service providers to decrypt hosted data. Client-side encryption allows for the creation of applications whose providers cannot access the data its users have stored, thus offering a high level of privacy. (Source: Client-side encryption - Wikipedia)‌

Erasure Coding: In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from a subset of the n symbols. The fraction r = k/n is called the code rate. The fraction k’/k, where k’ denotes the number of symbols required for recovery, is called reception efficiency. (Source: Erasure code - Wikipedia).‌

You can learn more about how Storj´s take on Erasure Codes here: https://storj.io/blog/2018/11/replication-is-bad-for-decentralized-storage-part-1-erasure-codes-for-fun-and-profit/‌

Data Repair is necessary when the number of available pieces of a file still held on the network approaches the minimum threshold below which it would become impossible to recover the file. When we reach this threshold, the network will proceed to repair the data in such a way that the number of available pieces is always big enough to prevent the file from becoming irretrievable.‌

You can learn more about data repair in the Tardigrade network here: https://storj.io/blog/2018/11/replication-is-bad-for-decentralized-storage-part-1-erasure-codes-for-fun-and-profit/.‌

File Audit is the action of testing if a random piece can successfully be retrieved from a node that is storing it. File Audits are continually applied to assure the durability of the files on the Tardigrade network.‌

S3 Compatibility - If you are currently using Amazon S3, you can connect directly to Storj through an S3 gateway hosted alongside your application server. This means that switching to the decentralized cloud and reaping the cost, performance and privacy benefits is as easy as changing two lines in your config file! No new code needed!

BrightSilence · September 18, 2020, 1:29pm

That’s a very neat summary of the key concepts! You refer to documentation here, but I’m not familiar with where to find the original source for this? Do you have a link for it or is this a newly compiled set of info? This seems like a great summary to link people to.

Each advantage title also links to a missing gitbook page.

Pentium100 · September 18, 2020, 1:55pm

I have a small nit to pick. Encryption is not related in any way to decentralization. I can encrypt (client-side) files and store them on a centralized system and I can store unencrypted files on a decentralized system.

Other than that, I think it is a good summary.

Alexey · September 20, 2020, 11:07am

It’s not easy with Tardigrade, you need to modify the code of the uplink and I’m not sure is it possible to make it work after that. The encryption is a core part of the uplink.

Pentium100 · September 20, 2020, 5:06pm

Well, Storj implementation of semi-decentralized storage requires encryption (though I could still probably set the key to zero or something. However, something being centralized or not and something being encrypted or not is not related.

I could take your code, remove the encryption part and still advertise my version as decentralized. People would probably not want to use it like that, but the technology itself does not require encryption.

On the other hand, nothing prevents me from using client-side encryption with Amazon or similar services.

As I said, I am nitpicking here.

BrightSilence · September 22, 2020, 5:48pm

I mean that kind of goes for erasure coding, data repair and audits as well. You could just split up a file, have copies of the pieces or just hope for the best with a single one.

These are more the advantages of this specific implementation of decentralized storage, not necessarily of decentralization in itself.