Swapping backup HDD to storagenode and putting backup onto Tardigrade

I have a 2 TB backup drive and I was thinking of storing my backups on tardigrade and using that drive for another storagenode.

The first “problem” I noticed is that all my backups are zfs snapshots and I’m not aware of any zfs backup system that allows storing those snapshots as files in a specific location so I need to change my way of doing backups (which is not so great…). But I guess duplicati might do a good job here.

The other aspect is cost vs revenue. Storing 2TB of backups on Tardigrade and never downloading it will cost me 20$ per month! The revenue for running a storagenode full of 2TB of data without egress is 3$… The egress of my nodes during the last months hardly make up for the gap and adding an additional node to the same IP will not make things better. However, the egress during the last months is hardly representative as there was lots of tests and my nodes were full (getting even less traffic than others). On the other hand, it’ll take a long time until there are customers with high download demand on tardigrade. I would assume most customers will use it for backups, which is basically upload only, so almost no egress for SNOs.

Any thoughts on both problems/considerations?

zfs send ... > /some/file/somewhere

Unless you value the fact that Tardigrade backup is “offsite”, it does not make much sense, unless you have a 8TB hard drive and your backups are only 1TB or similar.

I think I will use Tardigrade for backups, but instead of keeping the whole copy there (which would be expensive and not worth it for me), I think I will keep the daily incrementals for a week until I write them to tape. My daily incrementals would only be a few gigabytes, but I have a few terabytes of files.

The problem with this approach is, you would need to build a whole program around:

  • check which snapshots are not in your bucket and uploads those
  • check which snapshots are delete and delete those from your bucket (or retain them for a certain amount of time)
    It certainly is possible but there is no ready to use solution afaik.

“Offsite” is the main argument for storing backups on tardigrade. There are lots of other backup options, some even considerably cheapr than STORJ, but they are typically in a single datacenter.
However, my backups are 1.6TB and my HDD has a size of 1.8TB…

That’s a nice idea. I don’t have secure tape so Tardigrade would be my long-term backup.

Storj is not as decentralized as you might think - while the data itself is distributed to 80 nodes etc, the metadata is in the satellite, which, while it may be done with redundant servers etc, probably is not that much more reliable than the servers of some other company, be it AWS or someone else.

Tape is a good idea for me because, while I have more than 2TB of files, most of them do not ever change, so I can just write them to tapes and put those tapes on a shelf somewhere, gradually adding more tapes with newer files.

1 Like

I absolutely agree. I have been criticizing that point since I became an SNO. STORJ always said they’d make it better by splitting the service into smaller units and among multiple servers, redundancy, …
So once that works, it is not too bad but still not a truly decentralized solution.

1 Like

Can you define what truly decentralized means? How about holding the satellite in a cockroach cluster? How about spliting that cluster over several regions? How many cockroach nodes are needed to call that decentralized?

How does it compare with AWS or other companies? I’m pretty sure Google and Amazon have quite a few datacenters and quite a few servers in those datacenters, though I do not know how many servers would have to go down for my files to be inaccessible/destroyed.

I do not know for sure the design of the satellite, but if I somehow obtained root access to one of the servers and ran the equivalent of “drop database” would the network continue running?

I think that true decentralization means that there is no “more equal” server anywhere, like trackerless torrents or, kind-of, bitcoin or ethereum.

However, that being said, the fact that the data is spread out over lots of nodes makes it cheaper probably with the same or better overall reliability compared to Amazon or others, I just do not know how the metadata compares with Amazon etc.

1 Like

You are saying google and amazon are running decentralized servers? In that case, you can say the same about our current solution. The difference is even if all of our servers are burning down at the same time the data would only be temporarily inaccessible. We spin up the satellite in a new location, import the database backup and all your files would still be stored on storage nodes across the globe.

With cockroach that would mean the replica, you are connected with is now unusable and the remaining cockroach nodes will jump in and keep the network alive. We would need to spin up a new cockroach node to replace the lost node.

That is already the case now.

In a truly decentralized system there is no single authority. With STORJ there is always a single authority, the satellite (even if it is distributed across multiple servers and countries).
A truly decentralized system would have a network of completely equal entities. They all check each other. Basically just like Ethereum and Bitcoin.

I’m aware that this might be extremely difficult if not almost impossible for a storage providing network.
So compared to the typical cloud storage providers (who certainly don’t store your data in multiple locations unless you pay a lot or configure it that way), Tardigrade is a fairly decentralized solution under a single authority. But not a truly decentralized solution.

However, this got us quite far from the original questions I had :smiley:
I’m still considering using STORJ as my primary backup location and using my backup HDD as a storagenode but the costs of storing that data seems to outweigh the revenue of using the HDD as an additional storagenode by far.
If that HDD fills to 2TB and 10% get downloaded each month, that would be a revenue of 3$ (storage) + 2$ egress = 5$. Of course egress is completely unpredictable.

Also I’m still a bit undecided about switching my way of doing backups from sending incremental zfs snapshots to making actual incremental file backups. Sending zfs snapshots redirected to files would be a nice solution, if there was any program that would actually support it because I don’t like messing around with my backups manually or with a program I have to write myself. Too much could go wrong.

Even Ethereum has different node types and only a few so called full archive nodes. We still call it decentralized because everyone could setup a full archive node.

Same is true for storj. Everyone can run a satellite. If you can be the authority yourself that is already decentralized. Why don’t you run a full archive node? Because it is expensive and you don’t have to. The same can be said about running a storj satellite. You can but most of us will not.

I might join you on that question. My current QNAP NAS is getting a bit slow. Time to buy new hardware and this time I am confident I can manage a self build NAS with zfs or btrfs. I already have the hardware list. Soon I will start to try out different setups. The question of how to deal with backups will be part of that experiment.

Not quite. In Ethereum I am unaffected if a full archive node goes offline. With STORJ every satellite builds its own network, controlled by a single authority. If my satellite goes down, so does my network.

No, I would create a new authority in my own network, making storage decentralized but control centralized.

I’m glad I’m not the only one with this problem!
Let’s see if we can find a good solution. (please go for zfs and not btrfs :smiley: :wink: )

1 Like

I think you got it mixed up a masternode acts the same as say a satellite does, You can have many of both, whos to say Storj wont have many more satellites when everything is up and running? Currently The data is decentralized mostly, The satellites can be and will be even more decentralized then it is now.
As for authority they dont have much more authority then we do they can’t access your data, Just like people who own a masternode cant access your crypto directly.

No bitcoin/ethereum node can make your crypto money disappear, the whole network secures it.
With STORJ, my satellite can make all your data disappear that you have uploaded using my satellite (but of course not the data you uploaded using a different satellite).
So they are very different things.

Sure but have you every heard of an attack on a bitcoin/eth node and screw up the entire blockchain? Where they make an exact same blockchain and clone it. making your crypto useless. The same goes for both.

You can’t compare an attack on a network to a basic working mechanism… One is an exception, the other is how the network works.

It works exactly the same way, Your saying Storj has control over all the data on everyones hard drives, As if someone out there can’t take your bitcoin or eth away from you the same way.

Let’s just stop here, this makes no sense anymore and it was not the goal of my post to talk about decentralization.

Good post here that might be relevant: https://storj.io/blog/2020/03/the-electric-car-example-applied-to-decentralized-cloud-storage/

2 Likes

I thought storing on STORJ cost 3$/TB/month, so that would be 6$/month, right?

EDIT : Wrong numbers, sorry.