Ability to recover part of lost data / restore the node from backup

jammerdan · May 17, 2020, 8:34am

If we agree that customer should not have to deal with pieces that might be downloadable or not, then your proposal rather seems to be a way to trigger auditing more often. Namely for the pieces a customer wants to download and get the SNO only paid for those audited pieces.

However if those “maybe-lost” pieces are offered to the customer without verification, I don’t agree to that. I don’t think Tardigrade should offer such data to the customer. I don’t think it is the customers job or interest to be the one to verify if data is good or not.

anon27637763 · May 17, 2020, 11:22am

This is called a blockchain…

It comes down to the following parameters:

Speed of verifying
Paying bandwidth
Computation cost on the node and the satellite

Let’s assume some type of recovery mode is implemented…

If a node fails an audit, then the node will need to be removed from the network for verifying. While the node is offline in “untrusted” mode, data pieces are being requested that the node may have… but are being downloaded from other nodes and rebuilt. This means, while the node is in “untrusted” mode, that node is losing pieces. By the time the node has been checked, many GB of pieces may have been moved. If the node is slow, perhaps the node will already have lost most of its pieces to network repair before the verification of stored data is complete.

Simply put… there’s no practical method to implement WAN connected ZFS.

Local filesystems operate very fast, and so can do really neat things like checksums in very small amounts of time compared to the amount of energy expended. Storj nodes can’t do that.

SGC · May 17, 2020, 11:28am

well from what i understood when they talk about the erasure coding, then there is never pieces located on the node which the network will miss… because it doesn’t need all the pieces, it needs a certain number of pieces to read the data, the individual pieces are irrelevant…

which is actually quite brilliant… but don’t ask me how it really works… but with such a system putting a node into a degraded mode wouldn’t have any significance on the network aside from how close the individual datasets are to start copying data around or doing repairs.

its why i would wanted to do it locally in multiple places and compare the uberblock checksums between nodes basically… but because it’s erasure coding… all data is unique… and the concept falls apart.

atleast of how i understand it… i cannot really speculate about how to set something like that up, in something i basically don’t know heh erasure coding sounds really smart tho.

anon27637763 · May 17, 2020, 11:41am

My post wasn’t directed at answering network health. It was addressing how the network would route around the degraded node. The network would be fine… the node would lose a tremendous amount of data while in degraded mode due to the erasure coding algorithm.

So, at the end of the verification process, the node ID would be clean --or not-- but the node would have paid out more to the satellite than the held escrow AND have lost nearly all the pieces.

It’s better for everyone just to forget the DQ… get a new ID, and restart.

When I first started a node, I thought the same as the OP… but after a long time reading and watching the network function, it’s fairly clear that the current implementation is quite fair to SNOs.

SGC · May 17, 2020, 12:21pm

its not that it isn’t fair, i don’t plan on losing data, but i accept that i can happen and if i only run a limited number of 24tb nodes then loss of one could be a significant part of my fleet…
so if i lost 1% i wouldn’t mind to be able to report it and then move on with some sort of punishment… but then again i already have reasonable redundancy will have more layers of redundancy, so it’s really doubtful imo that it will impact my node…

and my pool takes 8 or so hours to scrub, so i don’t really see why it should take the long to deal with data loss…

anon27637763 · May 17, 2020, 12:38pm

Your local filesystem operates at the bus speed of the computing platform – not really, but close enough for practical comparison.

In that 8 hours, a 100 Mbit connection downloading at 2/3 max bandwidth could have retrieved about 500 GB of data…

100000000/8/2*3*60*60*8/1024/1024/1024 = 502.914 GB

If your particular node has 24 TB of stored data, it could have lost 2% of the stored data to erasure coding network repair in that 8 hours…

0.5*24 = 0.0208

If a node is only storing 5 TB of data, total data loss to algorithmic erasure coding re-routing is 10% in that same 8 hours of verification time.

It’s just not practical. The network is already repaired by the time a node proves the data store.

A friend of mine has a favorite saying…

“It’s always amazing to me that all these new decentralized storage solutions are basically replicating usenet.”

SGC · May 17, 2020, 12:44pm

i… guess… that…makes … sense…then… not very familiar with this erasure coding … is that what SDS is ? software defined storage

sorry wrong thread for this… i guess… nm ill look it up

SGC · May 17, 2020, 12:51pm

so would there be a way to keep the erasure code “alive” on each node?
so it also gains the benefit of the larger system by being able to reprocess lost data live.

anon27637763 · May 17, 2020, 1:16pm

Such node would need to store extra copies of data pieces in order to verify and actively repair locally lost data… this would decrease the amount of storage available for the paid data store while at the same time increasing the cost to run the node and the wear on the hard drives.

It’s useful to remember that ZFS already does this…

So, the practical solution is to simply recommend that node operators run ZFS.

SGC · May 17, 2020, 1:30pm

i’m very impressed by what ZFS can manage… thats for sure.

Pentium100 · May 17, 2020, 2:47pm

The customer already has to do that. Unless the particular piece has been audited recently, there is no guarantee that it is still there, the node is online etc. That’s why the customer is given a list of more nodes than he needs to download pieces.

Unless the node gets DQ for something the SNO has no control over, like db corruption.

Also, if it takes a year or longer to get back where I was and the probability of getting DQ remains the same (one small mistake and I’m done), it may be better for me to not even try to make the node as reliable - after all, restarting every few months would be better than trying to keep the node running for a year and then having to restart.

I already use raidz2 with 6 drives (so 33% of space used for parity) to make the node more reliable. However, RAID is not a backup.

In some sense Storj requirements are stricter than the requirements for a datacenter, while at the same time, the recommendations are to not run the node in datacenter-grade hardware.
What do I mean? If there was some kind of disaster and I had to restore customers emails from a day old backup, the customer would not he happy, but less angry than if I told him that I have no backup.
For some customers with databases, a backup can be done every 5 minutes etc.

I am more concerned about the database getting corrupted than actually losing files though, There is currently no way to back up the database logs to be able to restore then.

Also, right now (almost no traffic) I could probably restore a day old backup and the satellite may not notice it for a long time.

anon27637763 · May 17, 2020, 3:12pm

In many cases, an SNO does have control over DB corruption. However, this particular case needs to be addressed as a component of the node software improvement, bug fixes, and so forth… but not as an architectural change.

Remember that Beta just ended, and all the payments to SNOs before the end of Beta were quite generous considering that Storj was not receiving payments from actual customers.

The problem with this comparison is that an SNO is not storing a complete set of data… nor is storing a static sets of files… there’s no rollback to a prior state, because the current state of the network has changed. If an SNO has a backup, the backup is useless and out of date within a few hours.

As @BrightSilence indicated in one of the other similar threads, an SNO would need to recreate all the lost pieces from the complete files… So, if an SNO is storing 1TB of data pieces that SNO would need to download 30TB of data from all the other nodes… and then split the files… toss out the duplicates on other nodes… and then prove that the process was done correctly…

There’s simply no possible way to economically perform the task. Both the network itself as well as the SNO would lose funds, time, energy, and bandwidth.

It should be remembered that the recommendation of Storj is to run one drive per node and start a new ID and node when the drive dies… My own simulations of this procedure have shown me that this recommendation is about the same as running a more robust RAID or ZFS array.

The lost escrow when a node dies seems like a lot of lost funding… however, the payment to rebuild the lost data and verify that data is more than just running a new node.

The database issues are due to node hardware. It may be that the database issues could be resolved via tweaking some parameters in sqlite and/or migrating to some flat XML/JSON format for some table data… but that issue is a specific issue which is currently being discussed and mitigation solutions being worked through…

Long term, the erasure coding architecture is quite robust and individual SNOs are better served not attempting to buy back lost data or reputation.

Pentium100 · May 17, 2020, 3:23pm

Just the “starting over” part and the months with no traffic are bad enough on their own, even disregarding the escrow money.

After all, if I have to start from scratch multiple times, why should I even bother to try to keep the node running reliably? I’d rather restart every three months than every year.

(I am the sort of person who, after seeing that, say, a switch has 3 years of uptime puts in extra effort to keep that switch running, but, if the switch gets rebooted anyway, has no problems with rebooting it multiple times after that)

The current issue is a specific one - the database could get corrupted for some other reason later.

That would require running it on datacenter-grade SSD - I would not trust a hard drive for that.

anon27637763 · May 17, 2020, 3:29pm

A node in the proposed “recovery mode” will be losing funds at a higher rate than the node operator who simply drops the ID in the trash bin and simply starts a new one.

No. This is not a correct assessment.

Running regular consumer hardware will earn less than running enterprise level hardware. That’s a true statement, generally. However, there’s a cost differential in obtaining enterprise level hardware. Once all the expenses and incomes are listed, there’s really not much difference in running 4 single drive nodes on consumer hardware in succession when each dies due to some hardware failure and DQ-ed … and running one long term node with enterprise hardware.

Pentium100 · May 17, 2020, 6:43pm

For me, having to start over means the next time I will put in less effort because why bother. Setting up a node expecting to have to set up a new one when the drive fails? Why bother with it in the first place? I do not build ice sculptures for a reason. Well, lack of skill is one reason, but if I were to build a sculpture, I would not be using ice as the material.

So, if I absolutely had to run a node on a single drive (no way to have a RAID etc), that drive would be an enterprise-grade SSD or nothing at all.

The fact that I cannot have a backup makes me uneasy, especially every time I am doing something. I cannot disconnect the uplink, create a backup then do whatever I need to do, run the node for a while, see that it is working properly* and then connecting the uplink back. That would result in a lot of downtime.

I would like if it was possible to have a list of pieces that my node is supposed to have, I could at least verify that the files exist, even if I could not verify the data itself. It is more likely that a file disappears (or is not written to disk in the first place because the node uses async writes) because of a filesystem problem than for the data itself to be corrupt.

If it was possible to use MySQL for the database, I would have set up a cluster a long time ago, with backups of the logs in case an update etc messes something up.

anon27637763 · May 17, 2020, 7:49pm

It’s just a few lines of code away… Go MySQL Driver

I was thinking of trying out psql … but haven’t had time.

There’s even a Storj network simulator to test out your MySQL implementation.

Pentium100 · May 17, 2020, 7:55pm

Yes, this is what I may end up doing. Storj v2 made me learn some node.js, I guess Storj v3 will make me learn some Go.

anon27637763 · May 17, 2020, 7:57pm

I’d bet if someone writes the code and posts it to github, they would consider incorporating it.

Pentium100 · May 17, 2020, 8:04pm

My modifications usually do not look like a properly written code, especially if I do not know the language very well. I am not really a programmer, though I can write some simple stuff, mainly with bash or php. This is why I never published my modifications to the v2 node - why make people laugh at me :). My modified nodes worked well enough though.

I remember my update process with v2. Download the new version, run diff on the files I have touched (that is - files I have touched, difference between original old and new versions) and copy my changes t the new version.

anon27637763 · May 17, 2020, 8:11pm

Same here…

But I also write assembly, ANSI C, and yabasic…

Utilitarian code with plenty of errors and lots of inefficiency… However, I’m slowly coming out of my “shell” so to speak…

I guess I’ll probably need to join the crowd and just post my code so that others may mock if they wish. Tomorrow… Just like Frog and Toad like my kids like to read… over and over and over again.