Bad stuff can happen. A single 10TB drive can develop a few bad sectors, there could be a power failure that corrupts the database and so on. Nothing is perfect. However, under the current rules, losing a bit of data means I have to delete the rest of the data and start over. This looks like a disproportionate response - the database got corrupted somehow etc and now I have to wait a year for my node to fill back up etc.
Especially since there were cases where nodes got disqualified because of a corrupt or locked DB and the DB cannot be run in a cluster.
So, how about this (@Alexey formulated this, I will expand on it later) :
You could be allowed to recover a missed files, but you should pay in advance the amount which is enough to recover the whole data. The held amount could reduce the amount to pay.
My specific example would be with a backup. I had to restore a backup from yesterday and my node does not have the 200GB that were uploaded to it since the backup was made. I now have to pay in advance for the data recovery (for my entire node, because maybe I am lying that I restored a backup) and give a timestamp of the backup. All the newer data gets repaired (and maybe put on some other nodes, whatever) and my node enters some kind of extended vetting state, where it has to pass a lot of audits (for the older data, we know for sure that it does not have the newer data).
This could go one of multiple ways:
My node gets audits for every piece it is supposed to have, any pieces that it has lost get repaired. After the process is complete, the satellite either gives me the remaining money with the next payout.
My node gets a lot of audits until the satellite is satisfied that the node either has the data or has lost more data than I claim. If the satellite is satisfied with my node, it gives me the remaining money. If my node has lost more data than I claim, it gets disqualified anyway and I do not get the money back.
The end result should be that I should not have to wait a year to fill my node up if I have a backup that is recent enough.
Just like if I manage to lose some of my customers emails I do not just go and delete the rest of them, but instead restore from a backup.
While I agree with the general purpose of the proposal, I highly doubt it will function as intended.
There simply is no possible method that could audit enough prior data pieces to ensure a node that has been disqualified could safely reenter the network. A full audit of all data would take a very long time. A node which voluntarily pays to remain in the network would lose funds rather than earning funds.
The current system functions fairly well for both sides… even if a long running node loses the escrow payment… because a node that pays to remain would be paying more than the escrow anyway.
If my node gets DQ because the database got corrupted, I do not know if I would be able to put in the same effort in keeping the node running a second time. It will be DQ again, so why bother? At least the next time my node will get DQ it will be my own fault.
Or, I can restore the node from a backup and hopefully, the newer pieces will not be audited one after the other, so the network will never know that I do not have some of the data.
With DQ the satellite knows which pieces are gone and might trigger a repair.
With restoring a node from a backup, everyone will do it or at least claim it and the satellite has no idea which node lost which data. So it might be possible that one piece of data gets lost on too many nodes and becomes unrecoverable.
I don’t see any good solution on how to recover from a backup and proofing to the network that the node still has all the pieces it should have.
I agree with you that data integrity is a huge problem. I believe local backups cannot be trusted. We all know that backups also can fail. Every SNO would use a different backup application, a different schedule etc. So the data coming from a backup is in a total unknown state.
The way escrow is now, I could probably cause Storj to lose money by repeatedly setting up new nodes, limiting upload speed (so, I would get as little GET success as possible, while still having connection speeds above the minimums) and then deleting them when they got 1TB or so of data. My earnings and the held amount should be lower than the $10 Storj would pay the other nodes for repair.
The backup should have a timestamp, so the satellite would know what data the restored node definitely has lost.
About the rest of the data - how about this: the node is placed in a special state where the data is split into “before backup” and “after restore”. The “before backup” part is considered damaged by the satellite, except:
the satellite still issues audits for it - any pieces that pass the audit are considered OK.
the satellite still allows downloads from that node, including the repair if it is triggered. A successful download by the customers marks the piece as OK, a successful repair download marks the piece as OK on this node as well.
Repair downloads for those pieces are not paid
For the “after restore” data, the node is treated as new - it has to pass the vetting phase etc and the held amount percentage gets restored.
Alternative 1 - the node is allowed partial GE, with the data that it still has with the same requirements as for a normal GE, with the exception that no data later than the backup timestamp is requested. In this case I get part of the escrow money back.
Alternative 2 - the node is allowed to to a “special” partial GE - giving the data it still has to my new node. In this case, I do not get the escrow money.
From what I understand the number of audits is much lower than the real number of pieces. So it is impossible to issue an audit for every piece of data.
If you trust only the pieces audited, this results in only a fraction of good pieces.
If you trust more data than you can actually audit, then you are facing the same problem again, wether to trust or not to trust data from a node that has proven to be unreliable.
Satellite considers all pieces as lost. It may trigger a repair if the number of remaining pieces falls below 40 (I do not know what the numbers are, su I just made them up).
Node gets restored from a backup:
Satellite considers all “between backup and now” pieces as lost.
Satellite considers all “before backup” pieces as “maybe-lost”:
2a. If the number of pieces drops below 50 (a bit higher than required for repair) it triggers an audit to find out if the “maybe-lost” piece is actually lost of not
2b. The maybe-lost pieces also get audited as normal
2c. The customer is given a node with a “maybe-lost” piece in the list for download
2d. If one of the previous steps succeed, the piece is considered “not lost”.
For Storj this should reduce the amount of repair traffic without affecting data integrity.
But the question remains. Where does the backup come from and why should the satellite trust it even partially?
It starts with the timestamp. How does this get verified? Implications as you have mentioned them are huge: Either consider the piece as lost or “maybe-lost”.
And for the backup itself, why should the satellite trust it or how?
This is the question that needs to be answered.
The SNO is saying: “Hey satellite, I have lost 1 TB from my 10 TB. But I have a local backup. I cannot tell you how I have created it, but I can tell you it is there and it looks ok.”
So what should the satellite do? From my understanding it is impossible to audit every piece of the 9 TB backup. I don’t know the numbers but maybe for each audited piece, maybe 1000 remain not checked. So the satellite has to decide: Trust or not to trust. And I don’t know how to get around that fundamental decision.
I think I outlined that relatively well - the piece is lost, maybe-lost or OK.
Start with all pieces before the timestamp as “maybe-lost”, after the timestamp as “lost”. If the piece passes an audit, it is “OK”, if the piece approaches the threshold for repair (but still above it), it gets audited and becomes “lost” or “OK”. If the customer succeeds downloading the piece, it becomes “OK”.
Some pieces would not approach the threshold for repair and would not be audited for a long time, remaining as “maybe-lost”. These pieces are in no danger. Also, the satellite can only pay for the storage of “OK” pieces if that is your concern.
So, I have a 10TB node, mess something up, restore a day old backup which has 9.9TB of data.
The satellite gets informed about that and marks the “newer” pieces as “lost”.
The satellite marks all the other pieces as “maybe-lost”.
Until the satellite is sure that the piece is OK, I do not get paid for its storing it.
If a piece is audited by random and the audit passes - it is marked as OK and the satellite pays me for storing it from the moment of the successful audit.
If a piece approaches the “slightly-above-repair” threshold it is audited and marked as either “OK” or “lost”
If the customer wants to download the piece, he can try to download it from my node (among others) and if the download succeeds and is not canceled, the piece gets marked as “OK” and I get paid for egress.
Let’s say I lied and in fact have lost much more data. For the satellite it would be the same as DQing my node - pieces would get repaired as they hit the repair threshold.
If you do not get paid for un-audited pieces, how long would it take to restore 10 TB to a state you get paid again?
This is what I was trying to say. It’s my understanding that the number of audits are nowhere near to the real number of pieces stored.
So it is my understanding that if you would get paid only for pieces audited, your earnings would not recover for a very long time. Maybe even longer than starting all over again.
I understand correctly, that you trying to find a way to keep participating on egress payment, right?
Well I think what you describe is a little bit late down in the chain. When a “maybe-lost” piece is offered to the customer, it is an unknown state. The satellite does not know if the piece is there. The node does not know if the piece is there. Why should such insecurity be allowed? The customer expect the piece to be there. Why should satellite or Storj wait to find out the piece it not there in the moment the customer wants to download it?
That’s why I am saying it is the fundamental Storj decision that has to be made even with your proposal: Do we want to keep trusting a failed node, yes or no?
My previous response was a little short and I can do better. I like the idea in principle, but I’m trying to find a fair way to go about it. I think marking all storage as untrusted actually goes a little far, but if you do go by that approach, repair happens at the same time it would have had those pieces been lost. So it would incur pretty much the same repair costs. Most pieces will never be touched until that repair happens. So most of the pieces would still be untrusted at that point. To cover these costs the only fair approach would be to take the held amount and start the process of collecting held amount over again.
But as I mentioned, I think marking them all as untrusted is going way too far. Instead we should create a system that gives the node an incentive to correctly mark which pieces are lost. We should focus on how to deal with repair costs for those pieces and we should let the normal audit process take care of determining that the remaining data is still reliable.
The challenges are that you could try and cheat this system. Say you lose 100GB, you calculate how many failed audits your node can likely survive and determine it can survive by only reporting 50GB worth of pieces as lost. That’s the kind of stuff that will be really hard to prevent.
I would get paid for egress.
And yes, I would probably not get paid for most of the pieces that do not get accessed for a long time. But the payments for storage are low anyway.
The customer gets a list of nodes, but only needs to successfully download from some of them to reconstruct his file (the others get canceled). If it turns out that my node does not actually have piece, the customer would just download it from another node.
If my node was DQd, the customer just would not have the option of downloading from my node, that’s it.
Yet they do trust a failed node operator.
How is a “maybe-lost” piece worse than a “definitely lost” one?
something like this might make it possible to verify everything “easily”
satellite sends an algorithm to three or more nodes which a certain set of piece being held by all of them, the nodes are asked by the sat to apply this on the data pieces in question.
here is my idea thus far…
this gives back a checksum created from reading all the data and doing some basic math, taking up 1/100000000000 or whatever of the original data, the satellite then after some hours or whatever job size dependent ofc have passed, gets back the checksums and compares them… then it maybe stores the created checksum or in some way utilizes the processing the nodes did to make the checksums to build a sort of checksum puzzle that the nodes cannot figure out, because
1 they don’t know the algorithm they need to apply, and they don’t know the pieces that will be selected, and because of the many variations one cannot prepare this data without having the actual data working as a focal lens to verify the integrity of the whole…
that way the satellites just needs to keep a map of sorts…
there is no real usage of bandwidth
and all data on the nodes could be verified and if one nest and scales it in the right ways it is very likely that one would also be able to identify exact pieces that are broken… but for right now that is outside the scope of what i can do on the back of a napkin conceptual storjnerding.
Honestly I don’t think it should be the customer problem if a node cannot reliable keep his data. I would leave the customer out of this problem entirely. The problem is fully internal and should never ever reach the customer.
So my suggestion would be that satellite must not offer a maybe-lost piece to customer but audit it. So auditing would expand to data pieces a customer really wants to download.
Still satellite would have to deal with “maybe-lost” data. As their state is unknown until proven satellite would need to maintain redundancy as for sure one scenarios must not happen: A customer wants to download data and only “maybe-lost” pieces are available and fail.
well before anything comes to the customer the satellite has to verify that everything is there…
a customer that see a please wait while we reconstruct your data isn’t likely to come back…
the satellite must be able to verify data on the nodes… anything else is just an insane house of cards…
ALL OF IT even if its done mostly locally in multiple locations, on request of the satellite using some tricks to make it impossible for nodes to not have read all the data of arrive at the same numbers. and the result being able to segment the data into blocks and give a location of a damage piece in the super / hyper piece
yeah so you use checksums on 3 sides of a hyper piece and so long as its no more than 1 piece that is damage within it, then it can be located, and maybe even corrected… i might just have put the final nail in the cannot verify and find the data that has gone wrong issue with this…
and ofc if one needed a better setup one would just use more block sections… ofc it might add more data to the matrix we seem to be building here, but it would be totally possible to identify multiple bad blocks in that way…
lol may be we can get some of that smart repair functions the network has down on our nodes also… that would be kinda cool…
Maybe I wrote it not entirely correctly. For this purpose, the maybe-lost piece should not be counted. If the number of “OK” pieces drops to 50 (or whatever), an audit gets triggered.
Right now if the number of “OK” pieces drops to 40, a repair gets triggered.
The number for audit should be higher than the number for repair.
ill need to read up on exactly how that works… but it doesn’t really matter might just make it easier to do what i suggest… because the system is sort of designed to be able to do it already… we just need to add more vectors to the mathematics of it… instead of it being linear
gutshot tho… will take a look at how that actually works…