Audit system improvement idea

Derkades · March 8, 2020, 6:03pm

It would, to get a portion of escrow money relative to the amount of data they can still send back.

Somewhat related to this thread: Reduce storage / partial graceful "exit"

littleskunk · March 8, 2020, 6:04pm

Why should there be an incentive to keep bad nodes running?

It is way more expensive than you might think. Instead of downloading the pieces to your machine the satellite can simply repair it. That will reconstruct more than one piece for the same costs.

Derkades · March 8, 2020, 6:05pm

You may want to read back a couple of quotes for more context

Pentium100 · March 8, 2020, 6:08pm

Someone from Storj said that it costs more to repair the data of a DQ node than there is money in escrow for that node. If that is true, wouldn’t it be better for everyone if, say, I could upload 4.4TB of data back to the network after losing 10GB or whatever the threshold for DQ is?
As it is right now, if my node fails an audit, I can either delete it and start over or continue running it until it is disqualified as starting graceful exit will not help me. So, a full repair is triggered instead of only the whatever data I lost.

hoarder · March 8, 2020, 6:11pm

To get the data off the nodes as a part of a forced graceful exit process and reduce the amount of repairs.

No pieces are downloaded. Node hashes the pieces, checks if the hash matches with the stored/received one, reports bad ones. Only two lists of pieces are transferred.

littleskunk · March 8, 2020, 6:18pm

For the network it is better to have a real penalty so that you don’t delete data in the first place!

direktorn · March 8, 2020, 6:19pm

There is no point in recovering a node as there are 10s, 100s or 1000s nodes that have the same segment(s) that was lost and they will be more than happy to serve.
The whole point is to make the network redundant not the node

Derkades · March 8, 2020, 6:23pm

There is, because repairing data is expensive

littleskunk · March 8, 2020, 6:25pm

We are more than happy to pay that to make sure bad nodes are getting the right incentive next time.

direktorn · March 8, 2020, 6:26pm

I hope you edited your post if this where incorrect?

I think its good to spend time on the network as a whole instead of one single SN. If Uber have a bad driver they just kick him or her out of the network so my as a customer can take another Uber.

There is no way the satellites can check every segment, if someone uploads a 10GB file it will be split in thousands of smaller peaces, you might not even have all of them. If the satellite should try to get every file you would loose a lot of bandwidth as the only way to check if the file is actually there is to download it.

If you’re old HD that you think will fail fails, run multiple instances instead.

This is a bit like if you was a Uber driver and said you shouldn’t be punished as you just killed one of your customers

Pentium100 · March 8, 2020, 6:29pm

And yet, the recommendation is to use a single hard drive, with the design of the system such that backups are impossible. Some people are going to get punished for following recommendations.

I was replying to the point that it is a financial loss for Storj to disqualify a node. Since a node can fail by losing only part of the data (as opposed to a hard drive crash etc), it would look like some kind of partial GE would be a good thing for both sides. However, as @littleskunk pointed out, it is better to have a real punishment for the node operator, even if it does result in a financial loss for Storj. So, whatever.

direktorn · March 8, 2020, 6:29pm

Ah. I don’t know how the network works, but I think I remember something about 60 copies of each segment, is that always the case? If one node fails to audit and the satellite initiate a repair for all the segments the node had then I agree. But if the network has 120 copies?

Derkades · March 8, 2020, 6:31pm

If I recall correctly, data is repaired when there are 30 or less pieces available and there are 25 pieces needed to reconstruct a segment. I might be totally wrong though.

“copies” is the wrong word to use though, there are not 60+ complete copies of an entire segment. It uses some form of erasure code

Alexey · March 8, 2020, 6:31pm

There is no copies.
Please, read this blog post:

littleskunk · March 8, 2020, 6:33pm

And they will still make more money even if they are losing a hard drive compared to the reduced income they would have with a raid setup.

direktorn · March 8, 2020, 6:35pm

I don’t think copies is the right terminology but I’m calling that anyways even if they are Erasure Shares and every share is unique.

Derkades · March 8, 2020, 6:38pm

Slightly off-topic but this is a complicated consideration. I understand the official advice of one node per drive because that means more raw capacity so theoretically more money. But, if you keep losing nodes after 6 months you only get a small percentage of that due to escrow. I guess it depends on how reliable the drives you use are.

If you run one node per drive, even for 15+ months, you will always lose 50% escrow (the part that you get back when using graceful exit), unless you exit before the drive fails which realistically very few people are going to do. A disk pool will probably survive longer than someone’s interest in this project, so a graceful exit is possible.

Pentium100 · March 8, 2020, 6:43pm

Even worse, IMO, is that it is the official recommendation. Usually, a “recommendation” means that if you do it liek that, you are in the clear if something bad happens. Here it’s essentially setting the node up for failure.

Considering that someone may want to run the node for a long time, a drive failure is essentially inevitable. If there was a way to back the node up, then I would consider it to be a viable option, but backups are impossible, so redundancy is a must in my opinion.

Vadim · March 8, 2020, 6:52pm

I agree with @Pentium100 that if we meat all reuarments then there sholdnt be penolties for that.

today is best option make GE every 6 months. Because most hdd’s are used and are 2-5 years old. In other way it not profitable at all, tu buy new hdd’s that will survive so long periode, time to time load is very havy. Some time hdd’s just turn them self off, because overheating. And Most hdd will not survive 15 months.

hoarder · March 8, 2020, 6:58pm

Please confirm if I understand this correctly.
You’re fine with losing money and potentially data to punish an anonymous SNO for partially losing data to drive or other hardware issues? While at the same time you recommend people to use non-raid setup and run on inherently unstable systems like windows pcs and raspberry pi’s?