Restore a node with hard drive fail

Hey,
my node is actual offline, and i think the harddrive is broken. i have a lot of read and write fails. Now i would create a new node. btw. i cant backup the data folder, because the drive is not readable. i have safe my keys and can use this, only how?

Hi Amnesia,

Since you have lost the data associated with that node, any attempt to recreate the node with the same identity key file will cause the recreated node to fail audits and quickly be disqualified. You will have to request a new authorization token with a different email address, and start from scratch. Refer to this post to avoid problems when requesting a new token.

3 Likes

Thanks for the fast response… fu** this was a full 8TB drive with a lot of traffic… okay,
to a new one…

Why is there no mechanism to repair the data in such a case?
It is kind of unfair to longterm trusted SNOs that if harddrive failure occurs there is not even an attempt to repair the data on the SNO ID.

1 Like

Hi Champmine18. The problem there is that other nodes would have to repair the data and send it to your node that lost the data. Who pays for that? The SNO operator would have to, and that is taken from the escrow money. So, there is no benefit to the SNO operator to go through a repair process since they would still likely lose their funds. Not to mention, it would be burdensome to the network to also send 8TB to a specific node, especially if that node operator were malicious and doing this constantly. It would be unnecessary traffic.

1 Like

@Knowledge interesting, thanks for the explanation. So what is the network doing to recover say 20 lost nodes that hold the same data?

As I understand it, you can lose 2/3rds of a file’s data and it can be recreated with the error correction Storj is using. So, it would be difficult to lose 2/3rds data from all sources. It’s not setup in such a way that one node would get all the shards of a file. It’s distributed, and I believe it has some IP/Region distribution. Though, I haven’t looked into it to know exactly how that works.

You can find more details on the subject here

In the case where there is only corruption of some data (not all) it would be helpful to have this feature. If a disk develops a bad sector and one specific piece becomes corrupt as a result, I would absolutely be willing to pay other nodes to help me repair that piece (after moving the data to a new disk, of course). The alternative is losing everything and starting over.

A “voluntary audit” where the storagenode hashes all of its pieces and requests repairs on any that have become corrupt would be a very beneficial thing for the network and SNOs. This could even be automated – verify some number of blobs every day and request repairs on any that are damaged.

1 Like

Thanks for the interesting discussion, very insightful.

I fully agree, automated verification and repair of blobs on the nodes would be a good thing to protect the network against bad sectors and bit rot/flipping.

If a number x of nodes with the same data come down, how does the network repair this? The satellites would need to pay for the repair traffic? Is the cost not the same if the failed harddrives from the x nodes are just refilled when coming online again?

Thanks @heunland but that link only leads to the general support site, were you trying to point to specific information there?

sorry I accidentally pasted the wrong link. I have corrected it in the post above. You can find the article here.

Hi,
Does this all mean that the optimal way to mitigate against HDD loss is to put each hard drive in a separate node instead of some kind of redundancy (example: RAID) to maximise profit?

There are two options

  • Run one node per disk, no overhead and if one disk fails it won’t matter that much because you’ve got the other node(s). This is the official recommended solution.
  • Put all disks in a redundant pool, and run one storage node. This way you won’t have to wait for the vetting process, but you do lose a lot of storage capacity to get redundancy.
1 Like

Please note that if you plan to run multiple nodes on the same IP/subnet, it is recommended you do not start them all up at once (this is what would cause the excessive vetting time on all nodes), but preferrably wait until the last node is almost full before starting the next one. This way you won’t have an excessive vetting time to deal with. But at a minimum, if you don’t want to wait for the first node to fill, you should wait until it is vetted before starting the next node. This way you could successively start new nodes every month or so until you reach your full capacity.

2 Likes