Node suspended - rebuild - corruption

Hello,
I was moving to a raid5 lvm volume and I had several hangs. I forgot to perform a full fsck and now I have fs full and I can’t extend due to hardware failure or just fs ext4 corrupted. Fsck go in stuck , no ways. Node have been suspended. I’m trying to copy each readable files to a news empty volume.

Questions are :
How much time before disqualification?
Is there a way to rebuild (download) my node ?

Best regards

You have around a month before you get disqualified for being offline. If you fail audits while online, it can happen quicker. You can copy your identity files off the server and create a new node and then use the identity. However, if you lost data you will fail audits and eventually get dq’d. In which case, it’s better to start over if the data is not recoverable.

3 Likes

Hello,
node is currently running with people that download and upload something even if space is almost at 100% and I can’t extend due to fsck issue (it hangs). Of course I have a lot of errors and also during the copy :
"pieces error: filestore error: unable to open "config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/na/z5tgwdabmtl7i5k4rlfr7ayamkvom4qiaiw7s2epkeqhe7oxda.sj1": open config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/na/z5tgwdabmtl7i5k4rlfr7ayamkvom4qiaiw7s2epkeqhe7oxda.sj1: input/output error

Two questions

  1. it’s better to stay online with a lots of errors but with someone still able to download/upload or stay offline during the copy
  2. The copy will never be completed, I have a lot of files lost (input/output error)
  3. Is there a procedure to ask satellite to send data again ? (*)

(*) What do you mean when you say to start over copying the identity ? it’s what I’m asking at point 3 ?

current state attached

Thanks

If you can’t copy the data to a different drive, then moving your identity is of no use because it is tied to the data and will get disqualified if you move it to a location without the current data.

If you have drive errors, you will get suspended and disqualified. I would suggest you take the node offline until you are able to repair the drive or move the data to a different drive. Otherwise the errors will likely result in your being disqualified.

ok,
I will copy as much data as possible. Rsync will skip the ones that can’t read !
As suggested I will keep the node offline during the copy, it will take 2-3 days (4.5TB full).

How many days before getting disqualified ?

Thanks

If it is skipping file it can’t read, you will just get suspended after it is online and it is not able to audit the missing files.

so guys,
what should I do ?
I have one node with 4.5 TB and a corrupted filesystem with I/O error and no way to fix with fsck.
The only thing that I can do is to copy something around 95-99% of data.
I thought that there could be a way (automatic or manual) to get files again from satellite / other nodes

Thanks

I would shut it down, and investigate why you have drive errors. Is the drive failing? Is there another check tool you can use? If the drive data is bad, there is nothing you can do to get the data replaced. You could then leave it running until it is disqualified. I would investigate the drive errors and see if you can figure out if it is a hardware failure or just some kind of free space issue, broken chains, etc.

2 Likes

There is no way to recover missing pieces to your node from the network. They will be eventually recovered on other nodes but yours.