Node disqualified after config pointed to wrong storj directory - what steps next?

twl · May 31, 2020, 9:32pm

After adding a new HDD, Windows decided to give it the drive letter that was formerly used for my storj pool drive. I didn’t notice first, but then uptimerobot.com alarmed me, I checked on the node and figured it has been DQed after some hours of downtime (faster on some satellites, slower on others).

I have since then corrected the path in the config.yaml. Will satellites just pick up on my node again? I don’t think that my storagenode data is corrupted.

Alexey · May 31, 2020, 9:53pm

The disqualification is happened because of data lose not the downtime. The disqualification is permanent and not reversible.
If your node disqualified on all satellites, you will be forced to start from scratch with a new identity, new authorization token and clean storage.
If it doesn’t disqualified on all satellites yet, then you can decide to continue run this node and receive payments from remained for the usage.

Krey · May 31, 2020, 11:35pm

I also lost a node in this way. Then there was no suspend mode, now it exists. It would be nice if it worked in such situations.

BrightSilence · June 1, 2020, 12:12am

I agree, there have been too many SNOs who fell into this trap with the best intentions. I understand that it’s a challenge to allow for nodes to recover from missing files, without giving cheaters a way out, but I think something could be done that would satisfy both those criteria. Just from the top of my head I would say, suspend the node. When the node restarts, send it one of the audits it previously failed again. If it succeeds, send the node all the audits it failed recently. If all of them succeed, remove the suspension.

Beddhist · June 1, 2020, 4:44am

There is a way to avoid the problem of changing drive letters: you can mount partitions on folders in the system file system (usually C:). It is found in Disk Management where you change drive letters.

However, the SN installer will normally refuse to work with this, because is not aware of this and it only checks the free space on the root file system, which is often too small. A small change will be required here.

I have not tried what happens if I put this into config.yaml and I’m not going to, if I risk my node being dq’d if it can’t access the required files.

kevink · June 1, 2020, 5:32am

There would be an easier way. When the node starts, it checks in with the satellite. Now without any files, the node doesn’t know if it is new or pretty old. So let the satellite answer the first check-in with some information about the node (e.g. used space on the satellite or 20 random piece-ids).
The storagenode then realizes that it doesn’t have any of those files/no used space for storj and stays offline because the most likely case would be a wrong mounting point, while the satellite could send the SNO an email and put the node into suspension mode.

This way the SNO can correct his problem in suspension mode but we’re not even touching the area of recovering from missing files or getting new files that might need to be merged or similar.

twl · June 1, 2020, 9:29am

I want to add something:

The HDD that I put into the PC already had storagenode data on it (old data for a different identity). Maybe this would’nt have happened with an empty HDD?

BrightSilence · June 1, 2020, 9:31am

It wouldn’t have started if the path didn’t exist. But if you pointed to the root of that drive it wouldn’t have mattered. If the node finds an empty storage location it just assumes it’s new and starts with no data.

Krey · June 1, 2020, 10:21am

This problem not only about Windows drive letters and node start. It more. Say i lost node in Linux when i change config with New path, start the node, double check logs and when dismount old path.
I forget daemon-reload, in result New config was not applied. But it start because old path still was here.
Detecting outbreak a full path is simple. Just suspend node then some audits failed in a row. Or just before score down to 0.6

twl · June 1, 2020, 6:14pm

I started over with a new identity, but the dashboard still says i’m DQed on all sats (showing the date and time when I was FIRST DQed).

Both identity and storage paths are correct and the node shows as “online”, also the port is open.

/EDIT: nvm, I just managed to forget emptying my storage folder, so it seems I got myself DQed again right away

I now stopped the node, formatted my storage drive, restarted the service and it seems fine. Is it possible that this didn’t lead to disqualification? Or would it be safer to just get a new identity now and start over?

kevink · June 1, 2020, 6:37pm

DQ can currently only happen if you fail audits. So if you have more pieces than you should have, this can’t lead to DQ. Maybe the trash mechanism will even eventually remove all the pieces that don’t belong to your new node (if you remove the DBs?).

But if you already formatted your drive, you might have removed a few new pieces so the new node will definitely get DQed very fast. So starting with a new identity seems best.

twl · June 1, 2020, 6:53pm

This is my assumption as well. It has only been running for about 1 minute tho

kk, new identity now. Thx!