Rebuild a node from scratch to replace a disqualified node

Hello, my node has been disqualified and I am not sure why. My monitoring was down and I do not know if the node has been unreachable and for how long.

From the email I have received, I have to start from scratch, so I created a new identity and authorized it. I build latest version (I build from source to run it on 32bit hardware).
I can not find any documentation or forum article on what to do next. If I try to start the node I see it crashing continuously but there is not a clear reason I could capture from the logs.

The log is full of errors indeed which I understand they have to do with walking the storage or trying to cleanup.

In short: is there any documentation on how to start from scratch? If I do so replacing the existing deployment, do I have to do anything else in addition to getting and authorising the new identity?

Can anyone help me troubleshooting the issue?

Thanks in advance.

Hello @ste,
Welcome back!

You can follow this guide:

The disqualification usually happens in two cases:

  1. The node managed to lost or corrupt data (it also includes permissions issues).
  2. The node was offline more than 30 days.

I guess, you likely followed the guide linked above and met some issues.
Usually it require to post logs exceptions with a FATAL or Unrecoverable errors to help to troubleshoot.
On case if you also forgot how to check logs:

1 Like

First remove the old installation and containers (if on docker), and remove all the dirs and files related to the old node. You can’t reuse anything.
Than, proceed with all the steps for a new node: obtain a new token, a new identity, run setup, run node.

1 Like

ah ok, thanks. do I have to physically remove also all data in the storage?

Yes. That is the meaning of starting from the scratch, literally a completely new node.

1 Like

OK I wiped it and started from scratch, likely I had a playbook for it.
I am currently facing this error starting the node:

ERROR	contact:service	ping satellite failed 	{"Process": "storagenode", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "attem
pts": 3, "error": "ping satellite: check-in network: IP address not allowed: [::]:28967", "errorVerbose": "ping satellite: check-in network: IP address not allowed: [::]:28967\n\tstorj
.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:210\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:158\n\tstorj.io/storj/storagenode/contact.(*Chore).updateC
ycles.func1:89\n\tstorj.io/common/sync2.(*Cycle).Run:102\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

NAT hasn’t changed and works correctly.

I can ssh into the box so networking should be ok. I also can wget from the box to the outside. However the console (http://host:14002) is not reachable,

What can it be?

Read the error message you posted, answer is there:

Check your configuration file or launch arguments for the value of contact.external-address.

Post your start command.

I would also rename the whole topic, it’s misleading. You can’t rebuild disqualified node. Disqualification is permanent.

1 Like

That was the issue indeed! Thanks.

However I see the allocated space in the console to be 64GB while the disk is 2TB and allocated-disk-space is set as:

storage.allocated-disk-space: 1800 GB

How are you running the node? In the container? Directly?

Also:

1 Like

never mind… it’s still issue with the configuration… :expressionless:

1 Like

Ok, so you are not passing any parameters, and relying on the default location of the config file.

Then yes, your specification shall take effect. However, allocated space wil lbe capped by the actual amount of space on the mountpoint. Many here use this behaviur to dynamically adjust available space: we set 500TB in the config file and adjust quota on the dataset.

SInce you see 65GB but expect more – check where is storage pointing storage.path, and make sure the volume is mounted.

It looks a lot like you have a 65GB system (VM, or jail, or container), with external storage, that failed to mount, so node is seeing local volume.

2 Likes

Thanks, this was very helpful; as you said, the mountpoint was incorrect

2 Likes