Stop the deployment of update 1.97.2 immediately

There finally exists version v1.97.3 that should be able to capture the startup failure.

If the failure happens before the service is fully operational, it will try to log to event viewer. If that also fails it outputs it into stdout – which will be only visible when starting the service from command line as the administrator with the same arguments as the service.

Hopefully this will help us figure out what’s causing the issue.

2 Likes

@AiS1972 have you had a chance to test v1.97.3, or does the issue not happen with that version any more?

Most nodes running on windows encounter the same problem.
The node that was automatically updated to version 1.96.6 failed to start. When it was returned to 1.95.1, everything was normal.
Part of the upgrade to 1.96.6 was running normally, but it stopped working when it was updated to 1.97.3 again. Manually roll back to the previous version and it will run normally.
More than ten of my nodes have the same problem, and I have deleted the upgrade program.
Because the program stops immediately after being started manually, no new error information is written to the log file.

May it be the problem with accidentialy doubled config entrys in the yaml, causing nodes not to start?

1 Like

Yes, the reason is that zksync payment collection is enabled. If the operator.wallet-features parameter in the yaml file is disabled, the node can run normally.

you can edit the existing line, using the search funktion.

1 Like

Can I only change zk to an eth chain wallet?

Yes, just comment out the option for wallet features, save the config and restart the node.
The default is still Ethereum (L1).

I’m having the same issue with Windows Node. Unable to start the service.
I rolled back to 1.95.1 and disabled the auto updater. Hopefully someone will post here when this is properly vetted and fixed and I"ll then turn the updater back on.

Might not be related to the original issue, but I saw one node with start issues after automatic update, due to 2 lines in config for console.address.

Obviously this was a mistake on my part, but it shouldn’t cause a node to not start - perhaps just use the first occurance ?

1 Like

Obviously this was a mistake on my part, but it shouldn’t cause a node to not start - perhaps just use the first occurance ?

I do understand how it can be annoying to fix the config when the additional restriction came unannounced.

The issue is that for config flags there isn’t an obvious answer, when it should fail to start and when not.

For example, let’s say you have an additional operator.wallet: 0x00... causing your payments to be sent to the wrong location. Sure it’s the operators fault for having an incorrect config, but should storagenode fail to start? If the user moves their storagenode directory and adds a new config param, but forgets to remove the old one – causing audits to fail, because satellite now thinks storagenode has lost all the data?

There are roughly two options for the configuration:

A. Allow incorrect configuration, display a warning or error in log and still start the storagenode (assuming the config is good enough). The config errors persist and are unlikely to be fixed, however it’s less annoying to the operator. Further config changes can introduce new errors.

B. Disallow incorrect configuration. The operator gets an annoying failure and needs to fix the config for things to start. Any further misconfigurations will be clearer when config is changed.

If the B. approach would’ve been intentional, we would have done a gradual change – e.g. notify operators, then first release only logs an error, then a few releases later that fails to start. Unfortunately, it was accidental. If I had a time machine, I would have forced that restriction from the storagenode v0.0.1 release.

To me it seems going back to more permissive config would allow more mistakes down the road. Although, I’m not feeling it too strongly. Now re-doing the “gradual config correctness enforcement” seems somewhat unnecessary. The config needs to be fixed regardless.

Either way, I’m sorry that this change happened accidentally without any pre-announcement or notices.

3 Likes

Hello @Bishop-T,
Welcome back!

I would recommend to fix a duplicate keys in your config.yaml and enable the updater back.
To see where is mistake you may run the node from the PowerShell:

& "$env:ProgramFiles\Storj\Storage Node\storagenode.exe" run --config-dir="$env:ProgramFiles\Storj\Storage Node\\" --log.output=stderr