Node fails to start due to race condition / no internet

Whenever my Windows machine is restarted (which happens often due to Windows updates), much of the time my node fails to start automatically. Looking in the logs, the last entry I see in these situations is: “Invalid configuration: invalid contact.external-address: lookup “[my-ddns-hostname]” failed: lookup [my-ddns-hostname]: no such host”

It seems to me that there is a race condition between the storage node starting and the network for the computer connecting. If the node starts before the computer is connected to the network, then it will simply fail to start and not try again. If I then manually start the storage node service once the computer is online, then the storage node starts without issues. This probably effects nodes running on Linux as well, but I didn’t test that.

The storage node should be fixed so that it still starts even if the computer is offline… the node would simply stay in an offline state until internet connectivity is restored.

Please, start a cmd.exe with Administrators rights and execute the command from this post:

Then restart the storagenode service either from the Services applet or from the elevated Powershell:

Restart-Service storagenode

Thanks for the link to the other post. This will help mitigate the problem, but its not a permanent solution. There will be times when the DNS service has started, yet the node is still offline.

If you did this dependency and still have a problems - let us know, thank you!

This is still a problem after making that dependency. My computer restarted while my internet was intermittently down, and the storagenode never started.

What the reason in the logs and system events?

I thought I had replied to your question, but I guess not. Anyway, I just ran into this again and the errors are below. I censored my dynamic dns hostname.

2020-06-08T04:11:59.970-0700    ERROR   Invalid configuration.  {"error": "invalid contact.external-address: lookup \"XXXXXXXXX.dynamic-dns.net\" failed: lookup XXXXXXXXX.dynamic-dns.net: no such host", "errorVerbose": "invalid contact.external-address: lookup \"XXXXXXXXX.dynamic-dns.net\" failed: lookup XXXXXXXXX.dynamic-dns.net: no such host\n\tstorj.io/storj/storagenode.(*Config).Verify:150\n\tmain.cmdRun:142\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

2020-06-08T04:12:00.014-0700    FATAL   Unrecoverable error     {"error": "invalid contact.external-address: lookup \"XXXXXXXXX.dynamic-dns.net\" failed: lookup XXXXXXXXX.dynamic-dns.net: no such host", "errorVerbose": "invalid contact.external-address: lookup \"XXXXXXXXX.dynamic-dns.net\" failed: lookup XXXXXXXXX.dynamic-dns.net: no such host\n\tstorj.io/storj/storagenode.(*Config).Verify:150\n\tmain.cmdRun:142\n\tstorj.io/private/process.cleanup.func1.4:359\n\tstorj.io/private/process.cleanup.func1:377\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

When this happens the Storj node service stops and does not try to start again. On most computer restarts I don’t run into this issue, but every once in a while the Storj service (& DNS resolver service that it depends on) start before the computer gets an IP from my router. Despite this being a wired ethernet connection, this issue still happen sometimes.

There are other times when internet could be down for people when their computer starts, such as after a power outage. I think Storj should remain running and trying to resolve DNS hostnames instead of giving up and exiting. I am sure this has already caused some node operators to be unnecessarily disqualified.

In your specific case I would like to recommend to add automatic restart for the storagenode service via Services applet.

A post was split to a new topic: Node fails to start on Linux