Additional HDD nodes are shown offline in the dashboard

amaneusan · August 28, 2020, 1:37pm

Hello, Operator.

I have added a second node to another HDD on a computer that is already running as a node, but the dashboard shows it as offline and I’m having trouble.

We have confirmed that port 28968 is accessible from the outside.

docker run -d --restart unless-stopped --stop-timeout 300
-p 28968:28967
-p 100.100.0.0:14003:14002
-e WALLET=“0x000000000000000000000”
-e EMAIL="example@example.com"
-e ADDRESS=example.com: 28968"
-e STORAGE=“1.5TB”
-v “/storj/identity/storagenode2”:/app/identity
-v “/storj/file_storage”:/app/config
–name storagenode2 storjlabs/storagenode:latest

amaneusan · August 28, 2020, 1:41pm

last 20 lines of the log

2020-08-28T12:27:35.228Z INFO Configuration loaded {“Location”: “/app/config/config.yaml”}
2020-08-28T12:27:35.315Z INFO Operator email {“Address”: “example@example.com”}
2020-08-28T12:27:35.315Z INFO Operator wallet {“Address”: “0x000000000000000000000”}
2020-08-28T12:27:36.539Z INFO Telemetry enabled
2020-08-28T12:27:36.546Z INFO db.migration Database Version {“version”: 43}
2020-08-28T12:27:37.534Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock.
2020-08-28T12:27:38.391Z INFO preflight:localtime local system clock is in sync with trusted satellites’ system clock.
2020-08-28T12:27:38.391Z INFO bandwidth Performing bandwidth usage rollups
2020-08-28T12:27:38.392Z INFO Node 10000000000000000 started
2020-08-28T12:27:38.392Z INFO Public server started on 0.0.0.0:28967
2020-08-28T12:27:38.392Z INFO Private server started on 127.0.0.1:7778
2020-08-28T12:27:38.392Z INFO trust Scheduling next refresh {“after”: “4h3m35.08540843s”}

baker · August 28, 2020, 2:12pm

I see a space between the : and 28968 in your run command. Did you run it with this space, or was that just from when you removed your address/ip?

amaneusan · August 28, 2020, 3:13pm

Thanks for the comment.

The spaces were put in after editing, not really.

baker · August 28, 2020, 3:18pm

Okay. You should verify that your new identity was signed properly.

https://documentation.storj.io/dependencies/identity#confirm-the-identity

amaneusan · August 28, 2020, 3:31pm

Thank you.
The ID was properly proven.

I checked the logs for the first time in several hours and found an error. What does this mean?

2020-08-28T12:27:38.392Z	INFO	trust	Scheduling next refresh	{"after": "4h3m35.08540843s"}
2020-08-28T13:27:38.392Z	INFO	bandwidth	Performing bandwidth usage rollups
2020-08-28T13:27:38.445Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 1, "error": "ping satellite error: rpccompat: dial tcp: lookup asia-east-1.tardigrade.io on 100.100.0.0: no such host", "errorVerbose": "ping satellite error: rpccompat: dial tcp: lookup asia-east-1.tardigrade.io on 100.100.0.0: no such host\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-08-28T14:27:38.392Z	INFO	bandwidth	Performing bandwidth usage rollups
2020-08-28T14:27:38.446Z	ERROR	contact:service	ping satellite failed 	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "attempts": 1, "error": "ping satellite error: rpccompat: dial tcp: lookup asia-east-1.tardigrade.io on 100.100.0.0: no such host", "errorVerbose": "ping satellite error: rpccompat: dial tcp: lookup asia-east-1.tardigrade.io on 100.100.0.0: no such host\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

s

baker · August 28, 2020, 3:37pm

This suggests that the container cannot reach the outside internet. It is trying to resolve the address of the satellite, but the DNS query fails. Have you tried restarting your system?

amaneusan · August 28, 2020, 3:54pm

Rebooting the system did not solve the problem.
Another node running on the same machine is working fine, so is it something else?

The identity path was incorrect and did not return the correct number.
Get the identity again.

baker · August 28, 2020, 4:11pm

That probably just means it was not signed. You don’t need to regenerate it, you just need to do the signing part of the instructions.

nerdatwork · August 28, 2020, 8:55pm

STOP your node immediately and update your command as per documentation

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967 \
    -p 127.0.0.1:14002:14002 \
    -e WALLET="0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
    -e EMAIL="user@example.com" \
    -e ADDRESS="domain.ddns.net:28967" \
    -e STORAGE="2TB" \
    --mount type=bind,source="<identity-dir>",destination=/app/identity \
    --mount type=bind,source="<storage-dir>",destination=/app/config \
    --name storagenode storjlabs/storagenode:latest

amaneusan · August 29, 2020, 6:11am

In my environment, --mount is not recognized and not available, so I use the previous -v to mount it instead.

After replacing the identity correctly, the storagenode2 container is now repeatedly restarting.

2020-08-28T21:36:20.558Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "context canceled"}
2020-08-28T21:36:20.752Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "context canceled"}
2020-08-28T21:36:20.826Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2020-08-28T21:36:21.144Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "context canceled"}
2020-08-28T21:36:21.218Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "context canceled"}
2020-08-28T21:36:21.241Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "context canceled"}
2020-08-28T21:36:21.241Z	FATAL	Failed preflight check.	{"error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:712\n\tmain.cmdRun:204\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:330\n\truntime.main:203"}
2020-08-29T01:14:49.019Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
2020-08-29T01:14:49.039Z	INFO	Operator email	{"Address": "example@exampe.com;}
2020-08-29T01:14:49.039Z	INFO	Operator wallet	{"Address": "0x000000000000000001"}
2020-08-29T01:14:50.762Z	INFO	Telemetry enabled
2020-08-29T01:14:50.766Z	INFO	db.migration	Database Version	{"version": 43}
2020-08-29T01:14:51.198Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.
2020-08-29T01:14:51.247Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "error": "rpccompat: dial tcp: lookup europe-west-1.tardigrade.io on 192.168.0.1:53: no such host", "errorVerbose": "rpccompat: dial tcp: lookup europe-west-1.tardigrade.io on 192.168.0.1:53: no such host\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-08-29T01:14:51.250Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "error": "rpccompat: dial tcp: lookup saltlake.tardigrade.io on 192.168.0.1:53: no such host", "errorVerbose": "rpccompat: dial tcp: lookup saltlake.tardigrade.io on 192.168.0.1:53: no such host\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).getSatelliteTime:110\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check.func1:67\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2020-08-29T01:14:51.377Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "error": "context canceled"}
2020-08-29T01:14:51.651Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "error": "context canceled"}
2020-08-29T01:14:52.070Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB", "error": "context canceled"}
2020-08-29T01:14:52.645Z	ERROR	preflight:localtime	unable to get satellite system time	{"Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "error": "context canceled"}
2020-08-29T01:14:52.645Z	FATAL	Failed preflight check.	{"error": "system clock is out of sync: system clock is out of sync with all trusted satellites", "errorVerbose": "system clock is out of sync: system clock is out of sync with all trusted satellites\n\tstorj.io/storj/storagenode/preflight.(*LocalTime).Check:96\n\tstorj.io/storj/storagenode.(*Peer).Run:712\n\tmain.cmdRun:204\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:330\n\truntime.main:203"}

nerdatwork · August 29, 2020, 7:39am

Can you elaborate ? Which operating system ? What is the version of your docker ?

amaneusan · August 29, 2020, 7:45am

CentOS 7
Docker ver 1.31.1, API version: 1.26 (minimum version 1.12)

nerdatwork · August 29, 2020, 8:00am

You are risking your data by using -v instead of --mount. Your docker version should be at least version 2. Is this docker community edition ?

amaneusan · August 29, 2020, 8:43am

I’ve updated the docker version and set up using --mount, but the container still keeps restarting.

nerdatwork · August 29, 2020, 10:57am

Try this checklist

Alexey · August 29, 2020, 5:56pm

Please, check your logs for the reason. I think you have used curly quotes somewhere instead of straight ones (you should use these ones: " instead of “ and ”) or hyphenation instead of double dashes --.
Also, it’s possible, that you hit a problem with a firewall and docker: Topics tagged centos

Dunc4n1d4h0 · August 30, 2020, 12:54pm

IMHO, if you added 2nd node on same OS as 2nd instance, you should check that you didn’t use same ports (used by 1st instance already) for new node, also check if you made correct port forwarding on router for new node.

amaneusan · August 31, 2020, 7:16am

The correct port forwarding is taking place, but the error continues to be displayed as incorrect clock time.
Another node on the same machine is working fine, so it’s probably a Docker issue.
I’ll give it some time.

Alexey · August 31, 2020, 8:23am

I would like to suggest you to take a look on firewall issues with a docker on CentOS: