Help Storage Node Offline!

Storage node has been offline almost a week! No notification was sent, and the logs simply show this, I can confirm the node has internet acces, and port forwarding

2022-06-07T13:04:58.319327394Z 2022-06-07T13:04:58.318Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “MIRUQ4BV3A4BEQNOSBXG446BVXMCD4SFM6VHUVGV2ZSXBAGQDGXQ”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET_REPAIR”}
2022-06-07T13:06:01.257681052Z 2022-06-07T13:06:01.257Z INFO piecestore downloaded {“Process”: “storagenode”, “Piece ID”: “ZBHH46DETOCWE7UKIR4IX6BUUBX37NQCQE2TNKIU5HZZ22OEYIBQ”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Action”: “GET_REPAIR”}
2022-06-07T13:06:36.121855827Z 2022-06-07T13:06:36.121Z INFO Downloading versions. {“Process”: “storagenode-updater”, “Server Address”: “https://version.storj.io”}
2022-06-07T13:06:37.895980152Z 2022-06-07T13:06:37.894Z INFO bandwidth Performing bandwidth usage rollups {“Process”: “storagenode”}
2022-06-07T13:06:44.532094479Z 2022-06-07T13:06:44.531Z INFO orders.12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo sending {“Process”: “storagenode”, “count”: 6}
2022-06-07T13:06:44.532143359Z 2022-06-07T13:06:44.531Z INFO orders.12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs sending {“Process”: “storagenode”, “count”: 121}
2022-06-07T13:06:44.532147669Z 2022-06-07T13:06:44.531Z INFO orders.12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB sending {“Process”: “storagenode”, “count”: 210}
2022-06-07T13:06:44.532150199Z 2022-06-07T13:06:44.531Z INFO orders.1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE sending {“Process”: “storagenode”, “count”: 609}
2022-06-07T13:06:44.532152468Z 2022-06-07T13:06:44.531Z INFO orders.121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6 sending {“Process”: “storagenode”, “count”: 97}
2022-06-07T13:06:44.532154619Z 2022-06-07T13:06:44.531Z INFO orders.12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S sending {“Process”: “storagenode”, “count”: 426}
2022-06-07T13:06:56.150069216Z 2022-06-07T13:06:56.137Z ERROR Error retrieving version info. {“Process”: “storagenode-updater”, “error”: “version checker client: Get "https://version.storj.io": dial tcp: lookup version.storj.io on x.x.x.x:53: read udp x.x.x.x:45613->x.x.x.x:53: i/o timeout”, “errorVerbose”: “version checker client: Get "https://version.storj.io": dial tcp: lookup version.storj.io on x.x.x.x:53: read udp x.x.x.x:45613->x.x.x.x: i/o timeout\n\tstorj.io/storj/private/version/checker.(*Client).All:68\n\tmain.loopFunc:21\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:136\n\tstorj.io/private/process.cleanup.func1.4:372\n\tstorj.io/private/process.cleanup.func1:390\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfigAndLogger:93\n\tmain.main:20\n\truntime.main:255”}
2022-06-07T13:06:57.896634119Z 2022-06-07T13:06:57.895Z ERROR contact:service ping satellite failed {“Process”: “storagenode”, “Satellite ID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”, “attempts”: 1, “error”: “ping satellite: rpc: tcp connector failed: rpc: dial tcp: i/o timeout”, “errorVerbose”: “ping satellite: rpc: tcp connector failed: rpc: dial tcp: i/o timeout\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:189”}
2022-06-07T13:06:57.896871721Z 2022-06-07T13:06:57.895Z ERROR contact:service ping satellite failed {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 1, “error”: “ping satellite: rpc: tcp connector failed: rpc: dial tcp: i/o timeout”, “errorVerbose”: “ping satellite: rpc: tcp connector failed: rpc: dial tcp: i/o timeout\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:189”}
2022-06-07T13:06:57.896934360Z 2022-06-07T13:06:57.895Z ERROR contact:service ping satellite failed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite: rpc: tcp connector failed: rpc: dial tcp: i/o timeout”, “errorVerbose”: “ping satellite: rpc: tcp connector failed: rpc: dial tcp: i/o timeout\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:189”}
2022-06-07T13:06:57.896955216Z 2022-06-07T13:06:57.896Z ERROR contact:service ping satellite failed {“Process”: “storagenode”, “Satellite ID”:

Hi @Monthrect
The logs seem to suggest a local DNS or traffic routing issue. Please check you can resolve DNS queries directly on the node. In previous cases of the same error a full restart (i.e. off then on) has fixed the problem.

Well my topic was hidden/auto-moderated. What a shamble.

Figured it out it exactly as you say, DNS. Reset DNS servers and all appears to be working.

I was mainly peeved at the fact I received no notification that the node was down! No email, nothing!

There is no such feature yet.
Some other dudes suggest here to use Grafana or a simple uptimerobot.com. The latter is very easy to set up but will only try to reach your endpoint to say everything is OK (which, in your case, wouldn’t have detected anything at all).
Hope such a feature will come soon!

By the way, could you tag the answer from @Stob as the solution?

Thanks!

1 Like

How did you get it was a DNS issue?

No, because he didn’t solve it, I did! :rofl:

Because I can read logs :weary:

Anyway, this is the solution… It may help other people to get quickly to the solution.

It was the log entry which referenced a DNS name and port 53:

1 Like