Nodes goes offline randomly and greetings

Giak · September 21, 2020, 7:28pm

Good evening,
I’m Giak and i’m from Italy.If there is a dedicated topic for greetings i’d be glad to write there.

My problem is that my node randomly goes offline, so i need to restart it to make it working again.
It happens about every 100 hours of node uptime (i run it on raspberry pi4).
I’m using a VPN 'cause my isp uses nat so i couldn’t be reachable without it.
I’m also using a ddns 'cause i don’t have a static ip.
Port forwarding works correctly, is there anything else i could check?

Thanks

nerdatwork · September 21, 2020, 7:39pm

Welcome to the forum @Giak!

How is your HDD connected ?

Giak · September 21, 2020, 7:53pm

Hello!
Is connected via usb 3.0, is a 2 TB Toshiba mechanichal hard drive (i have 1.5 TB dedicated for storj )

nerdatwork · September 21, 2020, 8:34pm

Can you tell me its model name/number ?

This is in regards to

Giak · September 22, 2020, 12:59pm

Sorry is a WD Elements model : WDBU6Y0020BBK-WESN

nerdatwork · September 22, 2020, 1:41pm

Western Digital had said their WD Red HDD to WD Black HDDs use SMR so I think WD elements is not one of them.

What does your log show when it goes offline ?

https://documentation.storj.io/resources/faq/check-logs

Giak · September 22, 2020, 1:59pm

Next time it goes offline (I hope not :D) i’ll let you know

nerdatwork · September 22, 2020, 2:17pm

This is why it’s advisable to redirect your logs.
https://documentation.storj.io/resources/faq/redirect-logs

buchette · September 22, 2020, 4:37pm

is your USB HDD mounted statically with FSTAB ?

Giak · September 22, 2020, 5:42pm

@buchette yes, i confirm
@nerdatwork i tried to edit the row like :

log.output: “/home/pi/node.log” but when i save it the dashboard gui stop working and the docker keep
restarting.

Probably my row about log is wrong but i don’t find the error (i commented with # again and now everything works).

nerdatwork · September 22, 2020, 6:10pm

That’s wrong. Do exactly as shown in the link and the row should look like

log.output: “/app/config/node.log”

Then restart the node. It will create node.log

Giak · September 22, 2020, 9:43pm

Oh i thought i could change the path i wish.
However now i enabled the log and everything works correctly.
As i get more information i’ll let you know, thanks !

Giak · September 26, 2020, 7:48am

Good morning, i just update my node to V. 1.12.13, it worked well for a few minutes and now i see these errors :

2020-09-26T07:38:04.218Z	ERROR	orders	listing orders	{“error”: “order: unexpected EOF”, “errorVerbose”: “order: unexpected EOF\n\tstorj.io/storj/storagenode/orders.readOrder:555\n\tstorj.io/storj/storagenode/orders.(FileStore).ListUnsentBySatellite.func1:245\n\tpath/filepath.walk:360\n\tpath/filepath.walk:384\n\tpath/filepath.Walk:406\n\tstorj.io/storj/storagenode/orders.(FileStore).ListUnsentBySatellite:196\n\tstorj.io/storj/storagenode/orders.(Service).sendOrdersFromFileStore:398\n\tstorj.io/storj/storagenode/orders.(Service).SendOrders:192\n\tstorj.io/storj/storagenode/orders.(Service).Run.func1:139\n\tstorj.io/common/sync2.(Cycle).Run:152\n\tstorj.io/common/sync2.(Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:57”}
2020-09-26T07:38:13.426Z	ERROR	contact:service	ping satellite failed	{“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “attempts”: 8, “error”: “ping satellite error: rpccompat: dial tcp 78.94.240.189:7777: connect: connection refused”, “errorVerbose”: “ping satellite error: rpccompat: dial tcp 78.94.240.189:7777: connect: connection refused\n\tstorj.io/common/rpc.Dialer.dialTransport:211\n\tstorj.io/common/rpc.Dialer.dial:188\n\tstorj.io/common/rpc.Dialer.DialNodeURL:148\n\tstorj.io/storj/storagenode/contact.(Service).pingSatelliteOnce:124\n\tstorj.io/storj/storagenode/contact.(Service).pingSatellite:95\n\tstorj.io/storj/storagenode/contact.(Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(Cycle).Run:152\n\tstorj.io/common/sync2.(Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(Group).Go.func1:57”}

I haven’t had contact for 30 minutes, node is still online

nerdatwork · September 26, 2020, 8:19am

Ignore error for 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW as this satellite has shut down.

Go to your folder for orders/unsent remove the oldest 5 files that would have nearly same timestamp. These 5 files represent orders for 5 satelllites. You can save them in another folder too if you want. Then restart your node. Observe the log to make sure you see sending for orders which will prove your issue was fixed.

Here is a reference thread:

Giak · September 26, 2020, 8:43am

Great everything is fixed, thanks

Giak · October 2, 2020, 8:10am

Hello,
My node went offline (now i connected again to vpn, restarted node and it’s online again).
Last message i see from log is :

2020-09-27T12:20:49.800Z INFO contact:service retries timed out for this cycle

and some ping to satellites that fails

baker · October 2, 2020, 12:39pm

This is probably related to the shutdown stefan-benten satellite, which you can confirm by checking for the satellite ID: 118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW . You can ignore errors related to this satellite. So your problem must be something else. I would double check that your DDNS updater is working properly. If you use no-ip, the account must be verified manually once per month I believe.

nerdatwork · October 2, 2020, 12:53pm

In addition to ^ you can check this thread too

Giak · October 2, 2020, 2:36pm

@baker yes it’s that satellite, infact i’m ignoring those errors.
@nerdatwork i took a look, my problem is that my ip is under nat so without a ddns i can’t use my node.
Next time i try to run a new node directly with vpn server’s ip (even if it is dynamic) but until my connection is up, it doesn’t change so i give a try.

Giak · October 5, 2020, 5:05pm

I Found the problem, DUC wasn’t working well, so as my vpn server changed ip address, the connection stopped working.
I made some tests and DDNS always point to the right IP now.
You can close this thread.