Docker node offline after power failure

Hi there,
There was a power failure last night and it only came back this morning. I have 2 nodes both on a separate internet address, one went back online no problem. However the other one can not go online. I check the port 28967 on the router and nothing has change still forwarded to the server ubuntu/docker. The ufw is disabled and so with the selinux. I check if the server is listening to port 28967 and it does using the tool ```
netstat -tulnp | grep 28967. I check the router, pfsense, and it is forwarding on port 28967, however it seems that the ubuntu/docker server is not responding/receiving any packet forwarded. I am not sure what to do now. Would there be a corrupt docker container because of the power failure?
Here are the first 100 lines of the log, Hoping anybody can help, I want to salvage the 3.5 gig of storj data:
2023-11-25T02:14:04Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode”, “Version”: “v1.91.2”}
2023-11-25T02:14:04Z INFO Version is up to date {“Process”: “storagenode-updater”, “Service”: “storagenode”}
2023-11-25T02:14:04Z INFO Current binary version {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”, “Version”: “v1.91.2”}
2023-11-25T02:14:04Z INFO Version is up to date {“Process”: “storagenode-updater”, “Service”: “storagenode-updater”}
2023-11-25T02:14:05Z INFO server kernel support for server-side tcp fast open remains disabled. {“process”: “storagenode”}
2023-11-25T02:14:05Z INFO server enable with: sysctl -w net.ipv4.tcp_fastopen=3 {“process”: “storagenode”}
2023-11-25 02:14:05,501 INFO success: processes-exit-eventlistener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-11-25 02:14:05,502 INFO success: storagenode entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-11-25 02:14:05,502 INFO success: storagenode-updater entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2023-11-25T02:14:05Z INFO Telemetry enabled {“process”: “storagenode”, “instance ID”: “12EjWjJABAPLojBe8Z3BqiU9hRytjk5iothvQQkH4cWMJdM83Mg”}
2023-11-25T02:14:05Z INFO Event collection enabled {“process”: “storagenode”, “instance ID”: “12EjWjJABAPLojBe8Z3BqiU9hRytjk5iothvQQkH4cWMJdM83Mg”}
2023-11-25T02:14:05Z INFO db.migration Database Version {“process”: “storagenode”, “version”: 54}
2023-11-25T02:14:07Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock. {“process”: “storagenode”}
2023-11-25T02:14:07Z INFO preflight:localtime local system clock is in sync with trusted satellites’ system clock. {“process”: “storagenode”}
2023-11-25T02:14:07Z INFO trust Scheduling next refresh {“process”: “storagenode”, “after”: “8h15m48.178379494s”}
2023-11-25T02:14:07Z INFO bandwidth Performing bandwidth usage rollups {“process”: “storagenode”}
2023-11-25T02:14:07Z INFO Node 12EjWjJABAPLojBe8Z3BqiU9hRytjk5iothvQQkH4cWMJdM83Mg started {“process”: “storagenode”}
2023-11-25T02:14:07Z INFO pieces:trash emptying trash started {“process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2023-11-25T02:14:07Z INFO Public server started on [::]:28967 {“process”: “storagenode”}
2023-11-25T02:14:07Z INFO Private server started on 127.0.0.1:7778 {“process”: “storagenode”}
2023-11-25T02:14:07Z INFO failed to sufficiently increase send buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See UDP Buffer Sizes · quic-go/quic-go Wiki · GitHub for details. {“process”: “storagenode”}
2023-11-25T02:14:07Z WARN piecestore:monitor Disk space is less than requested. Allocated space is {“process”: “storagenode”, “bytes”: 3690264898432}
2023-11-25T02:14:07Z INFO lazyfilewalker.used-space-filewalker starting subprocess {“process”: “storagenode”, “satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2023-11-25T02:14:07Z INFO lazyfilewalker.used-space-filewalker subprocess started {“process”: “storagenode”, “satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2023-11-25T02:14:07Z INFO lazyfilewalker.used-space-filewalker.subprocess Database started {“process”: “storagenode”, “satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “process”: “storagenode”}
2023-11-25T02:14:07Z INFO lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker started {“process”: “storagenode”, “satelliteID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “process”: “storagenode”}
2023-11-25T02:14:08Z INFO collector deleted expired piece {“process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”: “SGCP6IFOJD3VEQSWC7N5FQOGONMJL6EWJ4KVNI24ZC4C4PXH2ETA”}
2023-11-25T02:14:08Z INFO collector collect {“process”: “storagenode”, “count”: 1}
2023-11-25T02:15:18Z INFO reputation:service node scores updated {“process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Total Audits”: 330633, “Successful Audits”: 328561, “Audit Score”: 1, “Online Score”: 0.9637788778877888, “Suspension Score”: 1, “Audit Score Delta”: 0, “Online Score Delta”: 0, “Suspension Score Delta”: 0}
2023-11-25T02:15:18Z INFO reputation:service node scores updated {“process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Total Audits”: 410866, “Successful Audits”: 409675, “Audit Score”: 1, “Online Score”: 0.9647834274952919, “Suspension Score”: 1, “Audit Score Delta”: 0, “Online Score Delta”: 0, “Suspension Score Delta”: 0}
2023-11-25T02:15:18Z INFO reputation:service node scores updated {“process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Total Audits”: 1439762, “Successful Audits”: 1436849, “Audit Score”: 1, “Online Score”: 0.9643069606806078, “Suspension Score”: 1, “Audit Score Delta”: 0, “Online Score Delta”: 0, “Suspension Score Delta”: 0}
2023-11-25T02:15:19Z INFO reputation:service node scores updated {“process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Total Audits”: 326028, “Successful Audits”: 324568, “Audit Score”: 1, “Online Score”: 0.9644727172842854, “Suspension Score”: 1, “Audit Score Delta”: 0, “Online Score Delta”: 0, “Suspension Score Delta”: 0}
2023-11-25T02:15:43Z INFO pieces:trash emptying trash started {“process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2023-11-25T02:16:05Z INFO pieces:trash emptying trash started {“process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2023-11-25T02:16:10Z INFO pieces:trash emptying trash started {“process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2023-11-25T02:16:16Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:16:18Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:16:19Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:16:19Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}

ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:16:19Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “attempts”: 1, “error”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out”, “errorVerbose”: “ping satellite: failed to ping storage node, your node indicated error code: 0, rpc: tcp connector failed: rpc: dial tcp 154.20.146.89:28967: connect: connection timed out\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:209\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25 02:18:01,578 WARN received SIGTERM indicating exit request
2023-11-25 02:18:01,603 INFO waiting for storagenode, processes-exit-eventlistener, storagenode-updater to die
2023-11-25T02:18:01Z INFO Got a signal from the OS: “terminated” {“Process”: “storagenode-updater”}
2023-11-25 02:18:01,674 INFO stopped: storagenode-updater (exit status 0)
2023-11-25T02:18:01Z INFO Got a signal from the OS: “terminated” {“process”: “storagenode”}
2023-11-25T02:18:01Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “attempts”: 2, “error”: “ping satellite: context canceled”, “errorVerbose”: “ping satellite: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:203\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:18:01Z INFO contact:service context cancelled {“process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2023-11-25T02:18:01Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “attempts”: 2, “error”: “ping satellite: context canceled”, “errorVerbose”: “ping satellite: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:203\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:18:01Z INFO contact:service context cancelled {“process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2023-11-25T02:18:01Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “attempts”: 2, “error”: “ping satellite: context canceled”, “errorVerbose”: “ping satellite: context canceled\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:203\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:157\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:99\n\tstorj.io/common/sync2.(*Cycle).Start.func1:77\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
2023-11-25T02:18:01Z INFO contact:service context cancelled {“process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2023-11-25T02:18:01Z ERROR contact:service ping satellite failed {“process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, "atte

Try this checklist

I did this, WAN IP is the same. However, the port 28967 is closed. But checking the router, the port is forwarded correctly, even the log of the router says, the there was an attempt to forward a packet to the storj ubuntu/docker, however it closed the connection since the server is not responding. As I mentioned also the server is listening to 28967. But for some reason, the server is not receiving any connection attempt. I did checked also other ports that are purposedly opened and did the test through that site and shows that those other ports are indeed opened. Only this port 28967 is closed.

Did you check the local IP of your server and compared it with the IP in the port forwarding rule?
Also, does the IP from Open Port Check Tool - Test Port Forwarding on Your Router match this IP?

Yes everything matches, that’s why I am very confused. In fact I have other applications running on this server and test their ports and all are being forwarded. Only storj, under docker container has an issue.

This has been running for 2 years now, the data in it is already maximize to 3.5T for storj. Would re-installing docker, has significant effect on storj application?

No, it wouldn’t. You may try to reinstall a docker subsystem if you suspect it’s corrupted in your OS.
However the network issues usually related to the network, so you need to check all points between internet and the docker container.
Please show your port forwarding rules for this node, any firewall rules, related to this node (perhaps on your router), result of

hostname -I

Also, does your router have any DDoS protection or throttling (greylisting, etc.)

Is it possible that your container is restarting due some fatal error?
Search for FATAL in your logs:

docker logs storagenode 2>&1 | grep FATAL | tail

What does curl ifconfig.me/ip say, and did it match your external IP?
What does ip route get 8.8.8.8 | head -n 1 say, and does it match your local forwarding rule in your router?

It definately looks like an network configuration problem.

I attaching here the screenshots of all the info that you are asking me.



Please see attached screenshots.

image_2023-11-26_113920110

AS you can see on the screenshots, the storj has a lan ip address 192.168.200.50, and pfsense are forwarding the 28967 port to 192.168.200.50.
External ip address is the same as what is on the router. And I can tell there is no DDOS attack on my home network. FYI, I have 2 nodes that went down at the same time when the power went down, both are on different external ip, and both are 2 years old running and both are maximize on data 3.5T. The only difference between the 2 is that the other one is using pfsense and the other one is using openwrt. The one using openwrt has no issue when the power came back, it went on to repair and back online until now. However the one connected to pfsense did not connect back online. Nothing has change on the pfsense side. Nothing also has change on the side of the server.

See the attached screenshot. I don’t see any mistake on the networking side.

Yeah, I see. Quite strange.

But I see, docker start storagenode2. Like there is also a storanode[1] on the same device? If so, to what port is that one listening?

And I also see the machine being called multivpn. So maybe are you connected to the VPN, and the VPN forwarding rules aren’t right?

Could you possible post your docker script?

The most easy one: did you already restart the whole operating system at least once? Preferably cold?

That’s just a name so that I won’t get confused since I have other node but on another device and on another external ip. No vpn. As I mentioned, it has been running for 2 years now, and power outage cause it to stop. I restarted it and voila, it doesn’t go online.
docker run -d --restart unless-stopped --stop-timeout 300
-p 28967:28967/tcp -p 28967:28967/udp
-p 14002:14002
-e WALLET=“bbbcbcbcbcbcbbcbccbcbcbbcbcbbcbc”
-e EMAIL="hdhdhdhdhdhdd@gmail.com"
-e ADDRESS=“xxxxxxxxx.xxxxxxxxx.ca:28967”
-e STORAGE=“3900GB”
–mount type=bind,source=“/root/.local/share/storj/identity/storagenode”,destination=/app/identity
–mount type=bind,source=“/mnt/storagenode2/”,destination=/app/config
–name storagenode2 storjlabs/storagenode:latest

Indeed, but also listening on port 28967? Because above there is being listened om 0.0.0.0:28967?

Yup, I see. But also restarted the device as a whole after that? Sometimes just that, makes problem make problems magically disappear. In the two years before, also restarted now and then or just continue on?

The script you posted, aside from changing mail an wallet, can’t be your script. Because this won’t execute because of lack of escaping newlines. Besides, the most interesting part has been moderated: three external IP/address.

Of course, they are listening same port number, but different network and different external IP.
In between those 2 years, I have restarted and no issues at all. The script is good, since if there are errors there, it will be caught right away and the service won’t start. This part 0.0.0.0:28967, it is 0.0.0.0, because the server is set to dhcp, however on the router the mac address of the server is statically bind to 192.168.200.50.

Please check your hostname, perhaps it’s not updated

nslookup xxxxxxxxx.xxxxxxxxx.ca 8.8.8.8

It should return the same public IP as your WAN IP and IP from yougetsignal.

Do you want me to do the nslookup on the server itself?

Yes, please try from there, but It should produce the same result.

Strange and I am not sure why, if you look on the result, if I use a specific dns, 8.8.8.8 to do the nslookup no result, however if I dont use a specific dns server, it gives the result. It seem if I used a specific dns server, its not receiving any result or it is not sending the query. I am more confused now.


So, what happens if you change the hostname just to your IP and restart? Does it work than? Because that might indeed point to a DNS issue, although I can’t relate that to the fact in some earlier post it was started that pfsens saw traffic, but it wasn’t be accepted by the node.

Other try-out may be to chance -p parameters just to --network host to exclude bridging issues.

1 Like