Node not starting - i/o timeout / connection timeout

Hello,
I have two ubuntu nodes that stopped working after an OS upgrade and reboot.

I already tried to destroy and recreate the docker storagenode application - I use docker compose.

These are my logs :
2025-02-21T16:41:32Z INFO Configuration loaded {“Process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2025-02-21T16:41:32Z INFO Anonymized tracing enabled {“Process”: “storagenode”}
2025-02-21T16:41:32Z INFO Operator email {“Process”: “storagenode”, “Address”: “xxxxx”}
2025-02-21T16:41:32Z INFO Operator wallet {“Process”: “storagenode”, “Address”: “xxxxx”}
2025-02-21T16:41:33Z INFO server existing kernel support for server-side tcp fast open detected {“Process”: “storagenode”}
2025-02-21T16:41:41Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “open_time”: “5.562200562s”}
2025-02-21T16:41:47Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “open_time”: “5.454139228s”}
2025-02-21T16:41:52Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “open_time”: “5.547410667s”}
2025-02-21T16:41:58Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “open_time”: “5.397359886s”}
**2025-02-21T16:42:28Z ERROR version failed to get process version info {“Process”: “storagenode”, “error”: “version checker client: Get "https://version.storj.io": dial tcp 34.173.164.90:443: i/o timeout”, “errorVerbose”: "version checker client: Get "https://version.storj.io": dial tcp 34.173.164.90:443: i/o timeout\n\tstorj.io/storj/private/version/checker.(*Client).All:68\n\tstorj.io/storj/private/version/checker.(*Client).Process:89\n\tstorj.io/storj/private/version/checker.(*Service).checkVersion:104\n\tstorj.io/storj/private/version/checker.(Service).CheckVersion:78\n\tmain.cmdRun:91\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(Command).execute:985\n\tgithub.com/spf13/cobra.(Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272"}
2025-02-21T16:42:28Z INFO Telemetry enabled {“Process”: “storagenode”, “instance ID”: “1xQxZaPRchNV8qd74uy6fEqDZzjPkX15Sk17WkaUNVbZZ4TRhy”}
2025-02-21T16:42:28Z INFO Event collection enabled {“Process”: “storagenode”, “instance ID”: “1xQxZaPRchNV8qd74uy6fEqDZzjPkX15Sk17WkaUNVbZZ4TRhy”}
2025-02-21T16:42:28Z INFO db.migration Database Version {“Process”: “storagenode”, “version”: 62}
2025-02-21T16:42:59Z WARN trust Failed to fetch URLs from source; used cache {“Process”: “storagenode”, “source”: “https://static.storj.io/dcs-satellites”, “error”: “HTTP source: Get "https://static.storj.io/dcs-satellites\”: dial tcp 34.120.119.150:443: i/o timeout", “errorVerbose”: “HTTP source: Get "https://static.storj.io/dcs-satellites\”: dial tcp 34.120.119.150:443: i/o timeout\n\tstorj.io/storj/storagenode/trust.(*HTTPSource).FetchEntries:68\n\tstorj.io/storj/storagenode/trust.(*List).fetchEntries:90\n\tstorj.io/storj/storagenode/trust.(*List).FetchURLs:49\n\tstorj.io/storj/storagenode/trust.(*Pool).fetchURLs:326\n\tstorj.io/storj/storagenode/trust.(*Pool).Refresh:209\n\tstorj.io/storj/storagenode.(*Peer).Run:1079\n\tmain.cmdRun:127\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272"}
2025-02-21T16:42:59Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock. {“Process”: “storagenode”}
2
025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.150.199.48:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.150.199.48:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}

2025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.94.153.46:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.94.153.46:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.126.92.94:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.126.92.94:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}

so there is a strange i/o timeout related to “dial tcp 34.173.164.90:443” and also connection timed out

Any idea ?

Thanks

How are your disks connected ? Check your firewall settings too.

If you run

curl -I "https://version.storj.io"

do you also get a timeout there? What if you do the same inside the container?

Hello,
it seems to work from the server

ubuntu@hpool:~$ curl -I “https://version.storj.io
HTTP/2 405
date: Fri, 21 Feb 2025 18:20:58 GMT
strict-transport-security: max-age=15724800; includeSubDomains

from inside docker app (storagenode) there is no curl and I can’t install , no connectivity.
Not sure what happened

Maybe I will try to remove all, try to clean up docker and reinstall all.

Best regards

unfortunately no way

I reinstalled the OS (Ubuntu 24.04) so there is no dirty configuration.
I copied my docker-compose.yml (used also in other nodes)

storagenode4 | 2025-02-21T19:16:43Z ERROR Error retrieving version info. {“Process”: “storagenode-updater”, “error”: “version checker client: Get "https://version.storj.io": dial tcp 34.173.164.90:443: connect: no route to host”, “errorVerbose”: “version checker client: Get "https://version.storj.io": dial tcp 34.173.164.90:443: connect: no route to host\n\tstorj.io/storj/private/version/checker.(*Client).All:68\n\tmain.loopFunc:20\n\tstorj.io/common/sync2.(*Cycle).Run:102\n\tmain.cmdRun:139\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tmain.main:22\n\truntime.main:272”}

How can I solve this error ?

Best regards

Could you post a sanitized (no email, no wallet, no hostname etc) version of your docker-compose.yml? It’s like the container has no internet access at all.

That error is a bit different - no route to host. Do you have other nodes on that server with a similar docker-compose file that works?
Did you use these instructions to install docker? Ubuntu | Docker Docs
Are you using some kind of tunnelling in your setup? Check firewalls.
And yes show the docker-compose file.

Hello,
a bit of background. These two nodes (virtual machines) were working since a lot of time (more than 1 year) with the same configuration. yesterday I have upgraded the OS (but this is not the problem because I have reinstalled one from scratch) and restarted. Nodes no more worked.
No tunnels. Regarding firewall which port should I check ? 28967 or whatelse?

services:
storagenode:
image: storjlabs/storagenode:latest
container_name: storagenode4
volumes:
- type: bind
source: /NODE4/identity/storagenode4
target: /app/identity
- type: bind
source: /NODE4
target: /app/config
- type: bind
source: /STORJ_LOCAL-4
target: /app/dbs
- type: bind
source: /STORJ_LOCAL-4/LOG
target: /app/config/LOG
ports:
- 28967:28967/tcp
- 28967:28967/udp
- 14002:14002
restart: unless-stopped
stop_grace_period: 300s
sysctls:
net.ipv4.tcp_fastopen: 3
environment:
- WALLET=
- EMAIL=xxx@gmail.com
- ADDRESS=xxxxx:28967
- STORAGE=8800GB
- STORJ_PIECES_ENABLE_LAZY_FILEWALKER=true
- STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=false
#- STORJ_OPERATOR_WALLET_FEATURES=zksync
- STORJ_LOG_LEVEL=info
- STORJ_LOG_CUSTOM_LEVEL=piecestore=info,collector=error
#- STORJ_RETAIN_CONCURRENCY=1

watchtower:
image: storjlabs/watchtower
restart: always
container_name: watchtower
command: storagenode4 watchtower --stop-timeout 300s --interval 21600
volumes:
- /var/run/docker.sock:/var/run/docker.sock

storj_exporter:
image: thechristech/storj-exporter:latest
restart: unless-stopped
container_name: storj-exporter4
environment:
- STORJ_HOST_ADDRESS=storagenode4
ports:
- “9651:9651”

Thanks

Hello,
tried again this morning,
and the docker has no internet access.

storagenode4 | downloading storagenode-updater
storagenode4 | --2025-02-22 07:20:19-- https://version.storj.io/processes/storagenode-updater/minimum/url?os=linux&arch=amd64
storagenode4 | Resolving version.storj.io (version.storj.io)… 34.173.164.90
storagenode4 | Connecting to version.storj.io (version.storj.io)|34.173.164.90|:443… failed: No route to host.
storagenode4 | http://: Invalid host name.

I really don’t know what happened here.
There is no firewall in the OS, ports are opened.

Thank you

Fixed by my self !

it was an issue with iptables that has been broken by the apt-get dist-upgrade (OS patching)

To fix you need to flush iptables to go with the default, restart docker and it will recreate the needed docker rules … be aware that if you have additional iptables rules you need to apply them again.

ubuntu@hpool2:~$ sudo iptables -F
ubuntu@hpool2:~$ sudo iptables -X
ubuntu@hpool2:~$ sudo iptables -Z
ubuntu@hpool2:~$ sudo iptables -P FORWARD ACCEPT
ubuntu@hpool2:~$ sudo iptables -P INPUT ACCEPT
ubuntu@hpool2:~$ sudo iptables -P OUTPUT ACCEPT
ubuntu@hpool2:~$ sudo service docker restart

Thanks

2 Likes