Hello,
I have two ubuntu nodes that stopped working after an OS upgrade and reboot.
I already tried to destroy and recreate the docker storagenode application - I use docker compose.
These are my logs :
2025-02-21T16:41:32Z INFO Configuration loaded {“Process”: “storagenode”, “Location”: “/app/config/config.yaml”}
2025-02-21T16:41:32Z INFO Anonymized tracing enabled {“Process”: “storagenode”}
2025-02-21T16:41:32Z INFO Operator email {“Process”: “storagenode”, “Address”: “xxxxx”}
2025-02-21T16:41:32Z INFO Operator wallet {“Process”: “storagenode”, “Address”: “xxxxx”}
2025-02-21T16:41:33Z INFO server existing kernel support for server-side tcp fast open detected {“Process”: “storagenode”}
2025-02-21T16:41:41Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “open_time”: “5.562200562s”}
2025-02-21T16:41:47Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “open_time”: “5.454139228s”}
2025-02-21T16:41:52Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “open_time”: “5.547410667s”}
2025-02-21T16:41:58Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “open_time”: “5.397359886s”}
**2025-02-21T16:42:28Z ERROR version failed to get process version info {“Process”: “storagenode”, “error”: “version checker client: Get "https://version.storj.io ": dial tcp 34.173.164.90:443: i/o timeout”, “errorVerbose”: "version checker client: Get "https://version.storj.io ": dial tcp 34.173.164.90:443: i/o timeout\n\tstorj.io/storj/private/version/checker.(*Client).All:68\n\tstorj.io/storj/private/version/checker.(*Client).Process:89\n\tstorj.io/storj/private/version/checker.(*Service).checkVersion:104\n\tstorj.io/storj/private/version/checker.(Service).CheckVersion:78\n\tmain.cmdRun:91\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(Command).execute:985\n\tgithub.com/spf13/cobra.(Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272"}
2025-02-21T16:42:28Z INFO Telemetry enabled {“Process”: “storagenode”, “instance ID”: “1xQxZaPRchNV8qd74uy6fEqDZzjPkX15Sk17WkaUNVbZZ4TRhy”}
2025-02-21T16:42:28Z INFO Event collection enabled {“Process”: “storagenode”, “instance ID”: “1xQxZaPRchNV8qd74uy6fEqDZzjPkX15Sk17WkaUNVbZZ4TRhy”}
2025-02-21T16:42:28Z INFO db.migration Database Version {“Process”: “storagenode”, “version”: 62}
2025-02-21T16:42:59Z WARN trust Failed to fetch URLs from source; used cache {“Process”: “storagenode”, “source”: “https://static.storj.io/dcs-satellites ”, “error”: “HTTP source: Get "https://static.storj.io/dcs-satellites\ ”: dial tcp 34.120.119.150:443: i/o timeout", “errorVerbose”: “HTTP source: Get "https://static.storj.io/dcs-satellites\ ”: dial tcp 34.120.119.150:443: i/o timeout\n\tstorj.io/storj/storagenode/trust.(*HTTPSource).FetchEntries:68\n\tstorj.io/storj/storagenode/trust.(*List).fetchEntries:90\n\tstorj.io/storj/storagenode/trust.(*List).FetchURLs:49\n\tstorj.io/storj/storagenode/trust.(*Pool).fetchURLs:326\n\tstorj.io/storj/storagenode/trust.(*Pool).Refresh:209\n\tstorj.io/storj/storagenode.(*Peer).Run:1079\n\tmain.cmdRun:127\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272"}
2025-02-21T16:42:59Z INFO preflight:localtime start checking local system clock with trusted satellites’ system clock. {“Process”: “storagenode”}
2 025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.150.199.48:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.150.199.48:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.94.153.46:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.94.153.46:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.126.92.94:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.126.92.94:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
2025-02-21T16:45:11Z ERROR preflight:localtime unable to get satellite system time {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “error”: “rpc: tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connect: connection timed out”, “errorVerbose”: “rpc: tcp connector failed: rpc: dial tcp 34.159.134.91:7777: connect: connection timed out\n\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190”}
so there is a strange i/o timeout related to “dial tcp 34.173.164.90:443” and also connection timed out
Any idea ?
Thanks
How are your disks connected ? Check your firewall settings too.
If you run
curl -I "https://version.storj.io"
do you also get a timeout there? What if you do the same inside the container?
Hello,
it seems to work from the server
ubuntu@hpool:~$ curl -I “https://version.storj.io ”
HTTP/2 405
date: Fri, 21 Feb 2025 18:20:58 GMT
strict-transport-security: max-age=15724800; includeSubDomains
from inside docker app (storagenode) there is no curl and I can’t install , no connectivity.
Not sure what happened
Maybe I will try to remove all, try to clean up docker and reinstall all.
Best regards
unfortunately no way
I reinstalled the OS (Ubuntu 24.04) so there is no dirty configuration.
I copied my docker-compose.yml (used also in other nodes)
storagenode4 | 2025-02-21T19:16:43Z ERROR Error retrieving version info. {“Process”: “storagenode-updater ”, “error”: “version checker client: Get "https://version.storj.io ": dial tcp 34.173.164.90:443: connect: no route to host ”, “errorVerbose”: “version checker client: Get "https://version.storj.io ": dial tcp 34.173.164.90:443: connect: no route to host\n\tstorj.io/storj/private/version/checker.(*Client).All:68\n\tmain.loopFunc:20\n\tstorj.io/common/sync2.(*Cycle).Run:102\n\tmain.cmdRun:139\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tstorj.io/common/process.ExecWithCustomConfigAndLogger:77\n\tmain.main:22\n\truntime.main:272”}
How can I solve this error ?
Best regards
Roxor
February 21, 2025, 9:47pm
6
Could you post a sanitized (no email, no wallet, no hostname etc) version of your docker-compose.yml? It’s like the container has no internet access at all.
That error is a bit different - no route to host. Do you have other nodes on that server with a similar docker-compose file that works?
Did you use these instructions to install docker? Ubuntu | Docker Docs
Are you using some kind of tunnelling in your setup? Check firewalls.
And yes show the docker-compose file.
Hello,
a bit of background. These two nodes (virtual machines) were working since a lot of time (more than 1 year) with the same configuration. yesterday I have upgraded the OS (but this is not the problem because I have reinstalled one from scratch) and restarted. Nodes no more worked.
No tunnels. Regarding firewall which port should I check ? 28967 or whatelse?
services:
storagenode:
image: storjlabs/storagenode:latest
container_name: storagenode4
volumes:
- type: bind
source: /NODE4/identity/storagenode4
target: /app/identity
- type: bind
source: /NODE4
target: /app/config
- type: bind
source: /STORJ_LOCAL-4
target: /app/dbs
- type: bind
source: /STORJ_LOCAL-4/LOG
target: /app/config/LOG
ports:
- 28967:28967/tcp
- 28967:28967/udp
- 14002:14002
restart: unless-stopped
stop_grace_period: 300s
sysctls:
net.ipv4.tcp_fastopen: 3
environment:
- WALLET=
- EMAIL=xxx@gmail.com
- ADDRESS=xxxxx:28967
- STORAGE=8800GB
- STORJ_PIECES_ENABLE_LAZY_FILEWALKER=true
- STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=false
#- STORJ_OPERATOR_WALLET_FEATURES=zksync
- STORJ_LOG_LEVEL=info
- STORJ_LOG_CUSTOM_LEVEL=piecestore=info,collector=error
#- STORJ_RETAIN_CONCURRENCY=1
watchtower:
image: storjlabs/watchtower
restart: always
container_name: watchtower
command: storagenode4 watchtower --stop-timeout 300s --interval 21600
volumes:
- /var/run/docker.sock:/var/run/docker.sock
storj_exporter:
image: thechristech/storj-exporter:latest
restart: unless-stopped
container_name: storj-exporter4
environment:
- STORJ_HOST_ADDRESS=storagenode4
ports:
- “9651:9651”
Thanks
Hello,
tried again this morning,
and the docker has no internet access.
storagenode4 | downloading storagenode-updater
storagenode4 | --2025-02-22 07:20:19-- https://version.storj.io/processes/storagenode-updater/minimum/url?os=linux&arch=amd64
storagenode4 | Resolving version.storj.io (version.storj.io )… 34.173.164.90
storagenode4 | Connecting to version.storj.io (version.storj.io )|34.173.164.90|:443… failed: No route to host.
storagenode4 | http://: Invalid host name.
I really don’t know what happened here.
There is no firewall in the OS, ports are opened.
Thank you
Fixed by my self !
it was an issue with iptables that has been broken by the apt-get dist-upgrade (OS patching)
To fix you need to flush iptables to go with the default, restart docker and it will recreate the needed docker rules … be aware that if you have additional iptables rules you need to apply them again.
ubuntu@hpool2:~$ sudo iptables -F
ubuntu@hpool2:~$ sudo iptables -X
ubuntu@hpool2:~$ sudo iptables -Z
ubuntu@hpool2:~$ sudo iptables -P FORWARD ACCEPT
ubuntu@hpool2:~$ sudo iptables -P INPUT ACCEPT
ubuntu@hpool2:~$ sudo iptables -P OUTPUT ACCEPT
ubuntu@hpool2:~$ sudo service docker restart
Thanks
2 Likes
LxdrJ
February 23, 2025, 7:38am
11
I switched to nftables usually a bit simpler imo
LxdrJ
February 23, 2025, 9:41am
13
I don’t know if you should post your FW rules to forum just have this simple Script I like the layout
table ip my_filter {
chain input {
type filter hook input priority filter; policy accept;
iifname "eth0" tcp dport 28967 accept
iifname "eth0" udp dport { 28967, 51820 } accept
iifname "eth0" ip protocol icmp accept
iifname "eth0" ct state established,related accept
iifname "eth0" ct state invalid drop
iifname "eth0" icmpv6 type { echo-request, nd-neighbor-solicit } accept
iifname "eth0" drop
}
chain forward {
type filter hook forward priority filter; policy accept;
}
chain output {
type filter hook output priority filter; policy accept;
}
}
table ip nat {
chain POSTROUTING {
type nat hook postrouting priority srcnat; policy accept;
oifname "WG45" masquerade
}
chain PREROUTING {
type nat hook prerouting priority dstnat; policy accept;
iifname "eth0" tcp dport 28967 dnat to 10.1.1.2
iifname "eth0" udp dport 28967 dnat to 10.1.1.2
}
}
lol. You don’t have to, but it helps as a reference for future readers.
You can appreciate the complexity of your configuration compared to a few firewall_cmd commands sufficient to accomplish the same thing, regardless of underlying setup. I strongly believe that if there’s a tool that works on a higher abstraction level – that tool shall be used. Messing with iptables is too low level, and added complexity and opportunities for errors add no value, especially for these trivial configuration.
2 Likes
LxdrJ
February 23, 2025, 5:45pm
15
It’s just a executable text file what I like. I was also fiddeling around with ufw commands but it was not my taste and everywhere you can read its not compatible with Docker. Iptable commands was giving me little headache. On other hand I also read regarding wireguard ifup ifdown is not supported you just put the nat rules hard on the tables. But interface will be running anyways. So far I don’t know what happens when you mess up your nftables.