I get the same error message with VERSION_SERVER_URL="https://version.qa.storj.io"
but it works fine with VERSION_SERVER_URL=https://version.qa.storj.io
But at the end the process just hangs:
testnode_1 | 2022-03-17T22:57:34.734Z INFO Running on version {"Service": "storagenode-updater", "Version": "v1.50.3"}
testnode_1 | 2022-03-17T22:57:34.734Z INFO Downloading versions. {"Server Address": "https://version.qa.storj.io"}
testnode_1 | 2022-03-17T22:57:35.212Z INFO Current binary version {"Service": "storagenode", "Version": "v1.48.2"}
testnode_1 | 2022-03-17T22:57:35.212Z INFO Download started. {"From": "https://github.com/storj/storj/releases/download/v1.50.3/storagenode_linux_amd64.zip", "To": "/tmp/storagenode_linux_amd64.1411460718.zip"}
testnode_1 | 2022-03-17T22:57:38.131Z INFO Download finished. {"From": "https://github.com/storj/storj/releases/download/v1.50.3/storagenode_linux_amd64.zip", "To": "/tmp/storagenode_linux_amd64.1411460718.zip"}
testnode_1 | 2022-03-17 22:57:38,131 INFO waiting for processes, storagenode, storagenode-updater to die
testnode_1 | 2022-03-17T22:57:38.155Z INFO Restarting service. {"Service": "storagenode"}
testnode_1 | 2022-03-17T22:57:38.540Z ERROR Error updating service. {"Service": "storagenode", "error": "error stopping storagenode service: strconv.Atoi: parsing \"unix:///run/supervisord.sock no such file\": invalid syntax", "errorVerbose": "error stopping storagenode service: strconv.Atoi: parsing \"unix:///run/supervisord.sock no such file\": invalid syntax\n\tmain.restartService:37\n\tmain.update:68\n\tmain.loopFunc:27\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tmain.cmdRun:128\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.main:12\n\truntime.main:255"}
testnode_1 | 2022-03-17T22:57:38.552Z INFO Current binary version {"Service": "storagenode-updater", "Version": "v1.50.3"}
testnode_1 | 2022-03-17T22:57:38.553Z INFO Version is up to date {"Service": "storagenode-updater"}
testnode_1 | 2022-03-17T22:57:38.771Z ERROR contact:service contact/service.go:103 ping satellite failed {"Satellite ID": "12ZQbQ8WWFEfKNE9dP78B1frhJ8PmyYmr8occLEf1mQ1ovgVWy", "attempts": 4, "error": "ping satellite: check-in ratelimit: node rate limited by id", "errorVerbose": "ping satellite: check-in ratelimit: node rate limited by id\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatelliteOnce:136\n\tstorj.io/storj/storagenode/contact.(*Service).pingSatellite:98\n\tstorj.io/storj/storagenode/contact.(*Chore).updateCycles.func1:87\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
testnode_1 | storj.io/storj/storagenode/contact.(*Service).pingSatellite
testnode_1 | /go/src/storj.io/storj/storagenode/contact/service.go:103
testnode_1 | storj.io/storj/storagenode/contact.(*Chore).updateCycles.func1
testnode_1 | /go/src/storj.io/storj/storagenode/contact/chore.go:87
testnode_1 | storj.io/common/sync2.(*Cycle).Run
testnode_1 | /go/pkg/mod/storj.io/common@v0.0.0-20220131120956-e74f624a3d55/sync2/cycle.go:92
testnode_1 | storj.io/common/sync2.(*Cycle).Start.func1
testnode_1 | /go/pkg/mod/storj.io/common@v0.0.0-20220131120956-e74f624a3d55/sync2/cycle.go:71
testnode_1 | golang.org/x/sync/errgroup.(*Group).Go.func1
testnode_1 | /go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
testnode_1 | 2022-03-17 22:57:41,773 INFO waiting for processes, storagenode, storagenode-updater to die
testnode_1 | 2022-03-17 22:57:44,776 WARN killing 'storagenode-updater' (104) with SIGKILL
testnode_1 | 2022-03-17 22:57:44,777 INFO waiting for processes, storagenode, storagenode-updater to die
testnode_1 | 2022-03-17 22:57:44,780 INFO stopped: storagenode-updater (terminated by SIGKILL)
testnode_1 | 2022-03-17T22:57:44.780Z INFO process/exec_conf.go:116 Got a signal from the OS: "terminated"
testnode_1 | 2022-03-17T22:57:44.781Z INFO contact:service contact/service.go:107 context cancelled {"Satellite ID": "12ZQbQ8WWFEfKNE9dP78B1frhJ8PmyYmr8occLEf1mQ1ovgVWy"}
testnode_1 | 2022-03-17 22:57:44,805 INFO stopped: storagenode (exit status 0)
testnode_1 | 2022-03-17 22:57:44,806 INFO stopped: processes (terminated by SIGTERM)
Now the docker container is still running and so docker doesnât restart. Not a great ending.
Neither of those work on my end unfortunately.
Iâm a little confused, if weâre all using the same docker image, we should all be using the same wget version. Why is it just me getting this error?
Alpine docker image by default uses the builtin BusyBox wget which is not capable of handling SSL traffic via proxy unlike the GNU wget. Seems like youâre behind a firewall/proxy?
Not that Iâm aware. For proxy that is. Isnât everyone behind a firewall? I do use double NAT because of ISP complications. And I guess firewalls on both routers and on Synology NAS.
I guess it doesnât matter that much, I canât be the only one running into this, so I think it will need to be fixed before taking this to production or quite a few nodes will go dead.
A new version is available: 5f2777af9-v1.50.4-go1.17.5
It seems to work for me. I noticed one minor issue when running as non root user. I will have to compare some log outputs to rule out a mistake on my end.
That seems to have done the trick.
I am getting an error though, maybe the same one you had?
2022-03-18T20:08:33.067Z FATAL Unrecoverable error {"error": "Error during preflight check for storagenode databases: preflight: database: \"storage_usage\": failed inserting test value: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).preflight:428\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Preflight:353\n\tmain.cmdRun:241\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:474\n\truntime.main:255", "errorVerbose": "Error during preflight check for storagenode databases: preflight: database: \"storage_usage\": failed inserting test value: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).preflight:428\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).Preflight:353\n\tmain.cmdRun:241\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:474\n\truntime.main:255\n\tmain.cmdRun:243\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:474\n\truntime.main:255"}
Despite it being listed as a FATAL error, the node seems to run just fine after that though.
I am running my storage node as a restricted user now. That also works. For reference here is my docker compose file:
version: "3.3"
services:
multinode:
restart: unless-stopped
user: 1000:1000
image: storjlabs/multinode:latest
ports:
- 15002:15002/tcp
volumes:
- /home/storagenode/.local/share/storj/identity/multinode:/app/identity
- /home/storagenode/.local/share/storj/multinode:/app/config
testnode:
restart: unless-stopped
user: 1000:1000
image: storjlabs/storagenode:5f2777af9-v1.50.4-go1.17.5
ports:
- 12012:14002/tcp
- 12019:12019/tcp
- 12010:12010/tcp
- 12010:12010/udp
volumes:
- /mnt/testnode/storagenode/identity/storagenode:/app/identity
- /mnt/testnode/storagenode/storagenode:/app/config
environment:
- STORJ_SERVER_ADDRESS=:12010
- STORJ_DEBUG_ADDR=:12019
- STORJ_CONTACT_EXTERNAL_ADDRESS=<my ddns name>:12010
- STORJ_OPERATOR_WALLET=0x0000000000000000000000000000000000000000
- STORJ_OPERATOR_EMAIL=<my email>
- STORJ_LOG_LEVEL=debug
- STORJ_LOG_CALLER=true
- STORJ_LOG_STACK=true
- STORJ_SERVER_USE_PEER_CA_WHITELIST=false
- STORJ_STORAGE2_TRUST_SOURCES=1GGZktUwmMKTwTWNcmGnFJ3n7rjE58QnNcRp98Y23MmbDnVoiU@satellite.qa.storj.io:7777,12ZQbQ8WWFEfKNE9dP78B1frhJ8PmyYmr8occLEf1mQ
1ovgVWy@testnet.satellite.stefan-benten.de:7777
- STORJ_VERSION_SERVER_ADDRESS=https://version.qa.storj.io
- VERSION_SERVER_URL=https://version.qa.storj.io
- STORJ_ALLOCATED_DISK_SPACE=100GB
- STORJ_STORAGE2_MONITOR_MINIMUM_DISK_SPACE=0B
I did notice these two lines in the docker logs.
2022-03-18 20:08:27,212 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message.
2022-03-18 20:08:27,218 CRIT Server 'inet_http_server' running without any HTTP authentication checking
That kind of sounds like it might be a security issue. But I guess it depends on how itâs used.
Edit: looks like itâs only listening to localhost. So shouldnât matter. storj/supervisord.conf at 5f2777af9714de21e4c68a74d152f117c7d09588 ¡ storj/storj ¡ GitHub
Additionally, I hope some others jump in on testing this. This topic hasnât gotten much response from people actually testing it and I think you would need a bit broader testing base, especially since this canât really be rolled out on production to just a small controlled sample of nodes. (last time it canât I guess)
This is logged at critical level by supervisord because it is assumed as a critical security concern when running supervisord without authentication checks. This is safe to ignore as long as we do not allow HTTP access outside the container; it is configured to only listen to localhost, so itâs reasonable to run without HTTP authentication checks
We intentionally unset the user=root
in the supervisord.conf to allow running the container as a non-root user. When running as a root user, supervisord will automatically run as a root user but logs this warning. If the container is run as a restricted user, supervisord will run using the user privileges.
Mkay, both of those make sense to me. Itâs a bit unfortunate that itâs logged as CRIT as it may lead to questions, but I donât see a simple way around that either. I thought the user=root may have been a possible fix, but there is a good reason to not use that. So thatâs indeed not an option.
There is a suggested solution to set dummy user/password in the [unix_http_server]
and [supervisorctl]
sections of the config file to remove the CRIT
log entries: CRIT Server 'unix_http_server' running without any HTTP authentication checking ¡ Issue #717 ¡ Supervisor/supervisor ¡ GitHub
I donât really like this suggestion.
Thatâs missing the most important word, haha. Well itâs up to you of course. But with many people looking at logs, I think there will be questions. If it doesnât impact anything else, I would probably use this suggestion. But as itâs not actually anything critical, it isnât that important.
Hmm, havenât found a concrete answer to that question. Had this âFATALâ message today, once, as well. Anything to concern about? @BrightSilence @littleskunk
It hasnât caused any problems for me. It just happened once and the node kept working. Still FATAL errors seem like a legitimate concern to me. For me it was happening on a test node. Is it happening on production nodes for you? Maybe @littleskunk or @clement can look into that one. As you can see I didnât really get a response to that error here. Either it didnât worry them or it got overlooked.
I think itâs probably more of an annoyance than a concern unless you get stuck in a loop of these errors. It sounds like the check is a bit flaky.
Yes it seems it was (once) thrown after (or before) the latest update was done. Maybe linked to the update process (?).