For docker nodes
Like last time we would like the docker nodes to test out the new image early (bfc3d4c9d-v1.52.2-go1.17.5). The reason for this is that we switched to a different base image. That should fix most of the bugs we have seen last time. We are especially interested in arm32 nodes but also all other architectures.
Just tried the new image, seems to be working fine. Running on a rock64 (aarch64, ARMv8-A).
–update–
My first node running the latest tag and my second node running the bfc3d4c9d-v1.52.2-go1.17.5 image both just auto updated from v1.50.4 to v1.52.2 without issue.
Well this is certainly different this time around not used to my nodes updating the same day an update comes out, One of my nodes did fail to start back up for some odd reason though.
The error I got,
2022-04-06T23:33:21.037Z INFO Restarting service. {“Service”: “storagenode”}
2022-04-06T23:33:21.415Z INFO piecestore download started {“Piece ID”: “TFDMUL3NBHFDQZ2OCCIYVPN5HO6TWMI6ESJW3QIQTWQR2PBGHBVA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “GET_REPAIR”}
2022-04-06T23:33:21.840Z INFO piecestore downloaded {“Piece ID”: “TFDMUL3NBHFDQZ2OCCIYVPN5HO6TWMI6ESJW3QIQTWQR2PBGHBVA”, “Satellite ID”: “12rfG3sh9NCWiX3ivPjq2HtdLmbqCrvHVEzJubnzFzosMuawymB”, “Action”: “GET_REPAIR”}
2022-04-06T23:33:23.125Z INFO Service restarted successfully. {“Service”: “storagenode”}
2022-04-06T23:33:23.127Z INFO Got a signal from the OS: “interrupt”
2022-04-06T23:33:23.160Z INFO Current binary version {“Service”: “storagenode-updater”, “Version”: “v1.50.4”}
2022-04-06T23:33:23.160Z INFO Version is up to date {“Service”: “storagenode-updater”}
2022-04-06T23:33:38.128Z WARN servers service takes long to shutdown {“name”: “server”}
2022-04-06T23:33:38.790Z INFO servers slow shutdown {“stack”: "goroutine 975 [running]:\nstorj.io/storj/private/lifecycle.(*Group).logStackTrace.func1()\n\t/go/src/storj.io/storj/private/lifecycle/group.go:107 +0x94\nsync.(*Once).doSlow(0x40004828f0, 0x4000a54e68)\n\t/usr/local/go/src/sync/once.go:68 +0x108\nsync.(*Once).Do(…)\n\t/usr/local/go/src/sync/once.go:59\nstorj.io/storj/private/lifecycle.(*Group).logStackTrace(0x40004828d0)\n\t/go/src/storj.io/storj/private/lifecycle/group.go:104 +0x58\nstorj.io/storj/private/lifecycle.(*Group).Run.func1(0x40001ae9e0, {0x110d998, 0x400042b140}, 0x40004828d0, {{0xfb2072, 0x6}, 0x40003e54c0, 0x40003e54d0})\n\t/go/src/storj.io/storj/private/lifecycle/group.go:77 +0x260\ncreated by storj.io/storj/private/lifecycle.(*Group).Run\n\t/go/src/storj.io/stor
2022-04-07T01:03:20.123Z INFO Version is up to date {"Service": "storagenode-updater"}
2022-04-07T01:18:19.071Z INFO Downloading versions. {"Server Address": "https://version.storj.io"}
2022-04-07T01:18:20.000Z INFO Current binary version {"Service": "storagenode", "Version": "v1.52.2"}
2022-04-07T01:18:20.001Z INFO Version is up to date {"Service": "storagenode"}
2022-04-07T01:18:20.126Z INFO Current binary version {"Service": "storagenode-updater", "Version": "v1.50.4"}
2022-04-07T01:18:20.126Z INFO Version is up to date {"Service": "storagenode-updater"}
2022-04-07T01:33:20.503Z INFO Downloading versions. {"Server Address": "https://version.storj.io"}
2022-04-07T01:33:49.752Z ERROR Error retrieving version info. {"error": "version checker client: Get \"https://version.storj.io\": net/http: TLS handshake timeout", "errorVerbose": "version checker client: Get \"https://version.storj.io\": net/http: TLS handshake timeout\n\tstorj.io/storj/private/version/checker.(*Client).All:66\n\tmain.loopFunc:21\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tmain.cmdRun:128\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.main:12\n\truntime.main:255"}
It can. But if we introduce a bug or wants to update the initial downloader (as we plan to do in this version by the way), we will push a new image. The updater inside the image is able only update itself and storagenode, but not a base image.
The watchtower can help to update the base image in this case.
My raspberry pi nodes are still on 1.49.5 while my AMD64 nodes already updated to 1.50 and now to 1.52.2. Will that become a problem? When do you plan to release the ARM version of the latest version?
Problem with Synology DS918+ 12GB RAM
ALL 12 nodes (1x SHR1 volume, 11x usb disk) updates to 1.52.2 this night and kill my synology, when all 12 nodes start same time - after start docker containers consume about 1GB (biggest 4GB) RAM - because filewalker
But all was “started” but
Your Storage Node on the europe-north-1 Satellite was suspended because it produced errors too often during audits.
I stop all containers and start gradually
I dont test " bfc3d4c9d-v1.52.2-go1.17.5" i have “latest” image on all nodes
Please dont kill my 18 TB node with this update process…
My understanding is that synchronized updates like this will happen very rarely in the future as images will gradually get updated within docker container, and not at the same time.
It may still be an issue when the container image itself gets updated via watchtower as it is the case now.
One of my docker nodes went offline tonight, but container did not restart. In the logs I see a single 500KB line with a stack trace, then it’s just updater happily checking for updates.
Hardware this node runs on is not perfect, but up until this point I never had to manually intervene to restart the container.
I’ve tested the new image on my testnet node which seems to be working fine on synology DS3617xs. I’ll leave my production nodes on latest, prefer not to mess with those too much. That’s what test nodes are for.
This may be a bit of an issue with the new way the container works. If the storagenode process stops, but the updater doesn’t, the container still has running processes, so it won’t restart. That’s not ideal…
I have not, this node I haven’t touch it auto updated to 1.50.4 on its own then failed to start 1.52.
I have 3 pi4 running node this is the only one that had failed which all are running arm64.
Wow, 12 USB-drives and 1 SHR1 drive. I run a similar setup, DS920+ 12 GB RAM, but only 1 SHR1 volume. I made that force update, where I type the version. But when I went back to latest, I was back to 1.49 something. I had to manually pull down the “latest”. Just using docker run-command wasn’t enough. Mine just updated automaticly to 1.52.2 from 1.50.4 an hour ago.