Node stopped working as docker container on qnap-container-station

beli · September 20, 2024, 4:13pm

Hi!
I have some problems since today.
The docker-container isnt starting anymore. I got no log message.
The container-status is switching from “running” to “other”.
I could not find any point where i can start troubleshooting.
Someone a hint for me?

Greetings
Thomas

arrogantrabbit · September 20, 2024, 4:17pm

Most frequent reason is one of your mounts does not exist anymore.

But you need to ask on qnap forums about where to find the logs for their container management abomination.

At this point is has nothing to do with storagenode.

beli · September 20, 2024, 4:23pm

checked more than twice

the only message i saw is

WARN    image/image.go:71       Get "http://unix.socket/1.0": dial unix /var/lib/lxd/unix.socket: connect: no such file or directory
WARN    containers/container.go:386     failed to get to container resource:failed to get resource from redis: redis: nil

NAS got no updates
i’m searching

Alexey · September 21, 2024, 4:45am

Please check the logs of the storagenode container. It may show the problem.

beli · September 21, 2024, 7:18am

the container-station(docker manager) is constantly switching between “running” and “others” without creating any storagenode logs.
i have no clue where to look. currently raid-consistency-check is running.

Alexey · September 21, 2024, 7:25am

The usual recommendation is to do not use a flaky GUI and use the robust CLI instead.
So, please try to login via SSH and check logs:

sudo docker logs --tail 20 storagenode

You likely need to replace the storagenode name in that command to what’s name has been used for it.
You may check:

sudo docker ps

beli · September 21, 2024, 7:38am

i operated already “robust”
the is absolutely no touch to any storagenode components
am also confused about the redis and unix.socket error messages i got from docker.
i have absolutely no clue where to search in this direction

Alexey · September 21, 2024, 7:56am

In the logs in a first place. Then - we can help better.
About socket you get from the system logs, which are not needed right now, you need to get a node’s logs. They may contain a useful info, why is it crashed.

arumes31 · September 21, 2024, 8:51am

got the same problem with the latest image on qnap container station
no log output
downgraded to 1.112.2 and the node is accessible again

beli · September 21, 2024, 9:14am

Thank you for moving my troubleshooting further forward. I have now started with version storjlabs/storagenode:0c21bd6, the node runs immediately!

Seems storjlabs/storagenode:2159415 is broken

beli · September 21, 2024, 10:05am

checked this already… i had none of the usual messages with non-executeable binaries

Alexey · September 22, 2024, 3:17am

@Scriptiee actually can be right. Do you have an exec mount option for the disk with data?
You may check your /etc/mtab.
The image

has a change, where binaries are now located in the bin subfolder of the data location.
You need to check the docker container logs, not redirected logs, because a supervisor is printing messages only to the docker container logs.

docker logs storagenode

See also:

Alexey · September 22, 2024, 3:24am

Hello @arumes31 ,
Welcome to the forum!

This is likely wouldn’t help, the only affecting change could be the image above. And I suspect the issue that the drive on QNAP is mounted with noexec.

The base image doesn’t contain neither storagenode nor storagenode-updater, it has only a downloader and supervisor. Then it run storagenode-updater and storagenode, then storagenode-updater will download a new version of storagenode and itself, updates them and restart. So the node will be updated again, if it’s eligible to be updated.

The simplest solution would be to use a docker image and mount it to /app/config/bin in the container. You may add this option to your docker run command before the image name:

...
-v storagenode-binaries:/app/config/bin \
...
storjlabs/storagenode:latest

beli · September 22, 2024, 6:41am

I already looked after noexec - here is the complete mtab of this box.

none /new_root tmpfs rw,mode=0755,size=460M 0 0
/proc /proc proc rw 0 0
devpts /dev/pts devpts rw 0 0
tmpfs /tmp tmpfs rw,size=64M 0 0
tmpfs /dev/shm tmpfs rw 0 0
tmpfs /share tmpfs rw,size=16M 0 0
tmpfs /mnt/snapshot/export tmpfs rw,size=16M 0 0
/dev/md9 /mnt/HDA_ROOT ext4 rw,data=ordered,barrier=1,nodelalloc 0 0
cgroup_root /sys/fs/cgroup tmpfs rw 0 0
none /sys/fs/cgroup/memory cgroup rw,memory 0 0
cpu /sys/fs/cgroup/cpu cgroup rw,cpu 0 0
/dev/md13 /mnt/ext ext4 rw,data=ordered,barrier=1,nodelalloc 0 0
tmpfs /samba_third_party tmpfs rw,size=32M 0 0
none /sys/kernel/config configfs rw 0 0
tmpfs /tmp/wfm tmpfs rw,size=80M,mode=0777 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,none,name=systemd 0 0
tmpfs /samba tmpfs rw,size=64M 0 0
/dev/mapper/cachedev1 /share/CACHEDEV1_DATA ext4 rw,usrjquota=aquota.user,jqfmt=vfsv1,user_xattr,data=ordered,data_err=abort,delalloc,nopriv,nodiscard,noacl 0 0
tmpfs /tmp/default_dav_root tmpfs rw,size=1k 0 0
tmpfs /share/CACHEDEV1_DATA/.samba/lock/msg.lock tmpfs rw,size=48M 0 0
tmpfs /mnt/ext/opt/samba/private/msg.sock tmpfs rw,size=16M 0 0

Alexey · September 22, 2024, 6:47am

Since there is no defaults, and the exec permission is not listed, I would assume that’s not allowed.

I also think that you cannot change the mount options in QNAP directly, am I right?
If so, the option with a docker volume should help to solve an issue.

beli · September 22, 2024, 5:17pm

yes - i tried to figure out a method to verify/change the exec-flag - no luck.

i double-checked and switched back the container-version to :latest.
the container was switching from others to running and so on continously.

i created a volume “storagenode-binaries” in the qnap container-station GUI and added "-v storagenode-binaries:/app/config/bin " to the run-command.

I saw that the volume-flag of “storagenode-binaries” switched to “in use”.

but - no change to the behavior.

Container-Station-Log:

Sun, 22 Sep 2024 19:12:23 CEST  WARN    images/update_images.go:62      Get "http://unix.socket/1.0": dial unix /var/lib/lxd/unix.socket: connect: no such file or directory
Sun, 22 Sep 2024 19:12:23 CEST  INFO    images/top_image.go:31  started getting docker top images worker
Sun, 22 Sep 2024 19:12:23 CEST  INFO    network-cache/network.go:37     started network config worker
Sun, 22 Sep 2024 19:12:23 CEST  INFO    network-cache/vswitches.go:41   started virtual switch worker
Sun, 22 Sep 2024 19:12:23 CEST  INFO    network-cache/vlan.go:42        started vlan worker
Sun, 22 Sep 2024 19:12:23 CEST  INFO    gpu/gpu.go:37   started GPU worker
Sun, 22 Sep 2024 19:12:23 CEST  INFO    network-cache/iface.go:42       started network interface worker
Sun, 22 Sep 2024 19:12:23 CEST  INFO    analytics/analytics.go:22       started analytics worker
Sun, 22 Sep 2024 19:12:27 CEST  INFO    middleware/debug.go:37  request url: [GET] /container-station/ws/v3/containers/inspect/docker/5f9ad3947aca8a93adeb68149fbc86ac5533c291740375fac320d0b4d5cbc686
Sun, 22 Sep 2024 19:12:27 CEST  INFO    middleware/debug.go:37  request url: [GET] /container-station/ws/v3/containers/logs/docker/5f9ad3947aca8a93adeb68149fbc86ac5533c291740375fac320d0b4d5cbc686
Sun, 22 Sep 2024 19:12:42 CEST  INFO    middleware/debug.go:37  request url: [GET] /container-station/ws/v3/containers/logs/docker/5f9ad3947aca8a93adeb68149fbc86ac5533c291740375fac320d0b4d5cbc686
Sun, 22 Sep 2024 19:12:43 CEST  INFO    middleware/debug.go:37  request url: [GET] /container-station/ws/v3/containers/inspect/docker/5f9ad3947aca8a93adeb68149fbc86ac5533c291740375fac320d0b4d5cbc686
Sun, 22 Sep 2024 19:12:43 CEST  ERROR   resource/resource.go:207        get container cgroup data fail
Sun, 22 Sep 2024 19:12:50 CEST  ERROR   ws/container_inspect.go:144     failed to send inspect message:websocket: close sent
Sun, 22 Sep 2024 19:12:50 CEST  INFO    ws/container_inspect.go:152     close ws
Sun, 22 Sep 2024 19:13:03 CEST  WARN    containers/container.go:386     failed to get to container resource:failed to get resource from redis: redis: nil
Sun, 22 Sep 2024 19:13:08 CEST  ERROR   resource/resource.go:207        get container cgroup data fail
Sun, 22 Sep 2024 19:13:24 CEST  WARN    containers/container.go:386     failed to get to container resource:failed to get resource from redis: redis: nil
Sun, 22 Sep 2024 19:13:24 CEST  WARN    containers/container.go:386     failed to get to container resource:failed to get resource from redis: redis: nil
Sun, 22 Sep 2024 19:13:28 CEST  ERROR   resource/resource.go:207        get container cgroup data fail
Sun, 22 Sep 2024 19:13:37 CEST  INFO    middleware/debug.go:37  request url: [GET] /container-station/ws/v3/containers/stats/docker/5f9ad3947aca8a93adeb68149fbc86ac5533c291740375fac320d0b4d5cbc686
Sun, 22 Sep 2024 19:13:44 CEST  ERROR   ws/container_stats.go:75        failed to send container resource: failed to get resource from redis: redis: nil
Sun, 22 Sep 2024 19:13:44 CEST  WARN    containers/container.go:386     failed to get to container resource:failed to get resource from redis: redis: nil
Sun, 22 Sep 2024 19:13:44 CEST  WARN    containers/container.go:386     failed to get to container resource:failed to get resource from redis: redis: nil
Sun, 22 Sep 2024 19:13:46 CEST  INFO    middleware/debug.go:37  request url: [GET] /container-station/ws/v3/containers/logs/docker/5f9ad3947aca8a93adeb68149fbc86ac5533c291740375fac320d0b4d5cbc686
Sun, 22 Sep 2024 19:13:48 CEST  ERROR   resource/resource.go:207        get container cgroup data fail

Container-Mount-Inspector:

{
Destination:"/app/config/bin"
Driver:"local"
Mode:"z"
Name:"storagenode-binaries"
Propagation:""
RW:true
Source:"/share/CACHEDEV1_DATA/Container/container-station-data/lib/docker/volumes/storagenode-binaries/_data"
Type:"volume"
}

Again, i have no chance to get any logentry from storagenode.

beli · September 22, 2024, 5:47pm

just switched back to storjlabs/storagenode:0c21bd6 but kept the binary-volume.
the container keeps running, but got no logs.

then i removed the “-v” parameter and the node is instantly generating logs

Alexey · September 23, 2024, 3:54am

The Container-Station-Log is not helpful, I need logs of storagenode container only.
Could you please provide the last 20 lines of the storagenode container logs when you use the latest tag?
Please use ssh and CLI, seems GUI didn’t capture something.
To find a name of the container:

sudo docker ps -qf ancestor=storjlabs/storagenode

or just

sudo docker ps

Then to get logs:

sudo docker logs --tail 20 container_name_here

Could you please post your docker run command? You may mask all private info.
I have a suspicion that you could put it after the image name, not before. Or you connected a volume via GUI, in that case you shouldn’t add -v parameter at all. The GUI would generate a correct docker run for you.

beli · September 23, 2024, 6:57am

Hi!
when the error is occurring i have really not one line of log by storagenode.
not in the GUI, not via docker-logs on cli neither in any files created by docker on the filesystem.

for clarification i never use gui for the run “creation” command.

my actual run-command:

docker run -d --restart unless-stopped --privileged \
    -p xx:28967:28967/tcp \
    -p xx:28967:28967/udp \
    -p xx:14002:14002 \
    -e WALLET="xx" \
    -e EMAIL="xx" \
    -e ADDRESS="xx:28967" \
    -e STORAGE="25TB" \
    -m 6144M \
    --user $(id -u):$(id -g) \
    --mount type=bind,source="/share/Container/STORJ/identity",destination=/app/identity \
    --mount type=bind,source="/share/Container/STORJ/config",destination=/app/config \
    --stop-timeout 300 \
## here i added -v storagenode-binaries:/app/config/bin \ ##
    --name storagenode storjlabs/storagenode:0c21bd6

beli · September 23, 2024, 11:12am

I have now the second qnap nas with “other/running”-loop.
0c21bd6 is going up instantly…

I hope we find the issue… There are several more nodes