Node suddenly goes offline

slonick81 · August 26, 2025, 1:57pm

I’m running node in LXC container in proxmox since last october. Yesterday it went offline without any visible reason (ie me updating software or messing with server in general).

Docker image is running

root@storj:~# docker ps
CONTAINER ID   IMAGE                          COMMAND                  CREATED         STATUS          PORTS                                                                                                                                         NAMES
3bdb216fc959   storjlabs/storagenode:latest   "/entrypoint --opera…"   10 months ago   Up 13 seconds   0.0.0.0:14002->14002/tcp, :::14002->14002/tcp, 0.0.0.0:28967->28967/tcp, :::28967->28967/tcp, 0.0.0.0:28967->28967/udp, :::28967->28967/udp   storagenode

services inside docker run as well

root@storj:~# docker top storagenode                                 
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                22450               22429               0                   13:57               ?                   00:00:00            /usr/bin/python3 /usr/bin/supervisord -c /etc/supervisor/supervisord.conf
root                22472               22450               0                   13:57               ?                   00:00:00            /bin/bash /usr/bin/stop-supervisor
root                22473               22450               0                   13:57               ?                   00:00:00            /app/config/bin/storagenode run --config-dir config --identity-dir identity --version.server-address=https://version.storj.io --storage.allocated-disk-space=10TB --contact.external-address=<address>:28967 --operator.email=<my mail> --operator.wallet=0xBC9c92ABf0De49e0C6df4Be31B023eb9D925915b --operator.wallet-features=zksync-era
root                22474               22450               0                   13:57               ?                   00:00:00            /app/config/bin/storagenode-updater run --binary-location /app/config/bin/storagenode --config-dir config --identity-dir identity --version.server-address=https://version.storj.io

storagenode diag posts this

root@storj:~# docker exec storagenode /app/config/bin/storagenode diag
2025-08-26T13:59:53Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}
Error: stat /root/.local/share/storj/storagenode: no such file or directory
storage node directory doesn't exist /root/.local/share/storj/storagenode

But dashboard is not accessable and I’m getting “your node is down” emails.

What should I do and what direction to dig in?

RecklessD · August 26, 2025, 9:59pm

Looks like your data is not mounted

arrogantrabbit · August 26, 2025, 10:48pm

Or user account the node is running as is not longer root.

It is a recommended practice to move configuration folder to the same place as storage.

Also — don’t run node as root. It is not necessary.

xgDkAbzkp9yi · August 27, 2025, 4:34am

Check host dmesg and other logs to find out the reason for the unmount, or other data inconsistency.

slonick81 · August 27, 2025, 8:41am

Turned out I had to set up DNS IP in LXC container settings. It was empty by default and optional. I’ve set it to 8.8.8.8 and node came online. Why did it become necessary after 10 months of smooth working? Who knows.

“Storagenode diag” still produces same output BTW

root@storj:~# docker exec storagenode /app/config/bin/storagenode diag
2025-08-27T08:35:25Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}
storage node directory doesn't exist /root/.local/share/storj/storagenode
Error: stat /root/.local/share/storj/storagenode: no such file or directory

Go figure…

hwm.land · August 27, 2025, 8:51am

Can you elaborate a bit how/where it’s done?

slonick81 · August 27, 2025, 9:06am

Sure

arrogantrabbit · August 27, 2025, 3:36pm

This makes zero sense. Looks like you have multiple issues. DNS settings don’t make files in users local home unreadable. Restarting the container after setting dns is what made a difference. This time. It will fail again. Because you did not find, let alone fix, the root cause

slonick81 · August 27, 2025, 5:57pm

Look, during two days of digging into it I’ve restarted the container like two dozens times, rebooted whole server twice - didn’t help. And this thing with DNS is repeatable, I changed it back and forth several times to check for sure.

And about this message

Error: stat /root/.local/share/storj/storagenode: no such file or directory
storage node directory doesn't exist /root/.local/share/storj/storagenode

AFAIK it’s about mounting the “identity” folder, according to docker config

               "Type": "bind",
                "Source": "/root/.local/share/storj/identity/storagenode",
                "Destination": "/app/identity",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"

It does exist

root@storj:~# ls -la /root/.local/share/storj/identity/storagenode/
total 36
drwxr--r-- 2 root root    8 Oct  5  2024 .
drwxr--r-- 3 root root    4 Oct  5  2024 ..
-rw-r--r-- 1 root root  558 Oct  5  2024 ca.1728140999.cert
-rw-r--r-- 1 root root 1092 Oct  5  2024 ca.cert
-rw------- 1 root root  241 Oct  5  2024 ca.key
-rw-r--r-- 1 root root 1100 Oct  5  2024 identity.1728140999.cert
-rw-r--r-- 1 root root 1634 Oct  5  2024 identity.cert
-rw------- 1 root root  241 Oct  5  2024 identity.key

and is mounted inside the docker container

root@storj:~# docker exec storagenode ls -la /app/identity
total 36
drwxr--r-- 2 root root    8 Oct  5  2024 .
drwxrwxrwx 1 root root    4 Oct  5  2024 ..
-rw-r--r-- 1 root root  558 Oct  5  2024 ca.1728140999.cert
-rw-r--r-- 1 root root 1092 Oct  5  2024 ca.cert
-rw------- 1 root root  241 Oct  5  2024 ca.key
-rw-r--r-- 1 root root 1100 Oct  5  2024 identity.1728140999.cert
-rw-r--r-- 1 root root 1634 Oct  5  2024 identity.cert
-rw------- 1 root root  241 Oct  5  2024 identity.key

Why any utility should try to poke external path from inside the docker container to begin with?

arrogantrabbit · August 27, 2025, 6:53pm

It’s not an external path. By default, if nothing is configured in config.yml, or not passed as arguments to storagenode, it will indeed be looking for identity under ~/.local/share/storj/identity/<name>. It just happens to be the same in the container as on your host because in both cases username is root (seriously, don’t run node as root), but it’s not reaching outside – that is simply impossible.

The fact that it attempts to look there in the container is a strong signal that either your other docker run arguments are ducked, or the config file is unreadable for whatever reason – because otherwise it would be looking for it under /app.

Understand, that while DNS could be a culprit if you mount some network share by unicast names, specifying public DNS would not fix it – cloudflare does not know about your network layout.

Show your full continer invocation and/or LCX config and node invocation. It shall closely match Storage Node - Storj Docs. Also check permissions on the config file. If you have AppArmor/selinux enabled – then it would be whole other can of worms.

(However, inside LCX, it shall be indistinguishable from full linux, so if that does not work – it only means LCX still suck, and you shall migrate to a proper OS that just works instead of fighting windmills. I’m talking FreeBSD & jails. (the degree of contempt I feel towards majority of linux is indescribable). As a stop-gap measure you can scrap LCX and run storj container in Podman rootless. This is a supported configuration and should work well.)

Alexey · August 28, 2025, 3:50am

To do this, you also need to add the option to specify where the configuration file is located, i.e.

docker exec storagenode /app/config/bin/storagenode diag --config-dir config

It’s a literal command, so you can copy and execute as is, config here is a folder named config and it’s in the current directory, because the image has a workdir is set to /app. So, you may also specify it as --config-dir /app/config to do not use a relative location.

because if you do not provide a path, it will try to use the default location, which is

in your case.
It’s easy to check:

docker exec -it storagenode /app/bin/storagenode setup --help | grep config-dir

slonick81 · August 28, 2025, 3:33pm

I tried it, looks like it loads the config (both by relative and absolute path) but doesn’t change the “diag” output at the end.

root@storj:~# docker exec storagenode /app/config/bin/storagenode diag --config-dir config
2025-08-28T15:22:18Z    INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}
2025-08-28T15:22:18Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}
                                             Satellite|      Total|         Put|         Get|   Delete|   Audit Get|   Repair Get|Repair Put
    1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE|   13.2 GiB|     6.8 GiB|     2.7 GiB|      0 B|    15.2 MiB|      1.4 GiB|2.1 GiB
   121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6|    1.2 TiB|   671.9 GiB|    98.6 GiB|      0 B|    56.5 MiB|    425.0 GiB|70.6 GiB
   12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S|    9.4 TiB|     7.5 TiB|     1.4 TiB|      0 B|    29.4 MiB|    267.9 GiB|232.8 GiB
   12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs|    6.0 TiB|     4.3 TiB|   257.5 GiB|      0 B|    31.3 MiB|      1.2 TiB|243.9 GiB
root@storj:~# docker exec storagenode /app/config/bin/storagenode diag --config-dir /app/config
2025-08-28T15:23:33Z    INFO    Configuration loaded    {"Process": "storagenode", "Location": "/app/config/config.yaml"}
2025-08-28T15:23:33Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}
                                             Satellite|      Total|         Put|         Get|   Delete|   Audit Get|   Repair Get|Repair Put
    1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE|   13.2 GiB|     6.8 GiB|     2.7 GiB|      0 B|    15.2 MiB|      1.4 GiB|2.1 GiB
   121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6|    1.2 TiB|   671.9 GiB|    98.6 GiB|      0 B|    56.5 MiB|    425.0 GiB|70.6 GiB
   12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S|    9.4 TiB|     7.5 TiB|     1.4 TiB|      0 B|    29.4 MiB|    267.9 GiB|232.8 GiB
   12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs|    6.0 TiB|     4.3 TiB|   257.5 GiB|      0 B|    31.3 MiB|      1.2 TiB|243.9 GiB
root@storj:~# docker exec storagenode /app/config/bin/storagenode diag
2025-08-28T15:23:38Z    INFO    Anonymized tracing enabled      {"Process": "storagenode"}
storage node directory doesn't exist /root/.local/share/storj/storagenode
Error: stat /root/.local/share/storj/storagenode: no such file or directory

It’s somewhat expected because the node have been loading the correct config anyway even before I issued these commands.

Alexey · August 29, 2025, 6:45am

You need to add --config-dir every time, it doesn’t modify anything, so it’s not persistent.