How to add second node

JWvdV · October 26, 2024, 8:08pm

Feel free to do so, depending on whether your computer is really included in those networks the instances sharing the same virtual LAN might be able to access your page. I don’t know whether that’s really an issue in your case. I would even expose them to the internet, if there was any use of it.

xgDkAbzkp9yi · October 26, 2024, 8:14pm

Your help is completely fine as it is Despite my sabotage attempts

Unfortunately, I never had storj docker containers starting correctly after reboot or shutdown. I tried many things. They just stay dead. I think default expectation for storj is to magically turn itself on after restart, but I never had this privilige. I don’t know if this is my docker version issue, my OS issue (Devuan, which is a Debian system without systemd), or something else. I wasted a lot of time trying to have storj working upon reboot without fruition, so I wrote this script, and it works now.

This script will do the following upon reboot or manual call:

docker stop -t 300 (nodename) (if node is running)
docker rm (nodename) (if it exists)
docker run -d (all parameters)

It goes the job, I tested it with many reboots. Sounds correct?

xgDkAbzkp9yi · October 26, 2024, 8:28pm

Some RTFMs:

The --restart unless-stopped option in Docker provides automatic restart behavior, which can be useful for ensuring your nodes come back online after a reboot or a Docker daemon restart. Here’s an overview of the restart policy options you could use:

–restart unless-stopped:
Container restarts automatically unless explicitly stopped.
If your primary goal is to have the nodes running after reboots without manually starting them, this option is generally suitable.

Never worked for me.

My next best could be:
--restart on-failure:10
That will restart node docker instance if it crashes, up to maximum of 10 retries, but will not try to start after reboot, since I am doing it by script anyway.

xgDkAbzkp9yi · October 26, 2024, 8:49pm

Sorry but that is incorrect, turns out. I removed it, and just realized… Second node failing:

QUIC is misconfigured. You must forward port 28967 for both TCP and UDP to enable QUIC.

See Step 3. Setup Port Forwarding - Storj Docs on how to do this

It seems, SERVER_ADDRESS is mandatory to declare, unless you declare it in config.yaml. My node failed after I reverted my changes in yaml file, trying to consolidate all powers to the script. So now, I brought STORJ_CONSOLE_ADDRESS back to the script, and it’s working. For simplicity, I will try to get everything in the script to keep all config files in nodes intact.

Alexey · October 27, 2024, 6:05am

It sounds like you didn’t enable docker to start automatically.

This is only mean, that you changed server.address: option in your config.yaml for that node. Otherwise it shouldn’t work, because the default value is

server.address: :28967

xgDkAbzkp9yi · October 27, 2024, 10:57am

I’ve been on that page. I don’t have systemd, but SysVinit on Devuan system. Devuan is a Debian Stable fork but with systemd stripped out.
I also don’t have containerd service in my system. containerd package is installed though, and everything is working. But not autostart of storj.

JWvdV · October 27, 2024, 5:08pm

Alright, that’s what counts: it works.

I’ve got Ubuntu and Debian, both indeed always restarting at reboot. So, I had to take care everything was mounted fine before the actual restart (although, if not the case there were some measures implemented by STORJ preventing any harm).

Indeed seems to be a known Devuan problem:

Docker AIO container does not start after reboot on Devuan · Issue #4458 · nextcloud/all-in-one · GitHub
none driver: support non systemd ways to restart docker · Issue #6954 · kubernetes/minikube · GitHub

Isn’t true, because you already forward that port by

Exactly the same thing @Alexey is saying.

So, what do you logs say?

And I can’t emphasize the fact you need to keep it as simple as possible: the example I gave you is also a second node on the same system. Always working fine, without any further ado or adaptations. I usually don’t even adapt any other options in the config.yaml.

xgDkAbzkp9yi · October 27, 2024, 6:08pm

Thanks for sharing. That’s what I suspected. Not storj issue, but OS issue. Anyway, I figured my way out of it, by using my own startup script.

JWvdV:

Isn’t true, because you already forward that port by
xgDkAbzkp9yi:
-p "28968:28967/tcp" \
  -p "28968:28967/udp" \
Exactly the same thing @Alexey is saying.

So, what do you logs say?

My apologies if I was wrong, and thank you. I will dig more, to figure it out!

Agreed! I like my script and I want deployment to described in the following steps:

format a partition
add it to fstab and mount it
add partition path, ports and capacity into my script
run my script

That’s it. No meddling with config.yaml, that would be perfect.

xgDkAbzkp9yi · October 27, 2024, 6:09pm

For anyone interested, this is my script in the current form:

#!/usr/bin/env python3

import os
import subprocess

# Define the common parameters
WALLET = "x"
EMAIL = "x"
ADDRESS = "x"

# Free space of empty drive:
# df
# Available 3877849020 1k blocks = 3877 GB

# 	node name			mount point					port   port   capacity
NODES = {
    "PK2334PBJ8WX9T":	("/mnt/PK2334PBJ8WX9T",		28967, 14002, "3867GB"),
    "WD-WCC7K3NHF837":  ("/mnt/WD-WCC7K3NHF837",    28968, 14003, "3867GB"),
    #"S301AT7P":			("/mnt/S301AT7P",			28968, 14003, "3697GB"),
    #"S301AT9B":			("/mnt/S301AT9B",			28969, 14004, "3697GB"),
    #"S30198AE":			("/mnt/S30198AE",			28970, 14005, "3697G"),
    #"S30198MA":			("/mnt/S30198MA",			28971, 14006, "3697GB"),
}

# Get the IP address of the eth0 interface
def get_ip_address(interface='eth0'):
    try:
        ip_info = subprocess.check_output(f'ip addr show {interface}', shell=True).decode()
        for line in ip_info.splitlines():
            if 'inet ' in line:
                return line.split()[1].split('/')[0]
    except subprocess.CalledProcessError:
        return None

IP_ADDRESS = get_ip_address()

# Temporary override to 127.0.0.1
# IP_ADDRESS = "127.0.0.1"

# Get UID and GID
UID = subprocess.check_output('id -u', shell=True).decode().strip()
GID = subprocess.check_output('id -g', shell=True).decode().strip()

# Loop through each node and start the Docker container
for NODE_NAME, (MOUNT_LOCATION, NODE_DATA_PORT, NODE_WEB_PORT, STORAGE_CAPACITY) in NODES.items():
    ca_key_path = os.path.join(MOUNT_LOCATION, "identity/ca.key")
    
    if not os.path.isfile(ca_key_path):
        print(f"Error: ca.key file not found in {MOUNT_LOCATION}/identity. Skipping {NODE_NAME}.")
        continue

    # Attempt to remove the container if it exists
    try:
        existing_container = subprocess.check_output(f'docker ps -aq -f name={NODE_NAME}', shell=True).decode().strip()
        
        if existing_container:
            # Check if the container is running
            running_container = subprocess.check_output(f'docker ps -q -f name={NODE_NAME}', shell=True).decode().strip()
            if running_container:
                print(f"Stopping the running container: {NODE_NAME}")
                subprocess.run(f'docker stop -t 300 {NODE_NAME}', shell=True)

            print(f"Removing the container: {NODE_NAME}")
            subprocess.run(f'docker rm {NODE_NAME}', shell=True)
    except subprocess.CalledProcessError:
        pass  # If the container doesn't exist, ignore the error

    # Run the Docker container
    # restart options:
    # --restart unless-stopped (starts node after reboot, didn't work)
    # --restart on-failure:10 (restart on node crash, max 10 retries)
    print(f"Starting the Docker container for {NODE_NAME}...")
    command = (
        f'docker run -d --restart on-failure:10 --stop-timeout 300 --memory="4g" '
        f'-p "{NODE_DATA_PORT}:{NODE_DATA_PORT}/tcp" '
        f'-p "{NODE_DATA_PORT}:{NODE_DATA_PORT}/udp" '
        #f'-p "{IP_ADDRESS}:{NODE_WEB_PORT}:14002" '
        f'-p "{NODE_WEB_PORT}:14002" '
        f'-e STORJ_SERVER_ADDRESS=":{NODE_DATA_PORT}" '
        f'-e WALLET="{WALLET}" '
        f'-e EMAIL="{EMAIL}" '
        f'-e ADDRESS="{ADDRESS}:{NODE_DATA_PORT}" '
        f'-e STORAGE="{STORAGE_CAPACITY}" '
        f'--user "{UID}:{GID}" '
        f'--mount type=bind,source="{MOUNT_LOCATION}/identity",destination=/app/identity '
        f'--mount type=bind,source="{MOUNT_LOCATION}",destination=/app/config '
        f'--name "{NODE_NAME}" storjlabs/storagenode:latest'
	)
	# Print the constructed command
    print(f"Running command: {command}")
    
    # Execute the command
    subprocess.run(command, shell=True)

Still yet to determine if I need STORJ_SERVER_ADDRESS= or not with 100% default config.yaml.

JWvdV · October 27, 2024, 7:43pm

xgDkAbzkp9yi:

command = (
        f'docker run -d --restart on-failure:10 --stop-timeout 300 --memory="4g" '
        f'-p "{NODE_DATA_PORT}:{NODE_DATA_PORT}/tcp" '
        f'-p "{NODE_DATA_PORT}:{NODE_DATA_PORT}/udp" '
        #f'-p "{IP_ADDRESS}:{NODE_WEB_PORT}:14002" '
        f'-p "{NODE_WEB_PORT}:14002" '
        f'-e STORJ_SERVER_ADDRESS=":{NODE_DATA_PORT}" '
        f'-e WALLET="{WALLET}" '
        f'-e EMAIL="{EMAIL}" '
        f'-e ADDRESS="{ADDRESS}:{NODE_DATA_PORT}" '
        f'-e STORAGE="{STORAGE_CAPACITY}" '
        f'--user "{UID}:{GID}" '
        f'--mount type=bind,source="{MOUNT_LOCATION}/identity",destination=/app/identity '
        f'--mount type=bind,source="{MOUNT_LOCATION}",destination=/app/config '
        f'--name "{NODE_NAME}" storjlabs/storagenode:latest'

You don’t seem to be listening…

command = (
        f'docker run -d --restart on-failure:10 --stop-timeout 300 --memory="4g" '
        f'-p "{NODE_DATA_PORT}:28967/tcp" '
        f'-p "{NODE_DATA_PORT}:28967/udp" '
        f'-p "{NODE_WEB_PORT}:14002" '
        f'-e WALLET="{WALLET}" '
        f'-e EMAIL="{EMAIL}" '
        f'-e ADDRESS="{ADDRESS}:{NODE_DATA_PORT}" '
        f'-e STORAGE="{STORAGE_CAPACITY}" '
        f'--user "{UID}:{GID}" '
        f'--mount type=bind,source="{MOUNT_LOCATION}/identity",destination=/app/identity '
        f'--mount type=bind,source="{MOUNT_LOCATION}",destination=/app/config '
        f'--name "{NODE_NAME}" storjlabs/storagenode:latest'

Is just fine and enough.

Besides, is quite uneasy to adapt your script for every new node.

I myself used in the past:

Mount all drives under storj/nd[0-9]*
For-each on all these folders
Data port = 28966 + number following the nd-foldername above (starting with 1).
Dashboard port = 14001 + number following the nd-foldername above (starting with 1).
Size as calculated above.

So I didn’t have to keep the arrays up-to-date.

No, you don’t. As long as you don’t complicate the stuff. Only necessary if you didn’t use docker. With docker it’s like you’re running several virtual systems / VMs, in which you manage the NAT with -p option.

Why do you assign 4G to each docker instance? Mine are using 250M at maximum…

xgDkAbzkp9yi · October 27, 2024, 8:17pm

I think you meant 28967 in both cases:

        f'-p "{NODE_DATA_PORT}:28967/tcp" '
        f'-p "{NODE_DATA_PORT}:28966/udp" '

Also, you removed commented out line, that line doesn’t affect anything.
And lastly STORJ_SERVER_ADDRESS, yesterday my second node failed due to lack of that. I will try it again with suggestions you made.

No special adaptation needed. Everything is declared in the table here:

NODES = {
    "PK2334PBJ8WX9T":	("/mnt/PK2334PBJ8WX9T",		28967, 14002, "3867GB"),
    "WD-WCC7K3NHF837":  ("/mnt/WD-WCC7K3NHF837",    28968, 14003, "3867GB"),
    #"S301AT7P":			("/mnt/S301AT7P",			28968, 14003, "3697GB"),
    #"S301AT9B":			("/mnt/S301AT9B",			28969, 14004, "3697GB"),
    #"S30198AE":			("/mnt/S30198AE",			28970, 14005, "3697G"),
    #"S30198MA":			("/mnt/S30198MA",			28971, 14006, "3697GB"),
}

I can add as many nodes as I want, easily. All I need to do beforehand, is to format the partition and mount it where it should be.
And each mount point is a serial number of the drive. If anything disappears, I will know exactly which drive died.

Good. And what do you use now, if anything?

Agreed

I will be happy to decrease it later. Better to declare 4G than not declare anything.

xgDkAbzkp9yi · October 27, 2024, 8:33pm

This version works perfectly well with two nodes so far. Just restarted it, nodes got up right away. I made sure config.yaml files are all-default.

    command = (
        f'docker run -d --restart on-failure:10 --stop-timeout 300 --memory="1g" '
        f'-p "{NODE_DATA_PORT}:28967/tcp" '
        f'-p "{NODE_DATA_PORT}:28967/udp" '
        f'-p "{NODE_WEB_PORT}:14002" '
        f'-e WALLET="{WALLET}" '
        f'-e EMAIL="{EMAIL}" '
        f'-e ADDRESS="{ADDRESS}:{NODE_DATA_PORT}" '
        f'-e STORAGE="{STORAGE_CAPACITY}" '
        f'--user "{UID}:{GID}" '
        f'--mount type=bind,source="{MOUNT_LOCATION}/identity",destination=/app/identity '
        f'--mount type=bind,source="{MOUNT_LOCATION}",destination=/app/config '
        f'--name "{NODE_NAME}" storjlabs/storagenode:latest'
	)

Thank you for your invaluable help, friend.

JWvdV · October 27, 2024, 9:13pm

Since I wanted my nodes to have all a local IP, I ended up using lxc-containers since docker doesn’t support local area networking out of the box. Than you have to do complicated stuff, like assigning multiple LAN-IPs to your ethernet adapter using macvlan. A solution that broke everytime when the network went down for any reason, like just rebooting the router.

I still use docker, but it would be light weighter if I just used the binary. But then I also have to take care of the updating process myself. So I’m not up to spending the time now.

Alexey · October 28, 2024, 7:52am

should be

f'-p "{NODE_DATA_PORT}:28967/udp" '

JWvdV · October 28, 2024, 8:53am

Correct, was already noticed by the TS.
Therefore not corrected afterwards. Bit for future reference, I will change it in the original post as well

xgDkAbzkp9yi · November 20, 2024, 6:19am

Thanks, I’ve read these threads and I’ve tried adding option they arrived that to docker, re-launching nodes with this and then doing reboot (due to system updates). Unfortunately, nodes still didn’t start after reboot. So I will stick to my trusted script. I have added further new nodes to it, everything works nicely.