Understanding the Docker port publishes

Craig · January 1, 2021, 8:32pm

I’ve been doing some reading up today on how I would go about adding a second HDD and node to my existing Linux Docker setup, 2 nodes on 1 machine. One thing I know would need to be handled carefully is the usage of ports. I didn’t see a good explanation on what the port list ("-p") parameters in our Docker commands represented. I’ve read up on Docker and think I now understand. Considering a “container” as a sort of packaged virtual machine helped me to wrap my head around this. Putting this out there for critique (I’ll edit if needed, please provide feedback) and reference by the community. @SGC, maybe something for your flight manual.

Each storage node container has two fixed ports on which it will listen: 28967 for cloud communication and 14002 for the dashboard. Use of the Docker -p (--publish list) parameter allows us to define forwarding to these ports at the machine level. The structure of the parameter is as follows:
-p [MachineIP:]MachinePort:ContainerPort

Because the container ports are defined in the node code these will always be 28967 and 14002 unless the code changes.

The machine values are how we control communications outside of the container (machine level) reaching the internal ports of the container, how we publish the ports. For a single node the documented and recommended settings are to use the same machine ports as the container ports. If you will be using the dashboard from the same machine hosting the node this is where you will use the machine IP string of “127.0.0.1:” to allow loopback.

If you intend to run multiple nodes on the same machine or for some reason already have the recommended ports in use on the machine, you will need to specify different ports for the machine side. This will change the ports which you need to use to access the dashboard or to which your firewall forwards node traffic. For example, if you already have a node on the machine set up with:
-p 28967:28967
-p 14002:14002

Your second node will need to have different published machine ports defined. You might use:
-p 28970:28967
-p 14010:14002

So to access the dashboard of your first node from a separate machine you would use
http://<node hostname/IP>:14002

But to access your second node dashboard you would use
http://<node hostname/IP>:14010

Similarly your firewall forwarding rules would need to specify internal ports of 28967 for your first node and 28970 for your second. Remember that it is your firewall external ports that are referenced in the “-e ADDRESS” parameter; changing your Docker published ports does not directly effect the value of this.

What do you think, sirs?

SGC · January 1, 2021, 8:52pm

i was going to say that the information is here…

https://documentation.storj.io/setup/cli/storage-node

but yeah it really isn’t or to generally described and also did confuse me when i first got started…
i went with different ip addresses tho… so each docker container is on it’s own ip address which correlate with the the ports.

i suppose running them all on the same ip would be kinda neat, because then one would just be able to switch the final number in the url to access them…but is that possible… i really need to try out running multiple containers in docker lol and not multiple docker lol

ill give it a whirl tomorrow, current i got each storagenode on it’s own container with it’s own ip address… so don’t really have those issues with the ports… i really like that it would be so easy to switch between the different dashboards and such…

Alexey · January 1, 2021, 11:43pm

I would not recommend to publish your private information (dashboard) with unrestricted access, please, do not publish a dashboard’s port to the Internet, use a remote access instead: https://documentation.storj.io/resources/faq/how-to-remote-access-the-web-dashboard

xavidpr4 · January 2, 2021, 1:09am

That kind of remote access is difficult to do with a smartphone.
I have multiple dispersed nodes and i managed to play with iptables of each to allow only my public ip. When connectin from my house is fine now, however, when connecting with smarthphone outside of my house, that won’t work.

To have a login in the dashboard screen would be great…

Regards.

kevink · January 2, 2021, 6:21am

The easier way (and maybe generally useful) is to run a vpn on your node, e.g. using https://github.com/kylemanna/docker-openvpn which runs openvpn in a docker. Worked well for me.

Alexey · January 2, 2021, 9:49am

I use Termius and JuiceSSH clients on my smartphone, nothing difficult. The one of the clients (Termius) is described in the linked article above.

SGC · January 2, 2021, 9:53am

i just use the dash over my lan…
my stability has been getting better and better all the time… so don’t really pay it that much attention any more… or i try not to… when i’m at the computer it’s quite distracting lol.

Pac · January 2, 2021, 12:50pm

I use JuiceSSH too, and it works really well

Craig · January 4, 2021, 5:11am

So in looking at my node today via docker ps -a I noticed the way the result had the ports listed. I’m assuming that the displayed 0.0.0.0:14002->14002/tcp indicates that the hardware is listening on port 14002 across all IPs that are configured on the device. If that is correct, then I suppose I should to change:

to:

Can someone confirm if my new interpretation is correct, that if the IP isn’t specified the port is open to the local network but if the loopback interface is specified that it is restricted to listening on only that IP, meaning you need to be on the hardware itself to access it?

I’ve got this open (within my network, not outside my firewall) in my own setup so I can quickly check the dashboard from whatever computer I’m using in my home at the time. If I’m remote and want to check, I’ve already got an ssh tunnel available to a separate Linux server in my network. I’ll remote into that device and then access the dashboard from there. If I feel I need to bump up the security another step I’ll use the directions as Alexey has provided. For me this is just a hobby right now to put some spare hardware I have kicking around to use. This is more a project in expanding my knowledge than a money-making endeavor. The possible extra bucks that can come in are just a nice bonus.

kevink · January 4, 2021, 7:51am

Your understanding is correct.

Craig · January 4, 2021, 3:05pm

Thanks kevink, learned something new. I’m quite green with Docker and only slightly experienced with Linux but I’m having fun with learning more about this as I go. I could probably run circles around you guys on an IBM i (AS/400) tho!

Unfortunately it looks like the window allowed to edit my initial post has passed as the edit function doesn’t show available so having the post later in the thread will have to suffice.

deathlessdd · January 4, 2021, 4:27pm

Do you mean dos? Some days I kinda miss dos but Linux is so much better then dos…

Craig · January 4, 2021, 5:43pm

Nope, talking about the IBM midrange platform that now runs on their Power architecture.

The command line of DOS was where I cut my teeth and favoring the command line to a mouse helped me to learn the IBM i. Also sets me up as a good fit for Linux, the next area I’ve been exploring.

deathlessdd · January 4, 2021, 5:46pm

Oh ok some reason I didn’t play around with this OS I was thinking something else, I did look at it looks pretty interesting for its time.

SGC · January 6, 2021, 9:41am

AS/400 was an antique like 20 years ago tho… but it being a mainframe i suppose those are a pain to switch out, i was actually taught on those… can’t remember anything about it tho, aside from it was terminal coding / db type stuff.

anyhow so i got around to testing out the whole running multiple storagenodes on the same ip addresses… seems kinda cool… ofc with the new multi dashboard it might not be needed, but the port configuration sure is confusing now.

and it works… i do see some other advantages of adding everything into one ip tho… but also more difficult to move stuff around… and for port mapping because it’s not each an ip thats easily redirected by changing the dhcp server entry.

i’ll go over it and then add it to the SNO flight manual.

Craig · January 6, 2021, 2:58pm

I went through some steps over the weekend to prepare to add a second node to my RPi where I’m running my first one. Not really necessary but satisfies some OCD itches. Things like renaming the drive’s mount point from “storj” to “storj1”. Updating my script to start and define the node in Docker to be named “storagenode1” rather than the documented “storagenode”. This way when I add a second drive and set up the node it will be mounted at “storj2” and be named “storagenode2” in Docker. Each mapped port will be incremented by 1, and I’ll add either add a new single port forward in my firewall or move my existing storj port forward to be a port range forward to include both. (Running DD-WRT on an Asus RT-N66U router.) And then I’ll also be ready for the eventual add of “storagenode3” since I have a 3rd drive sitting around. Not sure if I’d ever add a 4th node on that RPi (4 cores so could meet the requirement of 1 core per drive) but it should be simple to do if I wanted.

Pac · January 8, 2021, 10:15pm

As per the official doc, that’s right. Although many (me included) tried running more nodes on a RPi 4B successfully, mainly because the more nodes you run, the less activity each of them will have to handle. So all in all, the CPU is enough even if it doesn’t have as many cores as the number of nodes.

Each node requires a bit or RAM though, so that’s something to keep an eye on, especially if it’s not the 4GB version.

Do not take that as a recommandation, just as a feedback

Craig · January 9, 2021, 3:28pm

Crap, completely forgot about the RAM. It’s only and RPi 3 B+ so with one node running and the --memory=800m parm as recommended in one of the guides, I’m about maxed out on the 1GB of RAM this unit has. Guess my next move with this particular node will be to migrate from the 1TB drive I started with to one of the 2TB drives I now have available. Oh well. Looking to be a node op with only existing spare hardware I’ve just got laying around the house so that might be as far as I go for a bit.

I’ve just removed and restarted my container without the --memory parm to see how it does sorting out memory allocation on its own.

Pac · January 9, 2021, 7:25pm

Then I guess it has 1GB of RAM only. That’s not much, so I would probably not run 10 nodes on this machine…

This said, memory usage can go up if activity is high and disks can’t keep up. Otherwise, it tends to stay pretty low per node.

Right now everything is quiet, and here is the total RAM usage on my RPi 4B (raspbian headless) running 6 nodes:

top - 19:23:21 up 20:23,  1 user,  load average: 0.84, 0.73, 0.68
Tasks: 167 total,   1 running, 166 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.4 us,  2.0 sy,  0.0 ni, 87.2 id,  7.0 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem :   3906.0 total,    670.0 free,    331.0 used,   2905.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   3441.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1885 root      20   0  818936  38120  15116 S  14.2   1.0 143:25.82 storagenode
 2344 root      20   0  818684  34588  15316 S   4.0   0.9  35:07.65 storagenode
 2038 root      20   0  819008  35836  15284 S   2.6   0.9  87:58.73 storagenode
 2199 root      20   0  818564  34096  15456 S   2.0   0.9  55:34.90 storagenode
[...]

Less than 400MiB in total (OS included), and roughly 40MB per node.
But we should keep in mind that the more RAM there is, the better it will perform globally as Linux uses the rest of free RAM for caching (we can see I have almost 3GB or cached RAM above).

But still, considering these numbers, I guess a RPi 3B+ could run 2 or 3 nodes. Maybe more… If someone who tried this already could confirm that’d be cool

Craig · January 9, 2021, 8:03pm

Thanks for the input Pac. I might go ahead and try it myself then, the 2TB drive is already just sitting aside doing nothing. If I start up the new node now that gets me moving towards being out of the “held back” window that much quicker. I know as both nodes would be behind one public IP I shouldn’t expect to see my overall traffic to change by much or any, and my existing node will likely see a reduction in ingress as data shifts to balance between the nodes.

It has only been a few hours but I haven’t seen any ill effects from my removal of the --memory=800m parm in my existing node.