VPN port forwarding for Pi nodes?

MarkFeuer · July 9, 2024, 2:36pm

I think you’re onto something with this theory, and thankfully it helped me figure out the problem.

I had consistently been setting up my Wireguard connection on the Pi in advance of setting up Docker and Storj. As soon as I used sudo wg-quick down wg0 to cut the Wireguard connection and then reran the initial Docker setup for the Storj node container, it worked without any issues.

Then I turned the Wireguard connection back on with sudo wg-quick up wg0, and when I executed the initial Docker run command for the storage node, I ran into my earlier issues where I couldn’t get the web or CLI dashboard to start, and the container was running in limbo. So I wiped out the container, turned off Wireguard and redid setup and the initial Docker run.

This time, the Storj container went up, and I could access the dashboard completely fine, but the node was offline and QUIC was misconfigured. But as soon as I turned Wireguard back on, the dashboard showed that the node had snapped back online and QUIC was OK. (Thanks to @arrogantrabbit for the recommendation about the server address)

So, I guess the takeaway from this is, Wireguard can be installed and configured, but turning it on has to be the absolute final step in the setup process or else Storj won’t start properly.

I also set Wireguard to start automatically at boot in case of power outages with this: sudo systemctl enable --now wg-quick@wg0 but we’ll have to see if this interferes with Storj starting up in Docker on a restart. If so, this setup will require manual intervention on restart.

MarkFeuer · July 9, 2024, 6:50pm

Just received an email that my node has gone offline on the Saltlake satellite, and QUIC is showing as misconfigured again. It seems as though the node is connecting without issue and holds a 100% score with all other satellites except the Tardigrade one, which has had a 0% score from the start.

Knowledge · July 9, 2024, 7:31pm

That is strange. Perhaps there is something blocking traffic to that location. Are you behind any kind of firewall?

MarkFeuer · July 9, 2024, 8:05pm

I’m fairly certain my home network is behind CG-NAT or Double NAT from the ISP, but I’ve disabled my firewall on the Oracle instance. I’m confused why that specific server is blocked if it’s using the same port as the other three.

Would my node be at risk to be disqualified/suspended completely for a 0% on one server?

arrogantrabbit · July 9, 2024, 8:11pm

You can switch to podman, and use systemd dependencies to ensure the node starts after the network is up. Here is the reference on how to switch to podman: Running auto-updatable services in rootless containers with podman on Oracle Linux/RHEL/Fedora with SELinux enabled | Trinkets, Odds, and Ends

Knowledge · July 9, 2024, 10:32pm

No, you only get DQ’d on the one server. Ultimately it would be good to figure out why traffic is blocked on just the one. Have you checked the logs for Saltlake entries to see if there is any traffic at all?

Some firewalls as well as ISP’s could block traffic from a host if they think it isnt legit traffic. You might contact your ISP and ask if they blocked it.

MarkFeuer · July 10, 2024, 12:14am

My ISP is Spectrum (Hello from Columbus!) so it’s certainly possible they may block it. I can call them and ask for more information, but they don’t really budge much when I ask them for anything.

Checking my log file, there are errors sprinkled in for various piece IDs similar to these. Nothing that outright names Saltlake or Tardigrade, but you might know better if that satellite ID matches:

{"log":"2024-07-09T23:56:42Z\u0009ERROR\u0009piecestore\u0009upload failed\u0009{\"Process\": \"storagenode\", \"Piece ID\": \"MSFV67H7JBOV4MQIDO5SCAE23S6PP5PCCSIEPDJRYPP4SEEX6M2A\", \"Satellite ID\": \"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S\", \"Action\": \"PUT\", \"Remote Address\": \"10.66.66.1:47232\", \"Size\": 262144, \"error\": \"manager closed: unexpected EOF\", \"errorVerbose\": \"manager closed: unexpected EOF\\n\\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:225\\n\\tgithub.com/jtolio/noiseconn.(*Conn).Read:171\\n\\tstorj.io/drpc/drpcwire.(*Reader).read:68\\n\\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:113\\n\\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:229\"}\n","stream":"stdout","time":"2024-07-09T23:56:42.278873402Z"}

{"log":"2024-07-09T21:08:36Z\u0009ERROR\u0009piecestore\u0009upload failed\u0009{\"Process\": \"storagenode\", \"Piece ID\": \"HETU5SF5EH3PNBFJG5HMM2XXRV4AMKOT6Z2KZQ5VIILQZK3UNHUA\", \"Satellite ID\": \"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S\", \"Action\": \"PUT\", \"Remote Address\": \"10.66.66.1:54942\", \"Size\": 196608, \"error\": \"manager closed: unexpected EOF\", \"errorVerbose\": \"manager closed: unexpected EOF\\n\\tgithub.com/jtolio/noiseconn.(*Conn).readMsg:225\\n\\tgithub.com/jtolio/noiseconn.(*Conn).Read:171\\n\\tstorj.io/drpc/drpcwire.(*Reader).read:68\\n\\tstorj.io/drpc/drpcwire.(*Reader).ReadPacketUsing:113\\n\\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:229\"}\n","stream":"stdout","time":"2024-07-09T21:08:36.518564699Z"}

pangolin · July 10, 2024, 12:22am

Those messages are from US1 satellite. Saltlake ID is: 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE

MarkFeuer · July 10, 2024, 12:37am

Thank you, that helped me find it in the logs. Looks like there are errors reported every 15-30 minutes referencing it.

{"log":"2024-07-09T21:15:04Z\u0009ERROR\u0009contact:service\u0009ping satellite failed \u0009{\"Process\": \"storagenode\", \"Satellite ID\": \"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE\", \"attempts\": 11, \"error\": \"ping satellite: rpc: tcp connector failed: rpc: read tcp 172.17.0.2:40832-\u003e34.94.153.46:7777: read: connection reset by peer\", \"errorVerbose\": \"ping satellite: rpc: tcp connector failed: rpc: read tcp 172.17.0.2:40832-\u003e34.94.153.46:7777: read: connection reset by peer\\n\\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190\"}\n","stream":"stdout","time":"2024-07-09T21:15:04.439801212Z"}

MarkFeuer · July 10, 2024, 3:10am

Some additional info - This is from my Pi-hole set up as the DNS gateway for my router. It looks like there are repeated requests for saltlake.tardigrade.io in my approved traffic. I have the Pi hosting the Storj node set up to bypass the Pi-hole’s adlist so it shouldn’t interfere with its connection.

Alexey · July 10, 2024, 7:23am

exactly this one will not work unfortunately. You need to provide it as an command line option after the image name, i.e.

...
storjlabs/storagenode:latest \
--server.address=10.66.66.2:28967

Or use an environment variable using this pattern:

-e "STORJ_SERVER_ADDRESS=10.66.66.2:28967"

however, this is a bad example anyway, because -e ADDRESS would override the last one, unlike the command line option.

Alexey · July 10, 2024, 7:27am

So, this only mean that you do have a public IP. If so, why do you run a VPN at all? Leave it working without overcomplicating, remove the excess layer of complicity.
I hope by “initial docker command” you didn’t mean the SETUP step? It should be executed only once for the identity, otherwise there is a great chance to destroy the worked node.

MarkFeuer · July 10, 2024, 1:38pm

So, this only mean that you do have a public IP. If so, why do you run a VPN at all? Leave it working without overcomplicating, remove the excess layer of complicity.

It’s possible I have a public IP, but speaking from experience testing Storj without this setup, I also have a draconian ISP that blocks traffic on virtually all ports. Using the Docker run command with Wireguard turned off works to start the Docker container, but then the node dashboard shows it as offline. Using a Wireguard tunnel to Oracle gives me a public IP where I have the ability to open ports.

And I was referring to the initial Docker run command, not setup. Can confirm that setup has only been executed once.

MarkFeuer · July 10, 2024, 1:41pm

Thank you for providing the command line option - To clarify, is this something that would be added in the Docker run command, or is this something added in another place like the CLI?

Alexey · July 11, 2024, 3:18am

It’s for storagenode, neither docker nor the CLI, but it can be specified in your docker run command after the image name (to pass it to storagenode). You may also change this option in the config.yaml file instead.

but you said previously

“Worked without any issues” confused me, because without any issues mean not just running a container, but also it’s ONLINE and QUIC is OK.

MarkFeuer · July 11, 2024, 2:07pm

I may try to go that way if I can briefly stop my storagenode’s Docker container, and start it up again by adding that specification after the image name. Thank you for sharing that.

Sorry, I should clarify, you are right that I ran setup more than once, but it only fully executed once. What I was documenting there was this: I had Wireguard running before I ran the initial setup command, and when I ran setup in that scenario, it would freeze and fail to finish the setup, seen here:

Connecting to version.storj.io (version.storj.io) | 34.173.164.90|:443... connected.

But when I turned Wireguard OFF, and then ran setup, setup executed without any issues. I wasn’t trying to say that the Storj node container was online with QUIC OK, just that setup had completed properly.

I observed a similar situation with the Docker run command, where having Wireguard turned on when I first ran it would prevent it from starting, but having Wireguard off allowed the Docker container to actually start up, albeit showing offline in the dashboard. And when I turned Wireguard back on, that’s what finally allowed my Storj node to go online, albeit going back and forth between QUIC OK and Misconfigured.

Alexey · July 12, 2024, 7:45am

This makes me think, that you have something conflicting regarding a network when you up the Wireguard client.
Check that they are not intersects by IP/port anyhow.

MarkFeuer · July 12, 2024, 3:06pm

Could that also potentially be the reason that I am unable to connect to the Saltlake server? Error from logs below:

{"log":"2024-07-09T21:15:04Z\u0009ERROR\u0009contact:service\u0009ping satellite failed \u0009{\"Process\": \"storagenode\", \"Satellite ID\": \"1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE\", \"attempts\": 11, \"error\": \"ping satellite: rpc: tcp connector failed: rpc: read tcp 172.17.0.2:40832-\u003e34.94.153.46:7777: read: connection reset by peer\", \"errorVerbose\": \"ping satellite: rpc: tcp connector failed: rpc: read tcp 172.17.0.2:40832-\u003e34.94.153.46:7777: read: connection reset by peer\\n\\tstorj.io/common/rpc.HybridConnector.DialContext.func1:190\"}\n","stream":"stdout","time":"2024-07-09T21:15:04.439801212Z"}

pangolin · July 12, 2024, 4:35pm

Have you tried different servers from your vpn provider?

Since it works for the other satellites it seems to be something between the vpn server and saltlake satellite.

MarkFeuer · July 14, 2024, 5:41pm

I think I am locked in with the server I have. Oracle makes you choose a region where your instance will be hosted when you first set up your account. I think you can add on other regions, but it costs money. If the Storj node is working for free on three out of four satellites, I’m good with that.

Also @arrogantrabbit , if my node dashboard is correct, I’ve accrued about 1 terabyte of bandwidth usage in 5 days. So, at the current rate, I will average about 6 terabytes of bandwidth usage in a month, and that will keep me within Oracle’s free 10 terabytes of data egress it allots in a month. I should be able to avoid data fees added on as long as I keep that rate.