New node with first 24h offline, disqualified?

Hello,

Could you please tell me if the new node I created is now disqualified? I made a mistake when I set up the node. I launched a first time, but I thought the connection failed (probably didn’t waited enough time, I was too tired and confident it was not working), stopped and I went sleep. I only tried again the next day and was surprised it was actually working since the first attempt. Then, I realized in the dashboard than the uptime has been affected and is 50% or less on most of the satellitte, but it was working normally these first days, except I never seen auditing requests, and dashboard just shows 100% for each satellitte.

This evening, since one or two hours, after it sent a last order, it stopped to work, and when i restart the node, it shows offline, but the port is working normally. Can I conclude its disqualified?

Node is 12B5suGhHi6Wfqx562gRRxm9zD4nuvb2Tonrk9tzkERnzSgRngf.

Regards.

Hi Floxit, I spoke with one of the engineers here and they said that disqualified nodes are still online. They also asked whether you had checked your dashboard, because you can open it to see if you have been disqualified.

Im glad you asked this question, because its a good reminder (for us inside the Storj company) to have a look at the documentation here. We are doing a lot of things fast, so its good to have reminders to make sure that the documentation is also keeping up with our pace, and being updated.

1 Like

Thank you Jocelyn. Yes, I checked directly the dashboard. I don’t know what happened, but after the last order sent, the node just stopped to receive new activities at this time. Before I stopped and restart the node, the dashboard was just showing the node “online”, and nothing about disqualification or any issue (also, the node was just checking the version normally, but nothing else, while my other node was working normally). Then, I restarted, and now, the node stayed in “offline” status… I rebooted the router, the computer, … it has been offline for one hour at teast.

About uptime, I guess I better to start another node because of the first 24h offline, if that uptime counter is not reset in the next updates and if the node is still offline tomorrow ?



Capture3

I wanted change the internal address or port, because, I was trying to run this node on the other ethernet interface (I have two ports, 1x intel and 1x realtek ethernet device), which connect to another router and another Internet connection and ip. It was working these last days, so I assumed it works, but both nodes were using the same internal ip. But I used “-p 192.168.1.2:28967:28967 -p 192.168.1.2:14002:14002” hoping it uses my another internet interface. It has its own unique identity as well. Well, I guess it was working properly, because the traffic was transiting through the other internet connection (and on Windows, on the other ethernet card as it shows well in the task manager GUI). I know its better to run on a separate hardware, but the current hard drives are connected on the same machine, and that machine is also the most secured in home with a UPS backup. I was trying this option before I think about buying a mini dedicated hardware like the Pi 4, and because I wish use myself that second hardware simply on my Windows machine. Also, each node has a dedicated hard drive. Performance is not a problem.

EDIT: I stopped the first node, and relaunched the second, and it went back online again… I think my setting is just not working fine, because I’m using/sharing maybe the same private internal address, which is probably a very bad thing, I guess? I’m not sure that the internal address is connecting to my second ehternet interface, and I didn’t found a way to change it (tried to change the port by 7779 in the settings toi avoid that kind of conflict, or replace 127.0.0.1 by the 192.168.1.2 which is my second local ip (the first one/default one associated to another router is 192.168.178.2), both failed, and I read 7778 is hardcoded)… I might probably better to stop that second node maybe… It is also working slower than the first one, but the connection speed is quite the same, I don’t know if its because the node is new/low reputation or if its actually not working at all and splitting the data between the two nodes, and not working like a standalone node, which was my goal, but I think the internal address/port could be the problem if 127.0.0.1 is using the default card and, sometimes, a conflict happens and put down the internal server of a node ?..

Well, sorry, its actually my fault to set the things like that… At least, with a proper identity for each node, I hope they are working individually and not dividing the data like if I had installed two nodes behind the same internet connection, which is not the case and not that I wanted, but legitimately drive the traffic of this new node to the other internet connection, like a standalone.

From you screen
Public server address ::28967
This means that node listen on all network interfaces. If second node setup this way you get conflict.
You need change this port or bound to specific IP. Better change port.

This way you make inbound traffic working properly but not outbound. All you nodes will use default route for outbound traffic. For fix this you need more knowledge about networking and routers.
I have two provider’s too and few nodes on one server, but in my case all providers connected to one router with Linux where make routing descition for inbound and outbound traffic for each storagenode.

7778 not Hardcoded. This port for dashboard, you can specify it in storagenode dashboard - - address command.

If someone get interest i Can describe how to setup multihomed network in Linux for storagenode in other post.

Thank you Krey.

Oh, actually, it might be good, maybe the other network had issue, it only happened after a few days. But it looks working fine again. I think the storagenode replaces that default ::28967 by my run command, which is -e ADDRESS=“floxit2.ddns.net:28967” and -p “192.168.1.2:28967:28967”. The two internet connections are working properly on Windows 10, I can already connect remotely to the sessions on all the others machines or stream with OBS on the other network while I play online on the main connection, and Docker was installed with this setting, but I assume it uses by default the main connection, but it actually binds well the local ports for each dashboard.

When I check the GUI dashboard and CLI dashboard, both are working in the Powershell and browser. And I can see individually the dashboards of the two nodes with no issue. I’ve basic knowledges in the network, I already managed another project on linux/ubuntu and on a Raspberry Pi (which is the alternative ultimately if this second node is a mess). So yes, I read a bit about how create networks on docker, but I also read the Docker Desktop is quite different compared to linux because it deals with the Windows networking. I guess, the outbound is working if I’m able to get back both nodes online with a normal activity ? I can see the traffic of download/upload using the other connection and only for the operations I can see in the logs while the main node continues to deal normally with its traffic. But I can also see that my oldest node get a faster traffic, and the new node is slower (I guess because the new one also have less data and less downloads requests too).

The only thing I don’t get on this new node, is no GET AUDIT requests. Maybe because its a new node and I didn’t received AUDIT requests yet ?

But, something I didn’t make is specify for the first node my main local ip, to be sure it listen on the other local device. It was maybe the reason of the conflict since it listen on all network by default (I saw that when I had only one, I could see the dashboard through the two localhosts).

no, it don’t. See “public server started on” in you logs
if both nodes this string is equal - this is bad, you need fix it.

What command you use to open each cli dashboard? Did you see different ID on each?

docker exec -it storagenode /app/dashboard.sh --color
node1

docker logs -f --tail 100 storagenode

GUI:

docker exec -it storagenode2 /app/dashboard.sh --color
node2

docker logs -f --tail 100 storagenode2

GUI:

Yes, I missed the fact that you are using docker. It creates its own subnet and add another NAT translation.
But again it is better to add -p 192.168.x.x:28967 with it is own subnet on each docker run command.

1 Like

I agree, but that’s actually what I did with “-p “192.168.1.2:28967:28967””, but I forgot to add the proper address for the first one. Its also possible that the problem came from outside at this time (the other router come from my own FAI and is old, they didn’t updated it for years and advanced/custom settings are hidden, like the priority packets, system monitoring, logs…), so I don’t have fully trust).

But it could be great to do a little guide of your setting for linux, as you suggested. I think it would be helpful, because not everybody wish buy another dedicated hardware and monitor remotely. But again, I keep that option if it continues to have troubles later, and monitor with Putty, but I prefer keep my drives locally attached by SATA on my Windows OS, because I keep a part of the two 12Tb drives for me at this time (8tb goes to Storj for each) and I don’t like the idea to give me the access remotely by linux (and potential problems, and lower speed eventually). Its also the most stable and secure (with the UPS backup) machine I have there currently (but yes, in term of stability, linux rocks, I agree with you!). My eventual upgrades would be two new 12 tb drives to complete my rack, and just span virtually to have 2x 24tb (and at least 16tb for Storj for each, if it was necessary). I don’t really care about backup, Storj stated themself its not necessary for the network (because of erasure code, multiples copies, well, you know it!), I’ll just lose eventually the node at this time, and I’ll start with another one if I have no time to make a full backup to recover it. I think its not worth the RAID if we don’t have the hardware for ourself. I personnally bought a mini server to connect a cheap Tape Drive refurbished from eBay to do my long-life cold backups (essentially our photos/videos) + DVD backups.

Thanks Krey. :+1:

The problem is now fixed. – For the people reading that topic lately, the reason was probably a port/subnet conflict because this is a second node running in Docker Desktop (on Windows), with two physical ethernet cards connected to two separated routers and two Internet connections from two different FAIs too. The node apparently worked, but it was offline randomly, sometimes only after a couple of days (no error, but no activity in the logs except the version checking and “offline” statut in the dashboard).

In the listening server address in config file, I changed the port for node 2 by ::28968. For staying “logic” in my mind, I also changed the external/public port by 28968 (also updated in the router, of course). In the logs, its now well listening on that port. So, in run command, I changed by -p 192.168.1.2:28968:28968. And the other node is also listening on its own subnet as well (192.168.178.2:28967).

Since that, it works smoothly without interruption, so the problem was the port conflict obviously (but it was not possible to add any local or public address in the config file besides that port, you can only change the port, at least in my case, or at running, it reports it can’t bind the address to your port (even if its the actual subnet where the port is listening).

It was actually simple, but I read contradictory (and probably obsolete from old versions) informations on that forum about “do not change the port”, but it was actually the thing to do.

The uptime has been severally affected though durning the first days of running, so I hope it will increase normally with the time. But the new node is still quite peaceful, I think its normal. I also received first rare Auditing requests, its okay.

Thanks again for your help.
Cheers!

You can do port mapping in this way: -p 192.168.178.2:28967:28967, in this case this node will bind only to the 192.168.178.2.
The same way for the other node. However, the Windows will mix traffic for both network interfaces anyway, and your nodes would be still offline from the time to time. It just can’t divide the traffic by design. Usually it’s not suitable for routing purpose, the Linux would be better.

1 Like

Yes, that’s actually the command I added in the first node, also for the others ports. I checked with docker ps, and I can see each node listens properly on its own subnet and port. I can also see in the Windows Task Manager that the traffic is transiting individually with the upload/download corresponding to the current request, so it looks really fine (in my personal use of the desktop, I also only use the main connection, not this other connection, I’m using it to connect remotely to others machines connected locally to this network, but I already used on OBS for the streaming because you can attach an individual interface in this software). No error in the logs, except the first node had already an issue before I install that node, with all the GET_REPAIR requests since I updated the source line storjlabs/storagenode:alpha by beta (and added the GUI port binding since the last updates to see it in my browser). It looked crash the database when I have followed the logs at the start, then after a moment, it erased the db files automatically and the node started again, but the size of storage is now wrong.

Error is “file not exist” or this one, I didn’t find it in the forum:

could not get hash and order limit {“error”: “v0pieceinfodb error: sql: no rows in result set”, “errorVerbose”: “v0pieceinfodb error: sql: no rows in result set\n\tstorj.io/storj/storagenode/storagenodedb.(*v0PieceInfoDB).Get:132\n\tstorj.io/storj/storagenode/pieces.(*Store).GetV0PieceInfo:539\n\tstorj.io/storj/storagenode/pieces.(*Store).GetHashAndLimit:352\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doDownload:494\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Download:396\n\tstorj.io/storj/pkg/pb.DRPCPiecestoreDescription.Method.func2:838\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51”}

The last one is a consequence of the missed piece.

1 Like

Much better to remove docker and setup all addresses in config yaml. Where at least to address Public address that reported to satellite and listening address which storagenode process bind for listening inbound packet. More simple and transparent configuration.

1 Like