Satelitte online < 100%

deathlessdd · December 27, 2020, 3:31pm

Unfortunately I wasn’t trying to do any testing nor was I tinkering or doing anything with my node. Actually had a usb flash drive completely die luckly I was home and have plenty of usb flash drives laying around, So people running any rpi running sd card or usb flash drives should always be prepared for it dying on them. Also Had some server ram die on me too go figure all around the sametime.

SGC · December 27, 2020, 4:15pm

i think there are a specific type of flashdrives today, specifically for running operating systems and such on RPI and similar devices.
the regular flash are mostly for storage and many of them are pretty terrible in many aspects like raw iops and immediate transfers…
i got a usb3 one that is fast, but it will not work for any workloads like a tv or such, because it has like some sort of latency from when the request goes to it, before it will be able to respond, then it will do fast transfers… not sure about the iops tho.
takes like a second or two before it wakes up, if not more…

also saw a thing about usb3 drives running truenas which was very interesting, apparently they will overheat in most cases an burn themselves out, then answer seems to be to use usb2 based flashdrives, tho they are ofc even more terrible in some aspects than usb3.

but not really that well versed in all that, i will say tho running ssd’s over usb works excellent.
don’t really usb flash drives for anything more than moving stuff around, and installing an OS from time to time.

plan on trying to do some PXE and related stuff now that i got PfSense as DHCP, thus i should be able to do network boot from bios, but not sure how or if that’s a thing on RPI and such…
sadly don’t have any RPI’s to tinker with, and really the similar ones i have been looking at, i was looking for a lot of analog IO, because i wanted them for controller purposes, temperature sensors, light switches and such… and it’s just a bother if all the IO is digital.

here is the video on the truenas usb3 thing, if you want it… hadn’t closed it yet, because i hadn’t gotten around to finishing watching it lol… after a week

Onet · December 27, 2020, 5:36pm

Hello,

A lot of new answer. Ok, I understand the delay of 12 hours. And I have made some modification now on my network. So, my next questionis: How can be sure that my node is fully operationnal and online?

If Status = Online, and port check show " Port 28967 is open.", can I be sure that everything is working ? Do you have any tools or proposal to make an external check and validate tahat everything is OK?

Thank’s.
Olivier

SGC · December 27, 2020, 9:03pm

personally i like to check the logs, they will generally be upload started, uploaded, download started, downloaded or deletions.

shouldn’t have to many errors, but there may be some… usually nothing to worry to much about…
some uploads or downloads will be cancelled if your storagenode or internet isn’t fast enough to supply or receive the data in time… these are marked as errors, even tho they really aren’t.

and then it will also throw an error, if a connection is dropped…

but if the online is green on the dashboard when updating / refreshing the page in your browser.
then it’s usually online…

and ofc you can see on the graphs on the dashboard that you are using bandwidth, tho it will take a long time to get any significant amount, but there will always be bandwidth usage… thats also a good indicator that it’s working correctly.

the thing to watch out for is audit score… if that starts to drop the storagenode is in trouble, that’s really the alpha and the omega… the audit score must never drop.

if its getting and sending data and audit score is 100% then it’s most likely fine

nerdatwork · December 28, 2020, 2:33am

In case you overlooked this post.

Onet · December 28, 2020, 2:55am

I see it, and I already have an account on uptime robot, that check the port and IP of my node. But, an open port and an ip responding don’t necessary say that the node is working. This is why I’m looking on a solution to be able to check it (maybe I can initialize a TCP/UDP flow on the port, to get the correct answer? Or smething like that?

I didn’t understand what happen, in fact… I have a 500/500 mbps fiber connexion, a proxmox cluster with HA activated and stockage on a DS418, and a full mikrotik network. I think I have a correct configuration to host a node. The only part à recently change is that I add an untangle FW on my system, to work as an UTM. I have bypass it this afternoon, to check if Online score stop decreasing…

Thank’s everone for your help!

deathlessdd · December 28, 2020, 3:46am

If your node goes down the port is no longer open so that is why uptimerobot already is pretty good. I don’t think theres any reason to have to run anything else to make it more complicated.

kevink · December 28, 2020, 6:39am

When I update/restart I always check the logs. If I have uploads and downloads in there, then my node is fine.

Alexey · December 28, 2020, 3:28pm

You should not block any traffic from any source to your node’s port and any traffic from your node.
Data is flowing directly between customers and your node, the satellite will contact your node only for audit and repair.

SGC · December 28, 2020, 5:26pm

i was actually pondering a bit about that recently, i noticed i have a … significant blocked incoming on my PfSense… nothing major maybe a 1% didn’t really check might be 0.1% wasn’t a big number, but there was still quite a stack…

if i was blocking some incoming storj connections, would that show up in the log…?

because i’m pretty sure i haven’t seen any connection related errors or issues in the logs of the storagenodes…

ofc it’s very possible it’s something else that PfSense is blocking, haven’t had time to dig into what it actually is…yet

Alexey · December 28, 2020, 6:35pm

How? The traffic even did not reach the storagenode… You can see that only in your firewall’s logs.

SGC · December 28, 2020, 8:21pm

valid point…
but isn’t it the satellite that direct the whole orchestra, i suppose they don’t really then… i guess they are working more in an accounting, tracking and dictionary type role then… when you say it like that.

so a client requesting data will send that request directly to the storagenodes… ?
also pretty sure i run upnp so they should be able to pass through whatever they want… they can just only get in on a correct tcp port for that particular node…

usually it’s not a problem to connect to ports online, so why wouldn’t they connect at the port my node is announced on in the first place?

you really think it could be that i block incoming storj traffic…
and in such a case how would i fix it… i mean i have to redirect data to a specific port or port range, else i basically direct the entire internet connection to one storagenode if all ports has to be routed to it.

personally i don’t think it’s storj traffic…i’m blocking… but figure i would ask…

well it’s a very low amount anyways… doesn’t seem to affect my audits, and thus i got other priorities maybe ill get around to figuring out what it is Q1 2021

BrightSilence · December 28, 2020, 10:35pm

Yes, the order is signed by the satellite though, so the node can verify it’s legit without communicating with the satellite.

upnp is not relevant, that just allows something inside your network to open a port and forward it. It doesn’t allow someone from the outside to do that (thank God). It is however known to be very insecure. Turn it off. Use explicit port forwarding when needed instead.

I think you skipped over the part where you explained what you’re actually looking at. If something is trying to connect to other ports than your storage node port, I highly doubt it has anything to do with Storj. The internet is full of trash trying every port all the time. It’s not unusual to get some stuff that is blocked.

Please stop and think about what you’re trying to do here… The internet is f’ing evil and you’re trying to open the doors to everything.

You could be blocking Storj traffic, but only if you have rules blocking something that would affect your Storj port. This could be blocking ip ranges or countries of origin etc. Incoming traffic on other ports is by definition not Storj traffic. If you don’t know what it is, you want to block it.

Now if you’re talking outgoing, you shouldn’t be blocking anything.

SGC · December 28, 2020, 11:09pm

yeah that is the plan, but to avoid crying i’ve turned it on so that the other users on the network’s didn’t notice a switch from the old router to PfSense.
i do fully intent to clamp down on their options, not even sure that they really use it anyways…
i’ve aware it’s not very secure, always a matter of walking that line between usability and security.

no argument there, was trying to make an argument for even if it was storj traffic, then there would be no sensible way to fix it.

no i was saying that if i routed all online ports to a single ip on the network, then nothing else would work because there would be no response aside from what that single local ip saw.

but yeah maybe i could have explained it more clearly.

(P.S)
figured it wise to check the UPnP setup, because i wasn’t totally sure how i left it… was tinkering a lot to get PfSense working and basically ended up turning everything (off / on) to try and get it working… without any luck.
apparently i had reset it back to default before starting my last config, so UPnP was off…
it’s just commonly used in most routers and the old one had it running, so i figured i would leave it on for a bit… apparently i forgot … weirdest part about it tho, is that there was no whining…

not really a huge fan of the UPnP but it sure does make life much more practical… until it doesn’t lol

i guess nobody is hosting any online computer games any more, certainly not now hehe
been thinking of clamping down on torrent traffic also… but i’m sure that won’t go unnoticed

kalloritis · December 29, 2020, 4:27am

https://192.168.0.1/status_upnp.php

see what is being used and create either NAT’s for them or tell the people to deal with it.

ndragun · December 30, 2020, 8:03pm

Yeah honestly I’m not convinced the dashboard is very accurate in detailing uptime. Here is a screenshot from a node that’s at least 18(I think) months old running v.1.18.1:

SGC · December 30, 2020, 8:36pm

one will in most cases, have different amounts of data for each satellite.
thus the number of audits pr month is very different, each audit basically represents an amount of time, with a node around your size, the satellite with the most data will most likely be around a 30 sec to 2 minutes.

while the ones with the least data might be 1 hour because of low data… so if you are offline for a certain amount of time, depending on how many audits happen while you are gone on the particular data you have will affect your online score %

it’s calculated from a time window, which if broken / failed audit that window will report the % of accepted audits during that time period… and after a month that window will be out of the avg calculation.

so each satellite will have it’s own accuracy based on the number of audits / data you store for it.
thus the online scores will basically never be the same.
if you have downtime this downtime will be included for a entire month from when it happened.

so yeah it is quite accurate, it’s not perfect… but nor does it need to be… and even tho the satellite with low data has a much more coarse count, they will keep around the same numbers as the accurate satellites.

satellites with no or low data can be very very coarse in their count… i think i got away with 3 days of downtime and it barely even registered it… but that was on a new node… so basically no data.

a 3% deviation isn’t that bad imo… remember before this people could vanish for hours and the uptime tracking of the network wouldn’t even know they where gone… now you basically cannot shutdown for 30-60 seconds without the network knowing about it…

but yeah it does look kinda inaccurate… but it’s just like hands of a clock… they are imprecise indicators that can only be as accurate as the gears behind them…

doesn’t mean the clock cannot tell “accurate” time… even if it counts in 15minute ticks

if you want to use if to compare with your total downtime use the number from the satellite with most data…
i’m sure you will find it will % wise of you uptime during the last 30 days will be quite close.