Topic of strategic importance

BoberHuligan · May 24, 2020, 3:33pm

Russian

У меня есть пачка дисков по 16ТБ и россыпь дисков по 6ТБ.
У меня есть 5 точек подключения интернет в разных частях мира с публичными айпи, кое где подключено 3-4 разных провайдера с публичными айпи.
В каждой из 5 точек есть “старый” пк уровня ай5 и 4/8/16ГБ оперативной памяти. Каждая точка - частный дом/квартира.

Как мне всё это хозяйство задействовать оптимальным образом?
Допустим, самый тяжёлый вариант: 4 разных провайдера интернет (т.е. 4 публичных айпи), 1 ПК ай5, 16ГБ оперативной памяти и ОС - пусть будет дебиан 10. Сколько дисков и узлов можно/нужно запустить (количество сата во внимание не берём - можно расширить адаптерами) оптимальным образом?
ПС: не связанные вопросы, типа администрироваия/проброса портов и пр. можно опустить.

Google translate:

I have a pack of 16TB disks and a scattering of 6TB disks.
I have 5 Internet connection points in different parts of the world with public IPs, in some places 3-4 different providers with public IPs are connected.
In each of the 5 points there is an “old” pc level i5 and 4/8/16GB of RAM. Each point is a private house / apartment.
How can I use all this economy in an optimal way?
Let’s say the most difficult option: 4 different Internet providers (i.e. 4 public IPs), 1 I5, 16GB RAM and OS - let there be debian 10/docker (maybe esxi is better -one vm - one node). How many disks and nodes can / should be started (we don’t take into account the amount of sata - can be expanded with adapters) in an optimal way?
PS: unrelated issues, such as administration / port forwarding, etc. can be omitted.

Alexey · May 24, 2020, 4:02pm

Russian

Вы можете запустить сколько угодно узлов с учётом минимальных требований для каждого узла: https://documentation.storj.io/before-you-begin/prerequisites
Оптимальнее - один узел на диск, но запускать следующий, когда предыдущий будет полностью проверен (для случая использования одного публичного IP).
Для случая публичных IP не из одной подсети /24 можно запускать параллельно, они будут работать независимо.

Если хотите упростить управление, можете пожертвовать два диска на чётность для RAID6/RAID10, но это снизит потенциальный доход как минимум на их объём и возможно 10% исходящего трафика с каждого из них. Зато упростит управление и защитит от выхода из строя двух дисков (один во время работы, а второй во время перестроения массива после замены первого).
Предлагаю не открывать спор RAID/No RAID и почитать тут: RAID vs No RAID choice

Разных провайдеров можно использовать для failover но только если вы знаете как его настроить.
Если не знаете - можете использовать для разных узлов разных провайдеров, хотя это тоже требует понимания, как это делать. Однако желательно не на одном ПК, чтобы не подвергать риску доходы и сеть потерей большого объёма данных сразу по нескольким каналам в случае выхода из строя этого ПК.

You can run as much nodes as you want, just take into account the minimal requirements: https://documentation.storj.io/before-you-begin/prerequisites
The optimal configuration is to use one HDD per node. Just run the next one when the previous one at least fully vetted (in case of public IPs from the same /24 subnet).
If each node would have a different IP not from the same /24 subnet, it could be run in parallel. But make sure to do not run them on the same PC in such case to do not put the risk to lose a noticeable amount of data in once if the PC is fail.

If you want to simplify the management in cost of potential profit, you can waste two disks for redundancy on RAID6/10 and run only one node per site.
I wouldn’t recommend to run RAID5 with today’s disks, it’s most likely will fail during rebuild after the replace of the first failed disk.
Please, do not start the flame regarding RAID/No RAID, use this thread for that: RAID vs No RAID choice

The different ISPs you can use as a failover if you know how to do that or can use a separate nodes per ISP, however you should know how to do that too.

SGC · May 24, 2020, 4:58pm

i would run a separate node in each location… or the best of the locations… in theory customers can be all over the world and thus the closes nodes will have the advantage for those particular customers…

maybe a bigger more powerful node in what seems to be the best location, else i doubt it really matters its generally a latency to the customer and bandwidth thing.

ofc you will run into a maintenance issue at one point where you might wish you had everything in one place, thus you might want to select a couple of great spots… fiber uplinks near some big datacenter or other such internet nexus.

and then set it up with a good measure of redundancy, like say raid 6 with a hotspare or raidz3 with a hot or a couple of cold spares you can spin up remotely if need be…
depending on how big of a pool you are making…
sure you may get a bit more data from using all locations… but i’m on 400mbit and i barely ever use more than 40-70mbit, and its mostly just uploads thus far… ofc having stuff in one spot could reduce the maintenance issue ever further and allow you to slack a bit more on redundancy.

but if you do a remote site op, you will need the ability to atleast replace disk’s when they break down to keep the storagenode intact.

the best setup… well you would most likely get more out of having nodes in many places… but i doubt its worth the effort… it could take a long time before you even see any profit… more nodes draw more power…

Alexey · May 24, 2020, 5:48pm

3 posts were split to a new topic: Costs of electricity to run a storagenode

Sasha · May 25, 2020, 2:25am

Can you run multiple’s of 1 node per 1 HDD on a single NAS? Ie. A NAS with 10 disks run 10 nodes? In other words 10 docker images?
All from a single /24 IP ?

kevink · May 25, 2020, 5:19am

Yes you can do that. I run 3 nodes on3 disks on one host with docker on a single IP. All nodes behind a single /24 IP share traffic.

frances · May 25, 2020, 8:32am

not sure if i´m understanding this.

i´m on a /20 subnet.
if my ip is 180.200.200.140 and my netmask is 255.255.240.0
so, my subnet is 180.200.192.0,
then i will be seen as the same node in that subnet.

but you´re saying it´s as if i´m in a /24.
So for StorJ my subnet will be 180.200.200.0, and I will collide as a node only with the host with ip such as 180.200.200.XXX

is that right?

SGC · May 25, 2020, 8:42am

it’s a binary mask, its just confusing to look at it in decimal numbers.

Determining the network prefix[edit]

An IPv4 subnet mask consists of 32 bits; it is a sequence of ones ( 1 ) followed by a block of zeros ( 0 ). The ones indicate bits in the address used for the network prefix and the trailing block of zeros designates that part as being the host identifier.

The following example shows the separation of the network prefix and the host identifier from an address ( 192.0.2.130 ) and its associated /24 subnet mask ( 255.255.255.0 ). The operation is visualized in a table using binary address formats.

	Binary form	Dot-decimal notation
IP address	`11000000.00000000.00000010.10000010`	192.0.2.130
Subnet mask	`11111111.11111111.11111111.00000000`	255.255.255.0
Network prefix	`11000000.00000000.00000010.00000000`	192.0.2.0
Host identifier	`00000000.00000000.00000000.10000010`	0.0.0.130

The result of the bitwise AND operation of IP address and the subnet mask is the network prefix 192.0.2.0 . The host part, which is 130 , is derived by the bitwise AND operation of the address and the one’s complement of the subnet mask.

Subnetting[edit]

Subnetting is the process of designating some high-order bits from the host part as part of the network prefix and adjusting the subnet mask appropriately. This divides a network into smaller subnets. The following diagram modifies the above example by moving 2 bits from the host part to the network prefix to form four smaller subnets each one quarter the previous size.

	Binary form	Dot-decimal notation
IP address	`11000000.00000000.00000010.10000010`	192.0.2.130
Subnet mask	`11111111.11111111.11111111.11000000`	255.255.255.192
Network prefix	`11000000.00000000.00000010.10000000`	192.0.2.128
Host part	`00000000.00000000.00000000.00000010`	0.0.0.2

http://www.subnet-calculator.com/subnet.php?net_class=b

so basically a /20 means you would have 255 x 255 type c subnets
a subnet mask of 255.255.0.0
ergo you can put a storagenode on every IP : 180.200.B.C
all the B numbers / addresses would be a full type C subnet
and the satellites assume people are behind type C subnets

i doubt it will last, because it would become fully unreasonable at one point… also you have to take into account this has to be global IP’s… you can be behind a firewall / nat / routing of sorts that basically do that you could have a near infinite amount of ip addresses at your disposal, but all of them are sharing a few global ip’s at the other side of the routing / nat / firewall.

so afaik you could put a storagenode on all 255 type b subnets… ergo in your case… 180.200.b.1
b being 1-255 and it would count as you being 255 people…
not sure how happy storj would be tho… nor if you actually have a type b global ip subnet assigned…
seems like a lot… if you aren’t like an ISP

frances · May 25, 2020, 8:54am

yes, I know… I was investigating it last night. Took me 3hours to fully understand it. The main problem been I was thinking in the hosts as the ISP side, when it relates to the user side. Your explanation is great, I wish I had seen it before

so, yes my netmask been
11111111.11111111.11110000.00000000
but for storJ it is always
11111111.11111111.11111111.00000000

so no matter what, every node with an IP with the same three first octets as yours, will be trouble

SGC · May 25, 2020, 9:02am

the /20 or /24 is kinda confusing… seems to be network administrator short form… to lazy to write 255.255.255.0 or 255.255.0.0

well it won’t be in trouble, it will just share the load with 255 other people at most… assuming they all have storagenodes, which is pretty unlikely… i suppose it’s a perfectly reasonable way to sort it in a quick and dirty way… but i doubt they will keep that approach for long… especially since they seem so focused on reducing abuse and unfairness… in which case it is a whole unreasonable way of controlling the data streams it.

stop giving me evil ideas in my head…
meh can’t even find any data on renting a type b global subnet… bet it ain’t cheap tho… easily thousands of $ a month.
so i’m guessing you don’t actually have full roam over a /20 global ip subnet, in that case you should know what it was lol

stuberman · May 25, 2020, 12:48pm

3 hours? That is very determined, most people never get it, even in IT unless you are a network engineer or a renaissance geek.

fmoledina · May 25, 2020, 1:08pm

@SGC, the subnet calculator that you linked to indicates that a /20 has a subnet mask of 255.255.240.0, which was already indicated by @frances. This would allow for 16 /24 networks on the one connection.

@frances, this setup would theoretically allow 16 unencumbered Storj nodes behind your connection. What this means is that the satellite will not intentionally split the bandwidth between your nodes, and you’d likely only be limited by your ISP provided bandwidth.

This wouldn’t necessarily be in the spirit of decentralization that the Storj project is seeking, but would technically work.

SGC · May 25, 2020, 2:27pm

gotcha, well he did also spend 3 hours on looking at it… i might have skimmed it a bit to quickly no surprise there…
Been decades since i’ve really had to really deal with subnet math, that also makes it a bit more reasonable a global ip range to have… i think i pay like close to 10$ for 1 global ip… most likely getting robbed by my ISP on that, but not really much choice if one wants to keep it simple.

ofc might be an easy place to cut some extra expense and just do ddns or something like that.
i would assume a ddns type solution would add a slight latency to the overall time… not much tho… i suppose…

stuberman · May 25, 2020, 3:37pm

I am not sure what we are getting at other than an interesting if not academic exercise. Most people will have multiple nodes behind a single NATT’ed public IP (assume they are at home rather than using a data center). I use AT&T which does not subnet at /24, but Storj will assume /24 ranges - meaning if any of my neighbors in the range run Storj, we will be considered as a bloc. Yeah there are some exotic ways to split traffic such as using a cloud proxy to redirect traffic, but what does that practically buy you? Is it the feeling you can grab more traffic and profit more? I doubt the math would work out well in that case. Nonetheless the discussion is interesting and helpful.

SGC · May 25, 2020, 5:03pm

if storj gets really popular it might be nice to know where nodes are located on the subnets, so that we don’t step on each others toes… right now i doubt it’s a real consideration, but man would it suck to be say 3 people sharing a subnet on storj and then one guy is running 98 storagenodes…

wouldn’t that essentially mean the other two would get get 1% of the subnet data from storj each… while the other guy gets 98% obviously… also the other two wouldn’t be able to improve their numbers… until they start adding more nodes…

well i’m being told that i make threads go off topic… which i suppose is fair, maybe thats just how my mind works, or doesn’t lol so i think ill leave it there since this is only mildly related to the topic of strategic importance.

now that i think about it tho…it is kinda related to strategic importance of a storagenode… even if more in the theoretical, but could be a real issue for some…and they might not even know.

stuberman · May 25, 2020, 5:33pm

It is far from clear to me how traffic gets apportioned. I understand that the those close to a customer should get preferential treatment. Not sure what that algorithm looks like - is it by latency? Router hops? Today I have a node that is nearly full (97%) of 9 TB which has been active for 6 months and another that is new and nearly untouched. (I understand it is in the process of being vetted.) I think in your example, @SGC, the disparity is probably just at startup (months) and over time all nodes will be relatively loaded (assuming they are reputable), so perhaps more luck of the draw in terms of customer activity. One customer uploads a set of files that is very popular and constantly being downloaded by their customers. Another customer uploads files and they just sit there, that would be less profitable.

SGC · May 25, 2020, 5:59pm

i was mainly thinking test data, littleskunk talked about it and it seems to be a sort of subnet lottery, ofc when we will start to see more customer data it might become irrelevant as the locations and bandwidth may be much more of a factor…

BrightSilence · May 25, 2020, 6:16pm

Node selection is currently basically random. The satellite first selects 110 /24 subnets at random then selects a random node within those subnets. Offline, suspended and disqualified nodes are excluded. Unvetted nodes will only be selected at a rate 5% of vetted nodes.

Latency comes in later. When those nodes are being uploaded to, the race starts. As soon as 80 pieces have been uploaded, the rest are cancelled. If your node isn’t among the fastest 80, you lose out on that piece.

Currently reputation doesn’t yet play a role, but it might later on.

frances · May 25, 2020, 8:53pm

I´ve been thinking about that a lot. Now the traffic is just test, almost all of it, isn´t it? So in many ways, not representative behaviour of the net. I can’t remember the last time I went into my Dropbox files, and those are for free to download. I´m guessing that few people will download very often files that cost them to move. So the most probable way of using Tardigrade is as a static store. Obviously, I´m just projecting my needs and ways into this, so maybe I´m very wrong (I hope actually)

stuberman · May 25, 2020, 9:13pm

I wonder what the paying customers will look like, I imagined it would be commercial accounts looking for a way to save money on hosting files. I just downloaded some old AMD drivers - would it make sense for them?

Before I was aware of Storj, I was interested in IPFS as a way to host and distribute files where the closest storage would serve the request. One concept was that a large chunk of the static Internet files could be stored there and literally used on a Mars or moon base station (rather than the ridiculous notion of retrieving the files from Earth).