Put all the Hardware to the Work

realcm · April 29, 2021, 4:50pm

Storj Daemon, Credentials, Config should be placed on the same physical drive where storj node is places.
Because in case of hardware failure, for example, PSU, or motherboard, it will take much longer time for repairment, than for plugging physical drive into different PC.
Limitation of 1 node per IP does not make any sense. Because If OPerator has lots of hardware, mobos, hdd, ram, cpu, psu, etc… - one can put all the hardware to work.
Assembling many PCs with many hdds each.
It will not decrease survivability of the data by any means.
But in reverse - together with node detachment and mobility it is going to give more reliability to the storage facility.
Exactly opposite situation is guaranteed if there is limitation of 1 node per ip or operator.

because of low income and low efficiency - this is no longer a craft, - it is a toy instead.
Dockers is another thorn. It is slow and buggish, and requires updates updates updates…
No one is going to burn electricity and waste time for such “worth not time and money spent” business.
What I did last time - I just switched hosting off, and went to live my life.
Because during update it dropped all the data, making it unrecoverable.
Those who want to “try it” with just one node, and some minor space, - are specifically less reliable than experiences OPerators with lots of hardware and well engineered circuitry and internet connection.
Most people use “gray IP”, or behind the NAT connections.
Where as real IP costs money about $3 per month, and used mostly by proficient IT technicians, not by kind of “I think I’d try it for few weeks” people.

Probably there should be some kind of contract for reliable operator. Or for operators who are going to be reliable. Because every operator asks himself

does it worth it?
Technician may be very good one. But estimation of efforts, depth of planning, it all depends on promise for final result.
And of course, no one is going to build reliable facility for miserable cents… which is all that one can hope for, with 1 node per IP limitation.
Which is literally an artificial limitation, the blockage of hardware usage.

And by the way - I had very reliable nodes, with rating of 5000 from 5000 possible.

Stob · April 29, 2021, 5:08pm

Hi @realcm,

I’ll take your bait…

Already mentioned as part of the installation - Identity - Node Operator
Decentralised means there is no one point of failure. Having PB’s of data in one location on a single IP means if you lose power, or internet, or have a hardware failure the network could be badly affected.
A business with an increasing number of suppliers and customers seems to not be a ‘toy’.
You turned off, why are you back? (rhetorical)
Reliable node operators have the held back amount reduced and then returned.
Good for you.

stuberman · April 29, 2021, 5:17pm

There is no limitation on one node per IP address.

I have three nodes, each one its own hard drive on the same machine/IP address.

realcm · April 29, 2021, 5:44pm

But wait - there is no single point of failure…!
Data is spreat nearly holographically across the whole Internet.
That means If there will be electricity or ISP failure - it will temporary disconnect redundancy data blocks of randomly selected users.
It is not more failure than, say, 10 nodes at once, instead of 1 node at once.
Compare it to overall failures and it is gonna be less than 0.0001% difference.
If ISP fails - the whole segment fails with all the nodes, no matter 1 node per 1 IP of 10 000 nodes per 10 000 IPs.
If electricity fails in the whole city - same thing.
And now compare reliability of “noob” nodes, to reliability and care of dedicated facility nodes, made by home grown technicians.
So this limitation of 1 node per IP is not what is going to work.
Besides, for last 5 years, we haven’t more than 10 hours of ISP and electricity failures in these 5 years total.

I mean that 1 node per IP limitation is not going to help with improving reliability and reward rates.
Of course there might be some reasonable limitaitons, without certification.
But if I have not just a home garage facility, but the dedicated facility with backup power generators, and backup Satellite uplink, controlled temp and humidity environment,

darn! users would like to be serviced there!
Count on that there is hundred of such facilities in each country - I’m telling you, there is no single point of failure!

But let’s take a look at single node with JBOD or RAID 5, inexperienced user will not be able to manage it. If PC fails - all data is dis-synced, If single HDD fails all data is lost except for RAID 5.
For RAID 5 one needs more or less equal HDDs, and raid is beyond home user level to maintain.
This is where single point of failure!

If only one of 10 nodes fails, because 1 of 10 PCs fails, - it is just same common failure as any other, and it is not single point of failure!

And yes, I’m back because I hope for improvements, and my wallet has grown a little, and I have over 20TB hard drives, and about 10 ready to use PCs, completely well tested, and I have 2 locations for placing them with 2 different ISP.

So are we working or what?

realcm · April 29, 2021, 5:56pm

Documentation says they will be treated per IP basis, - as one node.
So throughput of your Internet connection is not gonna be utilized efficiently, despite you pay for it every month or so…
I may be wrong, but I expect same thing is going to happen to your HDD space and electricity.

Alexey · April 29, 2021, 8:53pm

You can have as many nodes as you want. They will be used as a one huge node. This is almost equivalent to a RAID. But unlike RAID you will spread the load and risks.
Regarding RAID you can read there:

And take a look on RAID vs No RAID choice as well.

In the v3 we do not have a replication anymore, only erasure codes are used. You need to have 29 pieces from 80 to reconstruct the file.
So, the replication logic is irrelevant anymore.

We want to be decentralized as much as possible, so we selects one node from /24 subnet of public IPs for each piece of the segment to make sure that the pieces of the same segment will not end in the same physical place or ISP. As result all your nodes behind the same /24 subnet of public IPs are working as a one node.

The problem with access and speed is solved differently - when the customer want to upload a file, its uplink requests 110 random nodes and starts uploads in parallel, as soon as first 80 are finished - all remained got canceled. The same for downloads - uplink requests 39 nodes, starts downloads in parallel, and cancel all remained when the first 29 are downloaded (the customer need only 29 from 80 to reconstruct the file). As result - files are stored on fastest and closest nodes to the customer’s location. However, because the node selection is random, they still spread across the globe: Visualizing Decentralized Data Distribution with the Linkshare Object Map

realcm · May 1, 2021, 7:53pm

Yes, that is what I’m talking about.
Many nodes, no raid - load and risk balancing.
So why would all nodes behind same IP would be treated as single node?
If I have 20 TB space to offer in total, why all my nodes will be treated as single node with only 500 GB?
This is not logical.
Besides, with IPv6 coming into play, there will be “white” IP for each device.
Or at least white IP will no longer be a problem.
Many ISP provide IP behind NAT, - and extra 3$ payment for real IP will spook off many users, because simple calculation will show worthlessness of the enterprise.
Those more advanced will use tricks to gather IPs for usage, with VPN for example.
People are going to do tricks, they always do tricks for gaining more. But it doesn’t mean yet that their nodes are unreliable.
So why to harden lives of technicians?
Others will devote rest of the storage to other projects like sia or btfs or another…
Decentralization will be done, though not the way you expect it.

How much storage can handle single technician alone?
Inexperienced user will hardly manage even 500GB.
The one with experience can afford 5 to 10, may be 20 TB.
And most advanced can deal with 100TB.

if hosting 100TB will give around 12 000$ a year, why would one abandon maintenance?
With high probability one will not.

Pentium100 · May 1, 2021, 8:02pm

To spread the data around in multiple geographical areas.
The customer data is split into 80 pieces and those pieces are uploaded to separate nodes, selected by IPs. This is so that you would not get multiple pieces in the same location.

Take this hypothetical scenario - you have 70 nodes and each node receives one piece from the customer. Now power fails in your datacenter or something - and since pretty much all pieces were in your nodes, the customer cannot access his file anymore.

However, your node is not limited to 500GB. My node, for example, has 17TB of data stored on it. If I had created two nodes at the same time (on the same IP), each would have about 8.5TB.

jammerdan · May 1, 2021, 8:05pm

Huh?
Currently 1 TB earns you around $3 - $4 per month. So at 100TB you would not make more than $4800 a year. And probably not even that.

realcm · May 1, 2021, 8:13pm

It was told above, that it is not going to affect network reliability much.
Because other storage locations with comparable capacity are still online.
And data is not inaccessible “anymore”, it is just temporary down.
“anymore” is when atomic blast has evaporated the data-center, that is “anymore”.
If I have 10 PCs, with 3 to 4 HDDs connected to each, one node per HDD.

The failure of 1 HDD or even of 1 PC will not make all the data inaccessible anymore.
It is gonna make only 1 node inaccessible “anymore” in case of HDD physical destruction,
And 3 to 4 nodes temporary down in case PC failure (1 of 10).
But it isn’t gonna take down all the 30 to 40 nodes.
From the one hand, bandwidth allocation according to capacity puts more data at risk.
But from the other hand, same approach brings up and fills more storage around, and decreases the risk.
And I believe simple calculation shows, that decrease of the risk - is orders of magnitude higher than increase of the risk.
Even for enthusiastic data-center, especially if storj protocol will support local peer discovery for high speed risk management.

Pentium100 · May 2, 2021, 2:20pm

And the customer is unhappy.

The network has enough nodes in separate locations that it does not need to use all aailable space in your (or my) node.

One of the advertising points of Storj is that being “decentralized” allows for the cutomer daa o be avaiable at all times, because it would take a large scale disaster to make enough nodes unavailable (even temporarily) to make the customer’s data unavailable. If, however, the critical amount of customer’s data is stored on your nodes, then it is no more reliable than the customer just storing his data on your servers (using FTP or whatever, while you use a cluster).

At the same time, if the network needs the storage space, it can use your one big node just as well as your 50 smaller ones.

Also, there is another problem. With Storjv2, having more nodes gave you more data. What do you think happened? Some people ran thousands of very small nodes. When those nodes would go down (because of power failure, ISP failure etc), customers lost acess to their data (and maybe got it back later).

Same would happen here - if v3 gave me more data if I ran more nodes, I would run a lot of nodes on the same array in an attemt to fill that array as fast as possible. After I run out of space, I would just set the nodes to advertise that they are out of space. While that may be more profitable for me, I don’t think that it would be a good thing for the network.

Because nodes are aggregated using a /24 subnet, very few places (large datacenters, ISPs) would be able to run a lot of nodes in the same location and those nodes would not make up the majority of the network.

realcm · May 2, 2021, 6:50pm

I believe this is misconception and I told before why it is.
Because other nodes bring more capacity too, and thus more redundancy too.

One big node requires many HDDs united in JBOD style.
Which is one HDD fault brings down whole node, making it inaccessible “anymore”.

This is misconception again. I’m telling you about capacity/allocation.
It does not matter how many, but matters how much in total space operator provides.

Data is duplicated, it is not going to be lost from one storage site disconnection.
For example today my laptop stopped working, I hosted there a node on external hard drive.
Would multiple nodes be supported out of the box, I could plug that HDD to desktop PC where I host another node, and bring it up in just few minutes.
But because it is not supported out of the box, dealing with messy scripting required to run many nodes on single PC, and I can’t do it in a sane time.

I for example, can set up nodes in compressed way, because I see no use in connecting 3TB drives for miserable amount of data. When nodes will grow, if they will, I will transfer them to bigger HDDs and then, will use small 2.5" HDDs to set up new nodes.
It depends on how you are going to fulfill your promise of declared storage.
If node is giving good results - you can just go and buy new HDD for it.
If not - you do not.

There are lots of misconceptions, from my judgement.
Decentralization is more needed when operators are out of bandwidth for example.
Hardware should work. Empty drives are not going to help anyone.

kevink · May 2, 2021, 7:06pm

I think the misconceptions are in your understanding of the storj network. Please read more about it on the blog or in the whitepaper or in other forum threads.

Pentium100’s explanations are all correct. But maybe it’s difficult to understand them without enough understanding of the storj network.

Pentium100 · May 2, 2021, 7:37pm

I use raidz2, which is equivalent to RAID6. My array can tolerate the failure of two to four drives (out of 12).

All what IP filtering does is make sure that when the customer’s data is split into pieces and distributed to the nodes, you nodes at the same location do not get more than one piece.

You can run as many nodes as you want, but the network will treat them as one big node, the only difference is that if one of your small nodes fail, the rest keep running.

(number taken completely out of thin air here)
There’s 1PB of customer data (with expansion due to erasure coding) to go around and 1000 node operators with nodes.
What do you propose?

Give each node operator equal amounts of data (1TB), unless their nodes are too small, divide the remaining data equally among other operators.
Give each node equal amounts of data, meaning node operators with more nodes would get more data.
Give nodes data, proportional to their size, so a 10TB node gets more data than a 2TB one.
Something else?

Storj has chosen a slightly modified option 1 - each IP subnet (/24) gets about equal amount of data (unless the nodes are too small, in which case the nodes get as much data as they can hold).

Options 2 and 3 can be abused.

realcm · May 2, 2021, 7:41pm

It is only correct in a way that someone might speculate storage capacity.
But possible solution is a contract for the guarantee to provide declared capacity.
If I have 20TB storage total, I declare it - Me is the one who is responsible.
If I do not provide it as declared - revenue is decreased.
Everyone on the network will profit if mid range technicians will build reliable storage sites and will care about equipment and data it keeps. Not for free of course, but for reasonable reward.

I don’t think that storj can compete seriously with dedicated data centers where just double redundancy is 99,9999% reliable.
And they charge about 3 to 5 times more than storj operator is going to receive at his best.
My friend went out of 15gb mailbox storage, an he’s been offered 100GB for 60$.
1TB for 600$ a year they charge…

Specialization is not discarded - and never will be.
The more you put - the more you get.
It worked this way all the time. And it is not gonna change.

realcm · May 2, 2021, 7:43pm

how much data you’ve got?
I’ve got only 7GB in 3 days. It will take a year to fill small 2.5" 500GB drive!

I’d propose N3. If someone has more storage it should be given more bandwidth.
May be not exactly proportional. But at leas with hope for faster filling.
Most people today have 1TB drive in their PC (I know it since I often service their PC).
So from 100GB to 1TB there might be no difference in bandwidth.
But after 1TB for efficient space utilization it is definitely required to be given more bandwidth.

What do you propose?

If I go to my friend and tell him - "do not turn off your PC, - buy IP address for 3$ and you may earn 3.5$ a month in tokens… and I will watch for your hardware, we’ll split profit…ok? "?
Despite my friend has more than 500GB free space - he is not willing to leave his PC running all the time. So those 500GB is lost for the network.
So I have to setup VPN in someone’s real IP network, for tunneling traffic to my home.
And plug in my HDD instead.
Because I have many diverse HDDs… and PCs that can run nodes.

I can setup nodes in offices of small companies, where PC runs all the time.
But in the end - this is me who operates these nodes.
I can handle may be up to 3 additional locations, not more.
And it is not really reliable.

kevink · May 2, 2021, 7:48pm

This is exactly what I mean, you don’t really understand the storj network and the reasons for the decisions. The network doesn’t gain anything from “mid range technicians” and most of your suggestions are simply impossible to implement or confirm.
And your mailbox example… well… did you find anything more expensive you could post?

Pentium100 · May 2, 2021, 7:48pm

17.64TB
I have been running this node since 2019 March.

Storj does not want to pay for you holding empty drives that may never get filled. If the total network capacity is something like 50PB, but there’s only 10PB or whatever of customer data, then nodes will be mostly empty. There’s not other way. Storj also uses test data, but the amount of it is limited.

If Storj paid money for “proof of space”, then all chia farmers would start running nodes and Storj would very quickly run out of money, becasuse the amount of customer data would not increase that fast.

realcm · May 2, 2021, 8:05pm

You know, for mid range businessman, 60$ is a question of 15 minutes of working time.
They’d buy without hesitation. I helped my friend to filter big mails as the reason to overload, and gain 14GB out of 15GB space back. Without my help he wouldn’t manage to make it.
So now I have free coffee any time in his office. But still it will not convince him to run a node on his office PC…

kevink · May 2, 2021, 8:10pm

I don’t see your point… running a node on his office pc would increase the supply for the storj network while buying space increases the demand for the storj network…
And storj network isn’t really targeting mailing systems either (even though any developer could of course build a mailing system on top of it).