Held amount per node vs per SNO

I’ve created my first node in July 2019, now it’s close to be earning 100%. I was out of disk space during the last couple of months and after adding one 2x larger disk I needed to migrate the existing node.

From the economical standpoint adding a fresh node to the new disk, as it’s advised, doesn’t make sense: I’m starting the vetting from scratch, held amount is crazy and no ingress in the pre-existing node.

In a hindsight, one year ago I should have created a few nodes one by one with a minimum amount of disk space and just wait.
But that feels like I’m trying to game the system instead of making the rules better.

Why is the held amount (%) tied to the node rather than to the owner SNO? It’s the SNO who is dedicated to run the node, not the node itself!

WDYT?

1 Like

It’s the node that will eventually fail and need to cover repair. If the failing of a new node has no impact on held amount, you have much less incentive to keep it running well.

Well I thought I was the only one who thought about that (I had the idea a few weeks after spinning up my first node).
I didn’t put it on the forum by fear that people would use (and abuse) it and eventually screw up the network because nodes that fail don’t have any held back amount to pay for data restoration.

I also feel only half bad because storj charges it’s customers more than twice what it pays SNOs to store data (I doubt running satellites it THAT expensive).
If someone could explain that big difference it would be awesome.
EDIT: The expansion factor for files stored on Tardigrade it 2.75 to provide redundancy but the cost to store 1TB it 4x higher than what SNOs get payed.

In theory (if we don’t take into account the strain it puts on satellites) the cost of a node failing is 10$ per TB stored (since repair egress is only paid that much).
It would be interesting to see the average held back amount per TB SNOs have once the first 9 months are over.

Held amount doesn’t even come close to covering repair costs. You’re forgetting the biggest cost for repair. Sure they pay $10 per TB to SNOs for retrieving pieces to recreate the missing ones, but they have to pay $80 per TB to GCP for egress to get the data from the satellites out to new nodes after the repair is done.

1 Like

ouch that hurts, is that one of the reasons why they pay SNOs way less than they charge customers for data (stored and egress)?

I imagine so. But obviously they also have to pay their people/rent/other cloud costs for the satellites etc.

1 Like

I don’t know much about the process of recreating lost pieces but it would be awesome if it was possible to decentralize that as well. Put to good use all those nodes with idle CPUs (as long as it’s not a very CPU intensive task since most nodes have small CPUs) and avoid having to pay google for bandwidth.

I’m sure most SNOs would be open to the idea since the cost of our internet connection isn’t calculated as a function of egress. However since I’m not much of an expert on how erasure coding storage works I don’t know if it’s even doable or not.

Also I don’t know if they are that transparent but I would be delighted to see a recap of how storj invests its revenue. Might shed some light on how much the satellites cost compared to nodes.

1 Like

They’re working on it in steps. Starting with trusted delegated repair.

Having repair done by storagenodes is briefly mentioned as a future intended solution. It’s doable, but having repair done by untrusted entities is something that needs a lot of checks and verifications. It’s not the simplest problem to solve.

2 Likes

Nice thanks a lot for sharing !
I should really start digging through the GitHub as much of the interesting information seems to be in there.

I guess moving away from google cloud must be a priority given how expensive it is haha
Thanks for sharing the info !

1 Like

That’s what I’m doing now. Trying to have 2-3 nodes per drive.

Remember that these fees also cover the cost of software development, which is crazy expensive unless the network is already huge—which it isn’t yet.

I see this discussion derailed from my original topic.

Let me quote one of the posts by a storjling:

The short explanation is that the earning potential of a node, our network performance and - most important - file durability are all strongly affected by node operator reliability.

Not the node reliability but the operator. And I wholeheartedly agree with this distinction.

I suspect that it’s the human operator who most often ruins the node by e.g. disconnecting the USB cable from the running computer, wiping the file system on a wrong device, etc.

In the original post I described a situation when my brand new drive may fail and ruin the whole old node. However this is the only reasonable course of action for me even though I’d prefer to keep the node on the already proven device (which doesn’t guarantee the reliability but we are not talking about it at this moment).

This is the kind of situation where my benefits as an SNO contradict benefits of the network.

P.S. I can even imagine a business of raising a fleet of lean (500GB or even lower by artificially slowing down uploads when fully vetted) vetted nodes with 9 months age (starting from 10th month SNOs get 100%). And then selling them for a profit. Held amount will be marginal and will not nearly cover repair costs for the newly stored data. But that’s what the node vs operation “reliability” is currently about.

I’m slightly exaggerating, of course, but only a bit.

That is a faulty conclusion based on the statement you quoted. Just because the node operator has a big impact doesn’t mean that it’s the sole impact. Different nodes tend to run on different hardware or at least different HDDs. Yes, a node operator could mess them up all at once. But a hardware failure could take out one or just a subset of nodes. Each failure should have a corresponding loss of held amount.

Not really. The network would be just fine if you moved the node to a bigger HDD as long as you keep downtime to a minimum. And soon enough, if you don’t keep downtime to a minimum, you will be disqualified (when that’s implemented again).
However if something goes wrong during that migration, it’s a loss for both sides. So seems like incentives are aligned.

As for growing and selling prevetted nodes out of held back period. Go right ahead. Held amounts really aren’t going to be huge normally. The only reason some of the older nodes have pretty large held amounts was surge payouts. Nobody is going to pay upfront for a prevetted node if they can just have the amount held back without upfront costs. I would say this is not at all a significant risk to the network.

Fair enough but I don’t yet understand why it’s not possible to attribute the loss to the owner rather than a node.

On a related note, to me it’d make more sense to simplify the rules, e.g.

  • make the held amount proportional to the used disk space so that for any individual node it would cover the repair cost of 100% of the stored data.
  • whenever a set of nodes is offline and the corresponding piece is needed to be reconstructed due to less than desired redundancy then all of those offline nodes are “charged” from their held amounts.
  • node age becomes irrelevant.
  • SNOs will have no hard choice whether to spin a new disk with a fresh node (therefore less earnings) or migrate the existing node to a larger disk, the result will be exactly the same.

It’s a loss incurred by the need to migrate.

Good to know it’s resilient to this kind of attacks.

So, are you suggesting this scenario:

  1. Started with empty 10TB HDD, the held amount is 0%
  2. Have used 8TB after a year and have 75% of my earnings in held amount every month?

no deal
I’ll have no incentive to run my node as soon as held amount start to grow - I’ll immediately start from scratch.

In the current scheme I have an incentive to keep my node running as long as possible, because after 9 months I’ll have a 100% of my earnings and after 15 months I’ll have a half of total held amount returned back.

@Alexey beat me to it, but I did the work, so I’m going to post it anyway.

Let’s assume 10TB node
minimum threshold: 29 to recreate a file
repair threshold: 52 pieces left and repair gets triggered
success threshold: 80 number of desired pieces after repair/upload

28 lost pieces are needed to trigger repair. We use this repair as an example

  • The satellite downloads 29x 10TB from nodes: 29 * 10TB * $10 = $2900
  • The satellite recreates all pieces: $ ? compute costs
  • The satellite uploads 28 repaired pieces at $80+ per TB costs: 28 * 10TB * $80 = $16000
  • If we assume $100 for compute costs, that’s $19000 total.
  • Your node was only one of the 28 that failed for this to happen. So $19000 / 28 =~ $680.

Your node can get to that 10TB level in about half a year at the current amounts of ingress. And would have made about $125 during that time. Unless you’re ok with making absolutely no money for a long time, you really don’t want the system you are suggesting.

At the current rate of ingress, that means it takes about 4 years of making no money until the money finally starts coming in. That is, if you have enough free space available. Interestingly though, small nodes would need less collateral and start making money sooner. So now we’ve introduced an incentive to not share too much space!?