Announcement: Changes to node payout rates as of December 1st 2023 (Open for comment)

BrightSilence · November 20, 2023, 9:30pm

I don’t think your average hard drive failure cares about held amounts. Offline nodes also count against this. So does data rot on nodes. ISP outages which can cause multiple nodes to go offline. Uncertain geographical areas that might disappear suddenly, etc etc. With so much to protect against and a razor thin margin of nodes you could lose (I didn’t calculate it, but I’m guessing with a 1.2 expansion factor the actual percentage you could safely lose is in the very low single digits), 1.2 is just obviously impossible.

This does not seem to be a response to the line you quoted from me. But I agree, held amounts need to be higher and need to be related to amount stored.

What do you mean?

Can’t blame you for saying escrow. Storj used to use that term themselves before their legal team told them to change it because of the legal implications. And they were definitely native speakers.

How so? Losing half your held amount and having no way to ever get it back provides no incentive to do better with your node at all.
In fact, being able to get the whole amount back provides a much larger incentive to run graceful exit and actually helps encourage better behavior.

Pentium100 · November 20, 2023, 10:25pm

In that case don’t show it. Say that my held amount is $40 and that’s it. Showing a higher amount which is then taken away as “fees” or whatever is misleading. Taking an extreme example you could say that my node earns $100/TB/month, $98 of which is held back and when I exit the network, $96 of that is taken as “repair fees”. Now, put that $100 value on the web site and you’ll have lots of angry new node operators.

Not by that much and it still is counter-intuitive in that the amount of data goes up, but payment goes down. This could lead someone to think that maybe he should limit the node size. The only way this would make sense to me if the held amount was taken from the earnings for the additional data, so that I don’t get paid less or a node with 11TB on it compared to when my node had 10TB. However, that would really complicate accounting.
Also, what would be the incentive for keeping a node operational for a long time vs continuously running GE and creating new nodes? Right now, if the node is older than 16 months, no more earnings are subtracted (and half of the previously subtracted earnings are returned). Under your rules, I could create a new node, wait for it to get some data (while making my old node get no additional data), then GE the old node to recover the held amount, that should get me more money than just keeping one node operational for a long time.

daki82 · November 20, 2023, 10:39pm

@IsThisOn @BrightSilence

may i interrupt you with

Ruskiem · November 21, 2023, 5:29am

Click to unroll (not want to take forum space, but want to reply publicly)

That’s a question for STORJ inc. not us, SNOs.
But i think No. i absolutely don’t believe in STORJ as a backup solution as its main job.

Because as You noticed, reliability: hard to predict if its gonna exist 2,5,10 years from now.
if network has only 23k nodes, (and no rapid increase.)
If it had 500k nodes?
then it would be much better to trust, but again, You won’t have so much if You pay SNOs too little, and You need small SNOs for that, a 4-10TB HDDs in mass, to achieve big network,
and small SNOs have nothing to look after here with current pay rates.
And MAYBE, all it takes is to just set Reed-Solomon from 2,75 to 2
and pay SNOs 2$/TB storage (instead of $1,5/TB)
to open doors for network growth (SNOs numbers, not only nodes),
with slashing prices for egress at the same time = more customers, and You have space and performance to welcome them!
Not like currently, where STORj tryes to magically spawn SNOs in affrica, lol
not with that payout rates Pals!

I think STORJ future purely relaying on entertainment content.
Video content.
Game content.
Apps content for mobile.

or other words, all that content, mass served for masses, whatever it will be.
This is where MONEY is, this is where NEED lays.
None of this backup Bulls**t.

Where price for 1TB egress matters, if its 6$ or 4$.
Critically matters.
So company’s are interested in less and less cost for content hosting.
Where a savings of 1$/TB = tens of millions in savings at scale.

how much does Netflix pays for space and traffic?
how much does Youtube pays for its videos to host?
How much does Rumble pays for its videos?
Do You have solution for them?
You do, but You have to lower the price of traffic!
For that use case, so high Reed-Solom isn’t even needed!

Ruskiem · November 21, 2023, 8:33am

Well i think the potential is there.
We are at the back of ISP, right?
So technically, its like the ISP already had HDDs (we, the SNOs have them)
So why Netflix shall fund new HDDs and take all the costs (as they only buy brand new right?)
if they can use what is already there? (if needed SNOs can get some used ones,
thanks to STORJ network magic and noone loose any data,
even with some hypothetical high disk failure ratio)

They could rent whats already there as low as $3-4/TB/mo TOTAL, its $576 for a 12TB HDD/year, theoretically they could buy it brand new SAS ultrastar for $200-270, but is it really cheaper when to ad all costs? electricity, maintenance, infrastructure? Those HDDs need other parts to work, entire computers… so the building blocks are there,
its the matter of what to build from them.

i stated that there even could be cost free egress, if You pay SNOs like $2/TB per storage.
and on top of that +20%, +50% to STORJ inc.
But You have to watch SNOs if they behave to the agreement, i know i would be super grateful to earn $2 per TB storage, and would be willing to give as much bandwidth as possible in return.
But in order to shield Your self from bad actors, You would need to establish a requirements and enforce it. And put additional requirement, like minimal wining ratio example: 60% daily.
To make sure node don’t cap upstream drastically, against other nodes.
Most people won’t bother cheating to be honest.

People are generous, IF YOU pay them well, they are grateful, and willing to go extra mile for You then!

And if S3 is standing on a way to that, then get rid of it eventually!

BrightSilence · November 21, 2023, 8:57am

I mean… you’re wrong. 1.2 isn’t even high on a local RAID array, where you are in control of the HDD’s and they run in a highly trusted environment. And this time I did do the calculation. Is you lose 10% of nodes in a 29/35 RS scheme (1.207), you’d lose roughly 5.5% of data. That’s a deathblow.
In fact if you want to ensure 11 9’s of durability, you can’t afford to lose more than 0.5% of nodes in that scheme. So I was wrong, it isn’t in the very low single digits. It doesn’t even get into the single digits.

Okay, sure. But how would you enforce that? Blocking customers from uploading smaller files?

Ehh, I see what you’re getting at now, but a few comments on that. The longer you hold a node, if it’s growing, the more money you lose in your suggested setup. And I’m inferring that you mean to prevent people stopping their node and starting over several times. There is already a massive penalty for that in losing all paid data. Compared to that, half the held amount is not going to matter. And I think if Storj would look into this, they’ll probably confirm that the data shows this doesn’t currently happen to begin with. The choice people make is, will I keep going or stop? Not will I keep going or start over? And for that first choice, there is no incentive from money you’ve already lost.

I mean, sometimes analogies don’t work. What we need is competitive pricing for native implementations. So just charge $4 for native egress and add a $4 additional charge for using edge services. Why not find out by differentiated price models whether customers find it worth it to switch? Get them on board with cheap S3 and low migration costs. Entice them from there to use native and save money. Best of both worlds. And you can start doing this now, even if some use cases don’t work with native only.
Or since Storj also has to pay for edge services on ingress, charge $2 for edge services for ingress and egress. Make the customer feel the cost difference. Perhaps it is possible for customers to upload native, but not download native. They could switch at least on the upload side. The customer now thinks S3 is free, so why not use it?

Yeah, Netflix and Youtube are such a different scale. And they can basically threaten ISPs with the promise of high peering cost and network congestion if they don’t host cache servers for them. It’s a different world and a scale that Storj can’t handle anyway. Think a little smaller. Maybe floatplane or curiosity stream? Services like that might work. But you still have the web integration issue.

Pentium100 · November 21, 2023, 9:42am

Initial conditions:

My node has 7TB. 56$ balance, 56$ ($29) hold back.
$0 in my pocket (let’s ignore the previous earnings).
Node grows by 1TB every month.

Option 1 - I leave the node running:
Month 8 - 8TB, held back $64 ($32), I earn $16, balance $56+$16=$72, I get paid $8 ($8 in my pocket)
Month 9 - 9TB, held back $72 ($36), I earn $18, balance $64+$18=$82, I get paid $10 ($18 in my pocket).
Month 14 - 14TB. held back $112 ($56), I earn $28, balance $132. I get paid $20 ($98 in my pocket)
Month 15 - 15TB, held back $120 ($60), I earn $30, balance $142, I get paid $22 ($120 in my pocket)
Month 16 - 16TB, held back $128 ($64), I earn $32, balance $152, I get paid $24 ($144 in my pocket)

Option 2 - I start a new node, stop growth on the old.
Old node stays at 7TB and consistently makes $14/month, all of which I get paid.
New node starts at zero
Month 8 - 1TB, held back $8, earn $2, balance $2, paid $0+14 ($14 in my pocket)
Month 9 - 2TB, held back $16, earn $4, balance $6, paid $0+14 ($28 in my pocket)
Month 10 - 3TB, held back $24, earn $6, balance $12, paid $0+14 ($42 in my pocket)
Month 14 - 7TB, held back $56, earn $14, balance $56, paid $0+14 ($98 in my pocket)
Month 15 - 8TB, held back $64, earn $16, ba;ance $72, paid $8+14=22 ($120 in my pocket)
Now I run GE on the old node, I get $28 ($148 in my pocket)
Month 16 - 9TB, held back $72, earn $18, balance $82, paid $10 ($158 in my pocket)
Month 17 - 10TB, held back $80, earn $20, balance $92, paid $12 ($170 in my pocket)

In the long term it would even itself out, but I would start a new node and get the held amount on this one. The idea would be to keep the held amount lower and get the money sooner.

BrightSilence · November 21, 2023, 1:40pm

Sure, which 1) Isn’t high in that scenario. I could even be considered low redundancy by some. But more importantly, 2) while in a RAID array all data is spread over all disks, losing 20% in your example ensure the data is still fine, with Storj, segments are placed on a small subset of nodes randomly. Meaning if 10% goes offline, it’s theoretically possible that ALL of the pieces were stored on that 10%. With a 29/35 RS scheme, you only need to have at least 7 pieces stored on the failing 10% and there is about a 5.5% chance of that happening for each individual segment.

Number of subnets and segments might be a little outdated. Didn’t feel like looking up the new numbers.

So yeah, Storj isn’t like RAID in this sense. They need more redundancy.

No need, here ya go.

That’s 11 9’s. Just 16 subnets offline would do it.

4m is a LOT bigger though. But uplink could pack files together. And if you do that without encryption, you could still use the feature of Storj to request data from a file with an offset and length to grab only the file you need from that pack. This could be automated on larger batch uploads, but wouldn’t help when files are uploaded one by one. I think there is something there that could be interesting though. Just needs some tweaking and massaging to turn into something useful that isn’t too wasteful for customers.

I mean, reiterating a broken analogy isn’t going to be helpful. Storj is clearly capable of providing S3 compatible services. They’re doing it. And I don’t think either of us can know whether they run it at a loss. But the fact that they said they won’t lower prices again any time soon, makes me think that is unlikely. It sucks, but they need S3 for customer onboarding right now. So what’s the downside of offering both with differentiated pricing?

I like how you and I are demoing the conversations the industry had before this became standard practice. Both sides sorely needed this solution, but for a long time they stared each other down, thinking the other side had to pay them for it. I think words like threaten, holding hostage and bribe were probably frequently uttered in those discussions. From both sides.

BrightSilence · November 21, 2023, 3:40pm

There are single node operators who run 16 /24 subnets. I don’t think we’re talking about the same thing. I run nodes on 6 /24 subnets myself.

If you’re trying to convince me with the analogy, what I think is what matters.

I mean, you responded to my comment, but I have no idea what you mean now. Storj is providing an S3 compatible service, is it not? So stop with the analogy and just point out what you mean without using an analogy, cause I’m clearly not following. As for how different the services are. Under the hood, it is exactly the same. The only difference is a Storj hosted service inbetween that translates S3 to Storj native. So what makes you say that business model doesn’t work?

Right, so am I correct in inferring that you mean they take a loss on edge services? I don’t think we have the data to conclude that one way or the other. They will certainly not be paying list rating for GCP egress and I have no idea what their volume discounts look like. But since they clearly communicated that they won’t be lowering payouts more until market conditions change, I doubt they are currently taking a loss on unit economics, edge services included. If they are, there would be even more of a reason to differentiate prices and push customers to the profitable integration as soon as possible.

I’m not going to argue about the “other” line of the token report. It’s combining so much stuff that it’s entirely unclear how much of that would go to edge services. I’m not even sure edge services are part of that line, let alone what proportion of that amount goes to edge services. Yet you seem to assume all of that is edge services. I doubt it.

You could argue whether it’s net neutral for ISP’s to host cache servers for some services and not for others. But from the point of view of ISP’s, their networks were overloaded by Netflix and Youtube and those platforms made all the profit from it. They tried to make those platforms pay for access to their customers at some point too. ISP customers pay ISP’s for internet… They pay Netflix for Netflix, but that business model dumped a lot of additional costs on ISP’s.

This was an industry debate happening world wide and it raged on for quite a while for a reason. It’s not a simple thing to solve. And now you have me defending the ISP’s, playing devils advocate to show the other side. FYI, I don’t believe what I just said above, but I can understand their thinking is all.

I’m with you on that. I even remembered T-Mobile US at some point decided to just compress video streams on their mobile networks. ISP’s have gotten away with murder there by dividing up the market geographically like drug cartels would, to enforce monopoly rule.

arrogantrabbit · November 21, 2023, 4:23pm

This would be an illusion and it won’t work.

(Padding is silly, I won’t comment on that, why waste space for no reason?)

What would be inside those combined compressed blobs? Same user data as today. After you found the chunk in the filesystem you need to unpack it and then search for user data.

Essentially you are cramming another, highly inefficient, filesystem into every chunk.
Because your ad-hoc filesystem cannot outperform host filesystem — end result will be worse performing node with extra steps.

All of that added complexity to design, that definitely makes retrieving files slower for all customers, only to speed up file-walker on misconfigured nodes to benefit (in what way? Filewalker does not need to be fast in the first place, random data access needs to be fast) few node operators who cannot be arsed to buy an SSD?

Pentium100 · November 21, 2023, 5:10pm

Don’t forget - “do not buy anything, use what you have”. Not many people have unused SSDs lying around and file servers can work without them.

That’s what happens when you don’t have competition between ISPs. Here one ISP I know asked Google for a cache server (and got it), because the traffic from Youtube was getting a bit high. Oh, and if Youtube (or Netflix or torrents) does not work good, customers just go to another ISP.

arrogantrabbit · November 21, 2023, 5:23pm

Right. $10 gets you an enterprise SSD on eBay. Don’t buy anything means don’t build servers or buy extra storage. $10 is noise. The alternative is not to host the node because we are already approaching IOPS limit of drives. I can’t imagine how people tolerate performance of raw array even in home setting. It’s abysmal for anything, including streaming media with Plex.

Everyone shall go right now and buy an SSD for their array, even if they don’t plan to host a node. It would be the the best $10 ever spent.

daki82 · November 21, 2023, 8:14pm

thats where you are wrong. it goes to the MFT (at least in ntfs) wich sometimes should be defragmented. (with the free program ultradefrag)

daki82 · November 21, 2023, 8:19pm

Easy, 36€ program solution called primocache.
also who knows?

arrogantrabbit · November 21, 2023, 8:55pm

We are not talking about sequential reads here. Cache does nothing for them of course. We are taking about loading library screen with thumbnails. If I see it gradually load — it’s too slow for me. Unacceptably slow. A lot of ram will fix it by caching — but I also want first load to go fast.

Those two SSDs I bought last week — those go as a special device for a Plex machine I’m putting together. Because friends don’t let friends tolerate latency in user experience. I literally paid $20 for human to not see page load.

That’s a horrible marketing story. You don’t force users to change their usage. You make your system support that usage.

You will lose 99% of customers. Look at distribution of files on your node. That’s what customers need to store. Now you are making it orders of magnitude more expensive by rounding up to whatever random huge blob you decide.

And for what? To save $10 for a SNO that does not have a proper setup?

Nobody complains about storagenode performance at the current usage level except people who have not heard about what IOPS means. Everyone else has SSD.

Well, it should not be. Who hosts file servers on windows? And if you are referring to people running storagenode as a side gig on their gaming master 9000 — they are not paying for server windows license and you can’t run node on a consumer one. Building a business based on your supplier violating the ToS is not a good idea.

On windows on the other hand there are some storage tiering solutions too, that accomplish the same thing. They are just not free. (Whatever was rebranded to StoreMI is one example)

You can make that persistent and it will help with rendering of Plex library. I think it’s a big win.

Today I won’t even consider a storage device without ZFS (or consumer device with windows for that matter). There is no upside in using anything else, only massive downsides.

arrogantrabbit · November 21, 2023, 9:03pm

Enmotus FuzeDrive, later known as AMD StoreMI is better for this: it’s tiering, as opposed to caching, so you don’t waste space, and it’s free for AMD board owners

daki82 · November 21, 2023, 9:59pm

yes, it may fit for a small part of sno. despite i have an free chipset to use it, it is limmited in configuration and does not support ram caching. or more than one active cache path.

daki82 · November 21, 2023, 10:10pm

primocache uses 2-level caching /aka tiering, vs 1 Level at StoreMI.
Sorry, this time you are wrong.

elaborate wich space?

Pentium100 · November 21, 2023, 10:20pm

I have two servers with hard drive arrays:
Server 1 - 12x3TB 5400RPM drives arranged as two 6-drive raidz2s - the pool is 83% full. I use it to store various files (usually large). No node here. The server has 24GB of RAM.
Server 2 - 6x4TB + 6x6TB 7200RPM drives arranged as two 6-drive raidz2s - the pool is 68% full, most of the data is the node (22TB). Here I use two SSDs as SLOG and the server has 192GB of RAM.
Server 3 - 2x1TB 5400RPM dries as a mirror, with two 240GB SSDs as SLOG and L2ARC, the server has 128GB. No node here.

Server 1 works great, because I mostly store large files as an archive (it also backs up my other servers). An SSD would not help it at all. I did not have a problem with its speed, even though the hard drives are slow and the CPUs are also slow (2xOpteron 270). Last time I backed up that server to tape I got over 100MB/s of read speed, though I was smart and put all small files into an archive first, before putting that archive and the large files to tape.
Server 2 needs SLOG because of the node.
Server 3 runs something else that does a lot of sync reads, that’s why it needs the L2ARC.

$10 is what you get for storing 6.6TB for a month or uploading 5TB to customers. Not a small amount with the new rates. My node got $35 last month, but that was with 20TB of data.

I never use thumbnails. It they are on by default, I immediately turn them off. I prefer the “details” view. I aso don’t use Plex though, so maybe that works differently.

You know, “use what you have, don’t buy stuff”. This gets repeated quite often, so why are people surprised that others follow that advice?

arrogantrabbit · November 22, 2023, 12:08am

Not at all.

Am I? Maybe it’s a language barrier.

Caching is distinct from tiering. It’s entirely different beast.

Caching copies frequently used data into another device — be that ram or SSD or both, in levels. All data still lives on the HDD. If you have 1TB hdd and 100GB SSD you have 1TB of available storage. Examples would be Primo Cache, Synology cache, and (L2)ARC on ZFS.

Tiering, on the other hand, moves and stores frequently used blocks on SSD, and keeps infrequently used blocks on HDD (being careful not to move sequentially accessed blocks for example). If you have 1TB HDD and 100GB SSD you will have 1.1TB of total storage. No caching is involved, and no “slow first access but fast subsequent accesses business” either. It’s fast every time. And you get to use combined space. Nothing is wasted. Examples would be FuseDrive/StoreMI, CoreStorage FusionDrive on macOS, and (to some degree) special virtual device on ZFS

Usable space: you “waste” space equal to SSD size. Hope that is clear from above

Because it’s not a caching solution. It’s a tiered storage solution. Now you know.

Further reading: Hierarchical storage management - Wikipedia

Copying here to save a click (should have looked it up first before typing the whole thing…)

While tiering solutions and caching may look the same on the surface, the fundamental differences lie in the way the faster storage is utilized and the algorithms used to detect and accelerate frequently accessed data.[5]

Caching operates by making a copy of frequently accessed blocks of data, and storing the copy in the faster storage device and use this copy instead of the original data source on the slower, high capacity backend storage. Every time a storage read occurs, the caching software look to see if a copy of this data already exists on the cache and uses that copy, if available. Otherwise, the data is read from the slower, high capacity storage.[5]

Tiering on the other hand operates very differently. Rather than making a copy of frequently accessed data into fast storage, tiering moves data across tiers, for example, by relocating cold data to low cost, high capacity nearline storage devices.[6][5] The basic idea is, mission-critical and highly accesses or “hot” data is stored in expensive medium such as SSD to take advantage of high I/O performance, while nearline or rarely accessed or “cold” data is stored in nearline storage medium such as HHD and tapes which are inexpensive.[7] Thus, the “data temperature” or activity levels determines the primary storage hierarchy.[8]