Very low utilization

waldeck.jeremy · October 19, 2024, 3:06pm

You know - I hadn’t thought of that, but seriously it’s one of the reasons why the whole /24 thing is honestly VERY stupid.
The way some ISPs allocate addresses, that could well be on the other side of town, too.
I get the idea behind it mind you… but it ends up with situations like this: I have ZERO way of knowing.
I mean, I -guess- I could do a portscan for 28967/28987 and see if anybody’s listening…
Could cause some unhappy feedback tho

waldeck.jeremy · October 19, 2024, 3:11pm

So, that’s a funny question.
Yesterday I looked and almost crapt myself when I saw all the 0%'s… and then remembered I’d set logging to “info”

Roughly 18h later:

$ ./successrate.sh
========== AUDIT ==============
Critically failed: 0
Critical Fail Rate: 0.000%
Recoverable failed: 0
Recoverable Fail Rate: 0.000%
Successful: 3485
Success Rate: 100.000%
========== DOWNLOAD ===========
Failed: 30
Fail Rate: 0.029%
Canceled: 206
Cancel Rate: 0.198%
Successful: 103651
Success Rate: 99.773%
========== UPLOAD =============
Rejected: 444
Acceptance Rate: 99.766%
---------- accepted -----------
Failed: 265
Fail Rate: 0.140%
Canceled: 117
Cancel Rate: 0.062%
Successful: 188687
Success Rate: 99.798%
========== REPAIR DOWNLOAD ====
Failed: 4
Fail Rate: 0.020%
Canceled: 0
Cancel Rate: 0.000%
Successful: 20481
Success Rate: 99.980%
========== REPAIR UPLOAD ======
Failed: 78
Fail Rate: 0.752%
Canceled: 4
Cancel Rate: 0.039%
Successful: 10290
Success Rate: 99.209%
========== DELETE =============
Failed: 0
Fail Rate: 0.000%
Successful: 0
Success Rate: 0.000%

waldeck.jeremy · October 19, 2024, 3:15pm

HMMMMM.

You’re making it sound like I might be better off if I split up my disks into separate nodes…
Do you think that would work out better? Storage bandwidth is almost entirely idle so that isn’t a factor… but wouldn’t I basically be “racing with myself” then?

ACarneiro · October 19, 2024, 3:22pm

You can get that information from this website

ACarneiro · October 19, 2024, 3:24pm

Pretty much. That’s why the generally prevailing advice is to not spin up a second node until the previous one is full.

waldeck.jeremy · October 19, 2024, 3:27pm

Thanks!

Also, yeah looks like I’m alone.

waldeck.jeremy · October 19, 2024, 3:28pm

Would that somehow get the second node different data (due to the first not being able to hold anymore)? If so then, that seems like it would work out better…

mtone · October 20, 2024, 12:33am

Data gets split more or less evenly between nodes per /24 that have free space. Unless you a) run into performance problems or b) are a few TBs away from being full there’s no point to split because you can’t merge later and you’re stuck keeping n half-filled drives spinning.

Alexey · October 20, 2024, 4:45am

It would be very stupid to do not filter nodes at least by /24, then you may easily execute a Sybil attack on your own without any expenses, so, thanks, but it will remain. However, we are improving the node selection and not lean only on this filter.

It’s still the same ISP. As soon as it would have issues - all nodes in their subnets will disappear.

We want to have a lot of uncorrelated nodes, preferably - different ISPs, different power supplies, different physical locations, different Node Operators (optional). The lower the correlation, the higher the reliability of storing customer data.

Alexey · October 20, 2024, 4:55am

Yes, it would be better to run different nodes, each with own unique identity and own disk, this way you wouldn’t need to have RAID and will not waste space on redundancy. You wouldn’t compete with yourself, because if these nodes would be behind the same subnet /24 of public IPs, they would act as a one big node, so some kind of RAID on a network level, if the disk would fail, you will lose only this part of the common data, not the whole as in case of a RAID failure. This also may spread the load more equally, because each of them would receive only fraction of the common ingress. And the total load on disks would be lower, because RAID usually puts more load on the disks to handle redundancy.

However, this advice

is correct too, it’s related to the case if you want to start several new nodes, because each node should be vetted and while it’s unvetted it can receive only 1-3% of the ingress from the customers until got vetted. To be vetted on one satellite the node should pass 100 audits from it. But to pass audits the node should have data. Since all vetting nodes behind the same /24 subnet of public IPs would share the same low amount of ingress, each would get less data and the vetting process could be slower in the same amount of times as an amount of vetting nodes.
But when they vetted, they would share the same ingress and would receive a proportional amount of the ingress for your subnet. So this is why we usually recommend to start the next node only when the previous one is almost full or at least vetted.

But usually it’s better to run when the previous almost full, then the new node would likely finish the vetting to the moment when the previous disk will be full, so this new node would take the whole ingress for the subnet. You may also reduce the power consumption by spinning only disks with nodes, and poweroff the empty ones.

waldeck.jeremy · October 20, 2024, 2:15pm

This still leaves one question unanswered:

Say I set up a single 14TB node (I wouldn’t, because it seems apparent that once they get to roughly just a little over 6TB the data seems to go to trash at about the same rate it builds up) and wait until that’s full.

If that’s full, the next node I spin up gets different data at similar rate of ingress? I thought the general understanding was that it intentionally would NOT get any data that’s any different from the rest of the /24… do I have that twisted?

Knowledge · October 20, 2024, 10:54pm

No, the nodes will continue to fill. The deletion of data was mostly brought on from the Salt Lake test data. The rest of the data is customer data, and what customers do with their data is variable. But in “most” cases the customer is retaining their data, so your nodes will continue to fill. There is also repair traffic, where when node owners leave or have issues, your node can gain their lost data. The struggle, historically, with large drives is the scan processes, such as file walker, that would tie up the drive for long periods of time. There’s been a lot of improvements lately to many of these processes, and I expect that to continue as the engineers refine things to be more optimal.

waldeck.jeremy · October 21, 2024, 12:34am

…so I GE’d from Saltlake for that exact reason. It already wasn’t taking up much space (like about 56GB or so) but that’s where the test data was right?

This is ext4 on LVM - but the LVM PV is a bcache object backed by a raid6 (and cached on mirrored SSD’s). I do have this habit of building my stuff to… by the time it dies I won’t care anymore lol

waldeck.jeremy · October 21, 2024, 12:35am

In practice the metadata gets accessed more often than anything else in the fs, which is why the caches are set up with LRU eviction (almost all the metadata - and therefore, the filewalker - gets fed from flash this way…)

Knowledge · October 21, 2024, 2:21am

Leaving Saltlake can limit your ability to earn when synthetic traffic is generated. Let’s say we need 100pb of storage available for a future customer. In order to guarantee we have that much storage, Storj Labs will upload 100pb of synthetic data to Saltlake. It will then keep that storage there until the customer begins to replace it. That could potentially be a long time. You would miss out on being paid for that data during that time since you are no longer accepting data from Saltlake. The recent test data was part of a test for a lot of simultaneous uploads with a relatively quick expiration time for when the data gets deleted. This was for a specific customer’s needs. Other customer data may simply need to reserve a large section of space, and by not being part of Saltlake, you would not be able to earn from the synthetic data guaranteeing that space availability. Of course, once the customer starts uploading, your nodes will earn from that. But, ultimately you’ll make less money overall if you remove Saltlake.

arrogantrabbit · October 21, 2024, 3:54am

The whole “I’m going to join the project to share my unused space and then will actively prevent the project from using that space” stance is highly irrational.

There are plenty of drawbacks of exiting test satellite and none of the benefits. I’m not even trying anymore to understand what those folks are thinking.

nyancodex · October 21, 2024, 2:42pm

Well. 100pb. I hope that day will come.

Ottetal · October 22, 2024, 1:55pm

We’re growing every day. StorJ is growing every day and more customers are being onboarded every day.

This project has never been about overnight rags to riches - it has always been about constant expected growth.

Unique · October 24, 2024, 2:20am

My thoughts are that I started years ago with ext4 5400 usb drives and they filled up. Then I went to SATA 5400 drives, and they filled up. Then 7200 larger drives and amazingly they filled up.

At some point filewalker started going crazy and I added some SSD’s to go cached LVM and that sorted it. After that the inconsistent usage vs used space did my head in so I wiped my disks and started from scratch, and that also went fine and they started to fill up. But now they aren’t doing much so new err again.

Alexey · October 24, 2024, 7:24am

You were need only to fix the issue with a filewalker or databases instead of the restart from scratch, but well.