Updates on Test Data

I assume you have considered the surge nodes to also only be chosen for downloads if there’s not enough regular nodes? Whatever the place you will pick to host the surge nodes, this would reduce your egrees while maybe making SNOs a little bit happier (at least those who would not complain about their egrees).

That’s just life…

1 Like

It sounds like a lot of SNOs need to put their money where there mouth is. Lots of talk of “when I fill… I’ll expand”, so we’ll see if that’s true. Because that’s over 10% of online nodes: if they stop accepting ingress it’s definately significant.

Many of us are weary of investing in resources for test data…

6 Likes

So far my nodes didn’t even get back to the pre-deletions level. Storj doesn’t want my nodes now :person_shrugging:

6 Likes

Not yet full, so not yet expand :smile:

As a reminder, if the uploaded test data is 30day TTL, it’s almost time to be deleted. We’ll see how that works out as well, and go from there.

We can also see it on the flip side as well: After a month of pushing everything to the limits (and sometimes beyond), 10% of the nodes became full.

4 Likes

Do you mean the Uptime on the dashboard? If it’s reset, then probably the node is restarted. Please check your logs (both - for storagenode and storagenode-updater), why is it restarted? The last message before a restart should have some explanation (unless you have had a power cut for a moment… but in this case dmesg or journalctl should show something).

Referring back to this, currently I have started seeing a large amount of the test data get garbage collected:

This might give insights to the issue you were diagnosing with the dashboard. To my untrained eye, it looks like the TTL data uploaded is not being correctly registered as “active” data, and therefore is being removed by GC before TTL kicks in.

1 Like

One of my nodes just moved like half of its data to trash. Seems to be data from slc sat. :thinking:

Dashboard makes me thinking this data was never correct registered by satellite and will not be paid. :roll_eyes:

I will probably expand by adding 20-30 nodes in the near future. Nothing close to 1000 of course but I think other people will add capacity as well. It takes a little bit of time though.

Right now I’m just trimming away some free space on existing nodes.

1 Like

Since a lot of data on my node has been deleted again, I don’t need to create a new node for now. I have plenty of space again.

2 Likes

Data deletion will start on sunday

I still don’t understand why the dashboard can not provide reliable data. As a customer it is good to have good and right data. And we as node operator are in my opinion the same as paying customers.
I really don’t want to complain but the stats are almost every month wrong from the satellite. The public status page is inconsistent and so on. Why is that so? And how do we know if the satellite knows the right data? And why is it sending wrong/no data to the storage nodes if it knows the real value?

I really like the project and will continue to support it, but i wasn’t able to find clarification for that. All that I was finding is “The satellite knows the right data and you will get paid the right amount”.

5 Likes

SNOs are service providers: we don’t pay Storj like customers: we get paid. And payouts have been correct month-after-month. If we don’t feel we’re getting paid adequately: we stop providing our service (and find another project)

5 Likes

I have the same problem: 71 TB in trash and 166 TB used. Ext4, 2 nodes on a single HDD (I know that in the “new Storj reality” we shouldn’t use more than 1 node per HDD, but we’ve been here for years and we have what we have). It’s been a while since the last time my nodes completed GC on the US satellite, which is why I have had so much data that should be removed. What I really don’t understand is why the devs took so long to roll out version 1.105.4 to Docker until yesterday (this version fixes a critical bug with removing retain files when the node is restarted). It is obvious that many SNOs restarted their nodes while tests are running to adapt to increased data flow and lost retain files because of the bug. If there is someone like me with a lot of unremoved trash (I mean in used, not in the trash directory) and still not updated manually to 1.105.4 (I did it a couple of weeks ago), then it will be another month before their nodes clean themselves up. If I understood @littleskunk correctly, this may decrease network throughput.

The satellite payout data is actually quite useful, especially now when we are being asked to bring new capacities online.
That is probably the only reliable source of income predictions, which in turn enabled the SNOs to make purchasing decisions.
Me for example, I’m now totally blind, as all the calculations predicting the income based solely on the used space are quite useless, especially in times of issues with GC, where unpaid data on nodes might not be trashed for months, skewing the income predictions quite dramatically.
Now there is another unknown with the Saltlake test data, how much of that is actually being accounted for?
So I guess this should be one of the priorities, to make these data reliable.

I like the project and some critics isn’t bad. I just wanted to know the things mentioned. Maybe it’s a fault my end that I don’t quite understand.
I like the project and will continue to support it. Maybe it sounded more aggressive than it should be. Sorry if that was the case.

3 Likes

Maybe you should slow down a bit in defending every developer’s failure. This is getting ridiculous. Stop being a fanatic.

2 Likes

It’s actually the Old Way of Doing Things™, see https://www.storj.io/legal/supplier-terms-conditions:

4.1.4.1. Have a minimum of one (1) hard drive and one (1) processor core dedicated to each Storage Node;

Last GCs from all the satellites were a couple of weeks back. Your GCs can’t finish because you are overloading the nodes against any advice not to.

Since we recently had successful GCs on all satellites, and the bloomfilters have been increased because larger nodes couldn’t keep up with the deletes (missed files because blooms were too small), I don’t think we’ll see the “months” of untrashed data again. A week, sure, but that’s nothing in the grand scheme of things.

All, except a single 10x64MB test.

2 Likes

My GCs can’t finish because:

  1. We didn’t have the “save-state-resume feature for GC filewalker” until May (April?) 2024.
  2. The “save-state-resume feature for GC filewalker” actually wasn’t working because of a bug with removing retained files on restart until version 1.105.4.

I am pretty sure that my nodes will be okay now, after these bugs was fixed.

I wouldn’t say it’s a developer failure. I woul more likely say it hasn’t that priority. They did a great job in increasing performance and so on. But the metrics weren’t fixed for like 2 (or 3 months). And the problem with wrong data is not unknown

2 Likes