Updates on Test Data

storaje · August 29, 2024, 3:11pm

Beyond Expectations: The significance of Storj’s Veeam Ready status

Does this have anything to do with the testing data or is it a separate use case?

Knowledge · August 29, 2024, 4:12pm

We cant share anything about customers unless agreements are reached with them to do so.

We have various customers in states of onboarding currently. Which one starts first and by how much and such is dependent on their internal timing and not ours, but we are pausing testing to monitor the platform as they come online. Decisions about restarting testing will come after these milestones have been met.

EasyRhino · August 29, 2024, 4:33pm

I think the massive emptying of data you are seeing is what many operators have seen (you are not alone, for three main reasons:

storj had been flooding nodes with test data
this test data been reduced the last week or so, thus resulting in deletions exceeding new ingress (even with everything working smooth), so data on nodes will shrink a lot unless test or real data ingress increases.
Things were NOT working smooth for a couple months and many terabytes that were actually trash were not being listed as trash or cleaned up.

So as others have said, I don’t think being in eastern Europe is hurting you geographically. “distant” nodes seem to be able to win plenty of races.

Now, you are having electricity problems, which are well-documented. I can see how it may not be worth it to maintain the uptime (the effort and expense of massive battery backup and generators may exceed the gain). it could still very reasonable to decide not to operate your nodes.

Ruskiem · August 29, 2024, 5:16pm

or could be that nowadays enough homes with fiber?
like L.A. to Berlin in 60ms or something, better than L.A to L.A on copper

beside i insist to delay test data uploads if in plans, my nodes are full in trash i hope in 5 days it will go away largely

Vadim · August 29, 2024, 5:44pm

it is up to client when begin and how much as i understand.

EasyRhino · August 29, 2024, 8:34pm

Do we have a hunch if new custom onboarding data will resemble the test data in “flow”? From a node operator perspective:

high ingress
low egress
lotta files
TTL deletes

if so, that will be useful so we can be ready from a BOHICA* perspective

* Bend Over Here It Comes Again

Knowledge · August 29, 2024, 11:41pm

I think the original post of this thread details expectations. I understand why you ask but each customer has different needs. We will know more as they onboard.

Alexey · August 30, 2024, 4:23am

It depends. My nodes located in St. Petersburg, so @vovannovig have nodes much close to the nearest network hub and should have much more wins, than mine.
However, right now most of the used space (more than 80%) on my nodes are used by the US1 customers.

I don’t think so, the customers cannot do it without explicit code modification. We supporting Geofencing, however, it’s limited to some regions, but Ukraine is a part of European geofencing as far as I know. Some customers requesting to exclude Russia though or include only EU for GDPR reasons, but I’m not aware of anyone who requests to exclude Ukraine.

I believe that may be a culprit, the node selector prefers nodes with a high reputation. What is the reputation of these nodes? Are there any scores below 100%?

vovannovig · August 30, 2024, 8:21am

Suspension -100%
But online is of course below 100%
Ensuring 100% is almost impossible in these conditions.

For testing, I launched a node on the NVME, it’s still quite a lot - “manager closed: unexpected EOF”

Alexey · August 30, 2024, 8:27am

Yes, I can understand, but this is what I can assume from the situation. I believe that your nodes should have much less latency, than mine, you should be close to the network hub (at least I think so…).

this is quite normal, I have them as well. Right now after the cleanup, my nodes half full. So, I would expect the traffic from these customers soon.
Please, do not give up!

vovannovig · August 30, 2024, 8:41am

Thank you.
I’m trying very hard to optimize everything I can and hope that the nodes will start filling up and storing more and more data.
Waiting.

littleskunk · August 31, 2024, 10:55am

No real update yet. I had already the feeling nothing is going to happen this week. Monday is a holiday so maybe Tuesday next week it continues.

JWvdV · August 31, 2024, 11:08am

Being already accustomed to more persistent user data, with a grow of about 250GB/day (which is expected to increase after the weekend, when all trash has been deleted) I wouldn’t mind if it would remain this way

PePeR · August 31, 2024, 3:28pm

I just want to say, I had like 50% of data that wasn’t marked as trash and it was just sitting there unpaid. Good that you fixed this

mihalko · September 2, 2024, 5:56am

after a few days like -10TB/day last week, I am still like -1,5TB/day so few things could change Now I hold less data than I was paid for last month, so not only unpaid trash is being deleted for me. But I prefer customer data which are growing.

@storjlings Are all “hacked and free accounts data” from eu and us deleted by now?

Alexey · September 2, 2024, 6:29am

It doesn’t matter. Each satellite has a deletion queue of abandoned or failed to pay accounts and they are deleted every time, just nobody can say, when they all will be deleted, because the queue is not empty as far as I know. Some customers would pay their invoices and their accounts would be removed from that queue, so almost like with the customers data - it’s not predictable when it’s related to the humans’ behavior.

Also not only Saltlake has TTL data, on my nodes I see a lot of such data from the customers of US1 as well.

vovannovig · September 2, 2024, 8:28am

Please tell me what script you used to watch this?

====
I have old tools and they show quite good results, but this node is actually very loaded and there are quite large delays in the file system, the queue for processing by the bloom filter (deletion)

pwsh .\successrate.ps1 -Path storagenode.log
========== AUDIT =============
Successful:             46
Recoverable failed:     0
Unrecoverable failed:   0
Success Min:            100%
Success Max:            100%
========== DOWNLOAD ==========
Successful:             748
Failed:                 3
Success Rate:           99.6005326231691
========== UPLOAD ============
Successful:             6056
Rejected:               0
Failed:                 107
Acceptance Rate:        100
Success Rate:           98.2638325490832
========== REPAIR DOWNLOAD ===
Successful:             179
Failed:                 0
Success Rate:           100
========== REPAIR UPLOAD =====
Successful:             179
Failed:                 38
Success Rate:           97.2343522561863

If you use the same tool, it looks quite strange, the numbers 98-99% indicate that everything is fine, but the node actually, if you look at the delays (more than 200 ms), queues (more than 3,4), requires attention and optimization.

If even such a node can handle the load (Yes, I understand that it is small and there is almost no test traffic now), then my other nodes are in super condition))) - but then why are there discrepancies with occupied space, long scanning time, bloom filter processing queue and so on…

JWvdV · September 2, 2024, 9:34am

What tool are we talking about?
I mean those delays and queues, how to get those insights?

vovannovig · September 2, 2024, 9:55am

I use Windows and the delays are visible in the task manager, and the queue length in the “resource monitor”

vovannovig · September 6, 2024, 9:19am

Very little traffic, after these large-scale deletions I really want to fill it up.

Please tell me, will the download resume?