Updates on Test Data

BrightSilence · May 21, 2024, 7:12am

I do both. SMART gives me a better picture of individual drive health and not all HDD’s are in an array.

nerdatwork · May 21, 2024, 7:29am

Windows node crashed with a stack trace.

2024-05-21T01:18:10Z	ERROR	services	unexpected shutdown of a runner	{"Process": "storagenode", "name": "piecestore:monitor", "error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1.1:154\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:143\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-05-21T01:18:11Z	FATAL	Unrecoverable error	{"Process": "storagenode", "error": "manager closed: closed: read tcp 127.0.0.1:49212->127.0.0.1:7778: read: connection reset by peer", "errorVerbose": "manager closed: closed: read tcp 127.0.0.1:49212->127.0.0.1:7778: read: connection reset by peer\n\tstorj.io/drpc/drpcmanager.(*Manager).manageReader:232"}

Part 1 of log

Part 2

Above file is just a single line of stack trace that has 1.45 million characters. Discourse has character limit per post of 32k characters.

Julio · May 21, 2024, 8:27am

Coolio… Getting near my own saturation point… WTF?

BrightSilence · May 21, 2024, 9:09am

A little note from my end. Some of my nodes are beginning to fill up, but I’m a little hesitant to upgrade space now, since I still have 27TB of trash, which would free up plenty of space for further testing. Are you expecting to fill more than what was removed? If so, I’ll go shopping now, but I’d need some assurances first.

Mitsos · May 21, 2024, 9:13am

ACarneiro · May 21, 2024, 9:15am

Interesting, isn’t it?
I suppose it boils down to your disposable income and how much of a drama it would be for you to have another spun-up node with little to no ingress for a while.

Personally, I’m waiting for all the trash data to be finally purged from my disks to get a better sense of what’s going on before going shopping.

Mitsos · May 21, 2024, 9:17am

Personally I’ll still expand just not at the rate I was, until we figure out what the future usage is like.

BrightSilence · May 21, 2024, 9:20am

Yeah, on the one hand I don’t want to miss out on the test data. On the other, I don’t want to waste money if I don’t have to. But to be fair, I’ve made more than enough since the last expansion to pay for another purchase. So it’ll depend on what I should expect. Also due to the short TTL I wouldn’t be missing on permanent storage anyway. So I’m just pondering my options.

Thanks @Mitsos . I guess I did read that but forgot about it. That gives some guidance at least.

ACarneiro · May 21, 2024, 9:20am

Well, I currently have 8.5TB in trash on one of my machines. That, along with the free space on those nodes will be MORE than enough to accommodate any spikes for the foreseeable future.

ACarneiro · May 21, 2024, 9:25am

Well… perhaps I’m just a bit of a doubter but… well… growth historically hasn’t really been massive and this recent frenzy is all on the promise of a new client which may or may not come to pass (fingers crossed, by the way!).
The history of the last couple of years suggests that reality will be less optimistic than the plans.

So long as the network currently has the performance and capacity to onboard big customers I see no reason to expand.
I remain somewhat sceptical but following developments with great interest.

PS: I hope that didn’t sound too pessimistic. I do believe in the project and have nothing but respect for all the Storj team.

agente · May 21, 2024, 9:30am

what kind of router do you have for manage 7.5gbit (ppoe?)

BrightSilence · May 21, 2024, 9:42am

I agree. You’ve just perfectly countered my FOMO @ACarneiro

zip · May 21, 2024, 9:49am

You should probably check the interface bandwidth as these data are now very broken showing tenfold of what the actual interface bandwidth is.

jammerdan · May 21, 2024, 9:50am

I have so much trash on my nodes for multiple reasons too and I would like to see that purged first.

littleskunk · May 21, 2024, 11:28am

Oh boy. I can’t keep up with all the messages in this thread.

Non of these deals are signed yet. → Lets better wait for the customers before making any decisions.

Also don’t forget the impact of the TTL. Lets take a TTL of 4 weeks or so. At first you will see an amazing grow rate and might order new hard drives just to find out that after 4 weeks there is almost no growth on your node. → We need to adopt our estimations and take the TTL into account. I don’t know how yet. It is an SQLite DB so a script could run the math for us.

I believe we ruled that out yesterday. If we put load on 2 satellites we still hit the same throughput so it can’t be the satellite. It looks more like a storage node limitation. We could try a different node selection. We already designed a new node selection based on upload success rate that could be usefull here. Parts of it are already implemented but it would take some more effort to finish and also test it. Sounds more like this would be further down on our priority list and we will try other methods first.

One method that is easy to apply is a different RS setting. I guess that will be one of the tests for today.

nerdatwork · May 21, 2024, 11:41am

Just wanted to bring this to your attention in case it helps.

ACarneiro · May 21, 2024, 11:44am

From my limited understanding, this is a bit of a worry, isn’t it?
Any idea if nodes are being held back by IOPS or bandwidth or any other “non-tuneable” constraints?

littleskunk · May 21, 2024, 12:18pm

With the few tests we executed yesterday it looked more like a bandwidth and not IOPS limitation. But don’t worry. It is tuneable for sure. We still have plenty of ideas what to try out next.

ACarneiro · May 21, 2024, 12:26pm

That’s very interesting. I think many of us “pleb SNOs” still have loads of available domestic bandwidth that didn’t get used.
In my particular case, my home nodes are behind 600Mbit upstream and I didn’t go higher than around 40 during yesterday’s test (despite the Success Rate script claiming over 98% successful transfers). My nodes in other locations and countries saw similarly low upstream usage.

BrightSilence · May 21, 2024, 12:35pm

Fair enough. I’ll pull my finger off the buy trigger.

Yeah, but the piece size isn’t in the DB, so you don’t get much further than counting the number of pieces about to expire per day.

Really curious about what this would yield. I’ve been looking forward to seeing some changes in that. Segment size might also be a thing to look at, though that would only help if you expect files to be big.

Yeah, it’s peaking at about 9% of what I have available, plenty of room for more.