Updates on Test Data

Toyoo · May 3, 2024, 10:14pm

They do. Otherwise they wouldn’t know which node to direct downloads to.

MattJE96011 · May 3, 2024, 10:40pm

Not typically, no.

I used to run arrays and guess what… everyone here at the time said not to especially if I wanted to run multiple nodes. I did this primarily because at the time it was the ‘what I had’ setup before moving to dedicated systems. At that time everyone was arguing against arrays and now we’re arguing for them? I think the reality is nobody really knows a good way of running Storj because no matter how you do it you still have very high IO.

Edit: I ultimately decided to stay away from arrays as to not waste capacity on redundancy that’s not needed and to keep nodes at manageable sizes in case they need to be moved for whatever reason. Although I had considered going back to arrays at some point if this all keeps growing and only running maybe 2 nodes per system. Either way, switching back or slowly transitioning to arrays requires significant investment now so I’ll wait and see how things play out with the overall network.

Since when? Why are all my drives pegged at 100% for hours after a restart unless filewalker is turned off? Why does everyone else seem to have the same issues? Why did we get the option to turn filewalker off in the first place? It sure as hell does reach the drives.

If your going to argue this point with me don’t start with ‘it should’ and compare it with a single drive running on a Pi simply because your thinking in terms of raw processing power. Try connecting 9 18TB drives to that potato and see if it keeps up… 9 FULL drives. Maybe that’s not fair… try it on whatever you want. Although 9 does actually seem to be ok on a single controller but much over that isn’t. Depends on the controller.

However… I have a feeling we may be misunderstanding each other here. You suggest my problem might be the controller… my point is that it IS the controller and I already know that. I’m not here trying to figure out my problem, I’ve already identified it. I’m trying to point that out. Once you reach above a certain threshold the controller becomes a bottleneck (not really ‘the problem’). The point at which this happens will vary depending on hardware, your overall capacity, number of nodes running, disks, bandwidth (network load/number of IPs if that’s your thing) and whether or not filewalker is on / off, garbage collection is running etc. You can only max out the IO on so many drives at a time before maxing out the controller.

arrogantrabbit · May 3, 2024, 10:49pm

I saw that misguided advice too and I ignored it. (And I’m not changing anything, let alone array layout, for storagenode. Storagenode is second class citizen on my server. It gets leftovers)

Because you don’t have enough ram for metadata and metadata lives on disks.

If you had (even non-persistent) cache, or enough ram, the metadata would get prefetched and cached very quickly and iops will plummet.

On my array when node was 10TB filewalker would take a few minutes to complete, I would see 30,000-40,000 IOps peak on ssd device for few seconds, and that would exponentially decrease to zero during those few minutes, as metadata would be cached in ram.

I had persistent ZFS arc cache device. Later I switched to using special device for metadata and filewalker just reads from SSDs.

MattJE96011 · May 3, 2024, 11:07pm

512 GB RAM is not enough? And if the metadata is on the disk, and I restart (when filewalker runs) how does that data from the disk get into memory? Now having a persistent cache on SSD would probably make a difference, sure. But since I don’t have arrays that’s not really feasible, plus I didn’t want to be chewing through SSDs as it shouldn’t be necessary to run Storj but I may change my view on this in the future if I go back to arrays.

Also, I did edit my previous response about considering moving back to arrays at some point if you care to read it.

Toyoo · May 4, 2024, 12:25am

Certain people do. Not all. Don’t take one person’s opinion as collective “we”.

I personally see no point in going for an array, standalone drives work for me.

Again, I do not have this issue. You are severely exagerating your points, and this does not help with conversation. Though, if you want to rant, not discuss, you are still free to do so. Just don’t expect anyone to want to help you then.

I believe you are misunderstanding IsThisOn’s statement. “It should scale” means that if a potato RPi4 with 4 GB of RAM can handle, let say, a 2 TB node, then a setup with 512 GB of RAM should probably handle 256 TB worth of nodes. But nobody claims here RPi4 should do the latter.

Now, if it doesn’t scale this way, then indeed something is wrong. Given that we have examples of large nodes on not-so-large setups, this sounds like there is a problem not on the storage node software side, but in the setup itself.

Are you sure this RAM is connected to the motherboard? You know, just making sure.

Storage node code itself does not manage it in any way, just assumes kernel will do its job and keep file metadata in kernel caches. As such, as long as kernel doesn’t evict this cache, it would survive any number of node restarts.

BTW, if you want to pre-warm this cache at any point, anything that reads file metadata will do that. Some do that efficiently by taking advantage of knowing file system layout and operating at a block level, for example this is a nice side-effect of plain old fsck.ext4 .

Any decent consumer-grade SSD has enough TBW durability to survive many, many years of caching file system metadata.

MattJE96011 · May 4, 2024, 1:10am

I am speaking generally based on my experience and interactions in the forum. At the time it seemed the common consensus was very against arrays. Although I wasn’t convinced they were necessarily bad for Storj I decided against it for multiple other reasons. As for the filewalker issue, again speaking generally as there are many people here with the very same issue most of which have much smaller nodes / drive counts. I admit I do have a habit if assuming people understand I’m speaking generally since it should be quite obvious I don’t literally think everyone has the same exact problem or opinions on any particular subject.

Yes, what’s wrong is drive controller limitations from high IO. I realize that saying to plug 9 drives into a Pi wasn’t really the appropriate response. What I should have said is that you can’t think only in terms of CPU and RAM when calculating scalability. When scaling up, neither of those are major factors with modern hardware as any decently equipped server could run hundreds if not thousands of instances of the node software without issue. Your bottleneck is the IOPS of the disks followed by the IOPS of the disk controllers, and high RAM usage really comes into play when these can’t keep up! Do I really need to spell out how this works? If a potato can run 1 drive maxed out in terms of IO, then 100 potatoes can run 100 drives, but a single system equal to 100 potatoes in terms of CPU/RAM etc cannot run all 100 drives. I hope this makes more sense. If not, I quit.

Quite sure, yes.

This was not meant to be a real question.

lyoth · May 4, 2024, 2:21am

I used to run my node in zfs, but it ate too much ram and feel like it is not fast enough.

Now I just run ext4 on lvm with ssd writeback cache that’s in a mirror and running 18-20 drives without much problem on a 16 cores 2nd epyc with 128GB ram with a single lsi 9300-8i.

MattJE96011 · May 4, 2024, 10:07am

Yeah… I just can’t anymore with you people. You’d clearly just rather attack me because you don’t get it. But that’s fine. Despite what you seem to think, my systems are running fine. I DO NOT have a problem, I fixed it a while ago (so yes… I guess I was right). I’m NOT here looking for help, I’m just trying to share my experience running systems under much higher load than most of you are used to and I’m basically being told I don’t know what I’m doing… which I assure you is not the case or I wouldn’t have the data I do in a way that is actually quite profitable. Worst of all, the examples being compared to mine are not even a close comparison so just stop. And your right… I’ll be happy to take on the extra data when the ignorant come to realize the actual issue I’m trying to convey. It’s ok, I have the rackspace.

Ruskiem · May 4, 2024, 11:58am

and can one do any subsitute of cache inside a virtual machine?

A VM, already has 1 SSD (windows 10, and storjnode instaled in C:/Program Files, taking logs etc.),
and 1 HDD 14,5TB at D:\ for storj’s files
Would like to make sure the SSD is cache’ing that storj’s metadata from D:
If You know how, PM me!
No array’s here. its NTFS.
i would like to do catche just don’t know how

Alexey · May 4, 2024, 12:13pm

It depends. My home server even not close to the enterprise - the usual consumer components from the box to the disks, and it’s not struggle. Even if it’s under Windows.

So, what’s wrong here? How much RAM do you have and how much nodes do you run?

Alexey · May 4, 2024, 12:16pm

I still think that the RAID is not needed, I can use only my anecdotal experience, of course. But so far almost anyone who use RAID have problems now. Maybe because they run all their nodes on the same pool? I do not know. The average IOPS in the redundant array with a parity has less than 200 IOPS (much less), so the whole array is working as a slowest disk in the array.

Alexey · May 4, 2024, 12:18pm

Actually we do not have such a database, we have segments and pointers, where pieces of it are located, it’s not the same, but close.

Alexey · May 4, 2024, 12:23pm

1GB RAM potato (RPi3b+) can handle 2TB.

like a used-space-filewalker on start?

Alexey · May 4, 2024, 12:34pm

Please, do not consider that we all the same, we are trying to help you, but in different ways.

And we are very grateful to you for that!

oops.

oops 2x

You need to have a third-party software to use cache properly, Windows is a worse OS regarding caching.
Or migrate to Linux.

Ruskiem · May 4, 2024, 1:16pm

Yea, its soo interested,
if only Storj could build in some catche mechanism!

Look at Matt, he got solid components, yet was struggling
many nodes even in VM’s can easily have an SSD space, or already have for windows files, and storj installation files, maybe the missing component is some catche mechanism?
i would like to have storj’s metadata catched on SSD, but “i use only what i have”,
won’t rebuild physically all the machine just for storj optimisation,
BUT i could make use of my SSDs,
my every node has some SSD space assigned!
beside HDD, dedicated to storj’s data.
That could solve a lot of slow nodes.

stuberman · May 4, 2024, 2:01pm

Sounds like the ideal Storj node where people just use the excess hard drive capacity they have lying around so that Storj is ‘green’ and you don’t have to buy anything to participate. /snark

Toyoo · May 4, 2024, 2:01pm

So you came here to vent…

and then you do not even try to understand us trying to give you some advice thinking you have a real problem. Please put yourself in our position. We see someone saying that despite their efforts Storj is still slow for them, and then complaining that we interpret it as a request for help.

Sure. Wasn’t trying to describe all details. All that matters is the satellite knows what pieces exist and where. The information is effectively compressed, derived from other pieces of information, but for all purposes, it is there.

Great!

Indeed, but the file walker does not know the optimal order in which to read metadata. fsck.ext4 does. fsck.ext4 knows that ext4 collocates inodes next to each other, so it can sequentially scan them ignoring the fact they are in different directories. Storage node cannot assume the file system, and does not have block-level access to the file system either.

If only node operators could use operating systems designed to act as servers.

It is crazy annoying that whatever server software is written for Windows, it has to also implement half of Linux on its own to operate. Why Microsoft can’t do that, they have 1000× the engineering staff of Storj?

daki82 · May 4, 2024, 4:59pm

Its possible, i recommend reading the prerequisites and the rest.

stuberman · May 4, 2024, 5:01pm

And you are typical?

snorkel · May 4, 2024, 8:24pm

These stress tests have just proved that “use what you have” dogma is old history for Storj needs.
You can’t submit your daily work laptop or PC or raspi or NAS to stress tests or real client use that freeze your system for a few pennies. If Storj considers making this the norm, than pretty much everyone should switch to enterprise servers or dedicated hardware; otherwise the daily activity on machines will be affected. As a one-time thing OK, test the limits, but don’t make it the standard and expect everyone use what they have. It’s good to know now what we need to impruve in our setups for future Storj usecase.