Monthly updated new node report

Alexey · January 26, 2024, 7:58am

Something can be compressed, like pieces headers or databases, it also likely uses ZFS compression, thus it compress the difference between a “cluster size” and the used data.

IsThisOn · January 26, 2024, 12:37pm

Sorry, I should have been more clear.

The dataset only hosts STORJ data. Nothing else. No DB. And yes, it uses the default setting lz4.

I highly doubt that lz4 has a 4.5 compression on STORJ data. That would mean that the real data is around 20TB, compressed down to 5TB and also STORJ only paying 5TB.

Alexey · January 27, 2024, 4:34am

Since it’s zfs, this compression is more related to a record size and the really used data, see the discussion there:

and

IsThisOn · January 27, 2024, 7:30am

Hmm… not sure if I get that. I use a dataset (not zvol) with 128kb recordsize.

So if a blob is 25k in size, it would use a 32k record.
Assuming this wasted space could be handled by compression (is that the case? Does compression not happen before?) that would be a compression rate of roughly 1.5. That is still far away from 4.5.

Alexey · January 27, 2024, 7:55am

You may ask in these threads from users who use ZFS, how it works and what are their results.

IsThisOn · February 6, 2024, 7:27am

Day 104: 4.78TB

Time for me to draw some conclusions.

Ingress is pretty much unpredictable.
Ingress is still pretty slow. That makes buying hardware for it basically impossible. Which is good, don’t get me wrong At the time you filled you newly bought 20TB HDD, the HDD already has reached EoL /s
STORJ has in my opinion a fundamental problem. The design basically asks for ZFS and is not suited to run on a Pi with an external 3,5HDD. That is fine, but at the same time I ask myself if the added fragmentation is worth the payout.

Anyway, I had fun playing around with docker and will probably gracefully exit in the next days.
But before I do that, Best filesystem for storj gave me some ideas. Will open another post for that.

I decommissioned my old TrueNAS System.
Now I have a perfect test rig for some ZFS stuff.
My idea is to compare 3 different systems:

ARC only
persistent L2ARC metadata only
special vdev metadata only

To have some benchmark, the idea is to see how long it takes for file walker to finish.
For that I will look at logs.

Proposed task order:

create pool with special vdev
rsync data from old new to new node
do benchmark
move data another pool an the same host
destroy special vdev
create L2ARC pool
move data to L2ARC pool
do benchmark
do benchmark again, now L2ARC is hot
remove L2ARC, that makes is an ARC only system
do benchmark
do benchmark again, now ARC is hot

Benchmark post:

daki82 · February 6, 2024, 7:31am

It always was.

As intended

That’s ok, as an opinion.

Have fun Testing around (I will have to wait for retirement to do something like this)

Alexey · February 9, 2024, 6:11pm

You cannot,

So, you may use this node for tests, especially for filewalker.

I do not think so. You may use ext4 and will get acceptable results, and it can be used on Raspberry Pi starting from model 2 (see Can Storj run on a Raspberry Pi 2? - #2 by ipoli800).
But perhaps on top of LVM to make your life a little bit easier when you would decide to move this node to a bigger drive. And this combination should have less impact on performance and memory usage in comparison with a single drive zfs.

IsThisOn · February 9, 2024, 7:12pm

Ahh, I missed that. Does not really matter, I don’t even think I am using a valid wallet currently

Does it really?
How long does it take a pi to run filewalker for 16TB?
Or to be even more precise, does 16TB STORJ filewalker run fine on any single HDD without some form of hardware cache?

Roxor · February 9, 2024, 10:29pm

I’ve seen SNOs mention about 3-4h/used-TB (lazy) for general nodes. But it’s the drive speed that will always be the limitation: it uses almost no compute. It’s not like you have to watch filewalker: and the default lazy version stays out of the way of any regular IO a node is dealing with: so if it takes a couple days nobody cares.

Alexey · February 10, 2024, 4:46am

It’s running fine unless you use VM, BTRFS or zfs, any kind RAID with parity or SMR. But I read elsewhere that even SMR disks are fine, they usually slow on writes, but better on reads.

snorkel · February 10, 2024, 6:58am

ext4, no cache:
On 1GB RAM, which is the worst case, with 2 nodes running, for 4.6TB + 4TB, it took 58h.
For 18GB RAM, 13TB, it took 39.5h.
Both with lazzy off. Lazzy will take +50% more time.
Cutting the ingress will speed up the process like 2x.
The CPU dosen’t matter, only RAM and cache, besides drive and filesystem.
I will put my final tests in Tuning the Filewalker thread.

IsThisOn · February 10, 2024, 8:55am

Thank you guys for your answers. That makes me really reconsider things.

@snorkel is that without offloading the DB?

So let’s say, worst case, a single disk with a single node 16TB.
13TB = 40h
16TB = 50h
plus 50%
75h or roughly 3 days.

What downsides do I get during these 3 days?

High load on the disk. Possible wear-out?
Losing ingress and egress races
2w additional power consumption

These downsides seem fine to me. So the next question would be, how many times does the filewalker run? Because if it runs every day, then it would be a huge problem. As far as I understand, filewalker only runs every restart, so probably every 2 months when there is a docker update?

Alexey · February 10, 2024, 9:19am

No downsides actually, if it would be able to finish. If it would fail… well… You need to switch to a non-lazy one to make sure that it wouldn’t fail.

likely “yes” to all.

On which filesystem?

IsThisOn · February 10, 2024, 12:40pm

Is that depended on the filesystem?
I thought it is after every node restart.
Is there an additional timer to run filewalker?

snorkel · February 10, 2024, 9:05pm

check this:
https://forum.storj.io/t/tuning-the-filewalker/19203/221?u=snorkel

Alexey · February 11, 2024, 3:04am

Oh, I re-readed your post, you asked how much times it executed not how long. You are correct the frequency is independent on filesystem.
It executed roughly 1-2 times per week per satellite

2024-02-02T09:47:50Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully    {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-02-08T23:10:12Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully    {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}

IsThisOn · February 11, 2024, 8:54am

Cheers. Interesting that you see such little improvement by moving the DB.

If it is executed 1-2 times per week, and we estimate roughly 4TB/h, a 16TB drive has a 64h (16 * 4h) filewalker run. That is worst case 38% (64 / 7 * 24 * 100) of the time your node suffers:

Roxor · February 11, 2024, 11:52am

If there are 2 fw runs per week, that would be (64h x 2) / (168h/week)… so wouldn’t it be 76% of the time? And then if it really was per satellite… multiply that by 4… and that 16TB disk would never stop running fw? Yikes! That doesn’t sound right…

snorkel · February 11, 2024, 12:03pm

No, you got it wrong. The one FW that consumes those resourses is the one run by node restart (occupied space calculator). The other is Garbage collector that runs when new bloom filter is received, once every 5 days I think, and moves pieces to trash. And the other 2 that delete pieces from trash and delete expired pieces (that have an expiration time) run hourly, I think. The one that you must worry is the first one, run at each restart, as long as you don’t disable it. So, if your system dosen’t reboot because of updates or whatever, you only have the storagenode updates that restart the node and FW. In the recent months, this is once every 10-30 days.