Monthly updated new node report

Something can be compressed, like pieces headers or databases, it also likely uses ZFS compression, thus it compress the difference between a “cluster size” and the used data.

Sorry, I should have been more clear.

The dataset only hosts STORJ data. Nothing else. No DB. And yes, it uses the default setting lz4.

I highly doubt that lz4 has a 4.5 compression on STORJ data. That would mean that the real data is around 20TB, compressed down to 5TB and also STORJ only paying 5TB.

Since it’s zfs, this compression is more related to a record size and the really used data, see the discussion there:

and

Hmm… not sure if I get that. I use a dataset (not zvol) with 128kb recordsize.

So if a blob is 25k in size, it would use a 32k record.
Assuming this wasted space could be handled by compression (is that the case? Does compression not happen before?) that would be a compression rate of roughly 1.5. That is still far away from 4.5.

You may ask in these threads from users who use ZFS, how it works and what are their results.

Day 104: 4.78TB

Time for me to draw some conclusions.

  • Ingress is pretty much unpredictable.
  • Ingress is still pretty slow. That makes buying hardware for it basically impossible. Which is good, don’t get me wrong :slight_smile: At the time you filled you newly bought 20TB HDD, the HDD already has reached EoL /s
  • STORJ has in my opinion a fundamental problem. The design basically asks for ZFS and is not suited to run on a Pi with an external 3,5HDD. That is fine, but at the same time I ask myself if the added fragmentation is worth the payout.

Anyway, I had fun playing around with docker and will probably gracefully exit in the next days.
But before I do that, Best filesystem for storj gave me some ideas. Will open another post for that.

I decommissioned my old TrueNAS System.
Now I have a perfect test rig for some ZFS stuff.
My idea is to compare 3 different systems:

  • ARC only
  • persistent L2ARC metadata only
  • special vdev metadata only

To have some benchmark, the idea is to see how long it takes for file walker to finish.
For that I will look at logs.

Proposed task order:

  • create pool with special vdev
  • rsync data from old new to new node
  • do benchmark
  • move data another pool an the same host
  • destroy special vdev
  • create L2ARC pool
  • move data to L2ARC pool
  • do benchmark
  • do benchmark again, now L2ARC is hot
  • remove L2ARC, that makes is an ARC only system
  • do benchmark
  • do benchmark again, now ARC is hot

Benchmark post:

1 Like

It always was.

As intended

That’s ok, as an opinion.

Have fun Testing around :nerd_face: (I will have to wait for retirement to do something like this)

You cannot,

So, you may use this node for tests, especially for filewalker.

I do not think so. You may use ext4 and will get acceptable results, and it can be used on Raspberry Pi starting from model 2 (see Can Storj run on a Raspberry Pi 2? - #2 by ipoli800).
But perhaps on top of LVM to make your life a little bit easier when you would decide to move this node to a bigger drive. And this combination should have less impact on performance and memory usage in comparison with a single drive zfs.

Ahh, I missed that. Does not really matter, I don’t even think I am using a valid wallet currently :slight_smile:

Does it really?
How long does it take a pi to run filewalker for 16TB?
Or to be even more precise, does 16TB STORJ filewalker run fine on any single HDD without some form of hardware cache?

I’ve seen SNOs mention about 3-4h/used-TB (lazy) for general nodes. But it’s the drive speed that will always be the limitation: it uses almost no compute. It’s not like you have to watch filewalker: and the default lazy version stays out of the way of any regular IO a node is dealing with: so if it takes a couple days nobody cares.

2 Likes

It’s running fine unless you use VM, BTRFS or zfs, any kind RAID with parity or SMR. But I read elsewhere that even SMR disks are fine, they usually slow on writes, but better on reads.

1 Like

ext4, no cache:
On 1GB RAM, which is the worst case, with 2 nodes running, for 4.6TB + 4TB, it took 58h.
For 18GB RAM, 13TB, it took 39.5h.
Both with lazzy off. Lazzy will take +50% more time.
Cutting the ingress will speed up the process like 2x.
The CPU dosen’t matter, only RAM and cache, besides drive and filesystem.
I will put my final tests in Tuning the Filewalker thread.

2 Likes

Thank you guys for your answers. That makes me really reconsider things.

@snorkel is that without offloading the DB?

So let’s say, worst case, a single disk with a single node 16TB.
13TB = 40h
16TB = 50h
plus 50%
75h or roughly 3 days.

What downsides do I get during these 3 days?

  • High load on the disk. Possible wear-out?
  • Losing ingress and egress races
  • 2w additional power consumption :laughing:

These downsides seem fine to me. So the next question would be, how many times does the filewalker run? Because if it runs every day, then it would be a huge problem. As far as I understand, filewalker only runs every restart, so probably every 2 months when there is a docker update?

No downsides actually, if it would be able to finish. If it would fail… well… You need to switch to a non-lazy one to make sure that it wouldn’t fail.

likely “yes” to all.

On which filesystem?

Is that depended on the filesystem?
I thought it is after every node restart.
Is there an additional timer to run filewalker?

check this:
https://forum.storj.io/t/tuning-the-filewalker/19203/221?u=snorkel

Oh, I re-readed your post, you asked how much times it executed not how long. You are correct the frequency is independent on filesystem.
It executed roughly 1-2 times per week per satellite

2024-02-02T09:47:50Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully    {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2024-02-08T23:10:12Z    INFO    lazyfilewalker.gc-filewalker    subprocess finished successfully    {"process": "storagenode", "satelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
1 Like

Cheers. Interesting that you see such little improvement by moving the DB.

If it is executed 1-2 times per week, and we estimate roughly 4TB/h, a 16TB drive has a 64h (16 * 4h) filewalker run. That is worst case 38% (64 / 7 * 24 * 100) of the time your node suffers:

If there are 2 fw runs per week, that would be (64h x 2) / (168h/week)… so wouldn’t it be 76% of the time? And then if it really was per satellite… multiply that by 4… and that 16TB disk would never stop running fw? Yikes! That doesn’t sound right…

No, you got it wrong. The one FW that consumes those resourses is the one run by node restart (occupied space calculator). The other is Garbage collector that runs when new bloom filter is received, once every 5 days I think, and moves pieces to trash. And the other 2 that delete pieces from trash and delete expired pieces (that have an expiration time) run hourly, I think. The one that you must worry is the first one, run at each restart, as long as you don’t disable it. So, if your system dosen’t reboot because of updates or whatever, you only have the storagenode updates that restart the node and FW. In the recent months, this is once every 10-30 days.

1 Like