If file seeks are a problem with ZFS, then it’s ZFS’s fault. On Ext, seeks are free, because you just look up the right position through the extent tree embedded in file inode/metadata, and start reading from there. Though, I do recall there’s a similar solution in ZFS, so I have a feeling seeks are likely free in ZFS as well.
Brilliant ideas, I was thinking about how to limit the IO’s for these operations some time back.. this certainly would meet my requirements.
Background info from my setup:
I’m doing some large and heavy rsync housekeeping jobs for some mirrors on schedules and currently I’ve implemented a check for housekeeping on StorJ nodes and try to make it all co-exist on the same multi vdev ZFS. But it’s not very “nice” or clean.
Sorry, didn’t notice earlier.
Filewalker is never a problem, nor are multiple concurrent filewalkers. They don’t generate IO to disks - and are effective rate limited by CPU utilization.
The concern is single thread compaction can produce enough undesirable IO to disks to cause annoyance.
I think the discussion about compaction is overblown. I’m running on 10 year old hardware with zfs and have no problems with compaction. It works great with default settings.
I’ve been using hashstore since I think it was April/May of last year when it was released. There are 24 nodes with about 130TB of data. When compaction is run on a node the system load goes up by about 4-5 and I/O delay goes up by about 3%. It’s nothing. With newer computers even the really small ones are faster than my steam machine so I can’t see hashstore being a problem.
I only run Linux so I can’t say anything about Windows.
Only Storj can say for sure but I very much suspect that it’ll be a very significant proportion.
It would be very interesting if Storj ever published a breakdown of OS, CPU/RAM and Filesystem of network nodes but it would probably scare prospective customers away
We do not have this stat as far as I know, but you may use our GitHub to check the amount of downloads:
We need to optimize for all file systems which are wildly used by people for every day tasks. One goal is supporting storage nodes, which are running in addition to existing workloads. And many (most?) servers are using ext4 today.
I am glad that you are happy with the current version, and I hope that you will be happy with improvements…
We were not happy, as we tested to current implementation under heavy load + storing high number of pieces (10-16 TB disks).
Most of the problems are solved with hashstore: now it’s time to make sure that public network is also ready for higher throughput / lower latency / more data.
If you want to upgrade us to hashstore… then you know… maybe push out a deluge of paid test data like last summer… I’d be OK with that
Lol good point, if there was a flood of paid test data that only worked on hashstore then you would definitely get a lot of petabytes migrated quickly!
How will the rollout for hashstore be handled? In waves (ie update cursor style) or do we all go to sleep one night and wake up with the entire network running on hashstore immediately?
With a chosen update, when the node is updated, it starts with migration enabled, and it will take like a month to migrate all the data on a node.
Please give us the option to opt out, put that option in a version predating the migration version and inform us when we can enable that option. I want to stay on piecestore as long as possible.
I believe the big problem I have with hashstore is the defragmentation. With small files, ext4 does a pretty good job avoiding it, but with 1GB files I believe the defragmentation will happen even on ext4. Pease correct me if I am wrong.
ext4 tries to allocate the file in full, ie not split it in chunks. And even then, ext4 tries to keep some space extra at the end of the file, just in case you edit the file and the file grows a bit. As long as the disk has free space, it will not be fragmented.
These are stats on storagenode binary:
Linux (amd64) | ██████████████████████████████████████████████████ 609,835
Windows (amd64) | █████████████████████████ 299,042
Linux (arm64) | █████████ 102,395
Linux (arm) | ██ 28,259
FreeBSD (amd64) | 230
Few notes:
- I’d expect some of those Linux thingies are actually windows users using docker. This is horrendous, why do so many people still keep putting up with Microsoft abomination of an OS?!! That’s beyond my comprehension, so I won’t even try.
- Raspberry pies and other archaic arm based synologies are less than 3%. They can be disregarded. This I expected.
- Where are FreeBSD fellas? What happened?!?
Another interesting tidbit – stats on storage node-updater:
Windows (amd64) | ██████████████████████████████████████████████████ 309,692
Linux (amd64) | ███████████████████████████ 162,176
Linux (arm64) | ██████████ 63,441
Linux (arm) | █ 7,163
FreeBSD (amd64) | 28
I’m not sure why there is discrepancy in distribution here.. But it’s noteworthy. Maybe some issue – why are windows machines pulling updater so frequently?
Yeah, I’m fuzzy on compaction myself. Would have to look into the actual implementation again. Give me FALLOC_FL_COLLAPSE_RANGE and I’m happy. Give me FALLOC_FL_PUNCH_HOLE and run full rewrite only when we run out of reasonable address space, I can understand. Other cases, I’m sort of on the edge—probably siding with Storj though, but barely.
A year ago I was somewhat close to just dedicating some time to writing my own piece store, did some small experiments, but the revenue is not good enough yet, and I saw that external patches aren’t really being integrated. Besides, Linux-specific piece store wouldn’t fly anyway in the general repository. Now I could vibe code at least half of the damn thing, but I just don’t have time.
If the Select nodes have already been the guinea pigs, and hashstore looks to be better, and the plan is to remove piecestore… it doesn’t seem like making opt-out easy is worth it. They control the speed of the rollout: they can make it so slow a full migration takes three months or something. Plenty of time to deal with any issues.
Now that I know they have a utility that can rebuild the hashtables from the logs, I’m much more comfortable with the migration.
It even has been used at least once:
And you can use this method if you do not want to install GO:
Should be higher. Raspberry Pi OS supports 64-bit and can run on the old 3B series. Also plenty of Synology NASes run x86.
I also wonder how many of those linux-amd64 nodes are from Select, and if anyone is running a node off of an Android phone lol.
Just a note about a new forum thread - Hashstore is coming out of tech preview: Hashstore rollout commencing!
So basically all what is required is
echo -n 'true' > 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6.migrate_chore
echo -n 'true' > 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S.migrate_chore
echo -n 'true' > 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs.migrate_chore
echo -n 'true' > 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE.migrate_chore
Then restart and the node will run on hashstore? No changes in docker command or something? Or should I remove the lines with badger cache or filewalker or whatever?