Could that be huge?
https://review.dev.storj.io/c/storj/storj/+/14910?tab=comments
Could that be huge?
https://review.dev.storj.io/c/storj/storj/+/14910?tab=comments
Could be good for shingled disks
How do deletes work, do you need to read the entire log file, process the deletes, then write it back to the disk?
I wonder how much write overhead there would be if this is the case.
Iâd assume just like SSDs delete data. hint â they donât. At some point read/modify/write will be needed.
I am personally strongly opposed to all and any attempts to reinvent filesystem features in the application. If an app needs to write a file â just write a file. Donât do batching/striping/logging/appending/anything else trying to workaround perceived filesystems [mis] behavior. Just do what you need to do, and let the filesystem handle performance. Itâs designed to do that. It has been tested on a billion of hours on millions of devices. Storagenode reincarnation of some hackery wonât be.
For example, there are absolutely no issues with node performance on zfs even on previous decadeâs ancient hardware. None. There is no need to complicate the node software and risk data loss for what benefit exactly? To run node on a potato for a day?
These shenanigans will never be as reliable as any underlying filesystem. So how about we stop wasting development time reinventing bicycles and optimizing things that work just fine?
Two people using ancient ext4 or NTFS (who runs sever software on any windows seriously today, come on) are not worth degrading node reliability for everyone. We are talking customer data here. .
Remember, we went away from using databases in the past because it was too fragile. Now we are back on the dangerous track of reinventing database. Just stop. Donât repeat the history.
Same is true about badger cache. Itâs not needed. It may (and did? I reminder reading some thread about it) cause issues. Throw it away. Let the filesystem do its job. and if it does not â tune it.
Donât try to make storagenode into its own OS. Thatâs a too big a piece to bite.
I believe that this new storage wouldnât be a default any time soon. At least itâs a too early stage of testing approaches.
To use ZFS you need to have a lot of RAM and run it on a more powerful hardware (and more expensive regarding either a cost of itself or/and power consumption). It also doesnât fully match to use what you have now. Even a low power device can run a node, and we wants to be it so.
Thatâs not true. Iâve experimented with rasbetty pi 4 with 8GB ram, HDD, and an SSD as a special device. No issues. Ran great.
Use what you have does not mean it shall run on anything you have.
People stringing raspberries pies together is not a backbone of the network. Itâs just toys. People who have âmore powerful hardwareâ - -like home or SMB NASes is what needs to be focused on. These are inherently more reliable, persistent, and less likely to drop dead when the cat sneezes nearby compared to PI with all those GPIO exposed. Pi is a prototyping device, not a production server.
Storj can collect telemetry and figure out the fleet of devices it runs on, normalized by space/bandwidth. I highly doubt raspberry pies are a backbone of the network.
Compared to raspberry pi? yes. But it already runs anyway. It does not matter what it consumes if itâs already online and available.
Not if this takes away precious development resources only to improve performance of 3% of nodes by volume while reducing reliability.
Does storj have stats on the CPU/RAM/ARCH/bandwidth of the devices running the nodes today, albeit virtual? Maybe there it is OK to drop raspberry pies? It would be much cheaper. Or just do nothing and wait until raspberry pi 9 with 34 cores and 128GB of ram is released. Itâs best to avoid writing code that does not absolutely have to be written.
And with RPi 3b+ 1GB without SDD and one HDD too? I doubt so.
Why not, if itâs meet the minimum prerequisites?
Maybe, but they use much less energy than anything else. This is a good choice for those ones, who wants to learn Linux and earn something on top. So, even if they run it for fun, it would be still useful. At least my RPi3b+ was able to pay for all four nodes electricity and my internet subscription. For me it was useful. So, why not?
This is probably true. However, we accepted anyone, who have a spare space and bandwidth.
In this case, it is exactly in line with our ideas of sustainable development.
Itâs not only for a small amount of slow nodes. Itâs also to reduce costs on hardware, if someone would build HW only for Storj (yes, we against it, but we are not blind). And we want to reduce the footprint, not increase.
Maybe, I didnât search enough to find that stat. I usually used a downloads stat from the GitHub.
This one implementation proves an increase in a performance. We want to have a high performance of our nodes. Anyone node. Even from a week devices. The high distributed nodes are our strength, why do not squeeze from it every ms which we can achieve?
Well, as long as it does not work as badly as the storage method in Storj v2 it will probably be good.
There are some people, with large MSSQL databasesâŚ
Iâm using ext4 on top of a zvol (node runs inside of a VM, the virtual disk is a zvol). I thoght that zfs on top of zvol would be bad for performance, but maybe it would not have been as bad.
I also do not know if this new method would improve the performance of my setup.
Iâm with arrogantrabbit on this: thatâs simply not true.
ZFSâs ARC does benefit from having more memory, but large amounts arenât needed. The special-sauce for Storj is its ability to easily and natively use a SSD as a âspecial metadata deviceâ to speed up almost everything heavy a node does. Like filewalkers complete in seconds/minutes instead of hours/days.
Anyways⌠if this change means welding all our millions of current .sj1âs into fewer/larger files so HDDs spend more time dealing with sequential transfers⌠Iâm intrigued. But Iâm interested in what happens for deletes: as it sounds like you either swiss-cheese those new larger files with tons of IO punching holes⌠or you rewrite them with their new contents (but benefit from those rewrites being sequential).
Hmmm⌠Copy-on-Write sematics⌠where have I heard that before?
When it comes to filesystems, I agree with Bart SimpsonâŚ
100% agree, ZFS brings more headaches than what it can offer, especially on low end hardware.
No, the disks will still need to deal with same amount of IO, likely more:
Youâve just added an extra step or two that filesystem cannot optimize because it has no idea about your second layer secret filesystem.
This is actively hurting performance.
The only usecase that will improve is copying node data as is to another volume. Hardly a burning performance issue.
I was always thinking this project could benefit from having a choice of piece stores. So this is great news to me. Canât wait to try it.
I would not be so enthusiastic. Piece store in v2 based on very well proven databases (pretty much as good as it gets in terms of reliability) collapsed catastrophically, and it was determined that there is nothing more reliable that plain old files on the disk. Thatâs what we have in v3.
Now the history seems to be forgotten, and someone tries to move away from plain old files on disk to what essentially is a database but made from sap and twigs.
I seriously have no clue how this exercise in futility got allocated development time. Arenât there architecture reviews? Prof of concept work the changes?
If team has extra time to burn there are more pressing things to address â like still broken updater on FreeBSD (no, im not filing yet another bug report), broken plots in the UI on page refresh, and gigabytes of logs per day that node generates. The whole logging needs to be overhauled. Itâs 2024, use tracing. Especially if the alleged reason for this database nonsense is concern is to run on weak hardware. Fix logs. Then youâll maybe realize filesystem isnât a bottleneck).
So far the best solution I found is ZFS with special device or l2arc for metadata. This works great but it needs an additional ssd. The proposed piece store could work without such ssd in my opinion.
having 30 millions of files on 4-6 TB hdd is also not very efficient. I think it will be best way to store 4k file or 8k file size, depends on HDD size, configurable will be the best way.
Exactly. The only people who suffer through windows are those who are locked in into windows ecosystem/software. That is not accidental.
A line shall be drawn somewhere. 8GB rpi is as good of a line as any.
I was using zfs with mirrored vdevs.
Switched to ext4. Cause: doubled space.
Now switching back to single vdevs (hdd) with mirrored special devices for metadata (ssd). Cause: slow listings of files of ext4 and too much i/o on metadata.
zfs is set to metadata cache only, so no much RAM needed. Filewalkers are blazing fast. I/O is primary on ssd.
NTFS? Itâs not an appropriate filesystem.
Hiding part of the data inside files is just moving money from a bank account to cash in the mattress â now banks canât see it and canât optimize your cashflow. Those pieces still need to be found and addressed. But now not by the stable filesystem code, but by new contraption storj invented, that would prevent filesystem to optimize data flow.
There is absolutely no reason for storage node to workaround deficiencies of the ill-fitting filesystem.
Few comments here:
Why not? Filesystemâs literary one job is to store files. In fact, storing 30M files on a single filesystem is better than storing 10K containers with 1000 files inside on a secret filesystems â because host filesystem now cannot optimize access to the files in the hidden filesystem: It cannot distinguish data from metadata, among other things. Your cache misses skyrocket, thrashing skyrockets. Everything collapses.
If there were optimization to be made â they would have already been (and are) made in the host filesystem itself.
There are modern filesystem that have no issues with hundreds on millions of files. NTFS is not one of them. Itâs not storjâs job to fix shortcomings of a filesystem that is used on a small minority of nodes. (I donât have numbers, I assume itâs minority, because nobody will willingly be putting up with windows, unless they have to, due to legacy software, like aforementioned MSSQL server (but then VM exists to run that legacy software â so there is no excuse really).
I think I made my position on this topic very clear, and it baffles me that nobody else sees the obvious futility of the proposed approach. I donât really care, but it would be a shame to waste development time on a clear dead-end instead of moving the project forward.
Why do you mirror special device if you have no redundancy to begin with? Use old enterprise SSD with PLP if you donât already.
Exactly.