It is necessary to add a parameter for the size of the minimum accepted pieces

NotPaidForBugReport · October 25, 2021, 7:11pm

It is necessary to add a parameter for the size of the minimum accepted pieces. Since there is now an “attack” with small files, for example, 1280 bytes, I would like to set a limit on the minimum size of a piece for my nodes.

Since storage is paid for at the price “as on a hard disk”, and not “as on an ssd”, it would be good to be able to specify, for example, a minimum of 512Kb per piece. And for small pieces, I am ready to raise the “all flash” node, but with a minimum acceptable price not $1.5/TB, but $10-15TB

max232 · October 25, 2021, 7:38pm

I completely agree with you.
Small chunks have to be stored on an SSD and at a different price.
There should be such a setting.

hashbackup · October 29, 2021, 2:00am

Another option would be to store small files in an SQLite database in the blob directory instead of the filesystem:

https://www.sqlite.org/fasterthanfs.html

Reads are faster because the db is left open (no open & close on each file) and less disk space is used because SQLite packs data tighter than filesystems. Writes are also faster I believe, if compared to doing an fsync after every file is written to the filesystem (the storage node software does that). It says:

“We found that invoking fsync() or FlushFileBuffers() on each file written causes direct-to-disk storage to be about 10 times or more slower than writes to SQLite.”

But they did not actually run that test. Instead, they disabled transactions on SQLite since they were not doing fscync() on the filesystem. The other problem with their test results is that SQLite needs to be in FULL sync mode; they ran their test in NORMAL mode, which may lose data though it will not corrupt the database.

I do like the idea of setting a minimum piece size. That could be used when selecting nodes to enable nodes with lower egress bandwidth to store smaller pieces. Retrieving and serving a piece has several different parts (not an expert on this!):

getting the request
verifying signatures
opening the file
reading the data
sending the data
closing the file

For a small piece, egress bandwidth is less important in relation to all the other necessary steps than it is when serving a large piece.

Pentium100 · October 29, 2021, 3:50am

SQLite also gets corrupted more often than a filesystem, as evidenced by multiple threads here about the node database corruption.

However, file systems usually have a block size that’s at least 4KB - a smaller piece just wastes the space - maye it should be counted as 4KB?

I would like the ability to set minimum AND maximum piece size on my node, so I could have a second “small piece only” node on SSDs or HDDs in mirror instead of raidz2.

hashbackup · October 29, 2021, 5:58pm

There are a lot of ways storage services try to address cost overhead of small files. The two popular ones are to bill for at least xK bytes and per-operation charges. I don’t mind services charging for these when they are reasonable, and they usually are.

Backblaze B2 only uses transaction fees, with anything related to upload being free (class A), API calls related to downloads costing 4 cents per 100K (class B), and other calls like directory lists, ie, database operations, costing 4 cents per 10K (class C). I think it makes more sense to charge the same transaction fee for uploads and downloads. Backblaze offers a free daily allowance of 2500 class B and 2500 class C transactions, so many typical customers probably don’t have to pay.

S3’s pricing is all over the place. The first sentence says “Pay only for what you use”, but that’s only true for S3 Standard. The other 5 storage classes have file size minimums (128K) and storage time minimums (30, 90, or 180 days), and they charge for 8K of metadata storage at standard S3 rates plus 32K of metadata at Glacier rates for either of the Archive storage classes. In many cases it’s cheaper to store data in the “higher cost” Standard S3 storage than the other classes because of all the pricing variations, and it’s often hard to figure out these fees ahead of time. Unlike Backblaze’s free allowances, which are ongoing, AWS’s Free Tier allowance is only for the first year.

Wasabi does not charge for ingress, egress (if reasonable), or transactions (if reasonable). They bill for a minimum file size of 4K, which to me is reasonable since that’s how much disk space a file < 4K actually uses with most filesystems. Their two policies I dislike are the $5.99 monthly minimum and the 90-day minimum storage time (I call it the delete penalty). I posted on Hacker News about how uploading the same 4K file every day lets Wasabi charge 360x more for storage because of the delete penalty. And unlike S3, Wasabi doesn’t have other storage class options to avoid the fees.

Storj does tip its toe into this with their per-segment fee (8.8 cents per 10K segment-months), though this fee is only seen by the company and not the SNOs. For payment to SNOs, Storj could round-up file sizes for every file stored to the next 4K, as @Pentium100 mentioned, and pay for this out of the per-segment fees. Seems reasonable since that’s what is actually happening on the SN end. It would also be motivation for Storj to come up with a reliable way to store small files more efficiently.

Charging a small per-operation fee seems very reasonable to me. A minimum file size of 4K would be reasonable too, but Storj is already sort of doing that with the segment fee. The segment fee is a little more aggressive because even large files contribute toward that, but there is a generous free allowance too.

It ideally needs to be done in the node selection code rather than rejecting the upload once a node is selected, right? And it seems like there are two different criteria: avoiding or wanting small pieces, and avoiding or wanting frequent access.

Alexey · October 29, 2021, 8:30pm

No, they are billed only for downloads (egress from the network/nodes) and storage. There are also fee for number of segments: Requesting and Understanding Usage Limit Increases - Storj Docs

Toyoo · February 22, 2022, 4:28pm

Uh, I have bad memories using SQLite this way some 3 startups ago. It turned out to be very slow for anything larger than few kilobytes. They’d better run the actual tests. I think someone from the team figured that the problem was with the SQL parser, or the driver API, not the storage itself, but it was a pain to work with.

As for storage, for S3-like workloads (no random writes, no incomplete files), I suspect a good (and lenient with fsync! [*]) log-structured filesystem would be the best match. But this requires separation of database storage from blob storage, so some additional configuration for the node operator. I was using btrfs before this separation was possible and the performance was absolutely terrible, because—turns out—sqlite’s performance is terrible on btrfs.

[*] Losing a couple of files in case of a rare power outage is actually a decent trade-off for Storj.

Small pieces are already stored on satellites themselves, IIRC, with no storage node being involved.