Huge ingress improvement HDD vs. SSD?

Ok, but that’s just the write path, and I’ve never really counted on that—you can’t really know which files will be read often at upload time. You can’t do read-through cache? LVMcache does it quite well.

Write path determines destination. Now small pieces end up on hdd and suffer huge latency (with respect of total transfer time) on first read.

At least on my nodes I keep seeing lots of small reads from large files too. And as these are small reads, latency is the dominant factor in winning a race—despite coming from big files.

Right. Subsequent reads are served from ram either way.

Also, I guess the fact that these are still separate write calls per piece doesn’t help?

No, they will likely get coalesced in the transaction group, exceed the small block size and get written to hdd.

Few small writes separated by sync may become separate small blocks. But it will kill write perf.

Well, then you can still hack hashstore so that it syncs/closes and reopens a log file after each piece. This would distribute concurrent writes to different log files in a similar manner to distributing them by their TTL, and should give you piece-sized transactions to different log files.

Probably simpler than maintaining piecestore yourself.

That would be the case where the cure is worse than the disease :). I don’t want to make everything synchronous on the off chance that some data will go to ssd and maybe.

I’m not going to maintain it. Piece store is pretty stable.

Once a month “dear friend, rebase on origin/release-xxx test and build” is good enough. I don’t think the interface is much more than “put piece” and “get piece”. I’ll ask robot friend to assess it too.

Will piecestore be completely purged from database? That’s a pity.

Storj likely doesn’t want to maintain piece store, and on top of that—all the logic that routes queries between piecestore and hashstore. It’s actually quite a lot of code that sits in the hot path.

Also, interfaces do change. I maintain my own set of patches to adjust nodes to my preferences, and while the storage node code is more stable than it used to, I still need to merge some conflicts, probably every two months.

On the other side, if I remember the code correctly, the sync/close/reopen patch would be just a few lines of code. And piecestore is synchronous anyway in the same sense, it already has a sync operation, then closes and reopens between each piece.

I been running hashstore metadata on special dev for half a year or so.
Those files are busy and writes are performance expensive on my rust disks.
Those can be used more efficiently.

Btw - never noticed a change for inbound data speeds from this.