This is great. It’s a really well put-together blueprint. We’ve discussed using journaling before but no one has fleshed the idea out with this much detail. I didn’t even know FALLOC_FL_COLLAPSE_RANGE
existed; that certainly makes this approach more interesting, at least for the Linux side.
I expect I (or someone) will be looking at this more in the weeks to come. It might need to compete with a proof-of-concept experiment using BadgerDB.
No, I believe xe meant that in a reasonable situation data loss would be much less than 2% (2% being the level of data loss where node reputation begins to suffer sharp penalties).

I mean, if there are two files being uploaded at the same time, how does the node decide the order to write them to the pack file? You can write multiple files to a filesystem at the same time, no problem, but trying to append two sets of adata to a file at the same time could result in problems.
This is the reason for keeping each uploaded piece in memory until it’s committed, at which point it gets written to the journal. Only a single thread is needed to write to the journal, since the full contents of each input will be immediately available.