Option to store pieces with erasure coding

andrew2.hart · May 24, 2022, 5:32pm

I sometimes got corrupted pieces when using old 1TB disks, due to media errors.
I thought of an option to saves pieces with built in erasure coding…I assume this would be fairly simple for storj devs?
I guess this goes hand in hand with my optional scrub idea

Edit: Apparently some people call this “local parity erasure coding”

Toyoo · May 24, 2022, 5:56pm

If your block device corrupts data, you have far bigger problems than just trying to make file contents more recoverable. If your file system’s metadata gets corrupted, then you might be easily losing everything on that drive.

Besides, erasure coding makes sense only if the erasure stripes are stored in blocks whose failures are independent (or as close to as possible). So you want either a proper parity RAID (for which the storage node has no business interfacing with) so that stripes are on distinct devices, or at least spread the stripes across the drive (which can also be done by RAIDing partition devices). No need to involve storage node code.

andrew2.hart · May 24, 2022, 7:45pm

With raid you would have trouble with a 100/2 coding or are you recommending a 102 disk raid?

BrightSilence · May 24, 2022, 9:11pm

Well, the idea is nice, but I don’t think there is any disk that is theoretically reliable enough to survive, yet causes enough bitrot for your node to be disqualified. Either that HDD is going to be completely gone soon anyway or your node will survive just fine if you just accept that your audit score will drop a little from time to time.

This is how you make devs your enemy

Edit: Actually, let me add some more information for this case specifically. All the erasure coding and decoding takes place either on the client side or satellite side in case of repair. The storagenode side is intentionally left as simple as possible so it can run on super low powered hardware. It doesn’t ever do erasure coding, so this would be entirely new to the storagenode component. Furthermore, it would have a significant impact on hardware requirements (mostly CPU) and adding it as an option would significantly increase the different types of regression testing required and different kind of hardware to optimize for depending on settings.
I also think that this may be something you could solve on the file system level independent from Storj.
I’m not on the Storj dev team, so I won’t speak for them. But I’m betting you would get a similar response from them. Simple is never as simple as you think.

Toyoo · May 24, 2022, 11:30pm

Not sure if mdraid is capable of that many stripes. But then it would be quite impractical anyway, needing to read 100 blocks whenever a customer requests even the smallest file.

6/5 would probably work, though, even probably be fast enough for Storj purposes with some tuning.

Pentium100 · May 25, 2022, 3:59am

If you want to use old drives that give you media errors, then you should use zfs and raidz3 or mirror from at least 3 drives.

Implementing that into the node software would only end up as a inferior version to zfs, but make the software more complicated.

andrew2.hart · May 25, 2022, 5:48pm

Actually I have found something called dm-verity that seems to do what I want. It talks about fec with rs(m,n), which sounds like it. If only I could work out what it all means

Toyoo · May 25, 2022, 7:09pm

Not really. dm-verity is read-only, ie. after preparing the disk image where dm-verity is used, you can’t modify its contents anymore.

andrew2.hart · May 26, 2022, 4:51pm

Doh! There’s also something called par2 maybe same issue.
Oh well GE it is.

BrightSilence · May 29, 2022, 8:41pm

I realize that it may be too late now, but I came across this and wanted to share.

https://www.thanassis.space/rsbep.html

I didn’t test this and it will require quite a bit of effort to set up. But I like the concept and didn’t find a more fleshed out option for local on disk parity file systems as most erasure coding file systems are built for large network storage. Perhaps worth a look.