There seems to be a lot of confusion when it comes to fsync and the new default.
Let us take a look at what fsync is and why you don’t have to worry about it.
Modern SSDs and operating systems have tons of caches. Read cache, write cache, cache in RAM, cache on SSDs, pseudo-SLC cache you name it. There are even SSDs that use RAM offered by the OS as a write cache.
These caches are all tricks to speed up your system!
Some of them are even volatile. Take your RAM as an example. It is fast as hell, way faster than your SSD, but as soon as your power goes out, all data is gone.
When we talk about fsync, we are talking about writes only.
For writes on your PC, there are two options for how a write can happen, sync or async.
If an application thinks its write is really important (like a DB which always should be ATOMIC) it can ask for writes to be written in sync. That is what fsync is for. Only when the data (and metadata) is completely written to the disk, data is safely saved even in case of a crash, the OS will report that the write is done. To good thing about fsync = enabled is that you can crash and don’t lose data. The negative thing is, it is very slow. A normal consumer 2,5" SSD can easily do async writes at 500MB/s but will sync write at 30MB/s.
For async writes, there are tons of caching mechanisms, and the OS will say “Yeah, data is transferred” even though it is maybe just in RAM at the moment.
There are enterprise SSDs that come with capacitors. The neat thing is, thanks to these capacitors, the SSD can lie to the OS and say “Sure bud, all done” while the data is still in flight. When a crash arrives, the capacitors offer the power for the SSD to write all the cached data down to flash. Currently, enterprise SSDs with capacitors (and Intel Optane) are the only option to get fast sync writes! Everything else will be dead slow!
Now for STORJ data, we know that we can lose 4% of data before we get disqualified.
Let’s assume you have 1TB of clean data already. Now your node has 100Mbit/s ingress (would be nice, wouldn’t it ). Your PC crashed. Your last 5 seconds are gone (technically only true for ZFS, but still a good number). 100mbit is 12,5mb. Multiplied by 5 = 63MB loss.
Your 1TB data = 1048576MB. So now 0.006008148193359% of your data is corrupt.
That is still far away from 4%.
4% would be 41943MB. With a data loss of 63MB you would need 665 crashes to reach that 4%.
Because it comes up a lot, NO DATA loss does not add up! I mean it does, but also does your new and fresh and good data. So you don’t have to worry that your node “adds up errors” over time.
But let’s say you start fresh without that 1TB of clean data. One day has 24h = 86400 seconds. Assuming you lose 5 seconds per crash, 86400 seconds is 17280 of 5 seconds intervals. 4% of these intervals would be 691.2.
So assuming you get a constant ingress, no matter how high that number is, and assuming you lose 5 seconds of in-flight data at every crash,
as long as you stay under 691.2 crashed per day you are fine!
If you are even remotely close to that number, you have other problems than fsync
I would even argue that if you have more than one crash a day, something is extremely wrong with your system!
TLDR: Don’t worry!
Update:
- Brightsilent is right, to make it statistically impossible to get disqualified, you should not go over 2% and not 4%. Don’t go over 345.6 crashes a day
- ZFS even profits more, because now there is less fragmentation caused by STORJ. ZIL will stay on RAM in Tx groups and only be written to the disk(s) once.