How many failed audits are acceptable?

Hi all,

my node was running without failed audits since April. A week ago I had to take it down for a bit over an hour for maintenance. When I took it back up again, I noticed that I had 4 unrecoverable failed audits within a couple of hours and as of now I have 10.

Would be shutting down the server the reason for the failed audits? And how many failed audits in relation to successful audits are “acceptable”? Here are my stats

Count of unrecoverable failed audits: 10
Count of recoverable failed audits:5
Count of successful audits: 8330

The reason for the failed audits is always “no such file or directory”

2019-07-30T07:06:42.026Z	INFO	piecestore	download failed	{"Piece ID": "ROQ5RTRKKITPULQ7YL4KFT6EUC64F3SBHH6BJCLIU2JP6EXVEWTQ", "SatelliteID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "GET_AUDIT", "error": "rpc error: code = NotFound desc = open config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/ro/q5rtrkkitpulq7yl4kft6euc64f3sbhh6bjcliu2jp6exvewtq: no such file or directory"}

Lets try to find the cause first. Usually this happens when people are still using the -v mounting syntax in their run command and for some reason the storage location isn’t available during the node starting. If that happens the container stores data inside a docker volume. This data is gone if you remove the container. So please refer to the docs for the correct --mount syntax.
https://documentation.storj.io/setup/storage-node#running-the-storage-node

The second question would be why the storage location may not have been available. On Linux, make sure you mount your disk using fstab during boot. The automatic mounts in /media are not reliable. On windows/mac os make sure the HDD’s are shared in the docker settings.

I use the --mount option and mount the hard disk in /etc/fstab under /mnt

docker run -d --restart unless-stopped -p 28967:28967 \
-e WALLET="$WALLET" \
-e EMAIL="$EMAIL" \
-e ADDRESS="$ADDRESS:28967" \
-e BANDWIDTH="$BANDWIDTH" \
-e STORAGE="$STORAGE" \
--mount type=bind,source="$identityDir",destination=/app/identity \
--mount type=bind,source="$storageDir",destination=/app/config \
--name storagenode storjlabs/storagenode:alpha

In the alpha your node would be paused after 40% of failed audits.
In the production - it will be disqualified after first. So, audit rate should be 1.

Please, show the string from the /etc/fstab for your $storageDir.

2 Likes

In the alpha your node would be paused after 40% of failed audits.
In the production - it will be disqualified after first. So, audit rate should be 1.

So nothing to worry about right now with > 8000 successful audits. But in the long term I need to get to the bottom of this. I searched for the missing pieces in the log, but couldn’t find any of them uploaded to me. So they must have been either uploaded before the last reboot or not at all.

Please, show the string from the /etc/fstab for your $storageDir .

UUID=09cf8270-456e-49a3-92d9-9b5e204a253f /mnt/storjsharev3 btrfs defaults,subvol=@storjsharev3 0 2

If the hard disk would not be mounted at boot time I would expect much more failed audits…

Is there a way to get a list of all pieces I should have from the sqlite database? Then I could write a script to look for them in the storage directory and see if there are more missing.

What is your device?
If that is not Synology, then please, change the filesystem ASAP

There is currently (2019-07-07, linux ≤ 5.1.16) a bug that causes a two-disk raid1 profile to forever become read-only the second time it is mounted in a degraded state—for example due to a missing/broken/SATA link reset disk (unix.stackexchange.com, How to replace a disk drive that is physically no more there?). It is probable that this issue is mitigated by using three-disk raid1 profile filesystem rather than a two-disk one. With two copies of all data and metadata spread over three disks the filesystem can lose any one disk and continue to function across reboots unless a second disk dies, because with two surviving devices, two copies of data and metadata can be made. The “filesystem becomes read-only” bug is avoided, because it is only triggered when it becomes impossible to make two copies of data and metadata on two different devices. As an alternative, Adam Borowski has submitted [PATCH] [NOT-FOR-MERGING] btrfs: make “too many missing devices” check non-fatal to linux-btrfs, which addresses this issue, which is also addressed by Qu Wenro’s yet-unmerged Btrfs: Per-chunk degradable check patch. The thread surrounding Borowski’s patch is an excellent introduction to the debate surrounding whether or not btrfs volumes should be run in a degraded state.

https://wiki.debian.org/Btrfs

Another key issue of BTRFS:

Raid5 and Raid6 Profiles
“Do not use BTRFS raid6 mode in production, it has at least 2 known serious bugs that may cause complete loss of the array due to a disk failure. Both of these issues have as of yet unknown trigger conditions, although they do seem to occur more frequently with larger arrays” (Austin S. Hemmelgarn, 2016-06-03, linux-btrfs).

Do not use raid5 mode in production because, “RAID5 with one degraded disk won’t be able to reconstruct data on this degraded disk because reconstructed extent content won’t match checksum. Which kinda makes RAID5 pointless” (Andrei Borzenkov, 2016-06-24, linux-btrfs).

2016-06-26 Update
Once again, please do not use btrfs’ raid5 or raid6 profiles at this point in time! In the thread [BUG] Btrfs scrub sometime recalculate wrong parity in raid5 Chris Murphy found the following while testing the btrfs raid5’s ability to recover from csum errors:

I just did it a 2nd time and both file’s parity are wrong now. So I did it several more times. Sometimes both files’ parity is bad. Sometimes just one file’s parity is bad. Sometimes neither file’s parity is bad. It’s a very bad bug, because it is a form of silent data corruption and it’s induced by Btrfs. And it’s apparently non-deterministically hit (2016-06-26).
In another email in this thread, Duncan suggested “And what’s even clearer is that people /really/ shouldn’t be using raid56 mode for anything but testing with throw-away data, at this point. Anything else is simply irresponsible” (linux-btrfs, 2016-06-26).

Not am I aware of. But if you could write such script - welcome!

What is your device?
If that is not Synology, then please, change the filesystem ASAP

No, it’s a Ubuntu 16.04 server. The disk I use for the storagenode is a single disk with a BTRFS filesystem, so no RAID there.

I have another 3 disks in RAID1 for other files with BTRFS. I know that RAID5 is not stable yet and I don’t use that.

If you have already failed the audit on this one disk with BTRFS, I cannot consider it reliable, sorry.

You make the assumption that BTRFS is to blame for the missing pieces. Everything you quoted about BTRFS doesn’t apply to my configuration. I run the storagenode on a single disk without any RAID.

I can migrate my node to a disk with ext4 in a few days if it’s really necessary. But I like the ability to create a snapshot and run the earnings script on the snapshot. This way I don’t need to stop the storagenode.

But what are my options now? There could be more missing pieces. Does each audit check for one piece? If yes, then it could take several month to find them all. I have > 600000 pieces right now and if I get disqualified with 1 missing piece once storj reaches production state I can only hope for another wipe or wait for “graceful exit” to be implemented and start over.

Once again, I doubt BTRFS is to blame. I run all my PCs with BTRFS with a single disk for the OS and additionally a 3 disk RAID1 in that server for storage and had never missing files. The problems started 1 week ago after the update to 0.15.3 (I think). I never had failed audits before and I started in April.

1 Like

I just saw it in the #hardware channel of the Rocket.Chat. I’m do not trust the filesystem with issues.
If you believe that there is coincidence for failed audits - then fine.

Audits are almost random, satellite will not check every piece. But if your node fail more than 40% of audits, it will be paused on that satellite.

Ok then, I’ll change it. Would you recommend to gracefully exit this node as soon it is available? Or will there be another wipe before production? I don’t wanna lose my escrow…

The graceful exit is not implemented yet.
I don’t aware about the wipe. The goal do not wipe anymore, but it’s not in stone.

Keep in mind that even when implemented, your node has to survive for 15 months to even qualify for graceful exit. I assume that within that time frame the percentage of allowed failed audits will drop significantly.

Keep in mind that even when implemented, your node has to survive for 15 months to even qualify for graceful exit.

Oh damn, didn’t know that. So I can only hope that all missing pieces will be found before production. Yesterday I had 3 more failed audits, today 0…

BTW:
~# btrfs scrub status /mnt/storjsharev3/
scrub status for 09cf8270-456e-49a3-92d9-9b5e204a253f
scrub started at Wed Jul 31 17:45:56 2019 and finished after 11:31:41
total bytes scrubbed: 1.42TiB with 0 errors

Just found this in another thread

we have identified an issue with that which is affecting several users, that have these types of failed audits for pieces that actually never were on their node in the first place. We are working on a fix for this already.

So it seems to be a known problem