Easy way to add more HDD

X64win · November 24, 2022, 1:35am

Is there one easy way how to fast and safe to add more HDD in my node?

arrogantrabbit · November 24, 2022, 2:07am

Start a new node on a new hdd.

X64win · November 24, 2022, 2:14am

Or on new device? I try start node on the same device with new HDD. There was an error because there is registered first node. In docs is not understandable for me instruction how to do it. So I ask again, how to start new node on one PC?

X64win · November 24, 2022, 2:17am

And how about array with hardware RAID? Should I get disk array from server to install there large node? What do you think?

arrogantrabbit · November 24, 2022, 2:21am

Exact the same way you started the first node, but point it to different folders for identity, config, and storage. Different from the first one. If you are using docker containers — you will have two instances of the container running, with different mounts.

Edit.

And different port.

arrogantrabbit · November 24, 2022, 2:22am

Search the forum, there is a huge discussion thread regarding this.

Official recommendation from storj — no raid, one hdd per node, one node per hdd.

My personal opinion- if you already have array for other purposes — no harm in using excess space on it for storj. Else — separate drives. I would also not use r hardware raid controllers, instead put them to IT mode and use modern filesystem like ZFS. But this is complete offtopic… for the purposes of this discussion — I would stick to 1:1 node:hdd.

X64win · November 24, 2022, 3:21am

So need I second machine for second node? Or RAID be better?

arrogantrabbit · November 24, 2022, 3:27am

You can setup second node on the same machine on a separate drive.

X64win · November 24, 2022, 3:35am

Thanks for idea! I will try to do it.

Alexey · November 24, 2022, 5:51am

arrogantrabbit · November 24, 2022, 6:09am

This paragraph however

Even using RAID5/6 with today’s disks is too risky due to bit failure rate on disks with high capacity: High risk to lose a RAID5 volume during rebuild.

Is fearmongering, based on faulty assumptions, and is factually and verifiably incorrect. If that “almost guaranteed rebuild failure” was the reality we would be seeing every other periodic scrub result in new bad sectors being discovered (those that would have otherwise cause the rebuild failure, had this not been a scrub). This is far from what actually happens.

Of course, if the array is setup, ran for 5 years and then scrubbed — of course there are expected to be bad sectors and when one disk removed — unrecoverable.

Nobody operates arrays this way. Data is scrubbed periodically, monthly, or even weekly. Bad sectors developing concurrently on multiple disks in the span of a month is a systemic failure, and otherwise the scrub would succeed, and therefore data following the scrub would be confirmed to be 100% healthy on all remaining disks, facilitating and ensuring successful rebuild.

Furthermore, more advanced file systems such as raidz1 (which is a lot like raid5 in many ways) provides for ways to replace the disk while maintaining access to the data on the disk being replaced, completely eliminating even this ridiculously unlikely hypothetical scenario that bad sectors appear a few seconds after successful scrub.

That article is being quoted and regurgitated all over the internet, and it’s a complete BS.

Alexey · November 24, 2022, 6:51am

Please do not assume that everyone uses zfs to build RAID. Most of hardware and software RAID solutions do not use checksums and autohealing. So when you replace the failed disk, you need to rebuild an array. With todays disks sizes the probability to have a bitrot or unreadably sector is very high, and the traditional RAID will fail to rebuild an array in this case.
I have had an experience with exactly that cases, it was a normal RAID5 on hardware RAID controller of well known vendor those days.

Just do not confuse raidz1 with RAID5 and raidz2 with RAID6, they are different even if they uses the same topology.

arrogantrabbit · November 24, 2022, 7:32am

I referred to zfs just as an example, that detour can be ignored.

Lack of bit-rot detection and correction by conventional raid does not depend on the raid level. Neither raid5 nor raid6 can correct rot simply due to the lack of checksumming, but this is beyond the point and does not affect the deduction logic here.

The claim made there is that merely due to vast amount of data stored on a modern high density single disk probability of URE or rot is so high, that rebuild is more likely to fail than succeed.

This claim is disproven by the factuallu observed very low frequency of scrub failures in realistic installations. And scrub on raid is exactly the same process that would happen during rebuild.

But one might argue that well, how do you know the scrubs don’t fail — raid5 does not support checksums; it must be just murdering data.

To that I say that some vendors, notably Synology, deploy checksumming solution (btrfs) on top of conventional mdadm managed raif5 or 6 raid. This hybrid solution would exhibit all the alleged horrors of raid5 insufficiencies that would have caused rebuild failure, but would be detectable, reportable, and correctable by this checksumming layer. So we would expect a lot of corrections logged after every scrub.

In my experience managing few relatively large Synology arrays this was not the case. According to the article, I should be expecting half of the scrubs to correct something. This was not even close to the observed reality.

In other words, the outcomes those article describe are possible under the worst conditions, but highly improbable, and unrealistic to achieve for the vast majority of users using the arrays according to the best practices.

I would not quote those questionable claims in the official documentation.

Again, I’m not saying this does not happen; I’m saying the probability of this is horribly exaggerated in the linked and derivative articles circulating on the internet with titles like “is raid5 dead” etc.

Alexey · November 24, 2022, 7:56am

You again uses the filesystem with checksums and autohealing. And noticeable, that Synology doesn’t call their analogues as RAID5/RAID6. Because they are improved versions of these topologies.
So, I proved this claim of not reliable RAID5 on my past work 6 times without intention to do so. It was resulted to decision to replace RAID5 to RAID10 in all 13 branches of that company.
You may also take a look on

And continue to believe to reliable RAID5/RAID6 (not zfs/btrfs).

arrogantrabbit · November 24, 2022, 8:15am

I’m not getting my point across… I’ll try to rephrase.

If the claim was correct, we would expect to see evidence of frequent corrections on at least some machines. Drive reliability does not depend on the filesystem it’s used with. If modern high density drives develop rot within weeks, we shall see evidence in logs of a higher checksumming layer fixing those rotten sectors. (Underlying raid5 fails → improved btrfs on top corrects and reports). And yet, I did not.

There will always be people for whom things failed. Even raidz3. Existence of such reports, including your own anecdotal experience, proves nothing. As does not mine. In fact, it proved it’s not as bad as the article claims. 6 times? Article says 50% failure rate; im sure you did more than 12 scrubs over the years.

It’s really simple. If the claim is 50% chance of successful rebuild, if I run scrub right now 10 times on Synology I should see non-zero count of bad blocks corrected in half of those runs. Those would be rebuild failures if that was an actual rebuild, and not scrub.

Do you agree so far?

And yet, in my 4 years with Synology I saw one monthly scrub finding and correcting bad blocks on a (dying) drive. One.

That is sufficient amount of lack of failures to question the maths in those write ups.

Alexey · November 24, 2022, 8:42am

I would not change my opinion regarding RAID5/RAID6. They are not so reliable as all thinks and they are not a silver bullet. And they obviously more expensive solutions.
I do not want to repeat the whole RAID vs No RAID choice thread, you may read it yourself.

We all makes decisions ourselves, so calculate, think and act as you believe is a right way for you. The warning regarding RAID will remain for others.

arrogantrabbit · November 24, 2022, 8:53am

Nobody argues that. They are not a panacea, they are risk management solutions with well defined applicability. But they are not as unreliable as the “articles” claim either.

Either way, for the purposes of hosting nodes, raid should not be used as one or two disk fault tolerance is negligible compared to redundancy offered by the network (80/29 reed/Solomon’s). So the whole point is moot.

I’ve read it. It’s focused on SNO profit, not merits of raid reliability for hosting actual data.

No problem. I feel however opinions should be formed on facts and not exaggerations and clickbait. All I wanted to express is that I find it weird to see those articles linked in official documentation, that’s all. It’s my personal opinion, based on prior research, and similar conversations in the past.

Toyoo · November 24, 2022, 10:18am

I agree with your post, but thought some clarification might be useful here. There was a lot of engineering done since original RAID levels were invented, and so it becomes quite confusing as to what kind of actual solution people have in mind when using these names. This makes discussions ineffective.

Original RAID schemes assume that a drive can reliably tell that it lost data. And so, for example, a total drive failure is a pretty reliable indicator of lost data. So is a drive-level checksum error, if the drive uses error-correcting codes (ECC, not all drives do!). In both cases RAID works just fine: data is re-read from other drives and potentially corrected. What we usually call bit rot is a situation when there was a change not caught by the drive and silently passed to the software layer… which happens, even if very rarely.

With this definition of bit rot just raw btrfs on top of mdraid RAID5 would not be enough. Consider what happens when there is bit rot. Pure RAID5 could technically notice that data for a stripe does not checksum correctly, but would not be able to figure out which drive is the wrong one. So it guesses. If the guess is incorrect, btrfs will return a checksum error. But the kernel does not have any facility to then go back to the lower layer and retry the guess, so that btrfs can verify the checksum again.

Could, but does not… mdraid does not actually verify parity on each read, that would make random reads slower. It does so only if the drive actually reports an error. And, even when you force a scrub, it is always parity that is assumed to be wrong (see man md, section Scrubbing and mismatches).

It is suspected that Synology may have actually changed their kernel to introduce this feedback loop between btrfs and mdraid, at the cost of bigger complexity. Alternatively, they might have written their own implementation of dm-integrity, which also helps.

RAID6 would technically be slightly better in this regard, because any N-1 set of drives which passes the parity check is fine, so the verification can be done purely within RAID code. But then you can only defend against one malfunctioning drive, not two, which is not what people usually hope for with RAID6. And, as stated above, mdraid does not consult parity if the drive does not explicitly report an error.

And so Synology is probably measurably safer than just mdraid+btrfs on a random Linux box. Naming both approaches just RAID5/6 is misleading.

arrogantrabbit · November 24, 2022, 7:53pm

Yes, this was introduced in DSM 6.1. Per the white paper:

Silent data corruption detection and recovery (file self-healing) - In DSM 6.1 and above, Btrfs file system has the ability to not only detect the silent corruption on the user data but also to recover it. If checksum mismatch is detected, the file system layer instructs MD layer to read redundant copy (parity or mirror) of the damaged block. It will then automatically recover the damaged block with the good redundant copy. This file self-healing feature requires the RAID volumes to run RAID 1, 5, 6, 10, F1, or SHR.

My understanding is it’s done via the lvm extension, but I did not look deep into the implementation.

This solution logs when data is corrected essentially creating a paper trail on whether any bits rotted between the last and previous scrubs.

Toyoo · November 24, 2022, 8:11pm

Oh, great source, I didn’t know they published the details. Thank you!