5 nodes on the same HDD vs 5 nodes on a separate disks

Alexey · November 21, 2022, 7:13am

Doesn’t makes any sense. These nodes likely behind the same public IP, so they hold data of a one node and also they share the same storage in this case. It could have sense to run them on own disks, then you would not need to have RAID5 (which is not safe solution with todays disks) and they would work as independent nodes.
But such setup just waste of a one disk for redundancy and resources to run two nodes instead of one.

I really do not understand why do you have many nodes if they share the same IP and storage.

Pentium100 · November 21, 2022, 7:41am

I can see a reason for it. Run 2 nodes on the same array for a while, until the array fills up, then move one of the nodes to another array.

BrightSilence · November 21, 2022, 8:39am

You can find it in storage manager:

BrightSilence · November 21, 2022, 8:45am

The important thing to consider is that Storj does tons of small reads and writes. While RAID5 can speed up things, this is only true for throughput constraints, not IOPS. All IOPS hit all disks with additional RAID overhead. With Storj this likely means lower performance overall. That said I have nodes running on an SHR2 array because I simply had free space there. But it’s also SSD accelerated for both reads and writes, which solves the IOPS issue. (while wearing down the SSDs)

humbfig · November 21, 2022, 1:12pm

No data scrubs (that one takes 2-3 days), no Time Machine, SMART tests run once a month. I do have some SMB shares… I will check Synogear.
I never record access time of files.

That is the problem of big nodes and that is why I started a new node in the same hardware. The node we’re talking about is 6.7TB and my free disk space is ~5TB.
I’m still trying to decide what to do. Buy another disk (it would be my first Storj expense!) and grow the raid5 or install it in the NAS as a single disk and move the node to it.

I know that thread. It’s an old one. I read somewhere the problem was fixed and it can’t happen anymore.

It could, but is it? I have 16GB, could upgrade to 32GB if it got used. Isn’t Syno-Docker limited to 2GB?

humbfig · November 21, 2022, 1:28pm

Life sometimes happen. When I started a node I had a Syno raid5. I could have used a RPi, but I had no trustable disks besides the ones inside the NAS.

Don’t believe everything you read. I’ve recovered some raid5 pools after a disk crash. Never lost one. I have 3 highly reliable disks that were bought ~1 year apart.

It’s not to run Storj. It’s what I have and I run Storj on it…

Because big nodes are hard to manage. I want to move the 1st node into a volume without checksum and I can’t because I don’t have enough space available. I have some spare 4TB disks and spare bays on the NAS. But I can’t move my 6.7TB node into one of this disks. It’s just that, big is hard to manage. I created the 2nd node for the first to stop growing so fast, to gain some time until I figure out how to proceed. The more it grows the bigger the problem.
I could stop the growth, but I’m greedy

humbfig · November 21, 2022, 1:36pm

The thing is I think there’s something very wrong with syno ssd cache algorithms. My previous SSD got killed in 1 year (5 years warranty), and it lasted way longer than the maximum specified write cycles.
My SSDs now are only ~10% in use (DBs and some VM’s). And even then I’m not that far from the maximum write cycles.

BrightSilence · November 21, 2022, 2:05pm

Can’t say I recognize your issue. My SSDs have run for almost 7 years without issues. I even did some Chia plotting on that accelerated array (bad idea).

This thing is chugging along just fine.

humbfig · November 21, 2022, 4:55pm

But not in a syno, right?

arrogantrabbit · November 21, 2022, 6:49pm

Maybe then stop the node, and move data? You have about 4 hours of “free” downtime. You can use something like mergefs to maintain uptime and keep moving data in 4-hour chunks monthly while maintaining the uptime. Or a combination of the two…

At this point might as well just add a disk dedicated to storj, there is no benefit in having redundancy.

They “fixed” it by releasing their own branded nvme sticks, with artificially capped throughput and heavily under provisioned to avoid the issues described. Hardly a solution…

Depending on the chipset in your NAS and version it may support more than specified, as evidenced by other vendors releasing products on the same chipset advertising higher memory capacity support, and users reports. Memory controller is part of the CPU, so it tracks the chipset.

Why would it be? And even if it was, it’s irrelevant – the filesystem cache is managed by the host, and it will take up all unused ram. This is the reason for having more ram on a storage system – to get much better responsiveness by means of offloading bulk of repetitive IO from the disk system and not to use it all up with applications. (Synology went the other way recently by soldering half of ram in entry level devices, but this is a separate discussion)

SSDs are “spent” when they are rewritten: it’s possible for a specific combination of SSD size and workload to result in a “stable” configuration where bulk of the data has been cached and is updated infrequently, essentially serving as a static readonly cache. For example, if your workload results in random access to 40GB of data, having 41GB of cache will end up written once and read multiple times, not wearing out SSD at all. On the other hand, if you had 20GB of cache - that would be getting rewritten all the time and wear out very quickly.

What will happen with storj – is unpredictable, and customer driven, and hence, caching is likely pointless (unless your cache is equal to array size, which is a bit extreme). Instead, it may be more productive to size the cache for all other tasks, with the goal of offloading all other IO from the array, so it is available to serve storj request, along with some in-ram cache for the filesystem metadata.

This is also why having separate single disk for storj will likely yield better overall outcome than trying to mitigate abuse of the use the existing array by caching…

humbfig · November 21, 2022, 7:35pm

Move where? Right now I have no space left to move it. If I go the route of buying a new disk, I guess I’ll rsync it. But since I still have 5TB free on the NAS, I might as well wait for it to fill up and only then buy a new disk.

I will probably add a disk dedicated to Storj. But I can’t understand how there is no benefit in having redundancy…

Well, I can install 32GB with synology blessing. I can also install 64GB, according to many users…
Synology did worse than that, they soldered 1/4 of the ram. You can only upgrade to 3/4 of the maximum ram supported by the CPU (DS720+ comes to mind).

This is why I’m only caching the Storj DBs. The amount of cache is more than enough (never gets full) and yet, with an uptime of 19 days I have this result:

It writes half a Tera and reads 50GB…
I also have other stuff I’d like to cache, but right now, it’s all running in the same volume as Storj storage, so, I can’t…

arrogantrabbit · November 21, 2022, 8:14pm

Move to another share (with no checksumming and access time updates disabled) within the same pool. No extra space needed.

Storj network is providing redundancy by splitting data into multiple pieces and spreading them over multiple nodes with erasure coding. If few bytes on your node rot it’s not a big deal, it’s expected, and is part of design. The only reason to run storj node on a redundant array is that you already have an array for another purpose and want to share the extra space with storj. Otherwise, single, non-redundant disks is all you need. In other words, running storj node on a redundant array is akin wearing belt and suspenders

As a reference, I myself also host a node on a redundant RAIDZ1 array of 4 drives, and to accelerate sync writes I’m using 16GB Optane as SLOG. My node is relatively young, about (edit) ~~1TB~~ 2TB of data stored so far, but the disk usage is under 5% at all times, and apparently SLOG is helping, by taking over some of the synchronous writes (which I assume are Db writes). There is also ARC in ram of about 24GB in size, with 99% hit ratio, for what I assume is metadata lookups.

Pentium100 · November 21, 2022, 10:18pm

There is a whole debate on whether you should run a node on a single disk or an array with redundancy. I am not planning on restarting it now, just wanted to say that this is not as clear-cut as it looks. Both options have their pros and cons.

arrogantrabbit · November 21, 2022, 10:33pm

Yes, I’ve read that discussion. It is focused on SNO profit, and not technical limitations.

From the network perspective, redundancy within a node is not required, is managed on the network level, and suggested against by storj; single disk per node is the recommended configuration; but if you have redundancy — no harm there, except maybe lower performance.

If you focus on extracting max profit/minimum loss from running a node — then yes, it’s debatable.

Personally, I share extra space on my array, if it fills up, I’ll just buy a cheapest HDD I could find and add another node. But I’m not doing it for profit.

humbfig · November 21, 2022, 10:34pm

I could do that. Access time is a volume thing, it’s already disabled. The only thing I can gain is the checksumming (constant payload, not what wakes up everyday at midnight). What I really wanted was another volume.

I’m not thinking about Storj network providing redundancy. I know the network will survive without me. The question is, will I survive without me?
Wearing belt and suspenders, that’s the definition of redundancy. If the belt fails, you still have the suspenders.

How do you measure your disk usage? If it is “%util” from iostat, mine are at 40%.

PS-> Been reading about Optane memory. Sounds like Intel bullshit. It’s a nvme disk with cache algorithms built in. That wouldn’t be useful in a synology. The algorithms are outside.

arrogantrabbit · November 21, 2022, 10:41pm

It was a poor analogy. Sorry. Data is split to 80 pieces and any 29 are sufficient to recover it (I may be misremembering exact numbers). Your extra 1 vs 80 makes no difference.

Subvolume, but yes.

I don’t understand what you mean by this.
Your data should be stored on a redundant array with bit rot recovery support, but storj is fine if you store it on rotting media

It’s what TrueNAS reports. I’m not sure what they are using, but I’d guess it’s the derivative of a queue length

Alexey · November 22, 2022, 8:25am

RAID5 in Synology uses additional techniques (checksums and autohealing), so it could survive even a bitrot. This article is for pure RAID5 setups.