I am consolidating some of my disk arrays, and in the process I am thinking of redesigning my storage setup. Therefore, here is my current setup, along with the explaination to my choises
Synchronos 1Gbit WAN
Two identical VMware nodes:
6c/12t,
64 GB RAM
2TB local NVMe
1x 1gbit for VMnetwork
2x 1gbit for vMotion and ISCSI
3 Synology boxes, each with 4 disks in RAID5. Each NAS has 2x 1gbit links to provide ISCSI, and 1x for management and regular file transfers.
1 is running btrfs because it also hosts my own files, and I use the snapshot feature there.
The other two are running ext4, and hosting mostly StorJ files.
14 Nodes, ranging from around 2TB to 10TB. All of them are Windows VMs.
Why Windows? - Because I am comfortable with Windows, and StorJ was intended primarily as a good way of mimicing a “real” user with real data when I got my certs. It has since turned into a great hobby
Each VM has the following:
3 cores
6GB vRAM
C:\ drive on the VMhosts local NVMe. I have not yet moved the StorJ databases to C:, but this is in my pipeline… soonish.You know the feel.
E:\ (StorJ) is made by enough 1TB VMdisks spanned together in Windows.
All LUNs that present VMware datastores for StorJ on the Synology boxes are thick provisioned, and all the VMdisks are thin provisioned in their VMware respective datastores. When a VM reaches below 1TB on its E:\ drive (StorJ data), I have a Jenkins job who automatically adds another 1TB disk to the VM, expands the VMs drivepool, reconfigures the config.yaml file and restarts the node. This is working flawlessly and without my intervention. The thought process behind this approach came about, when I was still at two nodes on a single Synology. By splitting my StorJ data into 1TB chuncks, I could more easily juggle data between the Synology boxes, which was great at the time, because it allowed me to more easily utilize the space I had at the time. I am now worried, that i leaves too much performance on the table.
I’ve recently aquired a 12bay rack Synology. It currently has 6x 20TB disks in a RAID5 and 4x 1TB Sata SSDs in RAID10 for caching. I also got two additional disk shelves, each with 12 slots. When I fill up the main unit, I will create a new volume on the diskshelf, which in time will expand to also have 8x disks and 4x SSDs. With my current growthrate, it will take me a fair amount of time, before I run out of disk slots to expand into.
I am aware the StorJ discourages the use of RAID for the disks, but I house a lot of other data, so for now having RAID arrays present storage for my VMs is my preferred way of operation.
Here comes the question(s):
Since I no longer need toShould I ditch the StorJ VMware datastores entirely, and just make LUNs that the VMs attach to directly, thus eliminating a layer of virtulization? (Synology Boxes will still present ISCSI LUNs to Non-StorJ VMs)
EDIT: I’ve fixed the formatting. This turns out to just be markdown - great!
Please never use a spanned disks for storage node - with one disk failure the whole node is gone.
Since you using VMs you may just expand the existing virtual disk. However, you would need to restart the storagenode service anyway to take a new allocation. You could also thin provision these virtual disks to a maximum size and configure node for the maximum allocation, since it doesn’t use this space right away, it will grow automatically while it has space to grow. No need to have a dangerous spanned drives and Jenkins jobs, but you need to have a promised free space on your Synologys.
You may use iSCSI directly, but then you would need to manage size of disks on your Synologys
And I would ask - why do not run storagenodes directly on your Synologys to simplify things to the bottom and reduce electricity costs on hypervisor?
Please never use a spanned disks for storage node - with one disk failure the whole node is gone.
Under normal circumstances, I agree - but the spanned disks are virtual disks. They will not fail, unless the underlying spinning rust does, and if that happens, I have bigger problems.
You could also thin provision these virtual disks to a maximum size and configure node for the maximum allocation
Thin provisioning would allow me to use VAAI as well, which would severely cut down on some vMotion times, but I am pushing my arrays IOPS wise as is, and I don’t want to leave performance on the table.
why do not run storagenodes directly on your Synologys to simplify things to the bottom and reduce electricity costs on hypervisor?
Good question! It would be easy to point at my Synology and say “it’s too old”. With just 16GB of RAM and a 2 core processor, I don’t think I would be comfortable running 14 nodes on it. It could be expanded to 32GB, sure, but that brings me to my other point: learning.
I specifically chose my Synology + VMware setup to mimic what I am using at my real job, allowing me to have a readily available homelab, with VMs that actually has data and resource usage on them. I could of course ditch the ISCSI shares for simple NFS volumes, and remove the VMhosts in favor for a single Docker host. That would allow me to run a handful of nodes directly on the Synology, and the rest over the network.
Thanks, I just want to have some kind of a FAQ for Operators who have a lot of hardware (and, I suppose, money) to use a virtualization to run storagenode.
And collect them all in some kind of the FAQ or a Quick Start guide for Power Storage Node Operators at Home
I see you saw the other comment I wrote. I can rephrase it and make it into a standalone post if you want? I could add some additional information and pictures.