FAQ: Best practices for virtualized nodes

Hello all. The forum get’s a lot of questions regarding how to virtualize nodes, that I wish to shed light on.

I’ve written some about the topic before, and am now creating this topic to further push visibility.


This guide is written for a VMware environment, but many of the concepts are the same even if you’re on XCP-NG, Xen, Virtualbox, Ovirt, KVM, HyperV or Proxmox.

The areas to consider can be broken into three major areas, all connected by networking, which is so tied into all three that I won’t consider it it’s own area

  • Storage Array
  • Hardware
  • Virtulization settings
Just to be clear, here are the concepts I work with
  • VMhost = The physical hardware that you’ve installed your hypervisor on. The physical hardware is not the hypervisor, and the hypervisor is just the software implementation on top of your hardware
  • VM = The virtualized OS that lives on the VMhost
  • Node = A single instance of the StorJ Software.
  • Local = Resources directly connected (and thus not networked) to the physical hardware I’m talking about that time

And here are the words I tend to stay away from

  • Server - Because it’s and unambiguous word. Is it the VM? Is it the physical hardware? Is it a VMhost? No one knows, and for all the things a server could describe, there is more accurate wording

Let’s get into it.


Storage Array:

  • If your HHDs are local to your VMhosts, just use single datastore for a single disk
  • Synology running single disk volumes with single disk storage pools on them. Always formatted ext4, allways running ISCSI (not NFS!) on dedicated NICs, and always running thick provision on the LUNs
    • I am debating switching to ZFS on a homebrew machine so I can use the special metadata device, but what I have now works and I don’t want to spend any additional time or money right now.
  • I am using the before mentioned standalone 8x 20TB disks (MG10ACA20TE), and a RAID10 array of 2TB SSDs as read/write cache, allocating 400GB to each disk. This is massively overkill, but the performance is great. Here is a screen shot of three of the disks:
  • And here is a screenshot of one of the LUNs performance. It could be better, but it’s in the upper end of what I expect from cached HDDs:

Hardware:

  • Identical VMhosts of: Intel 12400, 128GB RAM, 2TB local mirrored NVMe, 4x 1Gb NICs
  • … But none of that really matters. As long as you hypervisor is not swapping to disk due to memory exhaustion and as long as you’re alright with CPU utilization, you can use whatever you want. I’d recommend still having local NVMe for the OS drives and still using dedicated NICs for SAN uplinks
    • Sidenote: I use 1Gb NICs, because I have them and they work fine for me. That being said; older 10Gbit PCIe cards are dirt cheap, and will give you a considerable network speedboost.

VMware:

  • Here is where it get’s more interesting. I always assign 1GB of RAM pr TB of used StorJ space for StorJ workloads, and a minimum of 2GB, because I run Windows on my guests. I give them 2 vCPUs, because I like them the ability to burst up when doing updates.
  • All my VMs are regular win10 VMs, but stripped of most of their unused stuff thanks to Chris Titus’ scripts: Debloat Windows in 2022. This brings down idle memory consumption significantly
    • I do this on all my windows machines. It’s a wonderful script.
  • All VMs have their C:\ Drive on local NVMe. All VM’s have their second disk as a 10GB drive, also on VMhost local NVMe which is only used for the StorJ Databases and the StorJ directory is built on multiple spanned 1TB disks:
  • I use spanned disks in wondows, and not striped, because the limiting factor of the performance is going to be my underlying hardware and not Window’s implementation of striping. Yes, there is a higher chance of datacorruption when running it this way - but the disk will should only corrupt if your SAN goes out. If this method is good enough for work, it’s good enough for me.
  • Remember to enable round-robin on your SAN uplinks, and try to have enough NICs so your SAN uplinks are not also you VM network.
  • When running larger nodes, it is adviced to add additional vSCSI controllers to the VM, and balance dthe disks out on those since each controller get’s it’s own IO queue. It is also advised to put each VMdisk on a single datastore, since each datastore also gets it’s own IO queue. This would be good advice when running massive nodes (30TB+), but I don’t, and even if I did, my underlaying storage of single HDDs would be the bottleneck - not the storage setup.
  • I run more nodes than I have disks, so I break the “One Node, One Disk” rule, but while one disk can hold multiple nodes I do make sure that a node only lives on a single disk.
  • I have spare local NVMe. I’ve thought about using it as local host cache for SWAP, but like I said, if you’re swapping, you have other issues.
  • All disks must not be thin provisioned. The write penalty for requesting new blocks when expanding the disks is simply too big. If your disk is full and already thin provisioned, there is no need to inflate it (because all the bytes in it’s allocated size is already written), but I would highly suggest inflating all other thin disks. My old nodes are running thick lazy formatted disks, but I create all newer disks as thick eager. I can’t feel the performance difference yet, but when creating new disks I can offload all the future IO cost of zeroing now, and that’s wonderful.
    • If you want to inflate a thin disk, you can either svMotion it to a different disk and chose thick provisioned as your preferred storage type or navigate to the .vmdk file in the datastore file-browser and press “inflate” there.
  • Don’t use VMware snapshots. Snapshots are not backup, they are terrible for performance and grow crazy fast in this use case. If you have to (perhaps you’re worried for an update?), please stop your node process first. Delete the snapshot before starting the node again.

Aaaaaand I think that’s it really. It’s a LOT more involved than running a single machine with docker and assign each disk to each image. If you have to use virtualization, follow the advice above and consider running a Linux distro with docker instead of windows.

I only do it this way, because it conveniently allows me to test some larger scripts and orchestration I use at work in my spare time.

I’ll update this post, if I find anything else useful

1 Like

Hi Ottetal, great value, many hints i didn’t know…
turned out i don’t even have option to choose controller in my VM version…
And what about PVSCSI controller?

Because some LLM3 70b on groq said me its better:
“If you’re looking to squeeze out more performance from your VM, switching to the BusLogic controller (buslogic) might be a good option. However, keep in mind that this change might affect compatibility with your VM’s operating system or disk configuration, so be sure to test carefully before making any changes.”

  • will pvscsi work in 12.5.9 version of this Workastation?

The PVSCSI controller is a paravirtualized SCSI controller that’s optimized for virtualized environments. However, it’s not supported in VMware Workstation 12.5.9.

The PVSCSI controller was introduced in VMware Workstation 15 and later versions. In Workstation 12.5.9, you won’t be able to use the PVSCSI controller.

If you want to take advantage of the PVSCSI controller, you’ll need to upgrade to a newer version of VMware Workstation, such as version 15 or later.

In your current version, you can still try modifying the SCSI controller type to buslogic or lsilogic, as I mentioned earlier. These controllers are supported in Workstation 12.5.9 and might provide some performance improvements."

  • i opened the .vmx file as You adviced, and i se there is a line with: scsi0.virtualDev = “lsisas1068” hmm its not lsilogic, wonder what i can change it for to get better performance?

Answer:
"You’ve opened the .vmx file and found the line with scsi0.virtualDev = "lsisas1068".

The lsisas1068 value specifies the SCSI controller type, which is an LSI SAS 1068 controller. This is a popular and widely-supported SCSI controller type.

To explore other options, you can try modifying the scsi0.virtualDev value to one of the following:

  • buslogic: This will change the SCSI controller to a BusLogic controller, which is another popular type.
  • lsilogic: This will change the SCSI controller to an LSI Logic controller, which is similar to the current lsisas1068 controller.
  • pvscsi: This will change the SCSI controller to a Paravirtual SCSI (PVSCSI) controller, which is a high-performance controller optimized for virtualized environments.

Keep in mind that changing the SCSI controller type might affect compatibility with your VM’s operating system or disk configuration. Make sure to test the changes carefully and revert if you encounter any issues.

Before making any changes, create a backup of the original .vmx file, and then modify the scsi0.virtualDev line to the desired value. Save the changes and restart the VM instance to test the new configuration.

Remember to monitor the VM’s performance and adjust the settings as needed to achieve the best results.

  • its 12.5.9 version, seems it has no additional settings about disk controelr in settings hmmm

You’re running VMware Workstation 12.5.9, and it seems that the disk controller settings are not exposed in the UI.

In that case, you can try editing the VM’s configuration file (.vmx) to modify the disk controller settings. This will require some manual editing, but it can give you more control over the disk controller configuration.

Here’s an example of how you can modify the .vmx file to change the disk controller type:

  1. Shut down the VM instance.
  2. Locate the .vmx file associated with the VM instance. This file is usually found in the same directory as the VM’s virtual disk files.
  3. Open the .vmx file in a text editor (e.g., Notepad++).
  4. Add the following lines to the file, depending on the disk controller type you want to use:
  • For a SCSI controller: scsi0.present = "TRUE" and scsi0.virtualDev = "lsilogic" (or buslogic for a BusLogic SCSI controller)
  • For a SATA controller: sata0.present = "TRUE" and sata0.controllerKey = "sata" (or ahci for an AHCI SATA controller)
  1. Save the changes to the .vmx file.
  2. Restart the VM instance.

Keep in mind that editing the .vmx file can be risky, and incorrect changes can cause issues with the VM. Make sure to create a backup of the original file before making any changes.

If you’re not comfortable editing the .vmx file, you can also try upgrading to a newer version of VMware Workstation, which might provide more disk controller settings in the UI.

  • im interested with number 4 You mentioned, i added whole Disks to the VM, and there was only SATA or SCSI controller i guess

You’ve added whole disks to the VM, and you’re limited to SATA or SCSI controllers. In that case, let me explain the difference between these two controllers and provide some additional information to help you optimize disk performance.

SATA Controller:

  • Emulates a SATA disk controller, which is a common type of disk controller found in modern computers.
  • Supports up to 6 devices (disks) per controller.
  • Generally, SATA controllers are suitable for most use cases, and they provide a good balance between performance and compatibility.

SCSI Controller:

  • Emulates a SCSI (Small Computer System Interface) disk controller, which is a more advanced and feature-rich disk controller type.
  • Supports up to 15 devices (disks) per controller.
  • SCSI controllers are generally faster and more efficient than SATA controllers, especially in virtualized environments.
  • However, SCSI controllers might require additional configuration and may not be compatible with all operating systems or disk types.
1 Like

So many quirks and also hybrids (like a Linux host for the hypervisor and the Windows Guest, spanned disks instead of expanding the existing virtual disk, also Windows to run a node).
If you forced to use VMWare on a Linux host it’s highly advisable to run Linux as a guest too, and use docker. The docker setup doesn’t have limits how many containers you may run.
For honestly I would skip using a VM as a unit, and run docker containers directly on your host avoiding all performance and resource usage impacts.
With the selected approach you likely always will have failed readable/writeable checks with default options, you also likely will have a problem with all filewalkers, high CPU and RAM usage, also corruptions and database locks (unless you move databases to a separate SSD, I saw the disk for databases on your screenshot, but it’s not obvious what’s under the hood).

So it looks very expensive and overcomplicated (thus - unreliable and likely slow) in general.

1 Like

i’m doing various tests inside VM and outside, same computer, VM, same 16TB HDD sATA HC550.
And this is just insane Alex.

1st. In Random Access Read test in HDTune Pro 5.75
I’m having same times and speeds inside VM and outside (no downside for VM!)
(for small files like 4KB it’s like ~0.330KB/s and ~81 IOPS and ~11ms avg. access time)

2nd. Inside VM.
With node turned off.
Counting files under Windows 10,
around ~1Mil files with ~3TB, but data written close to each other, HD Tune Pro shows
I/O read at ~1000/s, spikes to 1500/s (counts that 3TB really fast inside VM!)
But when i go to some Storj blobs folder it gets not more than ~190/s
Then i go to next folder it gets not more than ~85/s!

Whats the matter?
Random access ladies and gentlemen + SMALL files
The difference between one satellite folder and another is:
one has folders with mostly 2266KB files,
the other has mostly <200KB files, like 110KB, 7KB, 4KB, 2KB, and even 1KB files!

i know its probably nothing new,
but that small files takes ~2,2 times MORE time to finish used-space-filewalkers.

And this 16TB HDD is not 4Kn, but 512e!
If only the files from one satellite could be somehow written next to each other…
otherwise we have 1-4KB files from some .us1 or .eu1 scattered all over the 16TB disk, thats quite some madness, if You ask me.
Therefore, i postulate to stop denigrating Virtual Machines or they will UPRISE one day! :smile:

Hmmmm, putting different files from different satellites on different volumes could be a fun endeavor to look at