4 VMWARE on different PC crash at the same time

Luis · May 30, 2024, 6:49pm

I have several nodes on several PCs, each PC with 2 VMWARE.
Sometimes all 4 vmware crashes simultaneously with the blue screen.
I suspect it happens when Downloads are very high.

Some help.

Mitsos · May 30, 2024, 7:01pm

What do you mean 2 vmware? nested?

Luis · May 30, 2024, 7:19pm

On each PC I have two VMs at the same time.
Each VM with a Node. Same setup on another PC.
The 4 VMs crash at the same time.

Mitsos · May 30, 2024, 7:24pm

Do you have any logs or any crash reports we can work with?

If they are all crashing at the same time, that means something is being shared between them. Are you using any network storage shared to them?

Luis · May 30, 2024, 8:17pm

The VMs don’t share netwok storage.
The crash was during a high ingress and at that time had about 150 Mbits on each VM.
Now I’ve limited the ingress on the VMs to 30 Mbits.
I suspect the VMs can’t with high ingress at the same time.

Roxor · May 30, 2024, 8:57pm

VMware is used globally with much more serious workloads than a Storj node: this sounds like something specific to your configuration. My first guess would be the virtual NIC: are you using the same type in every VM? Try switching one VM to an alternate (E1000/E1000E/VMXNET/2/3 etc) and see if it still crashes the same time as the others.

Ruskiem · May 30, 2024, 9:02pm

and i would say 1 CPU core per VM is not enough, at least for current moment. At least 2 cores per VM. My 2 VM also crashed last night, at same moment, but rest was fine. It was the first bluescreen i saw in my Storj VM instances from 4 years. Not sure if there was some storj’s test during that time in night here at 02:00-03:00 UTC+2 time or it was just because some ISP network problems, because there was lost connection on all nodes, and reconnection, idk, will observe if that happen again.

lyoth · May 31, 2024, 12:18am

I experience isp problems as well in 2 different locations around that time. Couldn’t figure out what happen and it fixed itself in the morning

Alexey · May 31, 2024, 2:04am

I suspect a VMWare is to blame. I saw many posts on the forum about crashing VM on VMWare, inability to finish a filewalker, high CPU and RAM usage, etc.
And I believe that you also use Windows as a guest OS. Exactly this mix is unstable.

I think the driver which is used to virtualize the storage is a culprit, especially in combination with a Windows guest.

@Ruskiem do you run Windows VM too?

Ruskiem · May 31, 2024, 2:46am

yea Windows 10 Pro, but only 2 out of 13 nodes crashed that night for me, and all are windows.
Never happened before, i guess it was combination of events, high load from storj and my ISP cut the net, and nodes probbaly went crazy or something.

Ottetal · May 31, 2024, 7:28am

None of my nodes on VMware have crashed.

@Luis, can you confirm you’re running the vmxnet3 driver? It’s the default driver, you should not change it. Can you also confirm that you’re running with VMware tools enabled, and tell us a bit more about your hardware and storage topology? Questions that are relevant for this case:

What is the hardware on your two VMhosts? (The machines hosting the VMs?
What is the virtual hardware assigned to the VMs?
How are you assigning .VMDK harddisks to your VMs?
On what underlaying storage is your datastores built?
How is the network connectivity to your VMhosts?
And finally, what version of windows, ESXi and VMware tools are you running?

I’ve written some about optimization of virtualized nodes below:
FAQ: Best practices for virtualized nodes - Node Operators / FAQ - Storj Community Forum (official)

Luis · May 31, 2024, 11:56am

Thanks for the help everyone.
Since I limited the network adapter to 30Mbs everything is fine with the VMs.
I will gradually make some suggested modifications to isolate the possible problem.

To answer some questions, the Host is Windows 10 also with a Node.
On this Host there are 2 VMs with 1 node each.

Each VM is connected to the adapter (VMnet0)

This is the configuration of a Host.
Thanks

Ottetal · May 31, 2024, 12:26pm

Hello again friend. Great to hear that you got it all working

Looks like you’re using virtual NVMe for the boot volumes and a physical disk in passthrough mode for node storage. This is a good setup.

Depending on how large the nodes are (used space, not allocated) 4GB on a Windows VM could be too low. If Windows decides to upgrade itself (or other processes, for that matter), the RAM could have a high bassline usage. With the increased storage node load due to performance testing, the strain on your HDDs are significantly increased. If HDDs cannot follow with the order queue, RAM will rapidly start to fill up, which could be the culprit of your BSOD.

That rasies the critical question: What harddisk(s) are you running?

Additionally, to combat bassline RAM usage, you could run the debloat script below on your VMs,
The Ultimate Windows Utility (christitus.com), try assigning 5GB vRAM instead of 4 to your VMs. Both would be best.

Kind regards.

Luis · May 31, 2024, 4:21pm

HI

It’s true. low RAM could be the reason.

The first PC, 1º VM has 14TB ( 12TB Used ) and 2º VM has 16TB ( 5TB Used ).

The other PC, 1º VM have 16TB ( 6TB Used) and 2º VM has 16TB (11TB Used ).

The OS on the VMs is 4GB

These are the hdd’s

PC 1

PC 2

Thanks

Alexey · June 1, 2024, 8:36am

Why do not use a Hyper-V, which will not have any issues?

for honestly, I would suggest to go with either @Vadim’s solution (Win GUI Storj Node Toolbox) or a Docker Desktop to do not waste so much resources.

Ottetal · June 1, 2024, 12:16pm

Ahh yeah, that’s a much better idea.

donald.m.motsinger · June 1, 2024, 6:55pm

To host 1 node per VM is such a waste of resources, especially if the guest is Windows. If you really must use virtualization, create 1 VM with a slim Linux server OS, install docker and run all your nodes with docker.