4 VMWARE on different PC crash at the same time

I have several nodes on several PCs, each PC with 2 VMWARE.
Sometimes all 4 vmware crashes simultaneously with the blue screen.
I suspect it happens when Downloads are very high.

Some help.

What do you mean 2 vmware? nested?

On each PC I have two VMs at the same time.
Each VM with a Node. Same setup on another PC.
The 4 VMs crash at the same time.

Do you have any logs or any crash reports we can work with?

If they are all crashing at the same time, that means something is being shared between them. Are you using any network storage shared to them?

The VMs don’t share netwok storage.
The crash was during a high ingress and at that time had about 150 Mbits on each VM.
Now I’ve limited the ingress on the VMs to 30 Mbits.
I suspect the VMs can’t with high ingress at the same time.

VMware is used globally with much more serious workloads than a Storj node: this sounds like something specific to your configuration. My first guess would be the virtual NIC: are you using the same type in every VM? Try switching one VM to an alternate (E1000/E1000E/VMXNET/2/3 etc) and see if it still crashes the same time as the others.

and i would say 1 CPU core per VM is not enough, at least for current moment. At least 2 cores per VM. My 2 VM also crashed last night, at same moment, but rest was fine. It was the first bluescreen i saw in my Storj VM instances from 4 years. Not sure if there was some storj’s test during that time in night here at 02:00-03:00 UTC+2 time or it was just because some ISP network problems, because there was lost connection on all nodes, and reconnection, idk, will observe if that happen again.

I experience isp problems as well in 2 different locations around that time. Couldn’t figure out what happen and it fixed itself in the morning

I suspect a VMWare is to blame. I saw many posts on the forum about crashing VM on VMWare, inability to finish a filewalker, high CPU and RAM usage, etc.
And I believe that you also use Windows as a guest OS. Exactly this mix is unstable.

I think the driver which is used to virtualize the storage is a culprit, especially in combination with a Windows guest.

@Ruskiem do you run Windows VM too?

yea Windows 10 Pro, but only 2 out of 13 nodes crashed that night for me, and all are windows.
Never happened before, i guess it was combination of events, high load from storj and my ISP cut the net, and nodes probbaly went crazy or something.

None of my nodes on VMware have crashed.

@Luis, can you confirm you’re running the vmxnet3 driver? It’s the default driver, you should not change it. Can you also confirm that you’re running with VMware tools enabled, and tell us a bit more about your hardware and storage topology? Questions that are relevant for this case:

  • What is the hardware on your two VMhosts? (The machines hosting the VMs?
  • What is the virtual hardware assigned to the VMs?
  • How are you assigning .VMDK harddisks to your VMs?
  • On what underlaying storage is your datastores built?
  • How is the network connectivity to your VMhosts?
  • And finally, what version of windows, ESXi and VMware tools are you running?

I’ve written some about optimization of virtualized nodes below:
FAQ: Best practices for virtualized nodes - Node Operators / FAQ - Storj Community Forum (official)

1 Like

Thanks for the help everyone.
Since I limited the network adapter to 30Mbs everything is fine with the VMs.
I will gradually make some suggested modifications to isolate the possible problem.

To answer some questions, the Host is Windows 10 also with a Node.
On this Host there are 2 VMs with 1 node each.

Each VM is connected to the adapter (VMnet0)

This is the configuration of a Host.
Thanks

F1


Hello again friend. Great to hear that you got it all working :slight_smile:

Looks like you’re using virtual NVMe for the boot volumes and a physical disk in passthrough mode for node storage. This is a good setup.

Depending on how large the nodes are (used space, not allocated) 4GB on a Windows VM could be too low. If Windows decides to upgrade itself (or other processes, for that matter), the RAM could have a high bassline usage. With the increased storage node load due to performance testing, the strain on your HDDs are significantly increased. If HDDs cannot follow with the order queue, RAM will rapidly start to fill up, which could be the culprit of your BSOD.

That rasies the critical question: What harddisk(s) are you running?

Additionally, to combat bassline RAM usage, you could run the debloat script below on your VMs,
The Ultimate Windows Utility (christitus.com), try assigning 5GB vRAM instead of 4 to your VMs. Both would be best.

Kind regards.

2 Likes

HI

It’s true. low RAM could be the reason.

The first PC, 1º VM has 14TB ( 12TB Used ) and 2º VM has 16TB ( 5TB Used ).

The other PC, 1º VM have 16TB ( 6TB Used) and 2º VM has 16TB (11TB Used ).

The OS on the VMs is 4GB :roll_eyes:

These are the hdd’s

PC 1

PC 2

Thanks

Why do not use a Hyper-V, which will not have any issues?

for honestly, I would suggest to go with either @Vadim’s solution (Win GUI Storj Node Toolbox) or a Docker Desktop to do not waste so much resources.

Ahh yeah, that’s a much better idea.

1 Like

To host 1 node per VM is such a waste of resources, especially if the guest is Windows. If you really must use virtualization, create 1 VM with a slim Linux server OS, install docker and run all your nodes with docker.

1 Like