A node keeps stopping

Blade · July 15, 2025, 10:59am

Hello community!

I’m new to STORJ so I have to ask for your help.

I’ve been running two nodes on separate W11 VMs (Parallels and VMWare Fusion). Two identical external 2TB Hard Drives connected to an Intel Mac Mini running the VMs and mounted directly to the VMs.

Now, I never have any problems with the VMWare one. It never stops on its own and has a 100% uptime on all satellites. The Parallels, however, never lives longer than a couple of days, sometimes much less, and I have to restart it manually. I’m monitoring both with UptimeRobot, so it is easy to tell when it stops listening on the port, but if it happens during the night, and I only notice it in the morning, it contributes to many downtime hours, which is not great.

The logs show the following last fatal errors:

Get-Content "$env:ProgramFiles/Storj/Storage Node/storagenode.log" | sls fatal | select -last 10

2025-07-07T11:37:03+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-07T23:41:38+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-08T22:19:02+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-11T00:48:06+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyWritability:184\n\tstorj.io/common/sync2.(*Cycle).Run:163\n
\tstorj.io/storj/storagenode/monitor.(*Service).Run.func2:114\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-11T18:47:12+02:00       FATAL   Unrecoverable error     {"error": "database is locked"}
2025-07-12T14:25:49+02:00       INFO    piecestore      uploaded        {"Piece ID": "VRHTQIMW2E6LYWJRGOVQXKOLWIHX3VPZZU6DYCUVYUJMFATALNLA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S
", "Action": "PUT", "Remote Address": "79.127.226.98:41362", "Size": 4864}
2025-07-14T07:56:27+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-14T14:30:55+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-14T21:00:21+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2025-07-14T21:25:12+02:00       FATAL   Unrecoverable error     {"error": "piecestore monitor: timed out after 1m0s while verifying readability of storage directory", "errorVerbose": "piecestore monit
or: timed out after 1m0s while verifying readability of storage directory\n\tstorj.io/storj/storagenode/monitor.(*Service).verifyStorageDir:157\n\tstorj.io/common/sync2.(*Cycle).Run:163\n\
tstorj.io/storj/storagenode/monitor.(*Service).Run.func1:111\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

So, basically, there are only two repeating errors: timeout after readability and writability verification.
Plus, a sudden “database is locked” error.

Since I’m new to this, is this a sign of a physical problem and a nearing death of the HDD or could it be that the Parallels virtual controller is somehow interfering?

I appreciate your attention. Let me know if I should give any more details or specs.

jammerdan · July 15, 2025, 11:24am

Connected via USB?

I am asking because I think USB connected disks are not recommended or supported. Thus there might be unexpected issues like the one you are experiencing.

Blade · July 15, 2025, 11:33am

Yes, both are connected via USB. The second one, however, shows zero issues. That’s why I am wondering on which level physical, controller, VM, etc. could the problem be.

jammerdan · July 15, 2025, 11:39am

Have you tested for example with only 1 disk and tried both ports? Or swapping ports with both disks/nodes? The question is, is it always the same node or port and maybe only when 2 disks connected? That may narrow down the cause for your issue.

hwm.land · July 15, 2025, 11:59am

Perhaps stupid question: why are you running Paralles instance as it’s so badly performing? Why you don’t run both instances on VMWare as it’s rock solid for you?

Blade · July 15, 2025, 12:02pm

No, I did not.
Trying only one disk would mean downtime for the second node. But I will try swapping ports and see if anything changes.

jammerdan · July 15, 2025, 12:17pm

A couple of hours usually do not hurt much and the online score will recover. If it helps to solve your issue then it should be worth it.

Blade · July 15, 2025, 12:23pm

It’s not stupid. In fact, I would be asking the same question.

Actually, I was already running some services on Parallels. When I got these HDDs I decided to try out STORJ, but after I found out I couldn’t run two nodes on the same system without tinkering, I decided to install it on a separate VM and try out VMWare at the same time.

Everything else runs solid on Parallels.

Blade · July 15, 2025, 12:25pm

I couple of hours, I agree. But since those errors happen anywhere between every 72 hours and sometimes more frequent, a couple of hours would not be enough.

arrogantrabbit · July 15, 2025, 3:47pm

Don’t do this. You are starving them of ram: if your storage can’t keep up storj will start caching writes in ram. And die.

This is the problem. Single disk won’t keep up. Single disk can only provide 200 IOPs. As soon as usage increases over that — your node blows up. It’s fully expected.

You can stretch it somewhat by moving databases to SSD, using a lot of ram to cache metadata, disabling sync and atime updates but sooner or later you will hit the 200iops wall, your disk won’t keep up, node starts caching in ram, and get killed.

Running nodes in VM makes zero sense and only exacerbates the problem.

RecklessD · July 15, 2025, 9:41pm

Seems like parallels handling of USB attached HDD is not working as well as VMWares,

or

one drive/USB adaptor is not working properly (dying after some time).