What are the best practices for making Storj nodes?
More specifically if we have virtualized Windows nodes?
I’m asking this because I feel there must be something important or obvious that I’m missing.
→ Reading this is optional for responding to the question but I’ll just explain the context of why I’m asking these questions.
Firstly because when wanting to start a node I’ve just followed what the website guided me through, thinking it was going to be fine. I was wrong. The process should have important information about hardware and configurations so we don’t f*** up in the long run but I guess nobody cares. I’ve invested over 100h of my time reading this forum and trying different things, even more if you think about the node migrations, and I still feel like my nodes are a failure and I’m close to just GEing.
My first nodes used Windows 10 and USB SMR drives and after ~2TB started failing all the time. I’ve shucked them and moved them into my proxmox server and made some Windows Server VM’s, problem was solved but at a little over 3TB it started happening again and there was a BIG discrepancy between real space occupied and what the node reported. Bought CMR Exos drives and deleted the databases, the problem still hasn’t reappeared over a year later but my drives are ALWAYS are 100% activity. When consulting the logs I can see that the lazy filewalker runs take over 20 days and there’s a lot of cancelled uploads. Also when Storj needs to update my node just dies until I manually restart the process, which didn’t happen back in Windows 10 and 11, so I guess Storj doesn’t work well with Windows server maybe?
Turning off indexing on the drive took over a month but seemed to help a little bit.
Turning on writethrough caching + iothread gives a significant speed boost but also some unexpected behavior in terms of IO wait and guest CPU usage, and it’s still too slow anyway.
Benchmarks on the guest machine give me IOPS and bandwidth at all queue depths within 15% of what I get on bare metal so I guess the virtualization overhead is not significant.
So what the hell am I missing? It it because I left the default cluster size of 4KB? Should I make some optimizations in the config file? Maybe defrag the MFT? Is the only choice for running a decent Storj node using Linux and EXT4 or something? I’m giving this one last shot but I’m way too tired to make another mistake. Please help. I would ideally stick to Windows VM’s because it works well with the way I manage the server.