SMR detail question

Pac · February 22, 2021, 7:37pm

That’s my guess too because when an SMR drive is stalling the RAM usage usually goes dangerously up.

Well at least that’s what I experienced when my SMR drive could not keep up with the load. Which happened during heavy tests in the past, that particular situation never happened since. Which doesn’t mean it will never happen again though

The buffer size will not help much. Even though in theory it’s better to have it set to 1 or 2MiB, I tried many values but none would prevent the disk from stalling, the RAM from filling up and eventually the node process from being killed by the OOM killer.

Some improvements were made to the Storage node software since though, such as one of the databases was moved to RAM and other enhancements, so things are probably better now.
Also, there are some minor things that can be improved like moving databases and logs to another disk like you mentionned.

But ultimately, if the disk cannot keep up with the incoming ingress load, then there’s not much that can be done appart from throttling down the number of accepted pieces in parallel (or switching to a CMR disk obviously ^^'). Which is far from ideal, but AFAIK there’s no other solution, yet.

Sure, logs can be redirected to anywhere you’d like, so it’s doable if you set up a RAMdisk, but it would reduce the amount of RAM available to the system.
For your information, right now my nodes hold roughly 5TB of data and their logs for the month of January take 735MiB in total, and this was a quiet month. So the RAMdisk could not be too small, so depending on how much RAM you have available for running Storj related stuff this might not be ideal. Would be better to find some space on a spare disk somewhere ^^’