Upcoming storage node improvements including benchmark tool

Ruskiem · May 7, 2024, 11:25pm

Hi and “WUT?”
looks like some speed advertisement from a package of a SSD disk,
and reality, well we all know whats reality … ;>
pls test it on some normal environment, like windows 10 ;>

littleskunk · May 7, 2024, 11:29pm

Told ya you need to test it yourself otherwise you will not believe it. This was one of my slower hard drives.

Mitsos · May 7, 2024, 11:30pm

What are the downsides of filestore.force-sync=false?

littleskunk · May 7, 2024, 11:33pm

In case of a power loss or some similar issues your node would lose what ever it hasn’t written on disk. Some failed audits will be the consequence.

Mitsos · May 7, 2024, 11:34pm

We are talking about the kernel’s commit to disk time and depending on the ext4 journal?

Roxor · May 7, 2024, 11:37pm

I think what you need to make clear is that the drives don’t finish the writes until sometime after the benchmark completes, when fsync is off. Your OS was in the background… with the writes queued in RAM… completing them as fast as it could. (iostat or ‘zpool iostat’ will show that background activity)

littleskunk · May 7, 2024, 11:38pm

I can repeat myself a few more times. Told ya you need to test it yourself otherwise you will not believe it.

BrightSilence · May 7, 2024, 11:40pm

Unrelated to why you mention it, but since bandwidth usage is now cached in memory, I assume stats that haven’t been persisted will also get lost in this scenario. I scanned the commit (admittedly fast) and didn’t see anything about it being persisted on shut down of the node. Did I overlook something? If not, that would mean losing some bandwidth data every time the node updates or is restarted?

littleskunk · May 7, 2024, 11:42pm

I believe yes. Same goes for the TTL DB. Garbage collection will be the backup for that one.

Roxor · May 7, 2024, 11:44pm

It’s not that I don’t believe those are the numbers you saw. It’s that I understand what fsync is doing. One test is “make sure the data makes it to the HDD: I’ll wait until you’re sure”. And the new mode is “YOLO! Toss this data in the general direction of the HDD… I’m not waiting for anything to be confirmed: I-don’t-know-stick-it-in-RAM-or-something… Jesus take the wheel!!!1!”

Disabling fsync should be safe for nodes because the data is protected in other ways. But this isn’t a standalone performance improvement: it’s a tradeoff of durability for speed.

littleskunk · May 7, 2024, 11:48pm

If you refuse to run the benchmark yourself this discussion is kind of useless. We both agree that without you running it yourself it is unbelieveable.

Roxor · May 7, 2024, 11:58pm

I can run it. And I may get numbers identical to yours. But… again… I haven’t seen anyone say they don’t believe your numbers. We’re saying you’re measuring different things.

Perhaps you need to run your benchmark again… even using more pieces if you want. And pull system power right when the benchmark completes. When you boot back up again… notice the old fsync version wrote every piece. Notice the fsync=off benchmarks are missing files**. Until you try that yourself this discussion is kind of useless.

I still think having fsync off for a Storj node is the correct decision. But it should be presented to SNOs in a way they understand the tradeoff being made.

** (Yes, there are controllers with battery-backed caches, and SSDs with power-loss-protection, and other tricks that improve durability. You’re talking the average config, and so am I)

Mitsos · May 8, 2024, 12:01am

On a default install, the max it “should” lose is ~5seconds of data. If that leads to a couple of failed audits, I’m good with that. If that nukes the entire node because every single one of those pieces got audited on the next restart, I’m not good with that.

littleskunk · May 8, 2024, 12:02am

I did many times. Performance results are about the same. Sometimes a bit more and sometimes a bit less. I also run it with a higher concurrency, different file sizes, longer runtime and so on. Common this benchmark is there to avoid these discussions. Run it and get some numbers yourself.

littleskunk · May 8, 2024, 12:07am

I would say the chance for an audit failure is low. Audits are spot checks and it is unlikely that the audit picks exactly the pieces you are missing. But things might escalate when you have lets say 1 powerloss per day. At that point I would suggest to enable the sync call.

Roxor · May 8, 2024, 12:12am

Maybe now you have to fsck/chkdsk to get things running again after an outage (which would probably been avoided in fsync mode). But yeah such a tiny bit of data should be lost that the Storj network wouldn’t care. This should be a great change!

Mitsos · May 8, 2024, 12:13am

Understandable, at 1 power loss a day I would have already used a UPS, an ATS and a generator anyway

For what it’s worth, around here we have pretty stable power, maybe a couple power losses per year.

Mitsos · May 8, 2024, 12:14am

It wouldn’t be avoided in fsync mode, data currently being written would still be lost, so it’s a tradeoff of how much data is in flight.

littleskunk · May 8, 2024, 12:19am

There is a final commit phase that makes the difference. With fsync you will not lose data after telling the uplink that you flushed it to disk. You will only lose data that hasn’t reached that final state.

Mitsos · May 8, 2024, 12:38am

On a sidenote, this should speed up the lazy filewalkers as well, correct?