Announcement: major storage node release (potential config changes needed!)

jtolio · May 13, 2024, 4:09pm

Hello everyone!

If you’ve been following the forum closely, you may have noticed three separate things:

But if you haven’t been following the forum as closely, this thread is for you!

Big performance changes for nodes in v1.104.x

v1.104.1 has started rolling out, and it includes some major performance improvements for storage nodes. Many of the nodes we’ve tested have gone from being able to handle under 100 uploads/sec to over 1,000 uploads/sec. Some nodes have seen a 100x improvement!

Possible config change necessary

There are a number of changes we’ve made to achieve these performance improvements, but one of the major ones is we’ve stopped using fsync by default. fsync is a feature that makes the storage node software wait until new bytes are completely flushed to the disk hardware. The waiting is the problem, and our system doesn’t actually need every node to write all bytes to disk.

However, if your node is fairly unstable, this may be a regrettable change, as losing bytes could cause audit failures. Most nodes should have no trouble (a node that is shutdown safely will have all bytes flushed to disk by the operating system naturally), but a node that loses power regularly without any kind of backup system could lose data it claims to be responsible for. If this is you, you’ll want to re-enable fsync.

However, keeping fsync disabled is an enormous performance benefit, and will help your node win more upload races. If you enable fsync you run the risk of losing most upload races.

Summary:

With fsync disabled (default): much faster performance, but make sure to shut your node’s operating system off safely.
With fsync enabled (former default): slower, more likely to lose races, but should be okay if your node’s operating system regularly shuts down unsafely.

If you want to enable fsync, you’ll need to start your node with --filestore.force-sync=true (or filestore.force-sync: true in the config file)

Thanks!

Please let us know what your experience is like with v1.104.x. As this rollout progresses we’re excitedly watching many graphs, but we’d be thrilled if you shared yours!

pangolin · May 13, 2024, 4:12pm

I guess you mean enable?

jtolio · May 13, 2024, 4:14pm

Oops! Thanks, fixed.

ACarneiro · May 13, 2024, 4:20pm

Sorry if I’m being dense, but just for clarification.

Even though fsync is disabled by default we still need to add filestore.force-sync: true to the config file to disable it?

littleskunk · May 13, 2024, 4:20pm

If you are running your storage node with storage2.piece-scan-on-startup: false you need to run the piece scan on startup once after updating to the new version. It isn’t critical. Ther was a bug in the current version with the size of the trash folder not matching the actual size and also impacting the free space. The new version contains a fix for that but you need to run the piece scan at least once to correct the free space value.

Mitsos · May 13, 2024, 4:29pm

I think you meant if you want to enable fsync it’s true.

ACarneiro · May 13, 2024, 4:32pm

I hope so, otherwise I’m confused

Roberto · May 13, 2024, 4:46pm

That’s right, it should be turned off by default

Mitsos · May 13, 2024, 4:49pm

This is the default as generated by storagenode setup:

# if true, force disk synchronization and atomic writes
# filestore.force-sync: false

Roxor · May 13, 2024, 4:55pm

I’ve seen a couple nodes moved from 1.102 → 1.104 already: thanks for not allowing downgrades to happen this time!

lyoth · May 13, 2024, 5:41pm

looking at version.storj.io, minimum is 1.101.3, so it may get downgraded to an even lower number than before.

ACarneiro · May 13, 2024, 5:47pm

Suggested is v1.104.1 so that’s the version being rolled out.

Mitsos · May 13, 2024, 6:23pm

Regarding downgrades: If the cursor goes blank, the updater will install the minimum version.

The cursor (in simple words) means nodes should be running the suggested version if they are in this(=> cursor) rollout group.

Roxor · May 13, 2024, 6:25pm

Is that what had so many restarted nodes go backwards to 1.99.x last time?

Mitsos · May 13, 2024, 6:29pm

Yes, the cursor went blank (empty) for a while. I don’t think this is happening this time round, since that would mean nodes would break (the new version has database migrations, as far as I have read here).

donald.m.motsinger · May 13, 2024, 6:56pm

In the last week I ran it on a few of my nodes. Do I need to run it again on these nodes?

Vadim · May 13, 2024, 7:11pm

Is there any command that will push node start one time piece scan? without node restart?
I really do not want stop node push piece scan on, wait 1 day and turn this off in 97 my nodes. It cold be just a command like forget untrusted satellite was.

snorkel · May 13, 2024, 7:45pm

This was sugested by many, but I guess devs didn’t consider it. Maybe in the future they will implement this so nedeed switch.
I’m stressed about doing this for 17 nodes; I can’t imagine how it would be with 100 nodes…

BrightSilence · May 13, 2024, 7:49pm

Yes, it seems this again would lead to downgrades for nodes that haven’t had their turn for 1.104 yet, but get their docker container restarted. @ifraixedes @jtolio @elek sorry to ping you. I don’t usually do that, but these downgrades caused issues last time and I’m afraid this is again unintentional and possibly untested. The minimum version should always be raised to a completely rolled out version before a new rollout starts to prevent downgrades on docker nodes.

The only intentional exception could be when an ongoing rollout is halted and a new version is rolled out without changing the seed. That doesn’t seem the case here. As all nodes were already at 1.102.

LrrrAc · May 13, 2024, 7:56pm

I just restarted a containerized node on 1.102.3 and now its 1.101.3. Just to confirm.