Announcement: major storage node release (potential config changes needed!)

Hello everyone!

If you’ve been following the forum closely, you may have noticed three separate things:

But if you haven’t been following the forum as closely, this thread is for you!

Big performance changes for nodes in v1.104.x

v1.104.1 has started rolling out, and it includes some major performance improvements for storage nodes. Many of the nodes we’ve tested have gone from being able to handle under 100 uploads/sec to over 1,000 uploads/sec. Some nodes have seen a 100x improvement!

Possible config change necessary

There are a number of changes we’ve made to achieve these performance improvements, but one of the major ones is we’ve stopped using fsync by default. fsync is a feature that makes the storage node software wait until new bytes are completely flushed to the disk hardware. The waiting is the problem, and our system doesn’t actually need every node to write all bytes to disk.

However, if your node is fairly unstable, this may be a regrettable change, as losing bytes could cause audit failures. Most nodes should have no trouble (a node that is shutdown safely will have all bytes flushed to disk by the operating system naturally), but a node that loses power regularly without any kind of backup system could lose data it claims to be responsible for. If this is you, you’ll want to re-enable fsync.

However, keeping fsync disabled is an enormous performance benefit, and will help your node win more upload races. If you enable fsync you run the risk of losing most upload races.

Summary:

  • With fsync disabled (default): much faster performance, but make sure to shut your node’s operating system off safely.
  • With fsync enabled (former default): slower, more likely to lose races, but should be okay if your node’s operating system regularly shuts down unsafely.

If you want to enable fsync, you’ll need to start your node with --filestore.force-sync=true (or filestore.force-sync: true in the config file)

Thanks!

Please let us know what your experience is like with v1.104.x. As this rollout progresses we’re excitedly watching many graphs, but we’d be thrilled if you shared yours!

11 Likes

I guess you mean enable?

1 Like

Oops! Thanks, fixed.

Sorry if I’m being dense, but just for clarification.

Even though fsync is disabled by default we still need to add filestore.force-sync: true to the config file to disable it?

If you are running your storage node with storage2.piece-scan-on-startup: false you need to run the piece scan on startup once after updating to the new version. It isn’t critical. Ther was a bug in the current version with the size of the trash folder not matching the actual size and also impacting the free space. The new version contains a fix for that but you need to run the piece scan at least once to correct the free space value.

7 Likes

I think you meant if you want to enable fsync it’s true.

3 Likes

I hope so, otherwise I’m confused :smile:

2 Likes

That’s right, it should be turned off by default

1 Like

This is the default as generated by storagenode setup:

# if true, force disk synchronization and atomic writes
# filestore.force-sync: false
1 Like

I’ve seen a couple nodes moved from 1.102 → 1.104 already: thanks for not allowing downgrades to happen this time!

looking at version.storj.io, minimum is 1.101.3, so it may get downgraded to an even lower number than before.

2 Likes

Suggested is v1.104.1 so that’s the version being rolled out.

Regarding downgrades: If the cursor goes blank, the updater will install the minimum version.

The cursor (in simple words) means nodes should be running the suggested version if they are in this(=> cursor) rollout group.

Is that what had so many restarted nodes go backwards to 1.99.x last time?

Yes, the cursor went blank (empty) for a while. I don’t think this is happening this time round, since that would mean nodes would break (the new version has database migrations, as far as I have read here).

In the last week I ran it on a few of my nodes. Do I need to run it again on these nodes?

1 Like

Is there any command that will push node start one time piece scan? without node restart?
I really do not want stop node push piece scan on, wait 1 day and turn this off in 97 my nodes. It cold be just a command like forget untrusted satellite was.

2 Likes

This was sugested by many, but I guess devs didn’t consider it. Maybe in the future they will implement this so nedeed switch.
I’m stressed about doing this for 17 nodes; I can’t imagine how it would be with 100 nodes… :cold_sweat:

Yes, it seems this again would lead to downgrades for nodes that haven’t had their turn for 1.104 yet, but get their docker container restarted. @ifraixedes @jtolio @elek sorry to ping you. I don’t usually do that, but these downgrades caused issues last time and I’m afraid this is again unintentional and possibly untested. The minimum version should always be raised to a completely rolled out version before a new rollout starts to prevent downgrades on docker nodes.

The only intentional exception could be when an ongoing rollout is halted and a new version is rolled out without changing the seed. That doesn’t seem the case here. As all nodes were already at 1.102.

6 Likes

I just restarted a containerized node on 1.102.3 and now its 1.101.3. Just to confirm.

3 Likes