Storagenode Operational updates

:latin_cross: crossed_fingers: crossed_fingers: This forum topic supposed to give more insights about Storagenode operations

  • Similar to https://status.storj.io/, but focused to SN
  • This may help in case of debugging issues, but shouldn’t be required to be followed for normal operation
  • Please consider opening dedicated thread for specific problems

Current status, notable, known issues:

History:

  • 2024-03-22 v1.97.3 rollout has been started
  • 2024-03-23 v1.99.3 rollout has been started
    • storagenode log levels are configurable
  • ~2024-03-28 Bloom filters are not sent out due to a permission problem with temp buckets
  • 2024-04-10 New problem identified: max sized (4100003) bloom filters were not sent out to big nodes
  • 2024-04-14 US1/AP1/EU1 bloom filters are generated successfully and sent out to all nodes
  • 2024-04-14 v1.100.3 rollout has been started
    • trash pieces immediately 52881d2db
    • support big bloom filters 87611d002
    • per-day trash directories 21de0c3fd
    • save-state-resume GC filewalker
  • 2024-04-15 v1.101.3 storagenode rollout has been started (instead of continuing v1.100)
    • save-state-resume feature for GC filewalker 2a8d0d44a
    • important gc file walker fix (don’t use v1.100!)
  • 2024-04-23: v1.102 rollout has been started
  • save-state-resume feature for GC filewalker (0f90f061b)
  • some logging changes
  • 2024-04-24: we started testing 10Mbyte (max size) bloom filters. New enough nodes may receive these bigger BFs. (We are planning to send out backward-compatible, max=4Mb BFs as well, especially for US1.
  • 2024-04-29: Ongoing investigation of not-deleted, date-based trash folders. → see https://review.dev.storj.io/c/storj/storj/+/13113
  • 2024-05-13: v 1.104 rollout will be started soon with significant performance improvements.
    • you may see different performance characteristic, as some bottlenecks are solved
    • fsync becomes optional, you may need to turn it on if you expect random shutdowns / power outages (see this topic)
  • 2024-05-16: the stable version from the v1.104 line is v1.104.5 (or later). It includes a patch for old storagenodes, which have semi-corrupted data in bandwidthdb: https://review.dev.storj.io/c/storj/storj/+/13163
  • 2024-05-31: storagenodes connected to SLC satellite may observed higher load during the recent period. Some cold test-data is deleted and replaced by the data from the load. (Similar load can be expected from now: it can be useful to adjust your nodes for higher load / better performance if you had problems)
  • 2024-06-24: Due to the increased amount of segments on SLC, we should improve the BF generation scheduler. You may see received BF with different, strange patterns (but still frequent enough).
  • 2024-06-24: Investigating false-postitive stotrage discrepancies: old storage stat may remain, even if data is already gone.
  • 2024-07-09 1.107 rollout is stopped as it used a bumped go version which is not supported in older Windows versions.
  • 2024-07-09 some updates on the current state of GC BF generation Two weeks working for free in the waste storage business :-( - #131 by elek
  • 2024-08-22 deleting TTL-ed testdata is under improvement. Lot’s of unnecessary garbage will be deleted soon…:crossed_fingers: When will "Uncollected Garbage" be deleted? - #283 by elek
30 Likes

I would prefer history in ascending time order, to easily follow the events. In order to understand the now, we need to read the past, so it would be naturaly to read the updates in chronological order.

7 Likes

I would prefer history in ascending time order, to easily follow

I am fine with both sort orders. But having the latest news on top, helps to understand recent changes faster, IMHO. You don’t need to scroll to down. (Eventually we will have lot of events, but we can prune it to an archive post after a while…).

Also the reverse-time order is more change-log style, where the most recent release is on top. ((But changelage doesn’t contain timestamps, so it can be confusing).

2 Likes

Great Job, Storj
Number One of Cloud Distributed Storage

6 Likes

I completely agree. I think the label “History” automatically makes us want to see things in chronological order in order to see the progress. While in terms of changelog, a user is going to use the latest version of the software hence showing what is included in the latest version at the top makes sense.

3 Likes

New at the top makes the most sense. And thank you for using the YYYY-MM-DD syntax as the ISO intended… and not some bass-ackward Americanism :wink:

7 Likes

Or just a sortable table, so everyone can choose his ordering :wink:

3 posts were split to a new topic: It seems that the new feature “save-state-resume GC filewalker” isn’t functioning as expected

I think that we can use a history of edits for this regards?

we are not all from Americas :wink:

1 Like

Do you know, how it could be showed on the forum?
Perhaps only the table view, but I’m not sure that it could be sorted without editing…

I would suggest that instead of making this into a complete changelog, only the important dates+versions+info should be kept.

Ie:
2024-04-15 v1.101.3 storagenode rollout has been started (instead of continuing v1.100)

  • save-state-resume feature for GC filewalker 2a8d0d44a
  • important gc file walker fix (don’t use v1.100!)

Or in other words, “update to this version asap because we fixed so and so”.

IMHO “regular” SNOs (ie those running a couple of nodes on auto updates) don’t even have to bother with this, unless they notice something wrong (ie a version downgrade). Bigger SNOs (ie those running dozens or hundreds of nodes, manually updating) will be interested in this topic since it gives important information about edge cases (ie bigger nodes and trash).

Yes, my idea was to put the important, actual information under Current status, and keep History for posterity…

Probably the warning about v1.100 deserves a line under the current status…

7 Likes

Has the rollout been paused? I haven’t seen the cursor move in a while and many of us are eagerly awaiting the performance improvements.

Also, side note: edits on the top post don’t bring this topic back up to the top, nor mark it as unread, so changes may not really be visible unless you also post a quick reply on the topic.

3 Likes

Well, my only one node which updated to 1.104.1 yesterday is now running 1.104.5 (updated like 10 minutes ago) so I think there may have been some changes under the hood.

My other 9 nodes are still waiting for 1.104.x

I figured that was the case. I saw 1.104.5 on my testnet node. Would have been nice to have that mentioned on this topic though.

2 Likes

They changed the rollout version

2

I think is related to: safeguard against corrupted db in v57 migration

2 Likes

Hello all,

My node still hasn’t updated automatically to 1.104 . I have tried to update it manually, as per the official torj instructions. Is there something I can / should do to make sure everything is ok? My node is running on Ubuntu 22.04 / docker.