Storagenode Operational updates

elek · April 15, 2024, 8:12am

crossed_fingers: crossed_fingers: This forum topic supposed to give more insights about Storagenode operations

Similar to https://status.storj.io/, but focused to SN
This may help in case of debugging issues, but shouldn’t be required to be followed for normal operation
Please consider opening dedicated thread for specific problems

Current status, notable, known issues:

Huge performance improvements are rolling out, attention may be required: Announcement: major storage node release (potential config changes needed!) - #38
Storagenode dashboard may show bigger discrepancies due to using 1024/1000 based units in different ways
Storagenodes may have bigger storage overhead because previous BF issues. Multiple successful BF processing might be required

History:

2024-03-22 v1.97.3 rollout has been started
2024-03-23 v1.99.3 rollout has been started
- storagenode log levels are configurable
~2024-03-28 Bloom filters are not sent out due to a permission problem with temp buckets
2024-04-10 New problem identified: max sized (4100003) bloom filters were not sent out to big nodes
2024-04-14 US1/AP1/EU1 bloom filters are generated successfully and sent out to all nodes
2024-04-14 v1.100.3 rollout has been started
- trash pieces immediately 52881d2db
- support big bloom filters 87611d002
- per-day trash directories 21de0c3fd
- save-state-resume GC filewalker
2024-04-15 v1.101.3 storagenode rollout has been started (instead of continuing v1.100)
- save-state-resume feature for GC filewalker 2a8d0d44a
- important gc file walker fix (don’t use v1.100!)
2024-04-23: v1.102 rollout has been started
save-state-resume feature for GC filewalker (0f90f061b)
some logging changes
2024-04-24: we started testing 10Mbyte (max size) bloom filters. New enough nodes may receive these bigger BFs. (We are planning to send out backward-compatible, max=4Mb BFs as well, especially for US1.
2024-04-29: Ongoing investigation of not-deleted, date-based trash folders. → see https://review.dev.storj.io/c/storj/storj/+/13113
2024-05-13: v 1.104 rollout will be started soon with significant performance improvements.
- you may see different performance characteristic, as some bottlenecks are solved
- fsync becomes optional, you may need to turn it on if you expect random shutdowns / power outages (see this topic)
2024-05-16: the stable version from the v1.104 line is v1.104.5 (or later). It includes a patch for old storagenodes, which have semi-corrupted data in bandwidthdb: https://review.dev.storj.io/c/storj/storj/+/13163
2024-05-31: storagenodes connected to SLC satellite may observed higher load during the recent period. Some cold test-data is deleted and replaced by the data from the load. (Similar load can be expected from now: it can be useful to adjust your nodes for higher load / better performance if you had problems)
2024-06-24: Due to the increased amount of segments on SLC, we should improve the BF generation scheduler. You may see received BF with different, strange patterns (but still frequent enough).
2024-06-24: Investigating false-postitive stotrage discrepancies: old storage stat may remain, even if data is already gone.
2024-07-09 1.107 rollout is stopped as it used a bumped go version which is not supported in older Windows versions.
2024-07-09 some updates on the current state of GC BF generation Two weeks working for free in the waste storage business :-( - #131 by elek
2024-08-22 deleting TTL-ed testdata is under improvement. Lot’s of unnecessary garbage will be deleted soon… When will "Uncollected Garbage" be deleted? - #283 by elek

snorkel · April 15, 2024, 6:31pm

I would prefer history in ascending time order, to easily follow the events. In order to understand the now, we need to read the past, so it would be naturaly to read the updates in chronological order.

elek · April 16, 2024, 10:25am

I would prefer history in ascending time order, to easily follow

I am fine with both sort orders. But having the latest news on top, helps to understand recent changes faster, IMHO. You don’t need to scroll to down. (Eventually we will have lot of events, but we can prune it to an archive post after a while…).

Also the reverse-time order is more change-log style, where the most recent release is on top. ((But changelage doesn’t contain timestamps, so it can be confusing).

digitalfrank · April 16, 2024, 12:52pm

Great Job, Storj
Number One of Cloud Distributed Storage

nerdatwork · April 16, 2024, 2:23pm

I completely agree. I think the label “History” automatically makes us want to see things in chronological order in order to see the progress. While in terms of changelog, a user is going to use the latest version of the software hence showing what is included in the latest version at the top makes sense.

Roxor · April 16, 2024, 2:48pm

New at the top makes the most sense. And thank you for using the YYYY-MM-DD syntax as the ISO intended… and not some bass-ackward Americanism

JWvdV · April 16, 2024, 4:23pm

Or just a sortable table, so everyone can choose his ordering

Alexey · April 17, 2024, 8:35am

3 posts were split to a new topic: It seems that the new feature “save-state-resume GC filewalker” isn’t functioning as expected

Alexey · April 17, 2024, 8:31am

I think that we can use a history of edits for this regards?

Alexey · April 17, 2024, 8:32am

we are not all from Americas

Alexey · April 17, 2024, 8:34am

Do you know, how it could be showed on the forum?
Perhaps only the table view, but I’m not sure that it could be sorted without editing…

Mitsos · April 17, 2024, 8:55am

I would suggest that instead of making this into a complete changelog, only the important dates+versions+info should be kept.

Ie:
2024-04-15 v1.101.3 storagenode rollout has been started (instead of continuing v1.100)

save-state-resume feature for GC filewalker 2a8d0d44a
important gc file walker fix (don’t use v1.100!)

Or in other words, “update to this version asap because we fixed so and so”.

IMHO “regular” SNOs (ie those running a couple of nodes on auto updates) don’t even have to bother with this, unless they notice something wrong (ie a version downgrade). Bigger SNOs (ie those running dozens or hundreds of nodes, manually updating) will be interested in this topic since it gives important information about edge cases (ie bigger nodes and trash).

elek · April 17, 2024, 9:42am

Yes, my idea was to put the important, actual information under Current status, and keep History for posterity…

Probably the warning about v1.100 deserves a line under the current status…

BrightSilence · May 15, 2024, 9:24am

Has the rollout been paused? I haven’t seen the cursor move in a while and many of us are eagerly awaiting the performance improvements.

Also, side note: edits on the top post don’t bring this topic back up to the top, nor mark it as unread, so changes may not really be visible unless you also post a quick reply on the topic.

ACarneiro · May 15, 2024, 3:07pm

Well, my only one node which updated to 1.104.1 yesterday is now running 1.104.5 (updated like 10 minutes ago) so I think there may have been some changes under the hood.

My other 9 nodes are still waiting for 1.104.x

BrightSilence · May 15, 2024, 3:31pm

I figured that was the case. I saw 1.104.5 on my testnet node. Would have been nice to have that mentioned on this topic though.

Roberto · May 15, 2024, 3:46pm

They changed the rollout version

agente · May 15, 2024, 6:11pm

I think is related to: safeguard against corrupted db in v57 migration

Seb · May 16, 2024, 9:36pm

Hello all,

My node still hasn’t updated automatically to 1.104 . I have tried to update it manually, as per the official torj instructions. Is there something I can / should do to make sure everything is ok? My node is running on Ubuntu 22.04 / docker.