Reelase preparation v1.146

New release candidate is already on QA Satellite

Changelog

General

  • 42bcc9b shared/dbutil/*,all: add log argument to OpenUnique
  • f58fd99 shared/dbutil/spannerutil: pass in minimum session pool size
  • d51aa96 all: use spanner emulator per test
  • 2069f70 shared/event: wide event based per-request eventkit logging with rich information
  • 03aa933 Jenkinsfile.public: run tests wit Spanner in a new way
  • 94f5256 Makefile.dev.mk: increase linting parallelism
  • e4c9eee go.mod: Bump deps with reported vulnerabilities
  • bed11db go.mod: Set back go to 1.24.7
  • 2b23e9a shared/modular/tracing: put a letter d aside for a rainy day (=typofix)
  • 99403f8 Jenkinsfile{verify,public}: increase go test parallelism to 9
  • 51f687e shared/dbutil: add support for composite foreign keys in schema
  • 79205c6 Jenkinsfiles: Use same golang version
  • 219c5d6 shared/{h/m}otel: monkit helpers to use open telemetry
  • b502a76 shared/modular: debug flag to print out current / all components
  • 629927a Revert ā€œshared/event: wide event based per-request eventkit logging with rich informationā€
  • 7927201 scripts: move go scripts to separate folder
  • 9e49dd7 shared/dbutil/spannerutil: fix test failing for production Spanner
  • 5368782 shared/modular/cli: fix typo in file name
  • 7d89b49 all: Homogenize zap fields
  • a97b6d9 ci: new release makefile
  • beef4f3 go.mod: bump monkit
  • 6abd28e ci: move build cache outside of tmp directory
  • f9194ce shared/modular/eventkit: replace mock destination with eventkitspy
  • e7ebd6d ci: use a dockerkit based binaries releasing
  • 5d25a8b : Add blank line to deprecated & fix warnings
  • 21e54a0 repairer: add name for rpc connection pool
  • c066c5d shared/mud: add RemoveAllRequirements method to Component
  • ff7ca21 ci: share web/storagenode/Dockerfile
  • 57ebf43 ci: fix make images
  • ec87e24 ci: use --push for pushing images, fix LATEST_TAG
  • ea8e5bb ci: setup buildx builder
  • ba5d3cb ci: fix buildx builder setup
  • 24557b9 ci: switch to Jenkins coverage plugin
  • 909b591 images: optimize delve and storj-up building
  • 47588ec installer/windows: fix path escaping
  • b272ed8 ci: add missing compress stage and reorder
  • 47e83c8 script/bake.sh: include satellite web for version calculation
  • 8334a0b ci: merge images and release-binaries makefiles
  • 6f78b89 ci: fix export the environment variables
  • 6914015 Makefile: remove commit hash from tagged builds
  • e193dcf ci: match also release candidate tags
  • 763554f ci: publish to github should use git tag
  • b2242d6 ci,scripts: fix rc tag detection
  • 5ffd61e ci,scripts: fix additional check

Multinode

  • 2c83c12 web/multinode: remove jest tests
  • 4a69fd7 web/multinode: use new styling syntax
  • 4bb49f1 web/multinode: add pinia modules
  • 710b9ae web/multinode: regenerate lock file and migrate a couple of components
  • bd3aa1f web/multinode: migrated common components to use SFC syntax
  • 584181d web/multinode: migrate a bunch of components to use SFC syntax
  • 32e9709 web/multinode: migrate payouts components to use SFC syntax
  • 5446377 web/multinode: migrated and reworked charts to use SFC syntax
  • 2ce0531 web/multinode: migrate the last bunch of components to use SFC syntax
  • b30bd7b web/multinode: remove Vue filters
  • c8d819f web/multinode: replace Vuex with Pinia
  • 5ee235e web/multinode: finalize migration to Vue 3
  • d9d80c0 web/multinode: add Dockerfile

Satellite

  • a39ae7d satellite/satellitedb: Orders methods use Spanner statements
  • 72a5067 satellite/admin: add delete project endpoint
  • 2cefdb7 satellite/{console,db}: use tenantID aware db methods
  • 5e3119c satellite/metabase/changestream: add partition metadata tests
  • 20442d2 satellite/satellitedb: optimize UpdateBucketBandwidthAllocation
  • 454a1b0 satellite/metainfo: fix multipart upload detection in batch
  • 2c0f02a web/satellite: make UI config backward compatible
  • ed47307 satellite/{console,web}: prevent abbreviated project deletion if object lock is enabled
  • 56d339e satellite/satellitedb: remove unused GE tables
  • a817c53 satellite/{console,web}: improve storage tier UI visuals
  • b685f91 satellite/admin-ui: fix project detail page
  • 23d573c satellite/admin: allow abbreviated project deletion
  • 0672b71 satellite/admin-ui: add delete project UI
  • a64ce09 satellite/{console,analytics}: add tenantID to some segment events
  • a46ace4 satellite/metabase: fix Postgres precommit query to honor ExcludeFromPending.Object
  • c5c1b0c satellite/metabase: fix retention time storage for objects without retention
  • 2bfcee4 web/satellite: Update text for managed encryption selection
  • a17da10 satellite: add Run hook for services required by console
  • 40d6ab0 web/satellite: add WASM build support for modular satellite image
  • e13631d satellite/sso: skip initialization when SSO is disabled
  • 79c54d2 satellite/console: added config API endpoint for dynamic frontend branding
  • de38602 web/satellite: dynamic branding system
  • b290988 satellite/console: add more whitelabel tests
  • 2e42dd2 satellite/satellitedb: database layer for bucket eventing config
  • c33c22d satellite/metabasetest: update EqualRetention to fix tests for PG
  • d519a4b satellite/satellitedb: add change histories table
  • 54e94df satellite/admin/back-office: save change log to DB
  • 04f47c2 satellite/audit: convert reporter_test to use mudplanet instead of testplanet
  • 8d1db11 satellite/metabase: fix test to use UncoordinatedDeleteAllBucketObjects
  • cf500c4 satellite/admin/back-office: add change history endpoint
  • a368f2a satellite/admin-ui: show change history
  • e5d6ff1 web/satellite: do not expect compute instance’s ā€˜remote’ field to be defined
  • dcf32cf satellite/admin-ui: change bucket ID to bucket name
  • cfa0ec6 web/satellite: resolve some build warnings
  • d49ed20 satellite/{console,web}: show product shortname in more places
  • b087bcb web/satellite: move usage graphs off main dashboard
  • 1ee8d37 satellite/metabase/avrometabase: refactor parser to use error accumulation pattern
  • 3092f6b satellite/metabase/avrometabase: refactor parser to use errs.Combine pattern
  • d41c5e6 satellite/metabase/avrometabase: add objects iterator
  • cbf67b7 satellite/console: avoid using console.Config as a global variable
  • ea61983 satellite/metabase: change spanner.Insert to spanner.InsertMap
  • 6b7cd61 satellite/metabase/changestream: fix flaky TestDeleteChangeStream
  • 7ac55e6 satellite/metabase: fix test context in TestListObjectsPendingDuplicates
  • ad9c94e satellite/payments: remove partner-specific billing logic
  • 7af72bd satellite/metainfo: switch logging to use public project id
  • 0f2dde7 satellite/accounting: remove unused method
  • 46a7f01 satellite/metainfo: add comprehensive doc.go for AI agents
  • 9ed2c7f satellite/metabase: fix flaky tests
  • c4ae04a satellite/eventing/eventingconfig: project-level gating config
  • 4118d0f satellite/metabase/avrometabase: add node aliases parsing
  • ac2658d satellite/eventing: add event type validation
  • aa324df satellite/eventing: add test event support
  • 15cb5fc satellite/metainfo: add bucket eventing config API endpoints
  • 2ca7e71 satellite/metainfo: add metadata size restriction to BeginObject
  • cd7abcd satellite/analytics: send tenant ID to Hubspot
  • c848f47 web/satellite: re-open create project dialog after member account conversion
  • ec4cada satellite/admin/back-office: cascade user agent and placement update
  • 06fe87d satellite/admin/back-office: treat user agent as string
  • 7381f59 satellite/admin: reorganize code
  • 91b4d91 satellite: whitelabel email system
  • 2fabdff satellite/admin/back-office: update user default placement
  • 4f1f0c8 satellite/admin-ui: update user default placement
  • 27075ff satellite/admin-ui: clean up head/rate limit display
  • e230a46 satellite/{console,payments}: add pricing endpoint
  • 2f4c453 web/satellite: hide satellite selector for white labeled satellite
  • a87d759 satellite/eventing: add Redis cache for bucket notification configs
  • 1a952c8 satellite/{console,payments}: Add config to require address
  • 8492b67 satellite/eventing: add event type and filter matching utilities
  • 2a4f889 satellite/metainfo: use Redis cache for bucket eventing
  • ad304fe satellite/eventing: use Redis cache in the eventing service
  • 61abebc web/satellite: small compute UI updates
  • ea3d4e5 web/satellite: move compute dialogs to a dedicated folder
  • d2bfe19 web/satellite: added start/stop compute instance
  • 7e0c2b8 web/satellite: add restart instance functionality
  • 96b22c3 satellite/satellitedb: add remainder_bytes column to bucket_storage_tallies
  • fa7d311 satellite/repair/repairer: add RPC connection pool
  • afefc2f satellite/metabase: fix swapped Spanner tags in precommitUnversionedObjectFull
  • b072ce3 satellite/console: simplify whitelabel smpt password
  • bdc5e59 satellite/satellitedb: simplify migration for bucket_storage_tallies table
  • fa74298 satellite/admin: Homogenize zap fields
  • 6400c79 satellite/{accounting,satellitedb}: start tracking remainder bytes explicitly
  • 10b2a40 satellite/{console,web}: dynamically set default favicons
  • 0152dd5 web/satellite: hide docs links for white-labeled satellite
  • e589079 web/satellite: hide satellites dropdown on ā€˜forgot password’ screen
  • b77c6cf satellite/satellitedb: add deletion_remainder_charges table
  • dac660a satellite/{accounting,db}: add remainder db methods
  • 9f7c9c2 satellite/payments: populate minimum object size invoice line item with the real data
  • 7394e60 satellite/payments: update product charges endpoint to return small object fee
  • 2445dbd {satellite,storagenode}/contact: add amnesty reporting
  • a4418d3 satellite/admin: added tenant ID indication to users table/details
  • 56ed972 web/satellite: hide linksharing for whitelabeled UI
  • 04f6768 satellite: use eventkit to report usage
  • 68ea408 satellite,storagenode: Fix Zap field names
  • c3cddc2 satellite/eventing: ensure public project id is logged
  • 9be1ff7 satellite/metabase: reorganize object and segment begin and commit code
  • 45d50b6 satellite/analytics: add admin_initiated flag to account deletion tracking
  • 2a427fa satellite/db: simplify and rename deletion remainder table
  • cb60b2b web/satellite: update estimated charges UI to show small object fee
  • 03b3c71 satellite/admin: Display ā€œreasonā€ in audit history in UI
  • 3155794 satellite/satellitedb: remove spanner emulator workaround
  • 75840f6 satellite/accounting: normalize rolled-up usage storage across day boundaries
  • cff4b1f satellite/metabase: make UpdateObjectLastCommittedMetadata reject invalid
  • 94eb64d satellite/console: added new ā€˜tenant’ user kind
  • 7555eaf satellite/satellitedb: add new method to query project invites by email and tenant ID
  • b6e0024 satellite/{admin,console}: don’t update user.upgrade_time if already set
  • b6417d2 satellite/eventing: consolidate retry logic for publishing events
  • f75fb15 satellite/console: fix tenant-specific links in emails
  • d0d1a4e web/satellite: Don’t request pricing if billing is disabled
  • 96bc165 satellite/console: Make redirects tenant-aware
  • 4339fc5 web/satellite: fix date calculation in billing history
  • 22561bb web/satellite: update feedback form to not accept issue reports
  • b180877 satellite/metabase/changestream: fix flaky test event filtering
  • a9f990b satellite/console: use combined GetByPublicOrPrivateID method
  • 9348510 web/satellite: Fix time unit conversion for REST API keys
  • f6bb098 satellite: allow unsetting of upgrade_time in update user method
  • 5637f3f satellite/console: create api keys that support bucket eventing
  • 930357c satellite/admin: add an explicit endpoint to update user’s upgrade time
  • 23a6028 satellite/admin/ui: add explicit functionality to update user’s upgrade time
  • a469555 satellite/satellitedb: add new DB method to delete API keys by project ID and owner ID
  • 51b4c4c satellite/{console,web}: extend delete members endpoint with RemoveAccesses flag
  • 091de5f web/satellite: added prompt to remove to be deleted member’s access keys
  • 2a8c9cd satellite/console: add config for hiding project encryption options
  • 2fc4c33 satellite/console: hide project encryption options
  • ce6627c satellite/{console,web}: support configuring button text and overall background colors
  • 1f9c947 web/satellite: use branding support URL instead of general request URL
  • fbd5fbc satellite/console: added feature flag to disable uplink behavior in the UI
  • 63e8fc7 satellite/admin/{,legacy}/ui: add Dockerfile
  • 21ae463 satellite/console: allow gateway URL overrides for tenants
  • a451d45 satellite/console: reduce billing address requirement

Storagenode

  • fa9459f storagenode/hashstore: annotate spans with compaction information
  • 3aea9a5 storagenode: adjusting used/available space calculation
  • 0b13733 storagenode: enable nodestats.Cache for select runner
  • cd68e34 cmd/storagenode-updater: avoid using tag for binary
  • 9a0ebaa storagenode/hashstore: basic informational fsck
  • d092dfa storagenode/hashstore: log reconciliation
  • 4988c84 storagenode/hashstore: modernize + add some tests
  • 3fe4fd7 storagenode/hashstore: full table reconstruction
  • 079e19c storagenode/hashstore: mmap for log reading
  • 0ec2381 storagenode/select: disable bandwith.DB usage
  • 7da1f04 storagenode/hashstore: Add build flags for synctest
  • 844b464 storagenode/monitor: remove duplicated free space calculation
  • e06551f storagenode/hashstore: Change build directive
  • 67c3e6c storagenode/hashstore: fix fsck on zero size files

Test

  • b94e77c private/testplanet: simplify user creation
  • 07fe1d9 private/teststorj: remove package
  • 58f0981 testsuite/playwright-ui: cover multi-tenancy functionality
  • 6c1ec16 {Jenkinsfile/testsuite}: bump postgres to v17
2 Likes

What about fsck?

20 char

fsck was mentioned twice:

  • 9a0ebaa storagenode/hashstore: basic informational fsck
  • 67c3e6c storagenode/hashstore: fix fsck on zero size files
2 Likes

this is great! I hope it will make nodes more reliable unattended.

Storagenodes don’t do OS level fsync (for performance reasons), which means that un-graceful shutdown can cause very some small percent of data loss on SN: Last written pieces are may or may not be written to the disk. It’s not a full data loss (thanks to the RS replication), but still annoying that node doesn’t have a piece, which supposed to be there, and worst case in can cause DQ or corrupted log files can block SN startup.

New version will check the tail of log files during startup, and compare it with the metabase, to check if they are in sync.

This can be be very slow, if all log files should be checked, therefore not all the files are checked, just the ones which were open.

Since v1.142, storagenodes generate hint files, so it should be there on (almost) all storagenodes in meta directory (example location: /config/storage/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s0/meta)

Example hint file:

cat hint-000000000000004c | head
largest: 000000000000d555
writable: 000000000000d1a0
writable: 0000000000007ab6
writable: 000000000000d113
writable: 000000000000c4ed
writable: 000000000000ce8b
writable: 000000000000d3a2
writable: 000000000000ce8a
writable: 000000000000d517
writable: 000000000000d539

This means all log files which are newer than the d555, should be checked + all others which were open (writable). Still only a few.

For a graceful shutdown, the hint file should contain only one line (largest). If node is down, and writable lines are there: it was not a graceful shutdown.

Without (the automatically generated) hint files, the startup of a node can be very slow (all log files should be checked). It should happen only, if you upgrade from <v1.142 to v1.146.

Obviously, a malicious SNO can disable this fsck with code change, but it’s better to have it. Fsck will report the lost pieces to the satellite, and in exchange of this information, satellite will not punish for these with lower audit score (if the number of these are acceptable, don’t make a hobby from loosing data). This is called amnesty in the protocol, but satellite side processing is still forming.

Still the best is doing graceful shutdown, where the startup should be as fast as possible.

The hahstore opened line also includes information about fsck:

storagenode1729  | {"L":"INFO","T":"2026-01-30T10:10:36Z","N":"piecestore:hashstorebackend","M":"hashstore opened successfully","satellite":"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S","open_time":"31.32114617s","logs_skipped":10643,"logs_matched":0,"logs_mismatched":0}

logs_skipped are the best, they are not checked by fsck, as they were properly closed. logs_matched are not properly closed but no data loss detected. logs_mismatched means disappeared pieces due to bad shutdown.

10 Likes

And what logs_mismatched will do to a node? Will it prevent the node from starting?

It shouldn’t, but it should report it:

If hashstore had been running a year in Select environments… why are the fsck+amnesty features only being added now? Is it that the Select gear is more reliable (so the features weren’t needed)… and the public network experiences way more disk-errors/unclean-shutdowns?

Of course. That’s the same reason original database based implementation failed: people run nodes on rotten consumer grade potatoes without upses. That’s why it was switched to using strictly filesystem features with sync on by default. That was piecestore.

Hashstore is an optimization of a specific hot path only high bandwidth nodes in select network saw, at the expense of re-inventing of an albeit simple database. Performance goes up, resilience to potatoes goes down. Very much expected.

I’m amazed how do people manage to reboot and crash stuff all the time. Just leave the server alone.

3 Likes

Nooo! People have to toy and tinker and optimize all the time, you see! What do you mean, leave the ā€œserverā€ alone, when users run Raspberry Pi with Sata ā€œHATsā€, running on some 20 year old bronze rated PSU, cables everywhere built in a plastic food container? What do you mean ā€œleave the ā€˜server’ aloneā€, if my four gigabytes of RAM are constantly over congesting my 20000 nodes on under sized non-reliable consumer hardware makes me swap into RAM at all times? I need to reboot every 36 hours to prevent crashes, you see???


Obligatory /s. But some people really are like this. You, and you stubborn ways really are the way to go. I’m migrating to ZFS as we speak, it’s server hardware, commodity disks, high end power delivery, all housed in a metal get-your-fingers-away chassis, and I see myself morphing into you every second.

3 Likes

I guess the ā€œwe should add database features to the filesystemā€ people, and the ā€œwe should add filesystem features to the databaseā€ people… will always be fighting… until balance is brought to The Force.

i’m also a fan of using ZFS for piecestore. Let the flash do flash-things, and the HDDs just handle bulk bits. As ${DIETY} intended.

3 Likes

While I agree with @arrogantrabbit’s assessment as likely, there’s also another possible factor: select node operators probably don’t mind killing broken nodes given they have different guarantees regarding utilization. So they’re not looking at each node warning and trying to fix it, and instead just get rid of a failing node and automatically set up another. For us common folks losing a single node is a big deal. For them providing petabytes of storage? Uh, not really…

BTW, at this point we know that select node operators don’t use Windows. Guess how we learned that? :rofl:

I use 1.146 I see that this verson kill metadata. Just stop node for update and metadata file is gone. I have alredy several nodes that after stop metadata just disapered hashtbl.

All my nodes are on 1.146 or 1.147 rc without any problems.

Vadim is on Windows. Maybe Windows isn’t tested enough.

I am running windows, ubuntu and truenas nodes.

100 nodes run fine, but already third node, who after node normal restart dont start with event hashtbl missing

I am using windows server so maybe problem only exists in consumer windows. :thinking:

I’m on 1.142.7 and relaxed :sweat_smile:

8 Likes

can you share log files of just before and after the restart?

1 Like