Release preparation v1.108

Andrii · July 8, 2024, 5:36pm

New release candidate version v1.108 is deployed on QA Satellite

Changelog

General

406c2c3 nodeselection: support subnet filter with any bit size (/25)
17c7163 shared/nodeidmap: add a generic NodeID map
79f3eb3 ci: move satellite UI tests after unit tests
8c2d8c0 release v1.108.1

Multinode

28a1e52 web/multinode: implement table sorting (#6974)

Satellite

dcfc2ab web/satellite: fix no passphrase bug
09805a4 web/satellite: improve account setup dialog code
697225a satellite/satellitedb/dbx: add spanner support
a2f8961 satellite/{web,console}: add config flag for new limits UI
247e22d web/satellite: improve limit update
8f9f7ff satellite/{buckets,metainfo}: don’t suspend Object Lock bucket versioning
1de87c3 satellite/{console, db}: additional actions on account delete
c28ebac satellite/payments: remove free trial feature flag
9f2ab74 satellite/console: extend account freeze to affect op specific limits
e5bb367 satellite/console: attempt payment on card added
2f89fd6 satellite/nodeselection: dual selector
0daae24 web/satellite: fix error happening on logout
49d2c92 web/satellite: fix text wrapping for manage passphrase dialog
cd7e49a satellite/metabase: exempt Postgres from UpdateTableStats test
24fc2f3 web/satellite: fix delete notification
c0cf0db satellite/metabase: alias cache, only fetch missing nodes
4a7d284 satellite/console: fix flaky account freeze tests
d6689fe web/satellite: fix limits behaviour
d8103ae web/satellite: optimize file delete
753b045 satellite/{console,web}: support disabling satellite managed encryption
12e13b0 satellite/metabase: use generic nodealiasmap, fix alias lookup
b2b9c3f web/satellite: fix swapped columns in versioned objects
85b3ff1 satellite/nodeselection/selector_test: increase test delta epsilon
8fc91a1 satellite/payments: update invoicing logic to handle accounts marked for deletion
9106714 satellite/repair: even more logging
a9f888f web/satellite: add account type selection to account setup dialog
cb9e5b5 web/satellite: update managed passphrase step
6d660b2 web/satellite: ui improvements
e0ab857 satellite/metabase: update GetTableStats for postgresql
9b6d39d satellite/satellitedb/dbx: pull in dbx changes
fb9a0a2 satellite/repair: instrumentation around queue insert
a538566 satellite/satellitedb/dbx: add DriverMethods
56b7948 satellite/{admin, db}: new endpoint for downloading CSV with user emails marked for deletion
8265e00 satellite/admin: force delete projects of the users which were requested for deletion
5a30a1d satellite/satellitedb: fix ApiKeys methods
9f24535 satellite/satellitedb: fix billingDB use tx instead of db
3cb6625 satellite/satellitedb: ensure consoledb tables use tx
0681218 satellite/audit/verifier: Improve code comment
b9219e7 satellite/satellitedb: stripe customers, avoid leaking the underlying implementation
3ad4a84 satellite/satellitedb: users, ensure code works in a tx
0efc3f1 satellite/satellitedb: add tx and db mixing check
d8f9425 satellite/console: avoid a tx in apikeys
c02ec50 satellite/console: send email when user changes password
91cdfe8 web/satellite: hide encryption notice for Storj managed projects
02306e3 satellite/kms: add support for multiple kms keys
fe6ee95 web/satellite: improve exceed bandwidth limit error handling
311f938 satellite/*: fix things next staticcheck noticed
b148438 satellite/satellitedb: Change meter by counter audit queues
d18201b satellite/metainfo: big bitshifttracker (bitshift tracker with variable size)
252bc5a web/satellite: fix setup account flow

Storagenode

2fceb6c storagenode/blobstore: blobstore with caching file stat information (mod time, size)
9e99540 storagenode/retain: reduce concurrent retain requests to 1
30f80af storagenode/storagenodedb: buffer up GC Filewalker progress storage
ea083c0 storagenode/orders: avoid Lstat calls on order files when listing
845efc6 storagenode/pieces: collector delete should update usage cache
32456d3 storagenode/collector: test used space updating

Mitsos · July 8, 2024, 5:45pm

https://github.com/storj/storj/commit/406c2c3 404 error

Andrii · July 8, 2024, 6:07pm

Roxor · July 8, 2024, 7:15pm

I have high hopes for this one. And not a bunch of forum posts about “BadgerDB corrupted log entries are preventing my node from starting”.

Mitsos · July 8, 2024, 7:37pm

Will the file stat info db be stored alongside the other databases? Meaning, is there any other config change required by the SNOs if the DBs are already on SSD?

Ambifacient · July 8, 2024, 8:10pm

I’ve enabled pieces.file-stat-cache: badger in the config and spun up with lazyfilewalker disabled. So far in the storage directory there is a new filestatcache directory where badger creates a 2GB .vlog file. The actual on-disk space so far in order of KB a few minutes after restart, maybe due to ZFS compression.

To be clear the filestatcache directory is also stored on HDD (i.e. follows the storage.path config option as opposed to storage2.database-dir). The IOPS even when triggering used-space-filewalker don’t seem significant. inotifywait sees an entry maybe every 20-30 seconds. If you are memory constrained may not be the case.

Once the used-space-filewalker completes I will report what I see for on-disk space.

arrogantrabbit · July 8, 2024, 8:26pm

Yeah. I really don’t like that.

It’s a job of a filesystem, not software. Software shall not reimplement system services. It just adds complexity and points of failure. It’s called feature creep and bloat. It’s a move into opposite direction. We need fewer databases and lower complexity on the node, not the other way around.

I’m disappointed this was approved.

Thankfully this only will be active if lazy filewalker is disabled (from a brief glance, I might be wrong).

Roxor · July 8, 2024, 8:32pm

And if the filesystem doesn’t do it? And no system service covers it? Don’t features then fall back to the app to implement? If it works: Storj has something cross-platform filesystem-agnostic that speeds up filewalker housekeeping.

If it doesn’t work, then it gets cut.

I’m willing to try it: it’s a reasonable path forward. Even if ZFS smokes it

flo82 · July 8, 2024, 8:38pm

Thanks for trying it out.

I don’t quite understand why the filecachedir is located on storage path. I would expect it on the database path (SSD) to reduce i/o on the HDD.

Seems hardcoded: storagenode/blobstore: blobstore with caching file stat information (… · storj/storj@2fceb6c · GitHub

Maybe this will change in future. If not a dir-symlink to the SSD should help here. Could you also try this @Ambifacient ? Thanks a lot.

Roxor · July 8, 2024, 8:40pm

Yeah: BadgerDB says it’s “Designed for SSDs”… so defaulting to a path that’s almost certainly not a SSD is a strange choice. Perhaps because of its size?

Ambifacient · July 8, 2024, 8:41pm

When using docker in the worst case you can provide a bind mount to /app/config/storage/filestatcache/ from a place on your SSD.

Vadim · July 8, 2024, 8:46pm

may be storj can make it configurable, today for example i use NVME for DBs, i can put chache also there thein it will realy work fast.

arrogantrabbit · July 8, 2024, 8:54pm

No, that means the system is not suitable for running that service.

I see it differently. This feature will be only useful on an underpowered, poorly sized, with crappy filesystem, rotten potatoes. That same potatoes that suffer from database corruption, resource exhaustion, and general poor stability. That same potatoes that would greatly benefit from reduced complexity, and elimination of existing databases, not inclusion of new ones.

Adding features like these is a slippery slope. In the extreme, eventually storagenode will end up supporting EFI secure boot and running reinvented ZFS. It will be called StorjOS, and Storj as company will switch to manufacturing SANs.

Instead, there shall be a hard line: minimum system requirements, per OS, per filesystem.

There is absolutely no reason to waste development, QA, and maintenance time on an issue that is solvable with a slightly larger memory stick, or a cheap SSD from ebay. Or, better yet, a line in a minimum system requirements text file.

With increased anticipated load storj does not need to be in business of sustaining zombies on life support. It shall be attracting hardware that does not need crutches to fetch files from the disk. There is surplus of node operators, storj can safely lose raspberry pies.

The problem is it will create an illusion of working – by actually working, but making more harm than good in the long run, in the form of decreased stability, especially on a weak nodes that benefit most from this features, and time investment for support. It’s just wasting resources to delay the inevitable. It’s a poor use of everyone’s time.

Yet, it seems this way in isolation. But it does not fit in the whole picture.

Toyoo · July 8, 2024, 8:54pm

Out of curiosity, what do you think of Ceph’s BlueStore? Ceph has migrated to this solution, which is object storage directly on top of block devices, from a file system-based approach citing performance issues.

Roxor · July 8, 2024, 8:56pm

Doesn’t BlueStore replace NTFS/EXT4: sitting right on the devices? That’s not really with the ‘use what you have’ mantra if you wipe your existing filesystems…

Vadim · July 8, 2024, 9:23pm

storj just want to work with everyone and not end up like filecoin, insane requirements, but it realy not scalable in real world reality.

jammerdan · July 9, 2024, 2:41am

According to @elek this is what you get with ext4:

As it is the default for most Linux I’d say the implementation is not suited for the system. That’s why the implementation needs to be adapted.

I don’t think that is true either. Looking at this issue:

The filewalkers are a problem on the select nodes as well and these should be datacenter grade nodes.
What Storj maybe could be doing is to to make system recommendations to for individual node operators that handle the load better than the default installations. For select nodes they might be able to force specific hard- or software setups as mandatory on the datacenters, like telling them the nodes must be running on ZFS with SSD metadata cache miniumum 1GB / TB stored. A datacenter should be able to follow such requirement.

@Mad_Max thanks again for your extensive investigations and reports. This is just great.

arrogantrabbit · July 9, 2024, 3:43am

This begs the question – why does the node need size and modification time for what’s supposed to be immutable objects?

And if the answer is “to show pretty pictures in the dashboard” – this only reinforces my earlier conviction that the dashboard must go. Satellite knows how much data is stored on the node. It knows the limit. Calculating sizes locally is counterproductive.

arrogantrabbit · July 9, 2024, 3:48am

I’m not familiar with it, I’ll have a look.

We are talking about tens of terabytes of space. I doubt people who have so much space available as a pocket change to share with storj are running NTFS or ext4. And on smaller nodes even ntfs works ok.

Alexey · July 9, 2024, 7:18am

Please, do not use symlinks any type. You likely will end with DQ, proven many times.