Release preparation v1.108

New release candidate version v1.108 is deployed on QA Satellite

Changelog

General

  • 406c2c3 nodeselection: support subnet filter with any bit size (/25)
  • 17c7163 shared/nodeidmap: add a generic NodeID map
  • 79f3eb3 ci: move satellite UI tests after unit tests
  • 8c2d8c0 release v1.108.1

Multinode

  • 28a1e52 web/multinode: implement table sorting (#6974)

Satellite

  • dcfc2ab web/satellite: fix no passphrase bug
  • 09805a4 web/satellite: improve account setup dialog code
  • 697225a satellite/satellitedb/dbx: add spanner support
  • a2f8961 satellite/{web,console}: add config flag for new limits UI
  • 247e22d web/satellite: improve limit update
  • 8f9f7ff satellite/{buckets,metainfo}: don’t suspend Object Lock bucket versioning
  • 1de87c3 satellite/{console, db}: additional actions on account delete
  • c28ebac satellite/payments: remove free trial feature flag
  • 9f2ab74 satellite/console: extend account freeze to affect op specific limits
  • e5bb367 satellite/console: attempt payment on card added
  • 2f89fd6 satellite/nodeselection: dual selector
  • 0daae24 web/satellite: fix error happening on logout
  • 49d2c92 web/satellite: fix text wrapping for manage passphrase dialog
  • cd7e49a satellite/metabase: exempt Postgres from UpdateTableStats test
  • 24fc2f3 web/satellite: fix delete notification
  • c0cf0db satellite/metabase: alias cache, only fetch missing nodes
  • 4a7d284 satellite/console: fix flaky account freeze tests
  • d6689fe web/satellite: fix limits behaviour
  • d8103ae web/satellite: optimize file delete
  • 753b045 satellite/{console,web}: support disabling satellite managed encryption
  • 12e13b0 satellite/metabase: use generic nodealiasmap, fix alias lookup
  • b2b9c3f web/satellite: fix swapped columns in versioned objects
  • 85b3ff1 satellite/nodeselection/selector_test: increase test delta epsilon
  • 8fc91a1 satellite/payments: update invoicing logic to handle accounts marked for deletion
  • 9106714 satellite/repair: even more logging
  • a9f888f web/satellite: add account type selection to account setup dialog
  • cb9e5b5 web/satellite: update managed passphrase step
  • 6d660b2 web/satellite: ui improvements
  • e0ab857 satellite/metabase: update GetTableStats for postgresql
  • 9b6d39d satellite/satellitedb/dbx: pull in dbx changes
  • fb9a0a2 satellite/repair: instrumentation around queue insert
  • a538566 satellite/satellitedb/dbx: add DriverMethods
  • 56b7948 satellite/{admin, db}: new endpoint for downloading CSV with user emails marked for deletion
  • 8265e00 satellite/admin: force delete projects of the users which were requested for deletion
  • 5a30a1d satellite/satellitedb: fix ApiKeys methods
  • 9f24535 satellite/satellitedb: fix billingDB use tx instead of db
  • 3cb6625 satellite/satellitedb: ensure consoledb tables use tx
  • 0681218 satellite/audit/verifier: Improve code comment
  • b9219e7 satellite/satellitedb: stripe customers, avoid leaking the underlying implementation
  • 3ad4a84 satellite/satellitedb: users, ensure code works in a tx
  • 0efc3f1 satellite/satellitedb: add tx and db mixing check
  • d8f9425 satellite/console: avoid a tx in apikeys
  • c02ec50 satellite/console: send email when user changes password
  • 91cdfe8 web/satellite: hide encryption notice for Storj managed projects
  • 02306e3 satellite/kms: add support for multiple kms keys
  • fe6ee95 web/satellite: improve exceed bandwidth limit error handling
  • 311f938 satellite/*: fix things next staticcheck noticed
  • b148438 satellite/satellitedb: Change meter by counter audit queues
  • d18201b satellite/metainfo: big bitshifttracker (bitshift tracker with variable size)
  • 252bc5a web/satellite: fix setup account flow

Storagenode

  • 2fceb6c storagenode/blobstore: blobstore with caching file stat information (mod time, size)
  • 9e99540 storagenode/retain: reduce concurrent retain requests to 1
  • 30f80af storagenode/storagenodedb: buffer up GC Filewalker progress storage
  • ea083c0 storagenode/orders: avoid Lstat calls on order files when listing
  • 845efc6 storagenode/pieces: collector delete should update usage cache
  • 32456d3 storagenode/collector: test used space updating
6 Likes

https://github.com/storj/storj/commit/406c2c3 404 error

2 Likes

I have high hopes for this one. And not a bunch of forum posts about “BadgerDB corrupted log entries are preventing my node from starting”. :crossed_fingers:

2 Likes

Will the file stat info db be stored alongside the other databases? Meaning, is there any other config change required by the SNOs if the DBs are already on SSD?

1 Like

I’ve enabled pieces.file-stat-cache: badger in the config and spun up with lazyfilewalker disabled. So far in the storage directory there is a new filestatcache directory where badger creates a 2GB .vlog file. The actual on-disk space so far in order of KB a few minutes after restart, maybe due to ZFS compression.

To be clear the filestatcache directory is also stored on HDD (i.e. follows the storage.path config option as opposed to storage2.database-dir). The IOPS even when triggering used-space-filewalker don’t seem significant. inotifywait sees an entry maybe every 20-30 seconds. If you are memory constrained may not be the case.

Once the used-space-filewalker completes I will report what I see for on-disk space.

8 Likes

Yeah. I really don’t like that.

It’s a job of a filesystem, not software. Software shall not reimplement system services. It just adds complexity and points of failure. It’s called feature creep and bloat. It’s a move into opposite direction. We need fewer databases and lower complexity on the node, not the other way around.

I’m disappointed this was approved.

Thankfully this only will be active if lazy filewalker is disabled (from a brief glance, I might be wrong).

3 Likes

And if the filesystem doesn’t do it? And no system service covers it? Don’t features then fall back to the app to implement? If it works: Storj has something cross-platform filesystem-agnostic that speeds up filewalker housekeeping.

If it doesn’t work, then it gets cut.

I’m willing to try it: it’s a reasonable path forward. Even if ZFS smokes it :wink:

2 Likes

Thanks for trying it out.

I don’t quite understand why the filecachedir is located on storage path. I would expect it on the database path (SSD) to reduce i/o on the HDD.

Seems hardcoded: storagenode/blobstore: blobstore with caching file stat information (… · storj/storj@2fceb6c · GitHub

Maybe this will change in future. If not a dir-symlink to the SSD should help here. Could you also try this @Ambifacient ? Thanks a lot.

Yeah: BadgerDB says it’s “Designed for SSDs”… so defaulting to a path that’s almost certainly not a SSD is a strange choice. Perhaps because of its size?

1 Like

When using docker in the worst case you can provide a bind mount to /app/config/storage/filestatcache/ from a place on your SSD.

4 Likes

may be storj can make it configurable, today for example i use NVME for DBs, i can put chache also there thein it will realy work fast.

1 Like

No, that means the system is not suitable for running that service.

I see it differently. This feature will be only useful on an underpowered, poorly sized, with crappy filesystem, rotten potatoes. That same potatoes that suffer from database corruption, resource exhaustion, and general poor stability. That same potatoes that would greatly benefit from reduced complexity, and elimination of existing databases, not inclusion of new ones.

Adding features like these is a slippery slope. In the extreme, eventually storagenode will end up supporting EFI secure boot and running reinvented ZFS. It will be called StorjOS, and Storj as company will switch to manufacturing SANs.

Instead, there shall be a hard line: minimum system requirements, per OS, per filesystem.

There is absolutely no reason to waste development, QA, and maintenance time on an issue that is solvable with a slightly larger memory stick, or a cheap SSD from ebay. Or, better yet, a line in a minimum system requirements text file.

With increased anticipated load storj does not need to be in business of sustaining zombies on life support. It shall be attracting hardware that does not need crutches to fetch files from the disk. There is surplus of node operators, storj can safely lose raspberry pies.

The problem is it will create an illusion of working – by actually working, but making more harm than good in the long run, in the form of decreased stability, especially on a weak nodes that benefit most from this features, and time investment for support. It’s just wasting resources to delay the inevitable. It’s a poor use of everyone’s time.

Yet, it seems this way in isolation. But it does not fit in the whole picture.

1 Like

Out of curiosity, what do you think of Ceph’s BlueStore? Ceph has migrated to this solution, which is object storage directly on top of block devices, from a file system-based approach citing performance issues.

Doesn’t BlueStore replace NTFS/EXT4: sitting right on the devices? That’s not really with the ‘use what you have’ mantra if you wipe your existing filesystems…

1 Like

storj just want to work with everyone and not end up like filecoin, insane requirements, but it realy not scalable in real world reality.

According to @elek this is what you get with ext4:

As it is the default for most Linux I’d say the implementation is not suited for the system. That’s why the implementation needs to be adapted.

I don’t think that is true either. Looking at this issue:

The filewalkers are a problem on the select nodes as well and these should be datacenter grade nodes.
What Storj maybe could be doing is to to make system recommendations to for individual node operators that handle the load better than the default installations. For select nodes they might be able to force specific hard- or software setups as mandatory on the datacenters, like telling them the nodes must be running on ZFS with SSD metadata cache miniumum 1GB / TB stored. A datacenter should be able to follow such requirement.

@Mad_Max thanks again for your extensive investigations and reports. This is just great.

2 Likes

This begs the question – why does the node need size and modification time for what’s supposed to be immutable objects?

And if the answer is “to show pretty pictures in the dashboard” – this only reinforces my earlier conviction that the dashboard must go. Satellite knows how much data is stored on the node. It knows the limit. Calculating sizes locally is counterproductive.

I’m not familiar with it, I’ll have a look.

We are talking about tens of terabytes of space. I doubt people who have so much space available as a pocket change to share with storj are running NTFS or ext4. And on smaller nodes even ntfs works ok.

Please, do not use symlinks any type. You likely will end with DQ, proven many times.