ERROR lazyfilewalker.gc-filewalker.subprocess failed to save progress in the database

Bivvo · June 16, 2024, 8:56am

it’s too long - how to find the right messages?
journalctl is empty.

Alexey · June 16, 2024, 9:05am

There I cannot give you an exact string to search. Try to use a name of the process. And you need to review only the time interval, when the restart is happened.

Qwinn · June 16, 2024, 1:31pm

This is the requested ls --si of the databases as they exist on my HDD. I haven’t been using these since yesterday morning, when I switched them to the SSD. But these are the typical sizes I saw when storj was running against these.

qwinn@Gungnir:/mnt/storj/storj1/storj1/storage$ ls -l --si *.db
-rw-r--r-- 1 qwinn qwinn  59M Jun 15 06:32 bandwidth.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 15 02:32 heldamount.db
-rw-r--r-- 1 qwinn qwinn  17k Jun 15 02:32 info.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 notifications.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 15 02:32 orders.db
-rw-r--r-- 1 qwinn qwinn 893M Jun 15 06:04 piece_expiration.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 pieceinfo.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 piece_spaced_used.db
-rw-r--r-- 1 qwinn qwinn    0 Jun 14 15:01 piece_space_used.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 pricing.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 15 06:36 reputation.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 15 02:32 satellites.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 secret.db
-rw-r--r-- 1 qwinn qwinn 193k Jun 15 02:32 storage_usage.db
-rw-r--r-- 1 qwinn qwinn  21k Jun 15 02:32 used_serial.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 02:32 used_space_per_prefix.db

Additional notes:

It does seem like the node has mostly stopped growing as free space dropped under 5g. But not entirely.

/dev/sdf 15501595164 15496770964 4807816 100% /mnt/storj/storj1

That free space seems to be dropping like 8 bytes every few seconds. That should last a while, but not forever. And probably not as long as it’ll take to finish the current lazy filewalker run. It’s been running since I switched the drives to SSD around 24 hours ago and it’s at directory “dg”. At that rate it’ll be weeks. Just to finish the Salt Lake satellite.

I can confirm that in fact, since I switched the databases to my OS drive, I have not seen any database locked messages. As I said previously, I think this is somehow compensating for the bug, but were the bug solved, the HDD would be capable of handling this.

Of interest, I just ran the same ls --si on the SSD directory that I moved the databases to, and the size of the piece_expiration db here is already MUCH MUCH bigger.

-rw-r--r-- 1 qwinn qwinn  59M Jun 16 09:24 bandwidth.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 16 07:26 heldamount.db
-rw-r--r-- 1 qwinn qwinn  17k Jun 15 07:24 info.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 notifications.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 15 07:24 orders.db
-rw-r--r-- 1 qwinn qwinn 1.4G Jun 16 08:54 piece_expiration.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 pieceinfo.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 piece_spaced_used.db
-rw-r--r-- 1 qwinn qwinn    0 Jun 15 06:49 piece_space_used.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 pricing.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 16 07:27 reputation.db
-rw-r--r-- 1 qwinn qwinn  33k Jun 15 07:25 satellites.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 secret.db
-rw-r--r-- 1 qwinn qwinn 193k Jun 16 07:26 storage_usage.db
-rw-r--r-- 1 qwinn qwinn  21k Jun 15 07:24 used_serial.db
-rw-r--r-- 1 qwinn qwinn  25k Jun 15 07:24 used_space_per_prefix.db

That’s like nearly double the size. I don’t think it was ever able to get anywhere near that big when it was on the HDD.

Please remember that, sometimes, all the locks go against the much smaller bandwidth.db and it updates the much large piece_expiration.db with no problem. So please don’t just decide “your HDD can’t handle a bigger db”.

Qwinn · June 16, 2024, 2:08pm

As an update, the free space left on my drive does seem to have increased back up to just over 5g, and seems to be sort of stable there now. So, ok, cool, looks like node might not be doomed. (One does still have to wonder why it doesn’t use this way of calculating free space for the dashboard all of the time, or why we are asked to leave 10% of the HDD free at all times if it can handle being this close to the edge. The 10% free I had left (I was allocating 14.5TB on a 16TB drive) didn’t help me much at all in this scenario.)

Node dashboard still showing node thinks it has 2.04TB free to play with.

As I noted above, at the current rate the filewalker is running, looks like it’ll be done in a few weeks (I am running my other 40% cpu process tasks atm, and not really willing to shut them down again, not when it would still take well over a couple of days to finish anyway). I’ll try not to shut the node down again, but I very much doubt it won’t be automatically updated in that much time. I’ve turned off the watchtower hoping that helps.

Toyoo · June 16, 2024, 3:29pm

Perhaps we have another case of orders.db/bandwidth.db. Orders used to be stored in a SQLite database, and because it had to be inserted into on each upload, it kept locking up. This was changed to append-only log files. Then we had bandwidth.db locking up, as it was also inserted into on each upload—this was changed to keep an in-memory cache instead and only update the totals in the database periodically. Now we have piece expirations that have to be inserted on almost each upload… and we have orders of magnitude more uploads now as well.

ACarneiro · June 16, 2024, 11:08pm

So a funny thing just happened.

Went to have a look at my worst “potato node”. That one is running on a Pi5 with a 18TB Exos via USB but with a small SD Card as O/S so I moved the logs and databases to the spinning rust.

A few days ago I noticed that the daily used bandwidth in the dashboard didn’t really tally with the throughput I was seeing on the machine. I didn’t think much of it.
Today I noticed that although the dashboard is showing around 8TB used, 8TB free and about 1TB of trash, doing a df shows actually 17TB are in use. So there are 8TB of files unaccounted for in the dashboard.

Running a grep on my logs for “locked” showed LOADS of "database locked"errors.

So I have disabled lazy filewalker, reduced storage2.max-concurrent-requests to 5 (so I’m not hammered) and set storage2.piece-scan-on-startup: true

I restarted the node and it’s now running the filewalker. I expect this should take a few days.

Funny thing is, when I was messing with config.yaml I noticed that I had

db.max_open_conns: 5

Now, I cannot for the life of me remember having uncommented that parameter and I have no idea if it may cause “database locked” errors, although it seems plausible that it might.
And then I remembered all the problems that @Qwinn is having with a system that is so highly specced that “database locked” errors really shouldn’t be happening.
Could that setting have something to do with that?

JWvdV · June 17, 2024, 12:11am

I’ve got my databases on NVMe gen 4 SSD, I got curious. So I did this:

root@VM-HOST:/var/lib/lxc# for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- docker logs storagenode 2> /dev/null | grep locked; done


STORJ10
2024-06-16T06:48:42Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "GET_AUDIT", "amount": 2304, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-16T06:48:42Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 187136, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-16T06:48:42Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 2304, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}


STORJ11
2024-06-15T16:14:05Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 768, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:05Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 181248, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:15Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 4707338, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:15Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 256, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:25Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_REPAIR", "amount": 102144, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:35Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "PUT", "amount": 10, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:51:39Z    ERROR   services        unexpected shutdown of a runner                                        {"Process": "storagenode", "name": "forgetsatellite:chore", "error": "database is locked"}
2024-06-15T17:07:00Z    ERROR   failure during run      {"Process": "storagenode", "error": "database is locked"}
Error: database is locked
2024-06-15T18:23:52Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:24:52Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:15Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 7413514, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:15Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 256, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:16Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "PUT", "amount": 627310592, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:25Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 512, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:26Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "GET_AUDIT", "amount": 256, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}


STORJ16
2024-06-15T20:08:42Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 512, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:42Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_REPAIR", "amount": 43595008, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked"

(...)

STORJ22
2024-06-15T15:11:39Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "PUT", "amount": 7482368, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T15:11:40Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "PUT", "amount": 475791360, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:11:43Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 362496, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:11:43Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 512, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:11:53Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "PUT", "amount": 10, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:12:03Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 8182794, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:12:13Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_REPAIR", "amount": 3797504, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:17:28Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:48:29Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:52:28Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:53:28Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:54:28Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:55:28Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T22:00:28Z    ERROR   services        unexpected shutdown of a runner                                        {"Process": "storagenode", "name": "forgetsatellite:chore", "error": "database is locked"}
2024-06-15T22:01:32Z    ERROR   failure during run      {"Process": "storagenode", "error": "database is locked"}
Error: database is locked


STORJ23
2024-06-15T14:58:36Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 3229952, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T14:58:46Z    ERROR   orders  failed to add bandwidth usage   {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 768, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:02:14Z    ERROR   services        unexpected shutdown of a runner                                        {"Process": "storagenode", "name": "forgetsatellite:chore", "error": "database is locked"}
2024-06-15T16:18:15Z    ERROR   failure during run      {"Process": "storagenode", "error": "database is locked"}
Error: database is locked
2024-06-15T18:57:39Z    ERROR   gracefulexit:chore      error retrieving satellites.                                   {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

I think the TS might be right…
It’s probably not primarily a hardware issue.

root@VM-HOST:/var/lib/lxc# iostat -x /dev/nvme0n1
Linux 6.1.0-21-amd64 (VM-HOST)  17-06-24        _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,65    0,49    4,06   73,74    0,00   19,06

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1         41,13   1035,76     1,35   3,19    0,64    25,18   42,01    911,61     3,93   8,55    2,12    21,70    2,36    305,93     0,00   0,00    0,61   129,84    0,31    0,33    0,12   4,93

Doesn’t look like the drive is overwhelmed…

root@VM-HOST:/var/lib/lxc# free -m
               total        used        free      shared  buff/cache   available
Mem:           47781       14918        1146        6202       40978       32863
Swap:          45392        4028       41363

Neither is memory…

I think, it’s the same issue as I posted.

And, no, these databases contain no errors. Also the frequency of those errors would be too low in that case. Pragma checks are ok so far.

Also no malformed issues:

root@VM-HOST:/var/lib/lxc# for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- docker logs storagenode 2> /dev/null | grep malformed; done


STORJ10


STORJ11


STORJ16


STORJ17


STORJ18


STORJ22


STORJ23


STORJ4


STORJ6

Qwinn · June 17, 2024, 2:14am

I do not have that max_open_conns parameter uncommented in my config.

One piece of happy news, now that the running out of room thing has stopped the constant ingress, the lazy filewalker is starting to move a bit faster. Now up to “ra” on the first satellite. This probably still means it’s going to take a week, but that’s better than the near month it was looking like. I’ll almost certainly get an autoupdate to interrupt it before it can finish tho, sigh.

Alexey · June 17, 2024, 3:56am

Thanks!
Could you please show the size of these databases?

Alexey · June 17, 2024, 3:59am

Thank you for the info!
Do you see “database is locked” errors on the node where databases are on SSD?

Qwinn · June 17, 2024, 4:25am

I already answered that in the post you just replied to.

I will add that the piece_expiration.db on the SSD does not appear to be growing further. Still showing 1.4GB.

Alexey · June 17, 2024, 4:39am

Oh, yes, sorry I overlooked it.

No, I wouldn’t. It’s too suspicious that these errors could be even when DBs are on SSD, so I think there could be a some relation between the size of the database and a frequency of “database is locked” errors.

Your node is full, so no uploads and no modification of that database. As soon as some data would be removed, the ingress likely would start again. But I’m not sure that this database would shrunk.

My biggest node now has the “database is locked” errors too, but only for piece_expiration.db so far (databases are on the HDD). It wasn’t the case for many years.

So I believe that @Toyoo is right:

And your HW is actually ok, but we need to have a new feature to change the how we work with that database.

Qwinn · June 17, 2024, 4:58am

As I have said, I have only ever experienced the database locking a single db, ever, and which one it is seems to change with each restart of the node, with the chance of a db being the one that gets locked proportional to its size as a percentage of the whole. I don’t know how big your biggest node is, but if it’s small enough that the piece_expiration isn’t too much bigger than the bandwidth.db, like maybe only 5 or 6x bigger, then try restarting the node a few times. Eventually, it’ll be bandwidth.db, and only bandwidth.db, that gets all the locking.

Alexey · June 17, 2024, 5:02am

It’s not so big, I used the existing not fully used disk. Two others are 1.4TB and 0.9TB. All nodes are full at the moment

Just checked. It should be 7TB, but shows 6.75TB on the dashboard. Interesting.

Qwinn · June 17, 2024, 5:09am

My 16TB disk has 16TB of data, minus the 5GB keeping it from crashing. The node says that I have 14.5TB allocated, 11.14TB used, 1.32TB trash, and 2.04TB free.

JWvdV · June 17, 2024, 5:45am

Not that much as is being complained of by others:

root@VM-HOST:~# cd /var/lib/lxc; for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- ls -lSh /storj/DBs; done     

STORJ10
totaal 830M
-rw-r--r-- 1 root root 829M 17 jun 07:18 piece_expiration.db
-rw-r--r-- 1 root root 268K 16 jun 22:22 storage_usage.db
-rw-r--r-- 1 root root 212K 17 jun 07:18 bandwidth.db
-rw-r--r-- 1 root root  52K 16 jun 22:22 heldamount.db
-rw-r--r-- 1 root root  32K 15 jun 22:18 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:23 piece_spaced_used.db-shm
-rw-r--r-- 1 root root  32K 17 jun 06:21 reputation.db
-rw-r--r-- 1 root root  32K 17 jun 07:33 satellites.db-shm
-rw-r--r-- 1 root root  28K 15 jun 22:18 satellites.db
-rw-r--r-- 1 root root  24K 15 jun 22:18 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 15 jun 22:18 notifications.db
-rw-r--r-- 1 root root  24K 15 jun 22:18 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 06:53 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 15 jun 22:18 pricing.db
-rw-r--r-- 1 root root  24K 15 jun 22:18 secret.db
-rw-r--r-- 1 root root  24K 15 jun 22:18 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 15 jun 22:18 info.db
-rw-r--r-- 1 root root  16K 15 jun 22:18 used_serial.db
-rw-r--r-- 1 root root 8,1K 17 jun 07:23 piece_spaced_used.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:33 satellites.db-wal


STORJ11
totaal 396M
-rw-r--r-- 1 root root 396M 17 jun 07:37 piece_expiration.db
-rw-r--r-- 1 root root  88K 17 jun 07:38 storage_usage.db
-rw-r--r-- 1 root root  68K 17 jun 07:38 bandwidth.db
-rw-r--r-- 1 root root  32K 17 jun 07:38 heldamount.db
-rw-r--r-- 1 root root  32K 15 jun 19:37 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:37 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:16 satellites.db-shm
-rw-r--r-- 1 root root  28K 15 jun 19:51 satellites.db
-rw-r--r-- 1 root root  24K 15 jun 19:37 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 15 jun 19:38 notifications.db
-rw-r--r-- 1 root root  24K 15 jun 19:37 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 07:42 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 15 jun 19:38 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 07:38 reputation.db
-rw-r--r-- 1 root root  24K 15 jun 19:40 secret.db
-rw-r--r-- 1 root root  24K 15 jun 19:38 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 15 jun 19:37 info.db
-rw-r--r-- 1 root root  16K 15 jun 19:37 used_serial.db
-rw-r--r-- 1 root root    0 17 jun 07:37 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:16 satellites.db-wal


STORJ16
totaal 66M
-rw-r--r-- 1 root root  65M 17 jun 07:38 piece_expiration.db
-rw-r--r-- 1 root root 104K 16 jun 21:42 storage_usage.db
-rw-r--r-- 1 root root  76K 17 jun 07:38 bandwidth.db
-rw-r--r-- 1 root root  32K 16 jun 21:42 heldamount.db
-rw-r--r-- 1 root root  32K 15 jun 21:38 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:23 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:19 satellites.db-shm
-rw-r--r-- 1 root root  28K 15 jun 21:38 satellites.db
-rw-r--r-- 1 root root  24K 15 jun 21:38 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 15 jun 21:38 notifications.db
-rw-r--r-- 1 root root  24K 15 jun 21:38 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 07:42 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 15 jun 21:38 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 05:38 reputation.db
-rw-r--r-- 1 root root  24K 15 jun 21:38 secret.db
-rw-r--r-- 1 root root  24K 15 jun 21:38 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 15 jun 21:38 info.db
-rw-r--r-- 1 root root  16K 15 jun 21:38 used_serial.db
-rw-r--r-- 1 root root    0 17 jun 07:23 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:19 satellites.db-wal


STORJ17
totaal 58M
-rw-r--r-- 1 root root  56M 17 jun 07:25 piece_expiration.db
-rw-r--r-- 1 root root 580K 17 jun 07:29 piece_expiration.db-wal
-rw-r--r-- 1 root root 290K 17 jun 07:32 heldamount.db-wal
-rw-r--r-- 1 root root 182K 17 jun 07:32 storage_usage.db-wal
-rw-r--r-- 1 root root 100K 16 jun 21:27 storage_usage.db
-rw-r--r-- 1 root root  80K 17 jun 07:26 bandwidth.db
-rw-r--r-- 1 root root  77K 17 jun 07:30 bandwidth.db-wal
-rw-r--r-- 1 root root  65K 17 jun 07:29 pricing.db-wal
-rw-r--r-- 1 root root  65K 17 jun 07:31 reputation.db-wal
-rw-r--r-- 1 root root  41K 17 jun 07:30 piece_spaced_used.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 garbage_collection_filewalker_progress.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 info.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 notifications.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 orders.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 pieceinfo.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 satellites.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 secret.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 used_serial.db-wal
-rw-r--r-- 1 root root  33K 17 jun 07:29 used_space_per_prefix.db-wal
-rw-r--r-- 1 root root  32K 17 jun 07:30 bandwidth.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 garbage_collection_filewalker_progress.db-shm
-rw-r--r-- 1 root root  32K 16 jun 21:27 heldamount.db
-rw-r--r-- 1 root root  32K 17 jun 07:32 heldamount.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 info.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 notifications.db-shm
-rw-r--r-- 1 root root  32K 15 jun 21:19 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:29 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 piece_expiration.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 pieceinfo.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:30 piece_spaced_used.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 pricing.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:31 reputation.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 satellites.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 secret.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:32 storage_usage.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 used_serial.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:29 used_space_per_prefix.db-shm
-rw-r--r-- 1 root root  28K 15 jun 21:19 satellites.db
-rw-r--r-- 1 root root  24K 15 jun 21:25 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 15 jun 21:19 notifications.db
-rw-r--r-- 1 root root  24K 15 jun 21:19 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 07:26 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 15 jun 21:25 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 05:31 reputation.db
-rw-r--r-- 1 root root  24K 15 jun 21:25 secret.db
-rw-r--r-- 1 root root  24K 15 jun 21:25 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 15 jun 21:19 info.db
-rw-r--r-- 1 root root  16K 15 jun 21:19 used_serial.db


STORJ18
totaal 376M
-rw-r--r-- 1 root root 371M 17 jun 07:42 piece_expiration.db
-rw-r--r-- 1 root root 5,1M 17 jun 07:42 piece_expiration.db-wal
-rw-r--r-- 1 root root  96K 17 jun 01:46 storage_usage.db
-rw-r--r-- 1 root root  80K 17 jun 06:47 bandwidth.db
-rw-r--r-- 1 root root  41K 17 jun 07:17 bandwidth.db-wal
-rw-r--r-- 1 root root  32K 17 jun 07:17 bandwidth.db-shm
-rw-r--r-- 1 root root  32K 17 jun 01:46 heldamount.db
-rw-r--r-- 1 root root  32K 17 jun 01:46 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:16 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:42 piece_expiration.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:16 pieceinfo.db-shm
-rw-r--r-- 1 root root  32K 17 jun 06:54 satellites.db-shm
-rw-r--r-- 1 root root  28K 17 jun 01:46 satellites.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 notifications.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 05:49 reputation.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 secret.db
-rw-r--r-- 1 root root  24K 17 jun 01:46 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 17 jun 01:46 info.db
-rw-r--r-- 1 root root  16K 17 jun 01:46 used_serial.db
-rw-r--r-- 1 root root    0 17 jun 07:16 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:16 pieceinfo.db-wal
-rw-r--r-- 1 root root    0 17 jun 06:54 satellites.db-wal


STORJ22
totaal 307M
-rw-r--r-- 1 root root 306M 17 jun 07:31 piece_expiration.db
-rw-r--r-- 1 root root  96K 17 jun 00:33 storage_usage.db
-rw-r--r-- 1 root root  76K 17 jun 07:31 bandwidth.db
-rw-r--r-- 1 root root  32K 17 jun 00:33 heldamount.db
-rw-r--r-- 1 root root  32K 16 jun 00:31 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:36 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:25 satellites.db-shm
-rw-r--r-- 1 root root  28K 16 jun 00:31 satellites.db
-rw-r--r-- 1 root root  24K 16 jun 00:31 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 16 jun 00:31 notifications.db
-rw-r--r-- 1 root root  24K 16 jun 00:31 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 07:32 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 16 jun 00:31 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 04:34 reputation.db
-rw-r--r-- 1 root root  24K 16 jun 00:31 secret.db
-rw-r--r-- 1 root root  24K 16 jun 00:31 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 16 jun 00:31 info.db
-rw-r--r-- 1 root root  16K 16 jun 00:31 used_serial.db
-rw-r--r-- 1 root root    0 17 jun 07:36 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:25 satellites.db-wal


STORJ23
totaal 1,6G
-rw-r--r-- 1 root root 1,6G 17 jun 07:16 piece_expiration.db
-rw-r--r-- 1 root root  96K 17 jun 01:22 storage_usage.db
-rw-r--r-- 1 root root  76K 17 jun 07:16 bandwidth.db
-rw-r--r-- 1 root root  32K 17 jun 01:22 heldamount.db
-rw-r--r-- 1 root root  32K 16 jun 01:16 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:31 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:30 satellites.db-shm
-rw-r--r-- 1 root root  28K 16 jun 01:16 satellites.db
-rw-r--r-- 1 root root  24K 16 jun 01:16 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 16 jun 01:16 notifications.db
-rw-r--r-- 1 root root  24K 16 jun 01:16 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 07:19 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 16 jun 01:16 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 05:18 reputation.db
-rw-r--r-- 1 root root  24K 16 jun 01:16 secret.db
-rw-r--r-- 1 root root  24K 16 jun 01:16 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 16 jun 01:16 info.db
-rw-r--r-- 1 root root  16K 16 jun 01:16 used_serial.db
-rw-r--r-- 1 root root    0 17 jun 07:31 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:30 satellites.db-wal


STORJ4
totaal 243M
-rw-r--r-- 1 root root 242M 17 jun 07:05 piece_expiration.db
-rw-r--r-- 1 root root 248K 17 jun 01:34 storage_usage.db
-rw-r--r-- 1 root root 156K 17 jun 07:34 bandwidth.db
-rw-r--r-- 1 root root  48K 17 jun 01:34 heldamount.db
-rw-r--r-- 1 root root  32K 17 jun 01:34 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:34 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 05:20 reputation.db
-rw-r--r-- 1 root root  32K 17 jun 07:34 satellites.db-shm
-rw-r--r-- 1 root root  28K 17 jun 01:34 satellites.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 notifications.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 secret.db
-rw-r--r-- 1 root root  24K 17 jun 01:34 used_space_per_prefix.db
-rw-r--r-- 1 root root  16K 17 jun 01:34 info.db
-rw-r--r-- 1 root root  16K 17 jun 01:34 used_serial.db
-rw-r--r-- 1 root root    0 17 jun 07:34 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:34 satellites.db-wal


STORJ6
totaal 752M
-rw-r--r-- 1 root root 751M 17 jun 06:48 piece_expiration.db
-rw-r--r-- 1 root root 455K 17 jun 07:18 piece_expiration.db-wal
-rw-r--r-- 1 root root  96K 17 jun 06:51 storage_usage.db
-rw-r--r-- 1 root root  68K 17 jun 06:48 bandwidth.db
-rw-r--r-- 1 root root  32K 17 jun 07:18 bandwidth.db-shm
-rw-r--r-- 1 root root  32K 17 jun 06:51 heldamount.db
-rw-r--r-- 1 root root  32K 15 jun 18:48 orders.db
-rw-r--r-- 1 root root  32K 17 jun 07:18 orders.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:18 piece_expiration.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:18 pieceinfo.db-shm
-rw-r--r-- 1 root root  32K 17 jun 07:19 piece_spaced_used.db-shm
-rw-r--r-- 1 root root  32K 17 jun 06:49 satellites.db-shm
-rw-r--r-- 1 root root  28K 15 jun 18:48 satellites.db
-rw-r--r-- 1 root root  24K 15 jun 18:48 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root  24K 15 jun 18:48 notifications.db
-rw-r--r-- 1 root root  24K 15 jun 18:48 pieceinfo.db
-rw-r--r-- 1 root root  24K 17 jun 06:49 piece_spaced_used.db
-rw-r--r-- 1 root root  24K 15 jun 18:48 pricing.db
-rw-r--r-- 1 root root  24K 17 jun 06:50 reputation.db
-rw-r--r-- 1 root root  24K 15 jun 18:48 secret.db
-rw-r--r-- 1 root root  24K 15 jun 18:48 used_space_per_prefix.db
-rw-r--r-- 1 root root  17K 17 jun 07:18 bandwidth.db-wal
-rw-r--r-- 1 root root  16K 15 jun 18:48 info.db
-rw-r--r-- 1 root root  16K 15 jun 18:48 used_serial.db
-rw-r--r-- 1 root root 8,1K 17 jun 07:19 piece_spaced_used.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:18 orders.db-wal
-rw-r--r-- 1 root root    0 17 jun 07:18 pieceinfo.db-wal
-rw-r--r-- 1 root root    0 17 jun 06:49 satellites.db-wal

The satellites and bandwidth aren’t usually the biggest ones, which is being complained of most in the logs (so size isn’t probably the diagnosis, at least… For the individual files). On every restart of my nodes I let vacuum them. So the optimization isn’t there either.

Alexey · June 17, 2024, 6:46am

Yes, this information should be updated by used-space-filewalker, when it finish its scans for each trusted satellite, since now it doesn’t have a “database is locked” issue.

Alexey · June 17, 2024, 6:49am

Do all these nodes uses the same SSD to store their databases?

Bivvo · June 17, 2024, 6:53am

unfortunately dmesg was newly created on startup.

in my case, yes, too.

pi@raspberrypi:~/storjDatabasesLocal $ ls -lh
total 792M
-rwxr-xr-x 1 pi pi  83M Jun 17 08:27 bandwidth.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:28 bandwidth.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:28 bandwidth.db-wal
-rw-r--r-- 1 pi pi  24K Jun 16 10:29 garbage_collection_filewalker_progress.db
-rwxr-xr-x 1 pi pi 136K Jun 16 22:30 heldamount.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:43 heldamount.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:43 heldamount.db-wal
-rwxr-xr-x 1 pi pi  16K Jun 16 10:29 info.db
-rwxr-xr-x 1 pi pi  24K Jun 16 10:29 notifications.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:43 notifications.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:43 notifications.db-wal
-rwxr-xr-x 1 pi pi  32K Jun 16 10:29 orders.db
-rwxr-xr-x 1 pi pi 705M Jun 17 08:32 piece_expiration.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:54 piece_expiration.db-shm
-rwxr-xr-x 1 pi pi 2.3M Jun 17 08:54 piece_expiration.db-wal
-rwxr-xr-x 1 pi pi  24K Jun 16 10:29 pieceinfo.db
-rwxr-xr-x 1 pi pi  24K Jun 16 10:29 piece_spaced_used.db
-rwxr-xr-x 1 pi pi  24K Jun 16 10:29 pricing.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:33 pricing.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:33 pricing.db-wal
-rwxr-xr-x 1 pi pi  36K Jun 17 06:26 reputation.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:33 reputation.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:33 reputation.db-wal
-rwxr-xr-x 1 pi pi  32K Jun 16 10:59 satellites.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:30 satellites.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:30 satellites.db-wal
-rwxr-xr-x 1 pi pi  24K Jun 16 10:29 secret.db
-rwxr-xr-x 1 pi pi 1.2M Jun 16 22:02 storage_usage.db
-rwxr-xr-x 1 pi pi  32K Jun 17 08:33 storage_usage.db-shm
-rwxr-xr-x 1 pi pi    0 Jun 17 08:33 storage_usage.db-wal
-rwxr-xr-x 1 pi pi  20K Jun 16 10:29 used_serial.db
-rw-r--r-- 1 pi pi  24K Jun 16 10:29 used_space_per_prefix.db

Alexey · June 17, 2024, 7:00am

Then need to wait for the next occurrence, because it’s not storagenode itself.
By the way, what’s docker logs is showing (I assumed that you redirected the logs, so docker logs should show only a supervisor’s and an updater’s logs)?
I also do not like the time between events in the logs. It looks like the node just hangs, then something did reset it. Power cuts?