it’s too long - how to find the right messages?
journalctl is empty.
There I cannot give you an exact string to search. Try to use a name of the process. And you need to review only the time interval, when the restart is happened.
This is the requested ls --si of the databases as they exist on my HDD. I haven’t been using these since yesterday morning, when I switched them to the SSD. But these are the typical sizes I saw when storj was running against these.
qwinn@Gungnir:/mnt/storj/storj1/storj1/storage$ ls -l --si *.db
-rw-r--r-- 1 qwinn qwinn 59M Jun 15 06:32 bandwidth.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 15 02:32 heldamount.db
-rw-r--r-- 1 qwinn qwinn 17k Jun 15 02:32 info.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 notifications.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 15 02:32 orders.db
-rw-r--r-- 1 qwinn qwinn 893M Jun 15 06:04 piece_expiration.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 pieceinfo.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 piece_spaced_used.db
-rw-r--r-- 1 qwinn qwinn 0 Jun 14 15:01 piece_space_used.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 pricing.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 15 06:36 reputation.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 15 02:32 satellites.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 secret.db
-rw-r--r-- 1 qwinn qwinn 193k Jun 15 02:32 storage_usage.db
-rw-r--r-- 1 qwinn qwinn 21k Jun 15 02:32 used_serial.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 02:32 used_space_per_prefix.db
Additional notes:
It does seem like the node has mostly stopped growing as free space dropped under 5g. But not entirely.
/dev/sdf 15501595164 15496770964 4807816 100% /mnt/storj/storj1
That free space seems to be dropping like 8 bytes every few seconds. That should last a while, but not forever. And probably not as long as it’ll take to finish the current lazy filewalker run. It’s been running since I switched the drives to SSD around 24 hours ago and it’s at directory “dg”. At that rate it’ll be weeks. Just to finish the Salt Lake satellite.
I can confirm that in fact, since I switched the databases to my OS drive, I have not seen any database locked messages. As I said previously, I think this is somehow compensating for the bug, but were the bug solved, the HDD would be capable of handling this.
Of interest, I just ran the same ls --si on the SSD directory that I moved the databases to, and the size of the piece_expiration db here is already MUCH MUCH bigger.
-rw-r--r-- 1 qwinn qwinn 59M Jun 16 09:24 bandwidth.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 16 07:26 heldamount.db
-rw-r--r-- 1 qwinn qwinn 17k Jun 15 07:24 info.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 notifications.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 15 07:24 orders.db
-rw-r--r-- 1 qwinn qwinn 1.4G Jun 16 08:54 piece_expiration.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 pieceinfo.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 piece_spaced_used.db
-rw-r--r-- 1 qwinn qwinn 0 Jun 15 06:49 piece_space_used.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 pricing.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 16 07:27 reputation.db
-rw-r--r-- 1 qwinn qwinn 33k Jun 15 07:25 satellites.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 secret.db
-rw-r--r-- 1 qwinn qwinn 193k Jun 16 07:26 storage_usage.db
-rw-r--r-- 1 qwinn qwinn 21k Jun 15 07:24 used_serial.db
-rw-r--r-- 1 qwinn qwinn 25k Jun 15 07:24 used_space_per_prefix.db
That’s like nearly double the size. I don’t think it was ever able to get anywhere near that big when it was on the HDD.
Please remember that, sometimes, all the locks go against the much smaller bandwidth.db and it updates the much large piece_expiration.db with no problem. So please don’t just decide “your HDD can’t handle a bigger db”.
As an update, the free space left on my drive does seem to have increased back up to just over 5g, and seems to be sort of stable there now. So, ok, cool, looks like node might not be doomed. (One does still have to wonder why it doesn’t use this way of calculating free space for the dashboard all of the time, or why we are asked to leave 10% of the HDD free at all times if it can handle being this close to the edge. The 10% free I had left (I was allocating 14.5TB on a 16TB drive) didn’t help me much at all in this scenario.)
Node dashboard still showing node thinks it has 2.04TB free to play with.
As I noted above, at the current rate the filewalker is running, looks like it’ll be done in a few weeks (I am running my other 40% cpu process tasks atm, and not really willing to shut them down again, not when it would still take well over a couple of days to finish anyway). I’ll try not to shut the node down again, but I very much doubt it won’t be automatically updated in that much time. I’ve turned off the watchtower hoping that helps.
Perhaps we have another case of orders.db/bandwidth.db. Orders used to be stored in a SQLite database, and because it had to be inserted into on each upload, it kept locking up. This was changed to append-only log files. Then we had bandwidth.db locking up, as it was also inserted into on each upload—this was changed to keep an in-memory cache instead and only update the totals in the database periodically. Now we have piece expirations that have to be inserted on almost each upload… and we have orders of magnitude more uploads now as well.
So a funny thing just happened.
Went to have a look at my worst “potato node”. That one is running on a Pi5 with a 18TB Exos via USB but with a small SD Card as O/S so I moved the logs and databases to the spinning rust.
A few days ago I noticed that the daily used bandwidth in the dashboard didn’t really tally with the throughput I was seeing on the machine. I didn’t think much of it.
Today I noticed that although the dashboard is showing around 8TB used, 8TB free and about 1TB of trash, doing a df shows actually 17TB are in use. So there are 8TB of files unaccounted for in the dashboard.
Running a grep on my logs for “locked” showed LOADS of "database locked"errors.
So I have disabled lazy filewalker, reduced storage2.max-concurrent-requests
to 5 (so I’m not hammered) and set storage2.piece-scan-on-startup: true
I restarted the node and it’s now running the filewalker. I expect this should take a few days.
Funny thing is, when I was messing with config.yaml I noticed that I had
db.max_open_conns: 5
Now, I cannot for the life of me remember having uncommented that parameter and I have no idea if it may cause “database locked” errors, although it seems plausible that it might.
And then I remembered all the problems that @Qwinn is having with a system that is so highly specced that “database locked” errors really shouldn’t be happening.
Could that setting have something to do with that?
I’ve got my databases on NVMe gen 4 SSD, I got curious. So I did this:
root@VM-HOST:/var/lib/lxc# for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- docker logs storagenode 2> /dev/null | grep locked; done
STORJ10
2024-06-16T06:48:42Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "GET_AUDIT", "amount": 2304, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-16T06:48:42Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 187136, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-16T06:48:42Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 2304, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
STORJ11
2024-06-15T16:14:05Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 768, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:05Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 181248, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:15Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 4707338, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:15Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 256, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:25Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_REPAIR", "amount": 102144, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:14:35Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "PUT", "amount": 10, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:51:39Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "forgetsatellite:chore", "error": "database is locked"}
2024-06-15T17:07:00Z ERROR failure during run {"Process": "storagenode", "error": "database is locked"}
Error: database is locked
2024-06-15T18:23:52Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:24:52Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:15Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 7413514, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:15Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 256, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:16Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "PUT", "amount": 627310592, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:25Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 512, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:26Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "GET_AUDIT", "amount": 256, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
STORJ16
2024-06-15T20:08:42Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 512, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T20:08:42Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_REPAIR", "amount": 43595008, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked"
(...)
STORJ22
2024-06-15T15:11:39Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "PUT", "amount": 7482368, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T15:11:40Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "action": "PUT", "amount": 475791360, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:11:43Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 362496, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:11:43Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_AUDIT", "amount": 512, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:11:53Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "PUT", "amount": 10, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:12:03Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET", "amount": 8182794, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:12:13Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "action": "GET_REPAIR", "amount": 3797504, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:17:28Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:48:29Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:52:28Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:53:28Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:54:28Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T18:55:28Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T22:00:28Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "forgetsatellite:chore", "error": "database is locked"}
2024-06-15T22:01:32Z ERROR failure during run {"Process": "storagenode", "error": "database is locked"}
Error: database is locked
STORJ23
2024-06-15T14:58:36Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET", "amount": 3229952, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T14:58:46Z ERROR orders failed to add bandwidth usage {"Process": "storagenode", "satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "action": "GET_AUDIT", "amount": 768, "error": "bandwidthdb: database is locked", "errorVerbose": "bandwidthdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*bandwidthDB).Add:76\n\tstorj.io/storj/storagenode/orders.(*Service).SendOrders.func2:249\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
2024-06-15T16:02:14Z ERROR services unexpected shutdown of a runner {"Process": "storagenode", "name": "forgetsatellite:chore", "error": "database is locked"}
2024-06-15T16:18:15Z ERROR failure during run {"Process": "storagenode", "error": "database is locked"}
Error: database is locked
2024-06-15T18:57:39Z ERROR gracefulexit:chore error retrieving satellites. {"Process": "storagenode", "error": "satellitesdb: database is locked", "errorVerbose": "satellitesdb: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*satellitesDB).ListGracefulExits:197\n\tstorj.io/storj/storagenode/gracefulexit.(*Service).ListPendingExits:59\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).AddMissing:55\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/gracefulexit.(*Chore).Run:48\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:51\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}
I think the TS might be right…
It’s probably not primarily a hardware issue.
root@VM-HOST:/var/lib/lxc# iostat -x /dev/nvme0n1
Linux 6.1.0-21-amd64 (VM-HOST) 17-06-24 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2,65 0,49 4,06 73,74 0,00 19,06
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 41,13 1035,76 1,35 3,19 0,64 25,18 42,01 911,61 3,93 8,55 2,12 21,70 2,36 305,93 0,00 0,00 0,61 129,84 0,31 0,33 0,12 4,93
Doesn’t look like the drive is overwhelmed…
root@VM-HOST:/var/lib/lxc# free -m
total used free shared buff/cache available
Mem: 47781 14918 1146 6202 40978 32863
Swap: 45392 4028 41363
Neither is memory…
I think, it’s the same issue as I posted.
And, no, these databases contain no errors. Also the frequency of those errors would be too low in that case. Pragma checks are ok so far.
Also no malformed issues:
root@VM-HOST:/var/lib/lxc# for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- docker logs storagenode 2> /dev/null | grep malformed; done
STORJ10
STORJ11
STORJ16
STORJ17
STORJ18
STORJ22
STORJ23
STORJ4
STORJ6
I do not have that max_open_conns parameter uncommented in my config.
One piece of happy news, now that the running out of room thing has stopped the constant ingress, the lazy filewalker is starting to move a bit faster. Now up to “ra” on the first satellite. This probably still means it’s going to take a week, but that’s better than the near month it was looking like. I’ll almost certainly get an autoupdate to interrupt it before it can finish tho, sigh.
Thanks!
Could you please show the size of these databases?
Thank you for the info!
Do you see “database is locked” errors on the node where databases are on SSD?
I already answered that in the post you just replied to.
I will add that the piece_expiration.db on the SSD does not appear to be growing further. Still showing 1.4GB.
Oh, yes, sorry I overlooked it.
No, I wouldn’t. It’s too suspicious that these errors could be even when DBs are on SSD, so I think there could be a some relation between the size of the database and a frequency of “database is locked” errors.
Your node is full, so no uploads and no modification of that database. As soon as some data would be removed, the ingress likely would start again. But I’m not sure that this database would shrunk.
My biggest node now has the “database is locked” errors too, but only for piece_expiration.db
so far (databases are on the HDD). It wasn’t the case for many years.
So I believe that @Toyoo is right:
And your HW is actually ok, but we need to have a new feature to change the how we work with that database.
As I have said, I have only ever experienced the database locking a single db, ever, and which one it is seems to change with each restart of the node, with the chance of a db being the one that gets locked proportional to its size as a percentage of the whole. I don’t know how big your biggest node is, but if it’s small enough that the piece_expiration isn’t too much bigger than the bandwidth.db, like maybe only 5 or 6x bigger, then try restarting the node a few times. Eventually, it’ll be bandwidth.db, and only bandwidth.db, that gets all the locking.
It’s not so big, I used the existing not fully used disk. Two others are 1.4TB and 0.9TB. All nodes are full at the moment
Just checked. It should be 7TB, but shows 6.75TB on the dashboard. Interesting.
My 16TB disk has 16TB of data, minus the 5GB keeping it from crashing. The node says that I have 14.5TB allocated, 11.14TB used, 1.32TB trash, and 2.04TB free.
Not that much as is being complained of by others:
root@VM-HOST:~# cd /var/lib/lxc; for i in STORJ[1-9]*; do echo $'\n\n'$i; lxc-attach $i -- ls -lSh /storj/DBs; done
STORJ10
totaal 830M
-rw-r--r-- 1 root root 829M 17 jun 07:18 piece_expiration.db
-rw-r--r-- 1 root root 268K 16 jun 22:22 storage_usage.db
-rw-r--r-- 1 root root 212K 17 jun 07:18 bandwidth.db
-rw-r--r-- 1 root root 52K 16 jun 22:22 heldamount.db
-rw-r--r-- 1 root root 32K 15 jun 22:18 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:23 piece_spaced_used.db-shm
-rw-r--r-- 1 root root 32K 17 jun 06:21 reputation.db
-rw-r--r-- 1 root root 32K 17 jun 07:33 satellites.db-shm
-rw-r--r-- 1 root root 28K 15 jun 22:18 satellites.db
-rw-r--r-- 1 root root 24K 15 jun 22:18 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 15 jun 22:18 notifications.db
-rw-r--r-- 1 root root 24K 15 jun 22:18 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 06:53 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 15 jun 22:18 pricing.db
-rw-r--r-- 1 root root 24K 15 jun 22:18 secret.db
-rw-r--r-- 1 root root 24K 15 jun 22:18 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 15 jun 22:18 info.db
-rw-r--r-- 1 root root 16K 15 jun 22:18 used_serial.db
-rw-r--r-- 1 root root 8,1K 17 jun 07:23 piece_spaced_used.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:33 satellites.db-wal
STORJ11
totaal 396M
-rw-r--r-- 1 root root 396M 17 jun 07:37 piece_expiration.db
-rw-r--r-- 1 root root 88K 17 jun 07:38 storage_usage.db
-rw-r--r-- 1 root root 68K 17 jun 07:38 bandwidth.db
-rw-r--r-- 1 root root 32K 17 jun 07:38 heldamount.db
-rw-r--r-- 1 root root 32K 15 jun 19:37 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:37 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:16 satellites.db-shm
-rw-r--r-- 1 root root 28K 15 jun 19:51 satellites.db
-rw-r--r-- 1 root root 24K 15 jun 19:37 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 15 jun 19:38 notifications.db
-rw-r--r-- 1 root root 24K 15 jun 19:37 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 07:42 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 15 jun 19:38 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 07:38 reputation.db
-rw-r--r-- 1 root root 24K 15 jun 19:40 secret.db
-rw-r--r-- 1 root root 24K 15 jun 19:38 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 15 jun 19:37 info.db
-rw-r--r-- 1 root root 16K 15 jun 19:37 used_serial.db
-rw-r--r-- 1 root root 0 17 jun 07:37 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:16 satellites.db-wal
STORJ16
totaal 66M
-rw-r--r-- 1 root root 65M 17 jun 07:38 piece_expiration.db
-rw-r--r-- 1 root root 104K 16 jun 21:42 storage_usage.db
-rw-r--r-- 1 root root 76K 17 jun 07:38 bandwidth.db
-rw-r--r-- 1 root root 32K 16 jun 21:42 heldamount.db
-rw-r--r-- 1 root root 32K 15 jun 21:38 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:23 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:19 satellites.db-shm
-rw-r--r-- 1 root root 28K 15 jun 21:38 satellites.db
-rw-r--r-- 1 root root 24K 15 jun 21:38 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 15 jun 21:38 notifications.db
-rw-r--r-- 1 root root 24K 15 jun 21:38 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 07:42 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 15 jun 21:38 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 05:38 reputation.db
-rw-r--r-- 1 root root 24K 15 jun 21:38 secret.db
-rw-r--r-- 1 root root 24K 15 jun 21:38 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 15 jun 21:38 info.db
-rw-r--r-- 1 root root 16K 15 jun 21:38 used_serial.db
-rw-r--r-- 1 root root 0 17 jun 07:23 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:19 satellites.db-wal
STORJ17
totaal 58M
-rw-r--r-- 1 root root 56M 17 jun 07:25 piece_expiration.db
-rw-r--r-- 1 root root 580K 17 jun 07:29 piece_expiration.db-wal
-rw-r--r-- 1 root root 290K 17 jun 07:32 heldamount.db-wal
-rw-r--r-- 1 root root 182K 17 jun 07:32 storage_usage.db-wal
-rw-r--r-- 1 root root 100K 16 jun 21:27 storage_usage.db
-rw-r--r-- 1 root root 80K 17 jun 07:26 bandwidth.db
-rw-r--r-- 1 root root 77K 17 jun 07:30 bandwidth.db-wal
-rw-r--r-- 1 root root 65K 17 jun 07:29 pricing.db-wal
-rw-r--r-- 1 root root 65K 17 jun 07:31 reputation.db-wal
-rw-r--r-- 1 root root 41K 17 jun 07:30 piece_spaced_used.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 garbage_collection_filewalker_progress.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 info.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 notifications.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 orders.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 pieceinfo.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 satellites.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 secret.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 used_serial.db-wal
-rw-r--r-- 1 root root 33K 17 jun 07:29 used_space_per_prefix.db-wal
-rw-r--r-- 1 root root 32K 17 jun 07:30 bandwidth.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 garbage_collection_filewalker_progress.db-shm
-rw-r--r-- 1 root root 32K 16 jun 21:27 heldamount.db
-rw-r--r-- 1 root root 32K 17 jun 07:32 heldamount.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 info.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 notifications.db-shm
-rw-r--r-- 1 root root 32K 15 jun 21:19 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:29 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 piece_expiration.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 pieceinfo.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:30 piece_spaced_used.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 pricing.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:31 reputation.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 satellites.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 secret.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:32 storage_usage.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 used_serial.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:29 used_space_per_prefix.db-shm
-rw-r--r-- 1 root root 28K 15 jun 21:19 satellites.db
-rw-r--r-- 1 root root 24K 15 jun 21:25 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 15 jun 21:19 notifications.db
-rw-r--r-- 1 root root 24K 15 jun 21:19 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 07:26 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 15 jun 21:25 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 05:31 reputation.db
-rw-r--r-- 1 root root 24K 15 jun 21:25 secret.db
-rw-r--r-- 1 root root 24K 15 jun 21:25 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 15 jun 21:19 info.db
-rw-r--r-- 1 root root 16K 15 jun 21:19 used_serial.db
STORJ18
totaal 376M
-rw-r--r-- 1 root root 371M 17 jun 07:42 piece_expiration.db
-rw-r--r-- 1 root root 5,1M 17 jun 07:42 piece_expiration.db-wal
-rw-r--r-- 1 root root 96K 17 jun 01:46 storage_usage.db
-rw-r--r-- 1 root root 80K 17 jun 06:47 bandwidth.db
-rw-r--r-- 1 root root 41K 17 jun 07:17 bandwidth.db-wal
-rw-r--r-- 1 root root 32K 17 jun 07:17 bandwidth.db-shm
-rw-r--r-- 1 root root 32K 17 jun 01:46 heldamount.db
-rw-r--r-- 1 root root 32K 17 jun 01:46 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:16 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:42 piece_expiration.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:16 pieceinfo.db-shm
-rw-r--r-- 1 root root 32K 17 jun 06:54 satellites.db-shm
-rw-r--r-- 1 root root 28K 17 jun 01:46 satellites.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 notifications.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 05:49 reputation.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 secret.db
-rw-r--r-- 1 root root 24K 17 jun 01:46 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 17 jun 01:46 info.db
-rw-r--r-- 1 root root 16K 17 jun 01:46 used_serial.db
-rw-r--r-- 1 root root 0 17 jun 07:16 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:16 pieceinfo.db-wal
-rw-r--r-- 1 root root 0 17 jun 06:54 satellites.db-wal
STORJ22
totaal 307M
-rw-r--r-- 1 root root 306M 17 jun 07:31 piece_expiration.db
-rw-r--r-- 1 root root 96K 17 jun 00:33 storage_usage.db
-rw-r--r-- 1 root root 76K 17 jun 07:31 bandwidth.db
-rw-r--r-- 1 root root 32K 17 jun 00:33 heldamount.db
-rw-r--r-- 1 root root 32K 16 jun 00:31 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:36 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:25 satellites.db-shm
-rw-r--r-- 1 root root 28K 16 jun 00:31 satellites.db
-rw-r--r-- 1 root root 24K 16 jun 00:31 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 16 jun 00:31 notifications.db
-rw-r--r-- 1 root root 24K 16 jun 00:31 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 07:32 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 16 jun 00:31 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 04:34 reputation.db
-rw-r--r-- 1 root root 24K 16 jun 00:31 secret.db
-rw-r--r-- 1 root root 24K 16 jun 00:31 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 16 jun 00:31 info.db
-rw-r--r-- 1 root root 16K 16 jun 00:31 used_serial.db
-rw-r--r-- 1 root root 0 17 jun 07:36 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:25 satellites.db-wal
STORJ23
totaal 1,6G
-rw-r--r-- 1 root root 1,6G 17 jun 07:16 piece_expiration.db
-rw-r--r-- 1 root root 96K 17 jun 01:22 storage_usage.db
-rw-r--r-- 1 root root 76K 17 jun 07:16 bandwidth.db
-rw-r--r-- 1 root root 32K 17 jun 01:22 heldamount.db
-rw-r--r-- 1 root root 32K 16 jun 01:16 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:31 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:30 satellites.db-shm
-rw-r--r-- 1 root root 28K 16 jun 01:16 satellites.db
-rw-r--r-- 1 root root 24K 16 jun 01:16 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 16 jun 01:16 notifications.db
-rw-r--r-- 1 root root 24K 16 jun 01:16 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 07:19 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 16 jun 01:16 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 05:18 reputation.db
-rw-r--r-- 1 root root 24K 16 jun 01:16 secret.db
-rw-r--r-- 1 root root 24K 16 jun 01:16 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 16 jun 01:16 info.db
-rw-r--r-- 1 root root 16K 16 jun 01:16 used_serial.db
-rw-r--r-- 1 root root 0 17 jun 07:31 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:30 satellites.db-wal
STORJ4
totaal 243M
-rw-r--r-- 1 root root 242M 17 jun 07:05 piece_expiration.db
-rw-r--r-- 1 root root 248K 17 jun 01:34 storage_usage.db
-rw-r--r-- 1 root root 156K 17 jun 07:34 bandwidth.db
-rw-r--r-- 1 root root 48K 17 jun 01:34 heldamount.db
-rw-r--r-- 1 root root 32K 17 jun 01:34 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:34 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 05:20 reputation.db
-rw-r--r-- 1 root root 32K 17 jun 07:34 satellites.db-shm
-rw-r--r-- 1 root root 28K 17 jun 01:34 satellites.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 notifications.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 secret.db
-rw-r--r-- 1 root root 24K 17 jun 01:34 used_space_per_prefix.db
-rw-r--r-- 1 root root 16K 17 jun 01:34 info.db
-rw-r--r-- 1 root root 16K 17 jun 01:34 used_serial.db
-rw-r--r-- 1 root root 0 17 jun 07:34 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:34 satellites.db-wal
STORJ6
totaal 752M
-rw-r--r-- 1 root root 751M 17 jun 06:48 piece_expiration.db
-rw-r--r-- 1 root root 455K 17 jun 07:18 piece_expiration.db-wal
-rw-r--r-- 1 root root 96K 17 jun 06:51 storage_usage.db
-rw-r--r-- 1 root root 68K 17 jun 06:48 bandwidth.db
-rw-r--r-- 1 root root 32K 17 jun 07:18 bandwidth.db-shm
-rw-r--r-- 1 root root 32K 17 jun 06:51 heldamount.db
-rw-r--r-- 1 root root 32K 15 jun 18:48 orders.db
-rw-r--r-- 1 root root 32K 17 jun 07:18 orders.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:18 piece_expiration.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:18 pieceinfo.db-shm
-rw-r--r-- 1 root root 32K 17 jun 07:19 piece_spaced_used.db-shm
-rw-r--r-- 1 root root 32K 17 jun 06:49 satellites.db-shm
-rw-r--r-- 1 root root 28K 15 jun 18:48 satellites.db
-rw-r--r-- 1 root root 24K 15 jun 18:48 garbage_collection_filewalker_progress.db
-rw-r--r-- 1 root root 24K 15 jun 18:48 notifications.db
-rw-r--r-- 1 root root 24K 15 jun 18:48 pieceinfo.db
-rw-r--r-- 1 root root 24K 17 jun 06:49 piece_spaced_used.db
-rw-r--r-- 1 root root 24K 15 jun 18:48 pricing.db
-rw-r--r-- 1 root root 24K 17 jun 06:50 reputation.db
-rw-r--r-- 1 root root 24K 15 jun 18:48 secret.db
-rw-r--r-- 1 root root 24K 15 jun 18:48 used_space_per_prefix.db
-rw-r--r-- 1 root root 17K 17 jun 07:18 bandwidth.db-wal
-rw-r--r-- 1 root root 16K 15 jun 18:48 info.db
-rw-r--r-- 1 root root 16K 15 jun 18:48 used_serial.db
-rw-r--r-- 1 root root 8,1K 17 jun 07:19 piece_spaced_used.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:18 orders.db-wal
-rw-r--r-- 1 root root 0 17 jun 07:18 pieceinfo.db-wal
-rw-r--r-- 1 root root 0 17 jun 06:49 satellites.db-wal
The satellites and bandwidth aren’t usually the biggest ones, which is being complained of most in the logs (so size isn’t probably the diagnosis, at least… For the individual files). On every restart of my nodes I let vacuum them. So the optimization isn’t there either.
Yes, this information should be updated by used-space-filewalker
, when it finish its scans for each trusted satellite, since now it doesn’t have a “database is locked” issue.
Do all these nodes uses the same SSD to store their databases?
unfortunately dmesg was newly created on startup.
in my case, yes, too.
pi@raspberrypi:~/storjDatabasesLocal $ ls -lh
total 792M
-rwxr-xr-x 1 pi pi 83M Jun 17 08:27 bandwidth.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:28 bandwidth.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:28 bandwidth.db-wal
-rw-r--r-- 1 pi pi 24K Jun 16 10:29 garbage_collection_filewalker_progress.db
-rwxr-xr-x 1 pi pi 136K Jun 16 22:30 heldamount.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:43 heldamount.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:43 heldamount.db-wal
-rwxr-xr-x 1 pi pi 16K Jun 16 10:29 info.db
-rwxr-xr-x 1 pi pi 24K Jun 16 10:29 notifications.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:43 notifications.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:43 notifications.db-wal
-rwxr-xr-x 1 pi pi 32K Jun 16 10:29 orders.db
-rwxr-xr-x 1 pi pi 705M Jun 17 08:32 piece_expiration.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:54 piece_expiration.db-shm
-rwxr-xr-x 1 pi pi 2.3M Jun 17 08:54 piece_expiration.db-wal
-rwxr-xr-x 1 pi pi 24K Jun 16 10:29 pieceinfo.db
-rwxr-xr-x 1 pi pi 24K Jun 16 10:29 piece_spaced_used.db
-rwxr-xr-x 1 pi pi 24K Jun 16 10:29 pricing.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:33 pricing.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:33 pricing.db-wal
-rwxr-xr-x 1 pi pi 36K Jun 17 06:26 reputation.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:33 reputation.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:33 reputation.db-wal
-rwxr-xr-x 1 pi pi 32K Jun 16 10:59 satellites.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:30 satellites.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:30 satellites.db-wal
-rwxr-xr-x 1 pi pi 24K Jun 16 10:29 secret.db
-rwxr-xr-x 1 pi pi 1.2M Jun 16 22:02 storage_usage.db
-rwxr-xr-x 1 pi pi 32K Jun 17 08:33 storage_usage.db-shm
-rwxr-xr-x 1 pi pi 0 Jun 17 08:33 storage_usage.db-wal
-rwxr-xr-x 1 pi pi 20K Jun 16 10:29 used_serial.db
-rw-r--r-- 1 pi pi 24K Jun 16 10:29 used_space_per_prefix.db
Then need to wait for the next occurrence, because it’s not storagenode itself.
By the way, what’s docker logs
is showing (I assumed that you redirected the logs, so docker logs
should show only a supervisor’s and an updater’s logs)?
I also do not like the time between events in the logs. It looks like the node just hangs, then something did reset it. Power cuts?