Docker container with node restarting and offline

Hi there,

my node is offline since an RPi restart.

docker ps:

CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS                         PORTS                                       NAMES
f452c9b749f3   storjlabs/storagenode:latest   "/entrypoint"            29 seconds ago   Restarting (1) 3 seconds ago                                               storagenode

log:

2021-11-07T20:25:47.996Z	INFO	Operator wallet	{"Address": "xxx"}
2021-11-07T20:25:48.578Z	INFO	Telemetry enabled	{"instance ID": "xxx"}
2021-11-07T20:25:48.635Z	INFO	db.migration	Database Version	{"version": 53}
2021-11-07T20:25:48.840Z	INFO	preflight:localtime	start checking local system clock with trusted satellites' system clock.
2021-11-07T20:25:49.575Z	INFO	preflight:localtime	local system clock is in sync with trusted satellites' system clock.
2021-11-07T20:25:49.575Z	INFO	bandwidth	Performing bandwidth usage rollups
2021-11-07T20:25:49.576Z	WARN	piecestore:monitor	Disk space is less than requested. Allocated space is	{"bytes": 25070940160}
2021-11-07T20:25:49.576Z	ERROR	piecestore:monitor	Total disk space is less than required minimum	{"bytes": 500000000000}
2021-11-07T20:25:49.576Z	ERROR	services	unexpected shutdown of a runner	{"name": "piecestore:monitor", "error": "piecestore monitor: disk space requirement not met", "errorVerbose": "piecestore monitor: disk space requirement not met\n\tstorj.io/storj/storagenode/monitor.(*Service).Run:125\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-11-07T20:25:49.576Z	ERROR	gracefulexit:blobscleaner	couldn't receive satellite's GE status	{"error": "context canceled"}
2021-11-07T20:25:49.577Z	INFO	Node xxx started
2021-11-07T20:25:49.577Z	INFO	Public server started on [::]:28967
2021-11-07T20:25:49.577Z	INFO	Private server started on 127.0.0.1:7778
2021-11-07T20:25:49.577Z	ERROR	bandwidth	Could not rollup bandwidth usage	{"error": "sql: transaction has already been committed or rolled back"}
2021-11-07T20:25:49.577Z	INFO	failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2021-11-07T20:25:49.577Z	ERROR	collector	error during collecting pieces: 	{"error": "pieceexpirationdb: context canceled", "errorVerbose": "pieceexpirationdb: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).GetExpired:39\n\tstorj.io/storj/storagenode/pieces.(*Store).GetExpired:521\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:88\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-11-07T20:25:49.577Z	ERROR	nodestats:cache	Get pricing-model/join date failed	{"error": "context canceled"}
2021-11-07T20:25:49.578Z	INFO	trust	Scheduling next refresh	{"after": "6h14m56.728789352s"}
2021-11-07T20:25:49.579Z	ERROR	piecestore:cache	error during init space usage db: 	{"error": "piece space used: context canceled", "errorVerbose": "piece space used: context canceled\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceSpaceUsedDB).Init:73\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:75\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Error: piecestore monitor: disk space requirement not met

strange - after another reboot it is working and the node is online.

in the logs I see a lot of this - do I have to be concerned?

2021-11-07T20:38:02.193Z	ERROR	collector	unable to delete piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "VOYY3HAUYI2P6CFHE3CSKD2NADAAM46XRVAZC5UEBAOBESQYWNLQ", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-11-07T20:38:02.194Z	ERROR	collector	unable to update piece info	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "UJ5UPKDEEU5VR2CNHU4NXPYUNDV3PDFZ7TVRHZNSOUIERWNU2VOA", "error": "pieceexpirationdb: database disk image is malformed", "errorVerbose": "pieceexpirationdb: database disk image is malformed\n\tstorj.io/storj/storagenode/storagenodedb.(*pieceExpirationDB).DeleteFailed:99\n\tstorj.io/storj/storagenode/pieces.(*Store).DeleteFailed:547\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:103\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-11-07T20:38:02.194Z	ERROR	collector	unable to delete piece	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "UJ5UPKDEEU5VR2CNHU4NXPYUNDV3PDFZ7TVRHZNSOUIERWNU2VOA", "error": "pieces error: filestore error: file does not exist", "errorVerbose": "pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}

Thx, @Alexey.

I’ve repaired piece_expiration.db successfully.
But the storage node is not starting, with the following error:

Error: Error starting master database on storagenode: database: database is locked
	storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:323
	storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:305
	storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281
	storj.io/storj/storagenode/storagenodedb.OpenExisting:248
	main.cmdRun:160
	storj.io/private/process.cleanup.func1.4:363
	storj.io/private/process.cleanup.func1:381
	github.com/spf13/cobra.(*Command).execute:852
	github.com/spf13/cobra.(*Command).ExecuteC:960
	github.com/spf13/cobra.(*Command).Execute:897
	storj.io/private/process.ExecWithCustomConfig:88
	storj.io/private/process.ExecCustomDebug:70
	main.main:385
	runtime.main:255

Have you closed the sqlite after the fix?
If yes, but the error is still there, then the temporary solution would be to restart the container. However, if disk is slow, it could be locked again.
The permanent solution would be to move the databases to the SSD, if you have one: How to move DB’s to SSD on Docker

as described in the howto mentioned above.
with sqlite3 installed directly.
tried with .quit ENTER, but did not help.

the container was not able to connect. restart, removal, recreation - nothing helped.

I’ll try the db fix with the howto again, and explicitly do the quit at the end. if it has the same result, I’ll try the db move to the ssd.

but moving them means, if one of the disk fails, ssd or hdd, the node is broken. right? I assume, the node cannot restart without dbs.

UPDATE: still locked, with new db created. how can I find out, which db is locked?

The node can restart without databases and will re-create them.
In worst case you will lose all statistics on your dashboard.

Please, try to add --log.level=debug after the image name:

docker run .... storjlabs/storagenode:latest --log.level=debug

And check your logs.

including payment stuff?
I don’t care about the statistics. Hope it can be run again.
So it could help to delete ALL (?) *.db files?

The past payout history - yes, it will be lost. It should recover from the existing satellites after a while. But if you have had the Stefan satellite, its stat will be lost forever.

I’m not sure. Perhaps your disk is started to have issues.

--log.level=debug: command not found

You need to stop and remove the container

docker stop -t 300 storagenode
docker rm storagenode

Then run it back with all your parameters, but add this option after the image name

docker run .... storjlabs/storagenode:latest --log.level=debug
2021-11-09T20:05:27.493Z	DEBUG	Unrecoverable error	{"error": "Error starting master database on storagenode: database: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:323\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:305\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:248\n\tmain.cmdRun:160\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:385\n\truntime.main:255", "errorVerbose": "Error starting master database on storagenode: database: database is locked\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:323\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:305\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281\n\tstorj.io/storj/storagenode/storagenodedb.OpenExisting:248\n\tmain.cmdRun:160\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:385\n\truntime.main:255\n\tmain.cmdRun:162\n\tstorj.io/private/process.cleanup.func1.4:363\n\tstorj.io/private/process.cleanup.func1:381\n\tgithub.com/spf13/cobra.(*Command).execute:852\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:960\n\tgithub.com/spf13/cobra.(*Command).Execute:897\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.ExecCustomDebug:70\n\tmain.main:385\n\truntime.main:255"}
2021-11-09T20:05:28.712Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
2021-11-09T20:05:28.713Z	DEBUG	Anonymized tracing disabled

still unclear, which db is affected.

Yeah, I would suggest to create a bug there: Issues · storj/storj · GitHub

Please, give me result of the command:

ps -aux | grep sqlite
pi@raspberrypi:~ $ ps -aux | grep sqlite
pi       24871  0.0  0.0   5820   680 pts/0    S+   21:12   0:00 grep --color=auto sqlite

So, nothing can block the database.
Could you please reboot the device?

still the same issue after reboot:

Error: Error starting master database on storagenode: database: database is locked

with

pi@raspberrypi:~ $ ps -aux | grep sqlite
pi        5183  0.0  0.0   5820   652 pts/0    S+   21:39   0:00 grep --color=auto sqlite

… still nothing locking, right?

:white_check_mark: done: Locked database: debug does not provide details, which database is affected · Issue #4271 · storj/storj · GitHub

:arrow_forward: It might help to just delete the piece_expiration.db ? :question:

If I can trust the reports here, that can be an option - please correct me, if I am wrong:

You would need to follow this guide: https://support.storj.io/hc/en-us/articles/4403032417044-How-to-fix-database-file-is-not-a-database-error it will help to recreate only this one database, if you think it’s what to blame.

now it’s getting weird: only the info.db in the storage folder was recreated with 0 bytes and the logs still show “database locked”. /confused/

restored all *.db without the pieces_expiration.db:

pi@raspberrypi:~ $ docker logs --tail 40 storagenode
2021-11-09T21:39:45.749Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
2021-11-09T21:39:46.021Z	INFO	db	database does not exists	{"database": "piece_expiration"}
Error: Error creating tables for master database on storagenode: migrate: database: database is locked
...
2021-11-09T21:39:58.594Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}

After the pieces_expiration.db was recreated and the container stopped, deleted and restarted, the previous log content appears again:

2021-11-09T21:43:56.971Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}
Error: Error starting master database on storagenode: database: database is locked
...
2021-11-09T21:44:08.447Z	INFO	Configuration loaded	{"Location": "/app/config/config.yaml"}

Done this, and the node -from the logs- is online - but the dashboard is not available. I am somehow “blind”.

Log looks like this:

2021-11-09T22:32:00.548Z	INFO	piecestore	uploaded	{"Piece ID": "V7WX7B7BMBC5XSXJV3X2PFLAECXYA6TOUKAE3SMHZLNG5LNMDE7A", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "Size": 181504}
2021-11-09T22:32:01.057Z	INFO	piecestore	upload started	{"Piece ID": "VCORJ5KVBHJJE7CO3PXLY4HIW5JX5BQ27N6GT6IRPBE67P35BFXQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 8583685517796}
2021-11-09T22:32:01.435Z	INFO	piecestore	uploaded	{"Piece ID": "VCORJ5KVBHJJE7CO3PXLY4HIW5JX5BQ27N6GT6IRPBE67P35BFXQ", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Size": 2304}
2021-11-09T22:32:02.438Z	INFO	piecestore	upload started	{"Piece ID": "CGG4VKSKHSM3XRYCBS6CXUALL6JVYZAJY7GVS7454SEUSC3MWZVA", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "Available Space": 8583685514980}

new db folder looks like this:

pi@raspberrypi:~ $ ls -al storjDatabases/
total 23592
drwxr-xr-x  2 pi pi     4096 Nov  9 23:14 .
drwxr-xr-x 24 pi pi     4096 Nov  9 23:11 ..
-rwxr-xr-x  1 pi pi 10760192 Nov  9 23:34 bandwidth.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:35 bandwidth.db-shm
-rwxr-xr-x  1 pi pi  4124152 Nov  9 23:35 bandwidth.db-wal
-rwxr-xr-x  1 pi pi    49152 Nov  9 23:08 heldamount.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 heldamount.db-shm
-rwxr-xr-x  1 pi pi   811672 Nov  9 23:14 heldamount.db-wal
-rwxr-xr-x  1 pi pi    16384 Nov  9 23:08 info.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 info.db-shm
-rwxr-xr-x  1 pi pi    32992 Nov  9 23:14 info.db-wal
-rwxr-xr-x  1 pi pi    24576 Nov  9 23:08 notifications.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 notifications.db-shm
-rwxr-xr-x  1 pi pi    32992 Nov  9 23:14 notifications.db-wal
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:08 orders.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 orders.db-shm
-rwxr-xr-x  1 pi pi    32992 Nov  9 23:14 orders.db-wal
-rwxr-xr-x  1 pi pi  2723840 Nov  9 23:15 piece_expiration.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:24 piece_expiration.db-shm
-rwxr-xr-x  1 pi pi  4124152 Nov  9 23:24 piece_expiration.db-wal
-rwxr-xr-x  1 pi pi    24576 Nov  9 23:08 pieceinfo.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 pieceinfo.db-shm
-rwxr-xr-x  1 pi pi    32992 Nov  9 23:14 pieceinfo.db-wal
-rwxr-xr-x  1 pi pi    24576 Nov  9 23:08 piece_spaced_used.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 piece_spaced_used.db-shm
-rwxr-xr-x  1 pi pi    32992 Nov  9 23:14 piece_spaced_used.db-wal
-rwxr-xr-x  1 pi pi    24576 Nov  9 23:08 pricing.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 pricing.db-shm
-rwxr-xr-x  1 pi pi    82432 Nov  9 23:14 pricing.db-wal
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:08 reputation.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:19 reputation.db-shm
-rwxr-xr-x  1 pi pi   140112 Nov  9 23:19 reputation.db-wal
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:08 satellites.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 satellites.db-shm
-rwxr-xr-x  1 pi pi    41232 Nov  9 23:14 satellites.db-wal
-rwxr-xr-x  1 pi pi    24576 Nov  9 23:08 secret.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 secret.db-shm
-rwxr-xr-x  1 pi pi    32992 Nov  9 23:14 secret.db-wal
-rwxr-xr-x  1 pi pi   208896 Nov  9 23:08 storage_usage.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:18 storage_usage.db-shm
-rwxr-xr-x  1 pi pi    82432 Nov  9 23:18 storage_usage.db-wal
-rwxr-xr-x  1 pi pi    20480 Nov  9 23:08 used_serial.db
-rwxr-xr-x  1 pi pi    32768 Nov  9 23:14 used_serial.db-shm
-rwxr-xr-x  1 pi pi    41232 Nov  9 23:14 used_serial.db-wal

external ping says “OK”, but opening http://192.168.178.70:14002 in the browser does not work. nothing to be seen here. “site not available”.

run command was this:

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967/tcp \
    -p 28967:28967/udp \
    -p 127.0.0.1:14002:14002 \
    -e WALLET="xxx" \
    -e EMAIL="xxx" \
    -e ADDRESS="xxx:28967" \
    -e STORAGE="9.5TB" \
    --log-opt max-size=50m \
    --log-opt max-file=10 \
    --mount type=bind,source="/home/pi/.local/share/storj/identity/storagenode",destination=/app/identity \
    --mount type=bind,source="/mnt/mywd1002",destination=/app/config \
    --mount type=bind,source="/home/pi/storjDatabases/",destination=/app/dbs \
    --name storagenode storjlabs/storagenode:latest 

changed to:

docker run -d --restart unless-stopped --stop-timeout 300 \
    -p 28967:28967 \
    -p 14002:14002 \
    -e WALLET="xxx" \
    -e EMAIL="xxx" \
    -e ADDRESS="xxx:28967" \
    -e STORAGE="9.5TB" \
    --log-opt max-size=50m \
    --log-opt max-file=10 \
    --mount type=bind,source="/home/pi/.local/share/storj/identity/storagenode",destination=/app/identity \
    --mount type=bind,source="/mnt/mywd1002",destination=/app/config \
    --mount type=bind,source="/home/pi/storjDatabases/",destination=/app/dbs \
    --name storagenode storjlabs/storagenode:latest

… and finally up and running again.

It seems that

    -p 28967:28967/tcp \
    -p 28967:28967/udp \
    -p 127.0.0.1:14002:14002 \

is NOT working. I’ve copied that from here:

This will allow you to access only from the device where the node is running. You need to use either How to remote access the web dashboard - Node Operator or [Tech Preview] Multinode Dashboard Binaries
Removing 127.0.0.1: from the port mapping will work too, but your dashboard will be opened to your LAN without any protection. In this case at least do not publish this port to the internet - all your private information will be available to anyone.

Since you removed

you will have Warning and error logs since updating to latest releasse