4ich
July 23, 2021, 10:33pm
1
Hi there
I have a quad sata hat on my pi4b 8gb with 4x2tb Drives in e5.
Since a few days i get notifications that my node is offline for a few Minutes.
I got this error in the log.
'''2021-07-23T22:13:25.576Z ERROR piecestore download failed {"Piece ID": "K7DCW53X3AA4PSB3BA2NAOLZEU3L4GL7YLFEL2QJIRS7CXJ3NCPA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_REPAIR", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:217\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2021-07-23T22:13:54.548Z ERROR pieces:trash emptying trash failed {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-07-23T22:13:54.856Z ERROR piecestore:cache error getting current used space: {"error": "lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/y2/fofv75k5nij2mhx7aq2kuyu66nwkv6l5kvncudl7n32pastujq.sj1: structure needs cleaning; lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/fr/5rwz7spaqjqdhmj5wobmyi436eqzcnjtawgwfjifvq5pmpzjxq.sj1: structure needs cleaning; lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/wn/qg55nknqud4j6xf2npqmmtkgud5pk5jypmv3zhzei2kbom2tnq.sj1: structure needs cleaning; lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/dk/fzys34za3bz2sxdgaut2elcyppz3yuxisro3br4fugf4gxd3iq.sj1: structure needs cleaning; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/hq/a6ohctx4ymlesavuvud3s3s5n5ljd4d3yb6vlcgetzgsknrdzq.sj1: structure needs cleaning; context canceled; context canceled", "errorVerbose": "group:\n--- lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/y2/fofv75k5nij2mhx7aq2kuyu66nwkv6l5kvncudl7n32pastujq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/fr/5rwz7spaqjqdhmj5wobmyi436eqzcnjtawgwfjifvq5pmpzjxq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/wn/qg55nknqud4j6xf2npqmmtkgud5pk5jypmv3zhzei2kbom2tnq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/dk/fzys34za3bz2sxdgaut2elcyppz3yuxisro3br4fugf4gxd3iq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/hq/a6ohctx4ymlesavuvud3s3s5n5ljd4d3yb6vlcgetzgsknrdzq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- context canceled\n--- context canceled"}
2021-07-23T22:13:54.958Z ERROR pieces:trash emptying trash failed {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Alexey
July 23, 2021, 10:39pm
2
There are two problems in your question. The first one - missed pieces, this will not lead to be offline, the second one is not discovered yet which leads to restart of the container.
To determine the first one - please, search all records for this Piece ID in your logs, when is it uploaded, and was it deleted or just vanished?
The second one - please, search for ERROR records in your logs, I would like to see the last 5 such errors, especially FATAL ones (not about the moving to trash).
4ich
July 26, 2021, 7:10pm
3
Ok i investigated a bit moreā¦
I have high iowaits from the storagenode processes
And in the logs i have
2021-07-26T19:07:20.176Z ERROR db Unable to read the disk, please verify the disk is not corrupt
Iam checking the raid right now but since All seems fine when i stop the node i dont think i will find any bad blocksā¦
Any suggstions?
Maybe move the db to sd card or even to a usb (2.0) attached ssd/hdd/stick
Alexey
July 26, 2021, 7:35pm
4
This is not good. What type of RAID do you use? What is the filesystem? How these disks are connected to the raspberry?
How do you run the docker run
? With sudo
or without?
If without - please, show result of the command
docker inspect storagenode -f '{{.Mounts}}'
4ich
July 26, 2021, 7:51pm
5
I run the container without sudo
This is the result of docker inspect storagenaspi -f ā{{.Mounts}}ā
[{bind /mnt/r5/storj/data /app/config true rprivate} {bind /mnt/r5/storj/identity /app/identity true rprivate}]
i have this:
https://wiki.radxa.com/Dual_Quad_SATA_HAT
and 4x2tb drives in Raid 5 since i use it as NAS too ā¦
and as said its an rpi4b with 8 gb ram
Alexey
July 26, 2021, 8:08pm
6
Please, show result of the command:
ls -l /mnt/r5/storj/data/storage
and
df -HT
How did you build the RAID? I assume with OMV.
4ich
July 26, 2021, 9:24pm
7
thanks alexey ā¦ seems i found the issue. ā¦ i think i have a faulty drive or at least a pre failing one
I use plain raspbian with mdadm
ls: cannot access '/mnt/r5/storj/data/storage/info.db-shm': No such file or directory
total 63400
-rw-r--r-- 1 root root 21786624 Jul 26 20:20 bandwidth.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 bandwidth.db-shm
-rw-r--r-- 1 root root 684032 Jul 26 20:24 bandwidth.db-wal
drwx------ 9 root root 4096 Jan 7 2021 blobs
drwx------ 2 root root 4096 Jul 26 20:12 garbage
-rw-r--r-- 1 root root 65536 Jul 26 20:20 heldamount.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 heldamount.db-shm
-rw-r--r-- 1 root root 1623312 Jul 26 20:21 heldamount.db-wal
-rw-r--r-- 1 root root 16384 Jul 26 22:21 info.db
**-????????? ? ? ? ? ? info.db-shm <---doesnt seem correct?**
-rw-r--r-- 1 root root 0 Jul 26 22:23 info.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 notifications.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 notifications.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 notifications.db-wal
-rw-r--r-- 1 root root 34328576 Jul 26 20:20 orders.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 orders.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 orders.db-wal
-rw-r--r-- 1 root root 2101248 Jul 26 20:20 piece_expiration.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 piece_expiration.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 piece_expiration.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 pieceinfo.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 pieceinfo.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 pieceinfo.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 piece_spaced_used.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 piece_spaced_used.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 piece_spaced_used.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 pricing.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 pricing.db-shm
-rw-r--r-- 1 root root 82432 Jul 26 20:20 pricing.db-wal
-rw-r--r-- 1 root root 40960 Jul 26 20:20 reputation.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 reputation.db-shm
-rw-r--r-- 1 root root 189552 Jul 26 20:22 reputation.db-wal
-rw-r--r-- 1 root root 32768 Jul 26 20:20 satellites.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 satellites.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 satellites.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 secret.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 secret.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 secret.db-wal
-rw-r--r-- 1 root root 32 Dec 6 2020 storage-dir-verification
-rw-r--r-- 1 root root 348160 Jul 26 20:20 storage_usage.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 storage_usage.db-shm
-rw-r--r-- 1 root root 193672 Jul 26 20:20 storage_usage.db-wal
drwx------ 2 root root 2473984 Jul 26 20:24 temp
drwx------ 8 root root 4096 Jan 12 2021 trash
-rw-r--r-- 1 root root 20480 Jul 26 20:20 used_serial.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 used_serial.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 used_serial.db-wal
-rw------- 1 root root 0 Nov 12 2020 write-test084195137
-rw------- 1 root root 0 Nov 10 2020 write-test290185210
-rw------- 1 root root 0 Nov 10 2020 write-test303027288
-rw------- 1 root root 0 Nov 22 2020 write-test512637377
-rw------- 1 root root 0 Nov 11 2020 write-test608393685
-rw------- 1 root root 0 Nov 16 2020 write-test910705580
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext4 32G 3.6G 27G 12% /
devtmpfs devtmpfs 4.0G 0 4.0G 0% /dev
tmpfs tmpfs 4.2G 29k 4.2G 1% /dev/shm
tmpfs tmpfs 4.2G 11M 4.2G 1% /run
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
/dev/mmcblk0p1 vfat 265M 56M 209M 22% /boot
tmpfs tmpfs 826M 0 826M 0% /run/user/1000
overlay overlay 32G 3.6G 27G 12% /var/lib/docker/overlay2/b064aed017c94b5bbe8b78bfe3eff0aa9091db630e6e971d20ff7a567cee9729/merged
overlay overlay 32G 3.6G 27G 12% /var/lib/docker/overlay2/92efe3a8997b0794a6b2731125c532d3afead5da386cc0f6053a2a96025bf0a3/merged
/dev/md0 ext4 6.0T 5.2T 466G 92% /mnt/r5
Be sure to run fsck on it. It might be just a file system issue and not necessarily a failing disk.
1 Like
Alexey
July 27, 2021, 7:43am
9
Iām agree with @BrightSilence - you need to run fsck
, maybe it would be enough.
I would also check the S.M.A.R.T. for every disk.
4ich
July 28, 2021, 4:02pm
10
Is the offline dq /penalty already a thing?
Alexey
July 28, 2021, 6:37pm
11
The DQ/penalty for lost data have not disabled.
For downtime your node could be suspended (if online score is fall below 60%). The DQ will be applied if your node would be offline for more than 30 days.
4ich
July 28, 2021, 6:50pm
12
okay then due to performnce issues when storj is running while rebuildā¦
i try to dd the faulty disk, skip the errors, add it to the raid and finally repair the failed sectors
4ich
July 29, 2021, 4:55am
13
my raid looks fine now but for some reason it impacted the āmaster databaseā ā¦
2021-07-29T04:53:25.318Z INFO Operator email {"Address": "censored@gmail.com"}
2021-07-29T04:53:25.318Z INFO Operator wallet {"Address": "someaddress"}
Error: Error starting master database on storagenode: database: unable to open database file: structure needs cleaning
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:323
storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:305
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281
storj.io/storj/storagenode/storagenodedb.OpenExisting:248
main.cmdRun:160
storj.io/private/process.cleanup.func1.4:363
storj.io/private/process.cleanup.func1:381
github.com/spf13/cobra.(*Command).execute:852
github.com/spf13/cobra.(*Command).ExecuteC:960
github.com/spf13/cobra.(*Command).Execute:897
storj.io/private/process.ExecWithCustomConfig:88
storj.io/private/process.ExecCustomDebug:70
main.main:385
runtime.main:204
meantime ill try to determine wich one to fix ā¦ with
4ich
July 29, 2021, 4:59am
14
yeeay
/mnt/r5/storj/data/storage# sqlite3 /mnt/r5/storj/data/storage/orders.db āPRAGMA integrity_check;ā
Error: unable to open database file
Alexey
July 29, 2021, 6:01am
15
Seems that file is not recovered.
You can re-create it, using this manual: How to fix database: file is not a database error ā Storj
4ich
July 29, 2021, 3:02pm
16
it doesnt help
after loading the config this comes immediately:
Error: trust: malformed cache: invalid character ā\x01ā looking for beginning of value
Alexey
July 29, 2021, 6:10pm
17
This is another corrupted file, see Search results for 'malformed cache: invalid character' - Storj Community Forum (official)
Ok, it seems my trust-cache.json has been corrupted. I removed it and new one has been downloaded. I am back online
4ich
July 29, 2021, 6:55pm
18
thanks ā¦ seems back onlineā¦
but
i had to drop that table like in Malformed database, fixed database, now won't start after i removed the trust-cache.json
2020-08-17T15:12:47.601-0500 FATAL Unrecoverable error {"error": "Error creating tables for master database on storagenode: migrate: index idx_order_archived_at already exists\n\tstorj.io/storj/private/migrate.SQL.Run:274\n\tstorj.io/storj/private/migrate.(*Migration).Run.func1:179\n\tstorj.io/storj/private/dbutil/txutil.withTxOnce:75\n\tstorj.io/storj/private/dbutil/txutil.WithTx:36\n\tstorj.io/storj/private/migrate.(*Migration).Run:178\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:295\n\tmain.cmdRun:183\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "errorVerbose": "Error creating tables for master database on storagenode: migrate: index idx_order_archived_at already exists\n\tstorj.io/storj/private/migrate.SQL.Run:274\n\tstorj.io/storj/private/migrate.(*Migration).Run.func1:179\n\tstorj.io/storj/private/dbutil/txutil.withTxOnce:75\n\tstorj.io/storj/private/dbutil/txutil.WithTx:36\n\tstorj.io/storj/private/migrate.(*Migration).Run:178\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:295\n\tmain.cmdRun:183\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n\tmain.cmdRun:185\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
then i got a ton of errors but it started downloading / uploading ā¦ i hope it didnt DQ because of downtime / missing or faulty pieces
1 Like