4ich
July 23, 2021, 10:33pm
1
Hi there
I have a quad sata hat on my pi4b 8gb with 4x2tb Drives in e5.
Since a few days i get notifications that my node is offline for a few Minutes.
I got this error in the log.
'''2021-07-23T22:13:25.576Z ERROR piecestore download failed {"Piece ID": "K7DCW53X3AA4PSB3BA2NAOLZEU3L4GL7YLFEL2QJIRS7CXJ3NCPA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "GET_REPAIR", "error": "file does not exist", "errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:217\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:102\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:95\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2021-07-23T22:13:54.548Z ERROR pieces:trash emptying trash failed {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2021-07-23T22:13:54.856Z ERROR piecestore:cache error getting current used space: {"error": "lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/y2/fofv75k5nij2mhx7aq2kuyu66nwkv6l5kvncudl7n32pastujq.sj1: structure needs cleaning; lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/fr/5rwz7spaqjqdhmj5wobmyi436eqzcnjtawgwfjifvq5pmpzjxq.sj1: structure needs cleaning; lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/wn/qg55nknqud4j6xf2npqmmtkgud5pk5jypmv3zhzei2kbom2tnq.sj1: structure needs cleaning; lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/dk/fzys34za3bz2sxdgaut2elcyppz3yuxisro3br4fugf4gxd3iq.sj1: structure needs cleaning; lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/hq/a6ohctx4ymlesavuvud3s3s5n5ljd4d3yb6vlcgetzgsknrdzq.sj1: structure needs cleaning; context canceled; context canceled", "errorVerbose": "group:\n--- lstat config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa/y2/fofv75k5nij2mhx7aq2kuyu66nwkv6l5kvncudl7n32pastujq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa/fr/5rwz7spaqjqdhmj5wobmyi436eqzcnjtawgwfjifvq5pmpzjxq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/wn/qg55nknqud4j6xf2npqmmtkgud5pk5jypmv3zhzei2kbom2tnq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/dk/fzys34za3bz2sxdgaut2elcyppz3yuxisro3br4fugf4gxd3iq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- lstat config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/hq/a6ohctx4ymlesavuvud3s3s5n5ljd4d3yb6vlcgetzgsknrdzq.sj1: structure needs cleaning\n\tstorj.io/storj/storage/filestore.walkNamespaceWithPrefix:788\n\tstorj.io/storj/storage/filestore.(*Dir).walkNamespaceInPath:725\n\tstorj.io/storj/storage/filestore.(*Dir).WalkNamespace:685\n\tstorj.io/storj/storage/filestore.(*blobStore).WalkNamespace:284\n\tstorj.io/storj/storagenode/pieces.(*Store).WalkSatellitePieces:497\n\tstorj.io/storj/storagenode/pieces.(*Store).SpaceUsedTotalAndBySatellite:662\n\tstorj.io/storj/storagenode/pieces.(*CacheService).Run:54\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n--- context canceled\n--- context canceled"}
2021-07-23T22:13:54.958Z ERROR pieces:trash emptying trash failed {"error": "pieces error: filestore error: context canceled", "errorVerbose": "pieces error: filestore error: context canceled\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
Alexey
July 23, 2021, 10:39pm
2
There are two problems in your question. The first one - missed pieces, this will not lead to be offline, the second one is not discovered yet which leads to restart of the container.
To determine the first one - please, search all records for this Piece ID in your logs, when is it uploaded, and was it deleted or just vanished?
The second one - please, search for ERROR records in your logs, I would like to see the last 5 such errors, especially FATAL ones (not about the moving to trash).
4ich
July 26, 2021, 7:10pm
3
Ok i investigated a bit moreā¦
I have high iowaits from the storagenode processes
And in the logs i have
2021-07-26T19:07:20.176Z ERROR db Unable to read the disk, please verify the disk is not corrupt
Iam checking the raid right now but since All seems fine when i stop the node i dont think i will find any bad blocksā¦
Any suggstions?
Maybe move the db to sd card or even to a usb (2.0) attached ssd/hdd/stick
Alexey
July 26, 2021, 7:35pm
4
This is not good. What type of RAID do you use? What is the filesystem? How these disks are connected to the raspberry?
How do you run the docker run? With sudo or without?
If without - please, show result of the command
docker inspect storagenode -f '{{.Mounts}}'
4ich
July 26, 2021, 7:51pm
5
I run the container without sudo
This is the result of docker inspect storagenaspi -f ā{{.Mounts}}ā
[{bind /mnt/r5/storj/data /app/config true rprivate} {bind /mnt/r5/storj/identity /app/identity true rprivate}]
i have this:
https://wiki.radxa.com/Dual_Quad_SATA_HAT
and 4x2tb drives in Raid 5 since i use it as NAS too ā¦
and as said its an rpi4b with 8 gb ram
Alexey
July 26, 2021, 8:08pm
6
Please, show result of the command:
ls -l /mnt/r5/storj/data/storage
and
df -HT
How did you build the RAID? I assume with OMV.
4ich
July 26, 2021, 9:24pm
7
thanks alexey ⦠seems i found the issue. ⦠i think i have a faulty drive or at least a pre failing one
I use plain raspbian with mdadm
ls: cannot access '/mnt/r5/storj/data/storage/info.db-shm': No such file or directory
total 63400
-rw-r--r-- 1 root root 21786624 Jul 26 20:20 bandwidth.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 bandwidth.db-shm
-rw-r--r-- 1 root root 684032 Jul 26 20:24 bandwidth.db-wal
drwx------ 9 root root 4096 Jan 7 2021 blobs
drwx------ 2 root root 4096 Jul 26 20:12 garbage
-rw-r--r-- 1 root root 65536 Jul 26 20:20 heldamount.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 heldamount.db-shm
-rw-r--r-- 1 root root 1623312 Jul 26 20:21 heldamount.db-wal
-rw-r--r-- 1 root root 16384 Jul 26 22:21 info.db
**-????????? ? ? ? ? ? info.db-shm <---doesnt seem correct?**
-rw-r--r-- 1 root root 0 Jul 26 22:23 info.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 notifications.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 notifications.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 notifications.db-wal
-rw-r--r-- 1 root root 34328576 Jul 26 20:20 orders.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 orders.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 orders.db-wal
-rw-r--r-- 1 root root 2101248 Jul 26 20:20 piece_expiration.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 piece_expiration.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 piece_expiration.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 pieceinfo.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 pieceinfo.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 pieceinfo.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 piece_spaced_used.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 piece_spaced_used.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 piece_spaced_used.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 pricing.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 pricing.db-shm
-rw-r--r-- 1 root root 82432 Jul 26 20:20 pricing.db-wal
-rw-r--r-- 1 root root 40960 Jul 26 20:20 reputation.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 reputation.db-shm
-rw-r--r-- 1 root root 189552 Jul 26 20:22 reputation.db-wal
-rw-r--r-- 1 root root 32768 Jul 26 20:20 satellites.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 satellites.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 satellites.db-wal
-rw-r--r-- 1 root root 24576 Jul 26 20:20 secret.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 secret.db-shm
-rw-r--r-- 1 root root 32992 Jul 26 20:20 secret.db-wal
-rw-r--r-- 1 root root 32 Dec 6 2020 storage-dir-verification
-rw-r--r-- 1 root root 348160 Jul 26 20:20 storage_usage.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 storage_usage.db-shm
-rw-r--r-- 1 root root 193672 Jul 26 20:20 storage_usage.db-wal
drwx------ 2 root root 2473984 Jul 26 20:24 temp
drwx------ 8 root root 4096 Jan 12 2021 trash
-rw-r--r-- 1 root root 20480 Jul 26 20:20 used_serial.db
-rw-r--r-- 1 root root 32768 Jul 26 22:21 used_serial.db-shm
-rw-r--r-- 1 root root 41232 Jul 26 20:20 used_serial.db-wal
-rw------- 1 root root 0 Nov 12 2020 write-test084195137
-rw------- 1 root root 0 Nov 10 2020 write-test290185210
-rw------- 1 root root 0 Nov 10 2020 write-test303027288
-rw------- 1 root root 0 Nov 22 2020 write-test512637377
-rw------- 1 root root 0 Nov 11 2020 write-test608393685
-rw------- 1 root root 0 Nov 16 2020 write-test910705580
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext4 32G 3.6G 27G 12% /
devtmpfs devtmpfs 4.0G 0 4.0G 0% /dev
tmpfs tmpfs 4.2G 29k 4.2G 1% /dev/shm
tmpfs tmpfs 4.2G 11M 4.2G 1% /run
tmpfs tmpfs 5.3M 4.1k 5.3M 1% /run/lock
tmpfs tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
/dev/mmcblk0p1 vfat 265M 56M 209M 22% /boot
tmpfs tmpfs 826M 0 826M 0% /run/user/1000
overlay overlay 32G 3.6G 27G 12% /var/lib/docker/overlay2/b064aed017c94b5bbe8b78bfe3eff0aa9091db630e6e971d20ff7a567cee9729/merged
overlay overlay 32G 3.6G 27G 12% /var/lib/docker/overlay2/92efe3a8997b0794a6b2731125c532d3afead5da386cc0f6053a2a96025bf0a3/merged
/dev/md0 ext4 6.0T 5.2T 466G 92% /mnt/r5
Be sure to run fsck on it. It might be just a file system issue and not necessarily a failing disk.
1 Like
Alexey
July 27, 2021, 7:43am
9
Iām agree with @BrightSilence - you need to run fsck, maybe it would be enough.
I would also check the S.M.A.R.T. for every disk.
4ich
July 28, 2021, 4:02pm
10
Is the offline dq /penalty already a thing?
Alexey
July 28, 2021, 6:37pm
11
The DQ/penalty for lost data have not disabled.
For downtime your node could be suspended (if online score is fall below 60%). The DQ will be applied if your node would be offline for more than 30 days.
4ich
July 28, 2021, 6:50pm
12
okay then due to performnce issues when storj is running while rebuildā¦
i try to dd the faulty disk, skip the errors, add it to the raid and finally repair the failed sectors
4ich
July 29, 2021, 4:55am
13
my raid looks fine now but for some reason it impacted the āmaster databaseā ā¦
2021-07-29T04:53:25.318Z INFO Operator email {"Address": "censored@gmail.com"}
2021-07-29T04:53:25.318Z INFO Operator wallet {"Address": "someaddress"}
Error: Error starting master database on storagenode: database: unable to open database file: structure needs cleaning
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabase:323
storj.io/storj/storagenode/storagenodedb.(*DB).openExistingDatabase:305
storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases:281
storj.io/storj/storagenode/storagenodedb.OpenExisting:248
main.cmdRun:160
storj.io/private/process.cleanup.func1.4:363
storj.io/private/process.cleanup.func1:381
github.com/spf13/cobra.(*Command).execute:852
github.com/spf13/cobra.(*Command).ExecuteC:960
github.com/spf13/cobra.(*Command).Execute:897
storj.io/private/process.ExecWithCustomConfig:88
storj.io/private/process.ExecCustomDebug:70
main.main:385
runtime.main:204
meantime ill try to determine wich one to fix ⦠with
4ich
July 29, 2021, 4:59am
14
yeeay
/mnt/r5/storj/data/storage# sqlite3 /mnt/r5/storj/data/storage/orders.db āPRAGMA integrity_check;ā
Error: unable to open database file
Alexey
July 29, 2021, 6:01am
15
Seems that file is not recovered.
You can re-create it, using this manual: How to fix database: file is not a database error ā Storj
4ich
July 29, 2021, 3:02pm
16
it doesnt help
after loading the config this comes immediately:
Error: trust: malformed cache: invalid character ā\x01ā looking for beginning of value
Alexey
July 29, 2021, 6:10pm
17
This is another corrupted file, see Search results for 'malformed cache: invalid character' - Storj Community Forum (official)
Ok, it seems my trust-cache.json has been corrupted. I removed it and new one has been downloaded. I am back online
4ich
July 29, 2021, 6:55pm
18
thanks ⦠seems back onlineā¦
but
i had to drop that table like in Malformed database, fixed database, now won't start after i removed the trust-cache.json
2020-08-17T15:12:47.601-0500 FATAL Unrecoverable error {"error": "Error creating tables for master database on storagenode: migrate: index idx_order_archived_at already exists\n\tstorj.io/storj/private/migrate.SQL.Run:274\n\tstorj.io/storj/private/migrate.(*Migration).Run.func1:179\n\tstorj.io/storj/private/dbutil/txutil.withTxOnce:75\n\tstorj.io/storj/private/dbutil/txutil.WithTx:36\n\tstorj.io/storj/private/migrate.(*Migration).Run:178\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:295\n\tmain.cmdRun:183\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57", "errorVerbose": "Error creating tables for master database on storagenode: migrate: index idx_order_archived_at already exists\n\tstorj.io/storj/private/migrate.SQL.Run:274\n\tstorj.io/storj/private/migrate.(*Migration).Run.func1:179\n\tstorj.io/storj/private/dbutil/txutil.withTxOnce:75\n\tstorj.io/storj/private/dbutil/txutil.WithTx:36\n\tstorj.io/storj/private/migrate.(*Migration).Run:178\n\tstorj.io/storj/storagenode/storagenodedb.(*DB).MigrateToLatest:295\n\tmain.cmdRun:183\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57\n\tmain.cmdRun:185\n\tstorj.io/private/process.cleanup.func1.4:353\n\tstorj.io/private/process.cleanup.func1:371\n\tgithub.com/spf13/cobra.(*Command).execute:840\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:945\n\tgithub.com/spf13/cobra.(*Command).Execute:885\n\tstorj.io/private/process.ExecWithCustomConfig:88\n\tstorj.io/private/process.Exec:65\n\tmain.(*service).Execute.func1:66\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
then i got a ton of errors but it started downloading / uploading ⦠i hope it didnt DQ because of downtime / missing or faulty pieces
1 Like