Fixing the error "database disk image is malformed" take forever

eshchar · November 28, 2021, 10:09am

after the setup step do i need to put back the ymal file back?

Alexey · November 28, 2021, 11:39am

No, it will be created during setup.
You can change it later if needed (for example to opt-in for zkSync)

eshchar · November 28, 2021, 12:21pm

ok the main error i get now is

2021-11-28T13:40:29.856Z ERROR piecestore:cache error getting current used space: {“error”: “readdirent config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: structure needs cleaning; readdirent config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3k: structure needs cleaning; readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/gx: structure needs cleaning”, “errorVerbose”: “group:\n— readdirent config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: structure needs cleaning\n— readdirent config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3k: structure needs cleaning\n— readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/gx: structure needs cleaning”}

2021-11-28T13:40:29.857Z ERROR services unexpected shutdown of a runner {“name”: “piecestore:cache”, “error”: “readdirent config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: structure needs cleaning; readdirent config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3k: structure needs cleaning; readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/gx: structure needs cleaning”, “errorVerbose”: “group:\n— readdirent config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: structure needs cleaning\n— readdirent config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3k: structure needs cleaning\n— readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/gx: structure needs cleaning”}

Error: readdirent config/storage/blobs/qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa: structure needs cleaning; readdirent config/storage/blobs/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/3k: structure needs cleaning; readdirent config/storage/blobs/6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa/gx: structure needs cleaning

also now the node is suspended how do i get it unsuspended ?

Alexey · November 28, 2021, 7:23pm

So, the filesystem is actually NOT fixed. You need to fix it.

eshchar · November 29, 2021, 10:07pm

ok
I have run the command xfs_repair -v /dev/md7 on the relevant disk 7/8 and it found a lot of file that were bad and put dose in a lost+found dir it is about 1GB (3700) files.
I have started the storj and it looks fine now.
I have 95% suspension on the server us1.storj.io:7777
Do I need to do any thing to get unsuspended or will it get unsuspended in do time?

Alexey · November 29, 2021, 10:16pm

The suspension score would recover with successful audits.
Make sure that you do not have any blobs-related errors anymore.

eshchar · November 30, 2021, 6:34am

the only error i get are like this:

2021-11-30T05:47:49.185Z ERROR collector unable to delete piece {“Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Piece ID”: “4XHLITHENSY25BNKRLLGRBK5Q2NCH4SPWWMXFPIQ37HKV2IO4RJQ”, “error”: “pieces error: filestore error: file does not exist”, “errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storage/filestore.(*blobStore).Stat:103\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:239\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).Delete:220\n\tstorj.io/storj/storagenode/pieces.(*Store).Delete:299\n\tstorj.io/storj/storagenode/collector.(*Service).Collect:97\n\tstorj.io/storj/storagenode/collector.(*Service).Run.func1:57\n\tstorj.io/common/sync2.(*Cycle).Run:152\n\tstorj.io/storj/storagenode/collector.(*Service).Run:53\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2.1:87\n\truntime/pprof.Do:40\n\tstorj.io/storj/private/lifecycle.(*Group).Run.func2:86\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57”}

Alexey · November 30, 2021, 7:41pm

See

eshchar · December 6, 2021, 8:24am

ok
i have managed to fix every thing but in the proses I have lost a lot of data and it caused me to be disqualified in 2 satellites.
Is there a way to start over only for that 2 satellites or
do i need to start a new identity and start all over again?

Thank you for all the help you gave me!!

Alexey · December 6, 2021, 6:33pm

You are welcome!

This could happen only if you lost customers data, it’s unrelated to losing databases. Databases are replaceable or can be generated, unlike the customers’ data.

With a new identity - yes. But if you would made a new identity why to limit it only to these two satellites? Run it for all.

No, you do not have to. You can run it with remained satellites, they will pay as usual unless your node is disqualified on them too.
If the node is disqualified on all satellites, then you need to generate a new identity, sign it with a new authorization token and start with a clean storage.

eshchar · December 8, 2021, 3:27pm

I did not have a choose and started new node.
The current one have only one satellite that is not disqualified, I will do a graceful exit in the next month for that satellite.

Alexey · December 9, 2021, 5:45am

so Unraid still “helps” to disqualify your node… This is not great news. All Unraid users hoped that when we implemented the check of missing drive (the main reason of DQ in the past on Unraid), the DQ would not happen with nodes on Unraid anymore.
Seems it’s still not the case. The reason is different, but the result is the same.
Unraid is still not safe to use for storagenode.
I’m worried that your data is at risk too. I would not use a NAS that can lose my data in a random time, even if it’s a simple power loss.

striker43 · December 9, 2021, 7:48am

May I ask if you run your node on the Unraid array or on a single disk using the „Unassigned devices“ plugin?

Alexey · December 9, 2021, 8:04am

Seems Unraid array…
And I would call it an epic failure for the Unraid platform, because it does not protect against data loss even in the simplest possible case of failure - just a loss of power.

eshchar · December 9, 2021, 4:54pm

my node is on the array disks.

eshchar · December 9, 2021, 5:02pm

after about 1.5 years of almost no lose I think it is pretty good.
Think of it if I heed run a node on a disk with no parity if the disk died i with immediately lose the node and be disqualified. on Unraid if i lose one to two drives from disk failure i can recover from that.
I think that Unraid is still better then just using individual disks.
In my case i hade corrupted data that was written to the party so it would not helped.