Db: unable to read the disk, please verify the disk is not corrupt

Omg, what does that mean? Storagenode is still running, online and this is the only error occurred:

2022-01-11T17:44:47.661Z        ERROR   db      Unable to read the disk, please verify the disk is not corrupt

Audits at 100%.
HDD is connected via smb from within the same network.

Within the last 24h I can only see the following errors (referencing my recent post):

2022-01-11T01:02:01.765Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: readdirent config/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/kj: no such file or directory", "errorVerbose": "pieces error: filestore error: readdirent config/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/kj: no such file or directory\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-01-11T12:39:15.889Z	ERROR	pieces:trash	emptying trash failed	{"error": "pieces error: filestore error: readdirent config/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/cp: no such file or directory", "errorVerbose": "pieces error: filestore error: readdirent config/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/cp: no such file or directory\n\tstorj.io/storj/storage/filestore.(*blobStore).EmptyTrash:154\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:310\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:367\n\tstorj.io/storj/storagenode/pieces.(*TrashChore).Run.func1:51\n\tstorj.io/common/sync2.(*Cycle).Run:92\n\tstorj.io/common/sync2.(*Cycle).Start.func1:71\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:57"}
2022-01-11T17:44:47.661Z	ERROR	db	Unable to read the disk, please verify the disk is not corrupt

There was a network issue yesterday, so the HDD / smb was not available for 15 mins.
Should I do something like checking databases for corruptness?

Connecting your data drive via SMB is prone to corrupt DBs - why did you choose this way?

The only “allowed” way of connecting HDDs for Storj via network is iSCSI.

3 Likes

Interesting. Thank you.

Current setup: storage system of 3 HDDs with different purpose connected via USB-C/Thunderbolt to a Mac mini M1; one of the drives is the storage HDD, which is shared via smb to a RPi4, where the node is hosted within docker on Debian Bullseye.

Some things have changed in planning since the initial setup was done in May 21, so if I can manage to share the other 2 disks from the RPi4 to the Mac mini, I can imagine to directly connect the icy box directly to the RPi4, as the box itself is power supplied (thinking about the limited power supply by the RPi).

The network mount, meanwhile clear, also impacts my download / upload rate, which is issued by timeouts since beginning of December 21. Most likely I can resolve that with the direct reconnect to the RPi.

Do you know / can help about how to share the other 2 mounts, once connected to the RPi?

You can install on the Pi an iSCSI target server and create iSCSI targets using your disks.
If Mac OS supports iSCSI you can mount the targets on your Mac mini.

Is the M1 Mac Mini running 24/7 anyways? Rosetta + Docker should work just fine for Storj while allowing to use the TB Icy Box and ditching the RPi completely.

1 Like

You should not need Rosetta really, its arm64 and thus should work.

2 Likes

As far as I remember docker does not fully support M1 and 9 months ago it was not suggested to install the node there. Has that changed? CLI docker install still references docker 2.1.x - and a windows version for MacOS:

Yes. Quite a good idea. Will try that with the most current docker desktop version.

I am running a M1 mini with Docker (the Desktop variant however) just fine.

2 Likes

Thanks for confirmation!

I have everything prepared and wanted to change the port forwarding to the M1 mini, but the port(s) are shown as closed. I shut down temporarily the firewall on the Mac, but that did not help. Could it be that, as long as the node is not running, the port will not be shown as open?

Update: yes, port was shown as online, as soon as the node was running. Firewall enabled. :wink:

Solution (hopefully): I have moved the node to the M1 Mac mini and shut down the RPi4.
[crossing fingers]

2 Likes

everything works fine, except that download and upload rates are very low:

 *** downloads (c: 34,038%, f: 0,000%, s: 66%)
 *** uploads (a: 100,000%, c: 40,047%, f: 0,158%, s: 60%)

Failure (f) rate seems to be acceptable, but cancellation (c) rate is horrible.
Ping is ok, better than on the RPi 4, with 10ms on average instead of 30-50ms.

All databases passed the integrity check. I cannot imagine, that the HDD access is slower than via smb / network from the RPi4 to the MacOS mounted HDD. Normally it should be much faster and reliable. So it must be something else…

Spotlight service disabled as well.

Update: it looks much better after a while, at least for downloads.

How can I elaborate on the low upload rate?

downloads (cancelled: 4,466%, failed: 0,051%, success: 95%)
uploads (accepted: 100,000%, cancelled: 36,571%, failed: 0,022%, success: 63%)

It may have been bad timing. I saw a lot of canceled uploads in my log around the time you posted this as well. Keep an eye on it and make sure no system resources are saturated. Though I doubt you’ll be strapped for resources on an M1 Mac mini.

Thx @BrightSilence, I can confirm, figures are going up to 90-100%.

But: docker on M1 seems to be unstable… First time a docker restart helped. Second time, the whole Mac was required to be restarted.

Node is up again and running now. I am thinking to install a tiny linux VM and let the node run there. Not sure if that helps. And I do not want to spend 80 bucks (per year) and above for a vm. there’s no business case for it. :wink:

1 Like

@stefanbenten docker hangs up each 2-3 days and throwing that error:

[013:06:38:44.027][I] time="2022-01-13T06:38:44.024220838Z" level=error msg="Failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=plugins.moby
[013:06:38:44.027][I] time="2022-01-13T06:38:44.024419921Z" level=error msg="Failed to get event" error="rpc error: code = Unavailable desc = transport is closing" module=libcontainerd namespace=moby
[013:06:47:08.337][I] time="2022-01-13T06:47:08.332335669Z" level=error msg="Error sending stop (signal 15) to container" container=069ca1e8703d4d5836ac05e174044b3b4494432413eb1ab484035287d893e30e error="Cannot kill container 069ca1e8703d4d5836ac05e174044b3b4494432413eb1ab484035287d893e30e: connection error: desc = \"transport: Error while dialing dial unix /var/run/desktop-containerd/containerd.sock: connect: connection refused\": unavailable"
[013:06:47:18.368][I] time="2022-01-13T06:47:18.368299423Z" level=error msg="Container failed to exit within 10 seconds of kill - trying direct SIGKILL" container=069ca1e8703d4d5836ac05e174044b3b4494432413eb1ab484035287d893e30e error="context deadline exceeded"
[013:06:47:58.153][I] time="2022-01-13T06:47:58.152787261Z" level=error msg="Error sending stop (signal 15) to container" container=069ca1e8703d4d5836ac05e174044b3b4494432413eb1ab484035287d893e30e error="Cannot kill container 069ca1e8703d4d5836ac05e174044b3b4494432413eb1ab484035287d893e30e: connection error: desc = \"transport: Error while dialing dial unix /var/run/desktop-containerd/containerd.sock: connect: connection refused\": unavailable"
[013:06:48:08.168][I] time="2022-01-13T06:48:08.168030794Z" level=error msg="Container failed to exit within 10 seconds of kill - trying direct SIGKILL" container=069ca1e8703d4d5836ac05e174044b3b4494432413eb1ab484035287d893e30e error="context deadline exceeded"
[013:06:48:59.458][I] time="2022-01-13T06:48:59.458115251Z" level=error msg="Error getting services: This node is not a swarm manager. Use \"docker swarm init\" or \"docker swarm join\" to connect this node to swarm and try again."
[013:06:48:59.458][I] time="2022-01-13T06:48:59.458115251Z" level=error msg="Error getting servic
es: This node is not a swarm manager. Use \"docker swarm init\" or \"docker swarm join\" to conne
ct this node to swarm and try again."
[013:06:48:59.458][I] time="2022-01-13T06:48:59.458143918Z" level=error msg="Handler for GET /v1.
24/services returned error: This node is not a swarm manager. Use \"docker swarm init\" or \"dock
er swarm join\" to connect this node to swarm and try again."
[013:06:48:59.465][I] time="2022-01-13T06:48:59.464935043Z" level=error msg="Error getting servic
es: This node is not a swarm manager. Use \"docker swarm init\" or \"docker swarm join\" to conne
ct this node to swarm and try again."
[013:06:48:59.465][I] time="2022-01-13T06:48:59.464971043Z" level=error msg="Handler for GET /v1.
24/services returned error: This node is not a swarm manager. Use \"docker swarm init\" or \"dock
er swarm join\" to connect this node to swarm and try again."
[016:07:23:38.060][I] time="2022-01-16T07:23:38.059893918Z" level=info msg="ignoring event" container=62e8926c69e4aec95d23ef6b73221478b64ffe10bdcec161876c0e2e66633234 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
[016:07:23:38.532][I] time="2022-01-16T07:23:38.532424376Z" level=info msg="ignoring event" container=62e8926c69e4aec95d23ef6b73221478b64ffe10bdcec161876c0e2e66633234 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

Not sure if doing docker swarm init will have side effects.
Have you had this issue?

My fallback will be listening to port ping + docker availability with crontab, forcing a macOS restart, if both are responding with “unavailability”. But this will be some kind of “hard kill” and there might be another way solving it.

Update: I’ll let docker restart once a day automatically via crontab on MacOS.
Hope it helps:

45 20 * * * osascript -e 'quit app "Docker"' && open -a Docker

It could hang because of OOM. Worth to limit the RAM for the container, see https://support.storj.io/hc/en-us/articles/360026612332-Install-storagenode-on-Raspberry-Pi3-or-higher-version for example.

1 Like

That’s a good point. I’ll limit each container to 800 MB, too. Thx @Alexey

It’s not neccessary to specify 800MB, you need to figure out your conditions. This limit was for Raspberry Pi3 with 1GB of RAM and headless OS (without desktop). 800MB is a free RAM after OS.
Perhaps your device have much more free RAM?

1 Like

8 GB, whereof 6 GB are used - so 1.5+ GB free.

Then you can limit to 1GB or more - try and error method

1 Like