Migrated node after a week of downtime, now it won't start

I had about a week or more of downtime recently, and I’m trying to get back up and running. I migrated my data from one filesystem to another (kept the same mount point at the end), and when I try to start my docker container I get errors saying the disk doesn’t exist. Error log here since I “can’t post more than 2 links”

I realize my node might have been disqualified but I’m trying to figure out how to come back online with the data I have, or if I need to start again from scratch, how do I do that?

do you have windows gui node or docker?
it looks like you have docker on linux?

Error: Error starting master database on storagenode: database: disk I/O error: no such device
thismean you have mountin problems, it wont find your hdd

Did you correctly mount your device?

Is your time set, or just check your node logs it might show some further errors that can help trouble shoot this

I am on Docker on Ubuntu Linux. My device is correctly mounted- everything else is working fine. I’m wondering if there was some error maybe permissions changed or something got corrupted during the move?

I think based on the error, you’ll need to convince us that the drive is properly mounted. Are you able to show the output of an ls -al at the location that’s bind mounted to /app/config in the container?

Compose–

  storagenode:
    restart: unless-stopped
    ports:
       - '28967:28967'
       - '14002:14002'
    environment:
      - WALLET=0x5-------7
      - EMAIL=------
      - 'ADDRESS=----.url:28967'
      - STORAGE=2TB
    volumes:
      - /opt/docker/storj:/app/identity
      - /data/Storj:/app/config
    image: 'storjlabs/storagenode:latest'

Nothing about my compose file has changed.

Mount–

$ sudo ls -al /data/Storj/ 
total 64
drwxr-xr-x  4 root  root    124 Dec 31 14:26 .
drwxrwxrwx 22 media media  8192 Jan  5 14:26 ..
-rw-r--r--  1 root  root   8447 Oct 17 21:29 config.yaml
drwxr-xr-x  4 root  root     47 Dec 30 00:45 orders
-rw-r--r--  1 root  root  32768 Dec 27 23:21 revocations.db
drwx------  6 root  root   4096 Jan  5 11:00 storage
-rw-r--r--  1 root  root   1204 Dec 27 23:21 trust-cache.json

Something doesnt look right I thought it should look like this

   volumes:
      - type: bind
        source: /mnt/storj/identity
        target: /app/identity
      - type: bind
        source: /mnt/storj/storage
        target: /app/config
2 Likes

I guess I can try that but it was working fine for several months with exactly what I posted above.

This is recommended for the edge case where you could start up the node but the volume isn’t mounted (i.e. directory doesn’t exist on the host). When using volumes without bind (i.e. equivalent to -v in a docker run command), if the directory doesn’t exist on the host, Docker will create an empty one. This could lead to a quick DQ because Storj would see that a bunch of data is missing and you’ll fail all audits. Believe me, I know, I lost my first node this way. Storj has recently made some improvements to help circumvent this disaster scenario.

With a bind mount, Docker will fail to start the container if the host directory doesn’t exist. It’s an easy change in your compose file and I’d recommend following the format specified by @deathlessdd with your source directories /opt/docker/storj and data/Storj. Having said that, it’s important to understand that that with a Docker bind mount, the directory has to exist, but if that directory itself is a mount point for a drive that you’re storing Storj data on, if it’s not mounted, Docker will still start the container with the existing empty directory. Ideally, to use your path as an example, your drive is actually mounted at /data and the directory /data/Storj would not exist if the drive wasn’t mounted. This would provide the protection intended by the bind mount if /data was not mounted.

In any case, looks like the disk is there. Just as another sanity check, can you verify that ls -al storage shows a bunch of .db files and some directories?

I’ve updated the config as suggested- the storage directory looks fine::

$ sudo ls -al /data/Storj/storage/
total 3124
drwx------ 6 root root   4096 Jan  5 11:00 .
drwxr-xr-x 4 root root    124 Dec 31 14:26 ..
-rw-r--r-- 1 root root 835584 Dec 27 18:31 bandwidth.db
-rw-r--r-- 1 root root  32768 Dec 27 23:21 bandwidth.db-shm
-rw-r--r-- 1 root root  32992 Dec 27 23:21 bandwidth.db-wal
drwx------ 7 root root    330 Oct 17 21:47 blobs
drwxr-xr-x 2 root root    240 Dec 30 00:45 garbage
-rw-r--r-- 1 root root  32768 Dec 27 23:16 heldamount.db
-rw-r--r-- 1 root root  32768 Dec 27 23:25 heldamount.db-shm
-rw-r--r-- 1 root root 238992 Dec 27 23:25 heldamount.db-wal
-rw-r--r-- 1 root root  16384 Dec 27 23:16 info.db
-rw-r--r-- 1 root root  32768 Jan  6 19:09 info.db-shm
-rw-r--r-- 1 root root  32992 Dec 27 23:21 info.db-wal
-rw-r--r-- 1 root root  24576 Dec 27 23:16 notifications.db
-rw-r--r-- 1 root root  32768 Dec 27 23:21 notifications.db-shm
-rw-r--r-- 1 root root  32992 Dec 27 23:21 notifications.db-wal
-rw-r--r-- 1 root root  32768 Dec 27 18:31 orders.db
-rw-r--r-- 1 root root  32768 Dec 27 23:22 orders.db-shm
-rw-r--r-- 1 root root  32992 Dec 27 23:21 orders.db-wal
-rw-r--r-- 1 root root 368640 Dec 27 23:08 piece_expiration.db
-rw-r--r-- 1 root root  32768 Dec 27 23:30 piece_expiration.db-shm
-rw-r--r-- 1 root root  98912 Dec 27 23:30 piece_expiration.db-wal
-rw-r--r-- 1 root root  24576 Dec 27 18:31 pieceinfo.db
-rw-r--r-- 1 root root  32768 Dec 27 23:21 pieceinfo.db-shm
-rw-r--r-- 1 root root  32992 Dec 27 23:21 pieceinfo.db-wal
-rw-r--r-- 1 root root  24576 Dec 27 23:16 piece_spaced_used.db
-rw-r--r-- 1 root root  32768 Dec 27 23:23 piece_spaced_used.db-shm
-rw-r--r-- 1 root root  57712 Dec 27 23:23 piece_spaced_used.db-wal
-rw-r--r-- 1 root root  24576 Dec 27 23:16 pricing.db
-rw-r--r-- 1 root root  32768 Dec 27 23:22 pricing.db-shm
-rw-r--r-- 1 root root  74192 Dec 27 23:21 pricing.db-wal
-rw-r--r-- 1 root root  24576 Dec 27 23:16 reputation.db
-rw-r--r-- 1 root root  32768 Dec 27 23:23 reputation.db-shm
-rw-r--r-- 1 root root  74192 Dec 27 23:22 reputation.db-wal
-rw-r--r-- 1 root root  32768 Dec 27 18:30 satellites.db
-rw-r--r-- 1 root root  32768 Dec 27 23:21 satellites.db-shm
-rw-r--r-- 1 root root  41232 Dec 27 23:21 satellites.db-wal
-rw-r--r-- 1 root root  24576 Dec 27 23:16 secret.db
-rw-r--r-- 1 root root  32768 Dec 27 23:21 secret.db-shm
-rw-r--r-- 1 root root  32992 Dec 27 23:21 secret.db-wal
-rw-r--r-- 1 root root     32 Dec 10 11:35 storage-dir-verification
-rw-r--r-- 1 root root  81920 Dec 27 23:16 storage_usage.db
-rw-r--r-- 1 root root  32768 Dec 27 23:25 storage_usage.db-shm
-rw-r--r-- 1 root root 197792 Dec 27 23:25 storage_usage.db-wal
drwx------ 2 root root  65536 Jan  4 12:02 temp
drwx------ 7 root root    330 Oct 25 22:00 trash
-rw-r--r-- 1 root root  20480 Dec 27 23:16 used_serial.db
-rw-r--r-- 1 root root  32768 Dec 27 23:21 used_serial.db-shm
-rw-r--r-- 1 root root  41232 Dec 27 23:21 used_serial.db-wal
-rw-r--r-- 1 root root      0 Dec 25 18:04 write-test172937416
-rw-r--r-- 1 root root      0 Dec 25 15:39 write-test837270165
-rw-r--r-- 1 root root      0 Dec 25 20:03 write-test936430880

edit: forgot to mention, I’m getting the same error still as before

Could be a corrupt database. I’m not able to help out much anymore. Perhaps @Alexey could support?

Perhaps you can check from inside the container.

docker exec -it storagenode ls -al /app/config

You should see what’s in your storage location as well.

That’s hard to do when the container fails to start

looks like docker does not have permission on the data folder from your output of “sudo ls -al /data/Storj/storage/” only root user has full permission of read and write.
and I think docker does not run as root. you may need to run “sudo chmod -R 777 /data/StorJ/storage/” to allow all user read and write access to storj data folder.

I guess that something works since it managed to write there, perhaps it is the other path /opt/docker/storj ? Only guessing because I can’t see your log due to a proxy firewall

The container just needs to exist for that to work. It doesn’t need to be started.

Unfortunately it must be started to exec any command.

1 Like

My bad, I could have sworn I executed stuff without the container running before.

I’m about ready to give up and restart- should I just delete everything in both directories on the host and restart the container or do I need a new id?