ZFS pool didn’t re-mount automatically

SGC · May 9, 2020, 3:00pm

if zfs complains that the folder isn’t empty, then you most likely had software running that accessed / created the folders and or wrote data or created more folders in the location where it’s trying to mount.
thus zfs for safety reasons assumes you made a mistake.

the folder can be deleted, if it / they are empty…

verify that the mountfolder is really empty

$ du -hs /zpoolmountfolder/*

if the folder is littered but essentially empty, you may proceed to

$ rm -rf /zpoolmountfolder

and then quickly after repeat

$ zfs mount -a or zfs mount poolname or zfs mount -f poolname

the folders should now be restored to their original location.

you might be able to do this with just $ zfs mount -f poolname
but forgot to test it last time i had the issue.

Regarding if the storagenode goes all crazy without the zpool mounted, well thats a good reason to keep the identity on the storagenode, but even without doing that, my storagenode simply refused to run when i ran into the issue.

and it was my own fault zfs couldn’t mount it because i had run the OS without all of the zfs pool drives connected, you could try to limit how much stuff there is in the OS which directly goes in and tries to manage the things on the zfs pool, but seems rather inconvenient.

Windows7ge · May 9, 2020, 3:40pm

The node is online again the problem now is that the pool does not re-mount after reboots. zfs-mount.service is refusing to start when the system starts and i don’t know why. The ONLY recent system change was replacing a disk in the pool and that was 3 weeks ago.

As donald.m.motsinger said:

This explains why the directory still existed when the pool dis-mounted for a still unknown reason.

Yes. Sorry I’m still a GNU/Linux novice. I’m slowly getting better at troubleshooting issues by myself but I still don’t have a complete understanding of the terminologies.

When I mounted the ZFS pool I told docker to use the main/parent directory of the pool. The root of the disk(pool) as you said.

So it would report that the directory doesn’t exist and exit with an error code yes? Would the following actions work or cause problems?:

sudo docker stop -t 300 storagenode
mv /storj/* /storj/another-directory

edit the docker launcher: --mount type=bind,source="/storj" to --mount type=bind,source="/storj/another-directory

Then re-run the launcher. Is that something that would help prevent storj from trying to make a new node if the pool were to un-mount again? Of course I need to diagnose why it refuses to mount on start-up for no reason but still preventative measures.

@SGC After the pool unmounted itself STORJ re-created the files/folders it needs in this directory which then put the files on the boot drive instead of the ZFS pool. This happened because even though the pool unmounted the directory doesn’t disappear. That’s where the data came from and why it wasn’t empty.

What needs diagnosing now is why the pool refuses to mount on system startup now.

SGC · May 9, 2020, 3:44pm

my zfs pools are happy to mount automatically…if they can… tho zfs has a lot of fault tolerances, so it will not allow you to do anything … stupid

does the pool still exist / register?.. else you might need to run a zpool import poolname first
else i don’t think it will allow you to mount it, but i’m far from an expert on zfs

Windows7ge · May 9, 2020, 3:48pm

So long as the mount-point either doesn’t exist or is empty ZFS is happy to let me manually import the pool but automatically on start-up it for no reason doesn’t want to do it anymore.

One other change that took place around the same time was the node update to 1.3.3 but I have reason to doubt that would impact a ZFS storage pool. Very unrelated processes.

donald.m.motsinger · May 9, 2020, 3:54pm

Windows7ge:

So it would report that the directory doesn’t exist and exit with an error code yes? Would the following actions work or cause problems?:
sudo docker stop -t 300 storagenode
mv /storj/* /storj/another-directory
edit the docker launcher: --mount type=bind,source="/storj" to --mount type=bind,source="/storj/another-directory

Then re-run the launcher. Is that something that would help prevent storj from trying to make a new node if the pool were to un-mount again? Of course I need to diagnose why it refuses to mount on start-up for no reason but still preventative measures.

Yes, that’s pretty much it. You’ll need to create /storj/another-directory first and when you move the files you’ll get the error message that you can’t move /storj/another-directory to a subdirectory of itself. That’s fine and all the other files should have been moved. And double or triple check that the storagenode is not running when moving the files.

SGC · May 9, 2020, 4:01pm

zfs is software… your system should load the zfs driver and launch that before it tries to mount the zfs pool… might be something related to that…
but really i don’t see why you would try to force it to mount during boot, it should figure that out all by itself… i might not be on ubuntu, but my proxmox is also debian and it checks and mounts it during boot… no clue about what commands it uses tho.

maybe don’t use the mount -a command and use a specific pool and maybe throw in an -f to force it
tho that does overwrite the failsafe’s but not sure what they are good for in this case aside from making sure zfs doesn’t mount on top of existing data…and thus deletes it or whatever…

Windows7ge · May 9, 2020, 4:15pm

This is what I saw during the VM’s boot sequence.

My hypervisor is PROXMOX. Really like it. The ZFS commands are the same across both OS’s at least assuming you installed the zfsutils-linux package. Maybe other packages are different. I don’t know.

@donald.m.motsinger
sudo docker container ls
I’ll make sure it’s not running before I alter the storage pool.

Then just to address everyone here who hops in I rebooted the VM and for some reason it mounted successfully. I couldn’t be more confused.

It unmounted by itself
It refused to remount on start-up
It starts remouting by itself again

So confused. Bit flip? Gamma ray burst hit my house? EMP? No idea.

As an added note STORJ Node no longer says I’m suspended on any Satellite. So that’s good. Fingers crossed no customer data was written to the unmounted /storj directory.

peem · May 9, 2020, 4:29pm

Your virtual machine has direct access to ZFS? Is it also virtualized ZFS? Or maybe you use LXC?

SGC · May 9, 2020, 5:41pm

depending on the state of the pool when you do a zpool status, you might want to simply import it, on a related note… the method of mounting zfs pools can be a bit weird in linux, because of the /dev/sdX definitions can change and if proxmox / zfs has like say sdb as a vdev for a particular pool then it want that exact vdev / disk to be on sdb for it to mount / import the zfs pool correctly.

for this reason it’s good practice to use different ways of identifying the exact disks, personally i use /dev/disk/by-id to identify them, but it’s not a particularly great solution, because it can be tricky to identify which disk is which in some programs and or tools, that can be fundamental for working with the drives…

however no matter which identifier you will use, it just needs to be one that is unique to each disk… you can label their partitions or the drives i believe (only 2 months into really using linux on a daily basis so)
there are several options you can find… the labeling one is good for if you can see the drive or bays then put a sticker on them with a name… so you can see which drives act up on a later date and thus much easier replace them physically.

ls /dev/disk/by-id
worked pretty well for me… else i use lshw -class disk
then depending on if its sata or sas or whatever, then you simply grab the ata… name or scsi name of the drives and use that like so

like this… you should try to remount the drives based on a similar method, that will make you able to shuffle the drives around and the sdX name of the drives become fully irrelevant to ZFs
duno if that is what is going on, but it could be…

example command i had nearby
zpool create zroot /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_531RH5DGS /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_Z252JW8AS /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_99QJHASCS /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_99PGNAYCS

it won’t always make you life simplere, but most of the time it will…

using this i can literally pull my zfs drives and switch them around on a live system and it just reconnects it to the pool when it hits the backplane… with a bit of complaining and resilvering…

i can also highly recommend this lecture.
i sure learned a lot of do’s and don’t from it.

Windows7ge · May 9, 2020, 7:11pm

Can’t use Docker in a LXC container. At least not storagenode. It’s possible if I enabled Nesting that it might work but I went with HBA pass-though and full virtualization as my other option. If I lose this node I will try LXC again.