have you tried turning it off and then back on again
a full shutdown of the machine can sometimes get rid of weird errors… but yeah no clue here really…
I have LVM configured and its an LVM volume of 15TB using XFS. This is sat ontop of a hardware raid setup. Hardware raid controller is not showing any errors or failed disks.
The filesystem is secondary to the partition table. MBR with 512 byte sectors is limited to 2TB. If the partition is larger than 2TB GPT should be used. If MBR is used, the sector size will need to be increased to 4096 bytes.
sudo gdisk -l /dev/your_disk
Should provide information on the partition table type.
Here is the output of that command:-
GPT fdisk (gdisk) version 0.8.8
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present
Creating new GPT entries.
Disk /dev/sdo: 175786426368 sectors, 81.9 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): A5AF73A4-C3AE-490B-B4A1-6986692F2758
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 175786426334
Partitions will be aligned on 2048-sector boundaries
Total free space is 175786426301 sectors (81.9 TiB)
Number Start (sector) End (sector) Size Code Name
As you can see there is no partition information because LVM takes over the whole disk and there is no partition on it as you would traditionally think of it.
So it seems that the metadata on the XFS volume had become corrupted and an XFS_Repair couldn’t fix it.
Working through the entire volume to see if there are any bad blocks but looks like there are so many errors coming back from the xfs_repair I doubt it will be able to recover it and I dont have faith in it currently.
So it seems that the metadata on the XFS volume had become corrupted
I realize it doesn’t help the current situation… but ZFS helps with that problem.
Adding for other readers: Perhaps, it would be useful to add a recommendation in the documentation, if it hasn’t already been done -I haven’t checked, for Linux Docker nodes to run ZFS for large storage systems…
There are so many various configurations to choose from… but the more I see and read on the forum, the more sure I am that ZFS is the way to go for a storj node.
i like ZFS but also kinda hate that one gets locked into the pool sizes a bit to easily…
but its rock solid, i cannot see a zfs pool dying without neglect…
tho i have only been using it a short time, but i’ve been so mean to it, that it is kind of riddiculous…
pulling the power, pulling the drives from my raidz while running, even down below the redundant drives… its not happy about it… but manage to not loose a byte.
it’s not so easy to working on smaller scales tho… you really want to add drives in sets of like atleast 4 and it quickly adds up…
I’ve been able to restore my storagenode, but want to pass this info along in case it helps anyone.
My neighborhood lost power for about 15 minutes this morning, taking my Linux host down with it. When power returned, the host booted itself back up, and docker restarted the storagenode container (thanks to --restart unless-stopped). Unfortunately, the container boot-looped every 30 - 45 seconds. The final error message from the container was
I found this forum item and tried the recommendation for xfs_repair.
`$ xfs_repair -v /dev/sdd
xfs_repair: /dev/sdd contains a mounted filesystem
xfs_repair: /dev/sdd contains a mounted and writable filesystem
fatal error – couldn’t initialize XFS library
`
I wasn’t confident after seeing that fatal error, and took a chance on rebooting and letting the system tell me about fs probs. Fortunately, rebooting did the trick (others here may be able to explain why relative to fsck or other during boot).
My system is now successfully running
v1.5.2 (storagenode:latest as of this morning)
hosted by Docker version 19.03.8, build afacb8b7f0
on Ubuntu 20.04 LTS (Focal Fossa)
Storj data hosted on single drive, Ext4, with 21.2% free space
The information above was premature. It turns out that storagenode has been running for 20-30 minutes, then failing with structure needs cleaning. When docker restarts the container, it fails within the first minute, and then it’s back in the bootlooping problem. After rebooting the host, storagenode will again run for longer periods. As I have time today, I’ll try to further diagnose and resolve.