I am unable to successfully execute fsck for the NVMe drive. The drive is exclusively used for storagenode so nothing else is on the disk.
$sudo e2fsck -f /dev/nvme0n1p1
e2fsck 1.47.1 (20-May-2024)
storj3: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>? yes to all
e2fsck: unable to set superblock flags on storj3
storj3: ********** WARNING: Filesystem still has errors **********
$sudo e2fsck -b 512000000 /dev/nvme0n1p1
e2fsck 1.47.1 (20-May-2024)
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
storj3: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
Superblock needs_recovery flag is clear, but journal has data.
Recovery flag not set in backup superblock, so running journal anyway.
e2fsck: unable to set superblock flags on storj3
storj3: ***** FILE SYSTEM WAS MODIFIED *****
storj3: ********** WARNING: Filesystem still has errors *********
I have tried with backup superblocks but they all fail.
nvme0n1 259:0 0 3.6T 0 disk
└─nvme0n1p1 259:1 0 3.6T 0 part /mnt/storj
Just to reiterate, I need help to fix my new node as I get more “file does not exist” errors. While trying to fix the issue by running fsck I get the above errors.
Since, there are many benign “not exist”-warnings around. Because if a file has been deleted by a customer before the TTL ends and is being removed by a bloom filter, it will show up as an error later when TTL is over.
In this case you should rerun e2fsck till it doesn’t say anymore there are weird in the filesystem left.
It cannot write a super-block on your drive. Meaning, it either rejects writes, or accepts them and drops on the floor.
Your NVME drive is likely dead. Becoming read-only is a safe way most decent drives die. What is the remaining endurance does it report? What’s the rest of the SMART data it reports? smartctl -a /dev/nvme0
What brand is the drive? 0x10ec is Realtek. OUI 00:E0:4C is also Realtek. Realtek is pretty new to SSD controllers. Stuff they are not new to – e.g. ethernet controllers – are known shit. I’m wondering who made that SSD? Who is so careless to “forget” to update the PCI IDs to their own, and probably use whatever buggy firmware they found.
If I were you I would return it and buy something else. Maybe buy something used from ebay but made by the company that has established reputation.
It lost data – strike one. It can’t repair filesystem – strike two. It’s out.
This is a bad idea. I would explicitly configure it with 4096 size. Because 512 is a lie.
Have a look at ebay. I recently bought a pair of Intel DC P3600 2TB SSDs with 98% endurance remaining for under $100.
I’m pretty sure you also want to ensure a partition is also aligned to 4k, but I don’t have much experience with linux filesystems.
A common way to produce a fake high capacity drive is to alter firmware to report higher capacity than physically available. The disk will work initially, until you fill the available flash, and then it wraps around to the beginning, at which point “bad things happen”.
I would never use a noname drive, let alone pay money for one. It’s never worth it.
If you can’t return it – open it up, and count how many and what capacity NAND chips are there. You might be surprised… Then submit dispute with your credit card provider since there is a differennce between poor quality product and fake.
I’ve been a victim of oversized USB flash: but yeah I guess NVMe drives could misreport the same way. I know there were apps (h2testw, validrive, CapacityTester etc) to verify… but I’m not sure which are USB-only (or if they even care). Maybe f3 on Linux?
I had previously performed a surface test using Easus Partition Master. IIRC it was progressing at ~61GB per min and lasted for an hour. So 4000GB/61GB = 65 minutes which makes it seem like its indeed 4TB.
It’s not the same thing. Surface test writes a bunch of data and reads a bunch of data – it will succeed if the disk lies about size, as long as it’s size is larger than the block that is being written and read. The purpose of these tools is to verify that every logical sector is writable and readable, to find defects, not deceit. (BTW, it’s quite pointless to do it on SSD because SSD remaps everything all the time as part of normal operation; it made sense on HDDs)
Dedicated tools like f3 (and I’d really disheartening that we had to develop such tools in the first place) verify that after you have reached writing the end of the drive, the originally written data across the whole drive is intact – to do that they write random data, and then read and compare it with random data with the same seed.
You can easily do that yourself I you want – send zeroes to openssl enc with fixed password – the sequence of bytes on the output will be random and always the same. Then read from drive and compare.
Edit:
This is how can you do that (note, this is obviously destructive, all data will be destroyed):
Then diff the content of the disk with the same sequence. You don’t have to read the whole disk, first few GB shall suffice, something like this (replace the 10,485,760 with whatever, up to a disk size):
It was already done correctly from scratch when the drive was first put into service and yet it still failed. Repeating the same experiment and expecting different outcome is not wise.
If the drive lies about its size, freshly formatting it will make it work again, until the amount of data written fits in the flash that exists. Then it will fail in exact same way. You still can’t trust it.
“Low level formatting” was a thing in 1990ies where disks had mechanical stepper motor to move the heads. They could format themselves from clean state. Since then, hard drives rely on service tracks on the media to guide the heads. These tracks are written at the factory. if they are lost – your drive is a toast. When it comes to SSD – low level format is meaningless. It does not have tracks, there is nothing to format. SSD reads and writes data from addresses, that mimic logical blocks to be drop-in replacements for HDDs. NVME SSDs dont’ even do that, so that term completely loses its meaning.
And the most important bit:
If the media “started to work” it still goes to the e-waste. There are no second chances. It already failed. It can never be trusted again.
To be precise, no media can be trusted. That’s one of the reasons we do backups. It just has benefit of the doubt, that it hasn’t failed yet, and it may work another day.
Media that failed – proved it’s bad. Why would you trust bad media?