Storj + SMART long test takes 15 days!

jfikar · January 7, 2025, 1:07pm

Actually it is the opposite. I started the node in 2023 with a dedicated 18TB drive, but storj does not fill it. It occupies less than 8TB now. So why not to use the empty space for backups?

log.level: error
noatime - have it
indexing on drive? what you mean?

I’ll try these, is it correct?
pieces.file-stat-cache: badger
pieces.enable-lazy-filewalker: false

Alexey · January 7, 2025, 2:27pm

I wouldn’t criticize. However, I used my backup (and online) drives for Storj. because they was not full. Why not? I do use some of them to run my VMs, which I still needed, I got sometimes “database is locked” due to sometime a high load from my VMs. But I still run nodes on the already used drives.

I have my backups also on Storj, because it’s convenient and fast, also part of the data is backed up to other cloud providers, because you shouldn’t store all eggs in the same basket. But for other providers I also need to encrypt my data before upload, so not so convenient, but well.

snorkel · January 7, 2025, 5:27pm

Yes! Or as Docker run parameters:

	--log.level=info \
	--log.custom-level=piecestore=FATAL,collector=FATAL,blobscache=FATAL \
	--pieces.enable-lazy-filewalker=false \
	--storage2.piece-scan-on-startup=true \
	--pieces.file-stat-cache=badger

I’m not sure if you enable badger, it works without runing the piece-scan-on-startup once to activate it. But you can keep it on without it. If it works, great, if not, dosen’t matter.
Maybe Alexey knows the answear. But in your current situation, I won’t start the piece scan only to use badger cache. Wait for the long smart test to finish and than switch the piece scan ON.
It’s also better logging with my params, because you get some useful info with minimum log entries. You can see when the piece scan starts and finishes, and the duration, you cand see how long the retain jobs take (moving pieces to trash) etc.

Some commands:

docker logs storagenode 2>&1
docker logs storagenode2 2>&1
docker logs watchtower 2>&1
===
	# Walkers:
# used space filewalker
docker logs storagenode 2>&1 | grep "used-space-filewalker"
docker logs storagenode2 2>&1 | grep "used-space-filewalker"

# Garbage Collector filewalker
docker logs storagenode 2>&1 | grep "gc-filewalker"
docker logs storagenode2 2>&1 | grep "gc-filewalker"

# collector - sends pieces with user's expiration time to trash
docker logs storagenode 2>&1 | grep "collector"
docker logs storagenode2 2>&1 | grep "collector"

# retain - bloom filter, sends expired pieces to trash
docker logs storagenode 2>&1 | grep "retain"
docker logs storagenode2 2>&1 | grep "retain"

# trash - cleans the trash
docker logs storagenode 2>&1 | grep "pieces:trash"
docker logs storagenode2 2>&1 | grep "pieces:trash"

	# Other loggers:
docker logs storagenode 2>&1 | grep "database"
docker logs storagenode2 2>&1 | grep "database"

docker logs storagenode 2>&1 | grep "Database"
docker logs storagenode2 2>&1 | grep "Database"

docker logs storagenode 2>&1 | grep "blobscache"
docker logs storagenode2 2>&1 | grep "blobscache"

docker logs storagenode 2>&1 | grep "orders"
docker logs storagenode2 2>&1 | grep "orders"

Those database log entries help you spot db errors.

xgDkAbzkp9yi · January 7, 2025, 8:30pm

I disagree with your statement there, and I speak from experience. First thing I do, when I notice that SMART short test cannot fix a drive, is a SMART long test, or several of them done repeatedly. I do them until they fix everything that short test could not do. Usually it’s zeroing pending sectors into reallocated sectors, or repeating tests until all stats are stable and none of them are growing (pending, reallocated, and so on). This happens on old, or troublesome drives, of course.
I see SMART long test as excellent preventative maintenance tool, which can spot and fix bad sectors without waiting until I encounter them in real life load.

EasyRhino · January 7, 2025, 9:26pm

I thought SMART tests (including long) were geared so they were lower priority over real activity. So while the SMART test could take forever the real system performance shouldn’t be hurt?

for pending sectors, my understanding is the only way they can get “cleared” is if a write is attempted. then it will either be cleared or permanently reallocated.

jfikar · January 8, 2025, 9:16am

In principle yes, but in reality a background long-test degrades the drive performance. I’ll show some numbers when the test finishes.

You are right, one need to “overwrite” the unreadable pending sector in order to remap it to the spare sectors (if there are still some left). On the other hand, long and short test do only reading, which fails, but does not remap the sector.

So I use long test to spot a problem on a drive and then I usually recover, what can be recovered with ddrescue and then re-write the whole disk with badblocks -w -svf -b 4096 -c 4096. After several passes the pending sectors are usually remapped and disk appears fine again. But it does not always work. Depends on the disk damage.

xgDkAbzkp9yi · January 8, 2025, 1:06pm

In my experience, SMART long test can successfully trigger a remap of a pending sector. It’s quite useful that way.
My favourite tool to remap all sectors with 100% success, is a commercial program SpinRite, from grc.com. It’s much more predictable than using just SMART long test or badblocks, it gets the job done, but it’s not free.

snorkel · January 8, 2025, 1:29pm

When you start to see bad sectors maybeee it’s time to move the data to a new drive? Just saying…

xgDkAbzkp9yi · January 8, 2025, 4:35pm

Agreed. I am not recommending for anyone to use any drive with bad sectors. I only do it on redundant arrays, storing not so important data, and I monitor what’s going on.
But when you have a spare drive acquired for free or very cheap, you can squeeze more life out of it if you regularly check for bad sectors and fix them.

My own example:
Bought 2TB drive off of eBay for £5.59, because seller has detected bad sectors on it
I fixed them all using SMART long test and SpinRite read/write runs
Stats today, 3.5 months later:

SATA ST2000VN004-2E4164: 30°C
Power On: 53528 hours
Power Cycles: 250 (214 hours/cycle)
Reallocated Sectors: 184
Load Cycles: 407552 (182/day)

Drive is a part of Btrfs software-Raid1 array, if I dies or sends back corrupted data, it will be all corrected on the fly, Btrfs does checksumming, preventing storing or reading bad data.

jfikar · January 10, 2025, 2:32pm

The test finished with a plot twist: there are errors on the disk!

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%     26626         83878328

Looks like one of the files storj/storage/blobs/xxxxxxx/o4/yyyyy.sj1 is not readable. It has barely 53kB and date of Sep 3. What are these files? Can it be removed?

Is there a way to recheck all the storj files?

I wasn’t able to measure the difference in disk performance during the test and without it. Maybe the difference is very small; that is why the test takes so long.

BTW I have another 18TB HDD, which farms chia. That one finishes the smart tests very quickly. Maybe because the chia load is read-only and not so intensive?

jfikar · January 10, 2025, 2:34pm

Thanks, it was useful. However, I have just one docker container storagenode. I don’t have watchtower nor storagenode2. Is it a problem?

Roxor · January 10, 2025, 2:40pm

Those .sj1 files are what stores customer data: what you’re being paid to keep . Stop Storj and ‘fsck -y’ the filesystem: it will probably get moved to lost+found. Yes you can delete them… but remember the Satellites are performing periodic audits so it’s possible (though extremely unlikely) your node would be asked for some data from that file… so could have a tiny drop in audit score.

But most likely absolutely nothing will happen if you delete it.

jfikar · January 10, 2025, 2:47pm

Unfortunately, this is a XFS, not btrfs or ZFS, so the filesystem check only checks the metadata integrity, not the file content integrity. So fsck will probably not do anything.

Anyway, I thought the customer data are in bigger chunks than 53kB.

Can I somehow ask the Satellites to perform an audit of everything they think I have and clean the corrupted/unusable/unreadable data? Or can it be done locally?

Roxor · January 10, 2025, 3:00pm

You’re not trying to repair the file: it’s probably unusable. You’re just making sure your filesystem is consistent and corrupted entries are purged. (I haven’t used XFS: is xfs_repair maybe the equivalent?)

I don’t think the Satellite can audit everything: I seem to remember that the client gives it a pile of challenges to use… but that doesn’t cover the state of every one and zero they uploaded. The satellite tracks overall availability of data (like there are enough online sources of segments that a customer (or the repair process) could rebuild the data.

Like… the Satellite cares if things would be OK if your entire node disappears. If it’s covered for that… a few corrupted files here and there are fine. A lot of corruptions will get caught by audits.

What will most likely happen… is that your node is never asked for that .sj1 (neither by the customer nor an audit, nor for repair)… and eventually the customer deletes the source data (like erasing an old backup)… and a bloom filter will finally tell your node to trash it (but it’s already gone).

I don’t believe there are local utils to check everything: but maybe a Storjling will confirm? I hope there’s no way to ask a satellite: because that would be a heavy burden.

snorkel · January 10, 2025, 3:37pm

Storagenode2, 3, etc are the names of the containers, if you preffer to name them like that, and if you have more than 1.
I run 2 on each machine, so that’s why my notes are made for 2 containers/nodes.
I don’t realy know about watchtower, if it is requested anymore.
There were many changes about how the cotainers are updated, since I started, when the watchtower was necessary for updating the node. Maybe Alexey or another storjling could enlighten us:
Is Watchtower still necessary for Linux Docker installs or Windows installs? Or we can remove it?

Toyoo · January 10, 2025, 3:51pm

That’s customer data you are storing. You start deleting it, you risk audit failures. The audit system is currently tuned to quickly disqualify any node that loses 4% of data, but tolerates loss of 2% of data.

If this is a single file, do whatever you want, Storj has enough redundancy. If this is millions of files, well, the node is toast.

In theory each sj1 file has a header with a checksum, so it would be possible to write a locally-run tool to proactively check state of each file, but doing a scan over all files is impractical and rather unnecessary with redundancy Storj as a network provides.

jfikar · January 10, 2025, 5:05pm

Oh, that’s interesting. Is the header described somewhere? And is it checksum in the encrypted state? A tool to verify all the files will be useful, as the users do not need to store corrupted or unreadable files and can thus fit more of good files.

Is there in theory any signaling from the node to the satelite “I don’t have any more that one file as it got corrupted, so don’t count on me”?

Why that would be a heavy burden on the satelite? It just sends the filenames and checksums. The work is then done by the node.

I don’t see much income from Repair&Audit. It’s about 3%. But I think that is for replacing a data not available from other nodes, right?

jfikar · January 10, 2025, 5:06pm

I see. Updating seems to be automatic and working just for the storagenode container.

BTW is there a way to run storj read-only? I can recover the HDD by making a ddrecover image only if the disk content does not change. And at the same time I would like to serve the files I already have not to be disqualified, as the 18TB recovery can take considerable amount of time.

snorkel · January 10, 2025, 5:22pm

Only by going offline, meaning stoping the container.

arrogantrabbit · January 10, 2025, 5:22pm

No, there isn’t.

Migrate node using rsync as described in the documentation instead. Then it does not matter how long it takes.