A very slow fsck that started itself on boot

Hello

Storj node on it’s own disk on a VM on Proxmox, after proxmox crashed, after restarting the VM a disk fsck was run. Fsck has been running for over 2 days and is at 80%! The HDD is enterprise class, 4TB and about 90% full, there is nothing in SMART that would indicate a disk failure. The disk is LVM, file system etx4. Via Proxmox, that disk is seen to have Disk IO showing 1/1 MB/s read/write. Iostats on proxmox shows disk activity up to 200 r/s or w/s, which is typical for that disk at maximum load (eg when filewalker is running). I think that the first 75% of the fsck passed normally in a couple of hours and now it is running very slowly, but it is still progressing. Otherwise manually started fsck of such a full disk Storj data does not last more than 6-7 hours.

Does anyone have any advice? Is it wise to stop? I estimate that if it continues at this slow pace, it will take about 7-8 more days.

Thanks.

So Linux/Proxmox is fsck’ing a filesystem that wasn’t shut down cleanly (and this has nothing to do with Storj)?

Let it run: fsck speeds can be lumpy… it may have finished by the time you’ve read this (even if it looked like it had days to go).

That. This is actually a Linux issue, Storj is the only “guilt” for the large amount of small files.

Thank you.

1 Like

It is possible that fsck is out of RAM and the system is swapping. A good rule of thumb is at least 1G of ram for every 1TB to fsck. How much RAM do you have in there?

The VM has 2 GB of RAM. I doubt that swap is in use, Proxmox reports only 1.12GB in use. It is currently at 86.8%, I guess it will arrive before the disqualification.

Memory requirements will vary depending on filesystem size, number of files, allocated data and whatever errors fsck may find.

Good luck, hopefully you are back online soon!

2 Likes

Thanks. Shortly after the last report, fsck accelerated and finished within 2 hours. Storj is online now :).
I found that at the time Proxmox crashed, that Storj was running filewalker, apparently it caused a nasty disk mess…
I think the main reason fsck slowed down was the lack of RAM!

2 Likes

Possibly, hard to tell now of course. Some errors, if encountered, prompts fsck to do additional work which could be the case here.

However a simple crash shouldn’t be allowed to leave a huge mess. Are you using writeback in the proxmox settings for the disk? Consider switching to writethrough. This should let ext4 just do a journal playback if things crash again.