Ungraceful but polite exit?

AnoStor · June 13, 2023, 1:30am

I had a node totally fail. I tried but have no way to recover it anymore and had to nuke the underlying virtual volume to preserve other workloads that were suffering from the filesystem repair disk thrashing.

Is there a “polite exit”? eg. don’t expect my node to come back - start the repair now type of notice? Seems like that would help improve network reliability. Albeit in fringe cases…

Trying to be a good operator!

More info below:

The primary boot volume for the node is fine but the data volume started doing the weirdest things recently and would continually result in corruption of the underlying filesystem (NTFS, I have a different node as well but this is not related to that recent REFS post).

I tried to recover it multiple times but it’s a lost cause… Couldn’t keep it online after correcting things each time. I have no idea what caused it but I am working to track it down. The underlying redundant array of drives is fine and has no faults. It was entirely filesystem corruption. Other stuff on the same array but different FS (and even some with the same FS) have been flawless.

Like many others I’ve seen recently the whole array has been impacted by more writes than usual after a recent update that I can’t explain but suspect is related.

Luckily(Unluckily?) this node was only 6ish months old and still vetting. Only had about 3TB and $15 withheld.

Thanks in advance!

BrightSilence · June 13, 2023, 2:33am

There is nothing you can or have to do. Once your node is offline for more than 4 hours, its pieces will be marked as unhealthy and repair will start for all segments dropping below the repair threshold amount of healthy pieces. Basically the network assumes nodes won’t come back after they’re offline for more than 4 hours.

Hope you sort out your file system issues, but you don’t need to worry about signalling the network of a lost node.

AnoStor · June 13, 2023, 2:46am

Is it really only 4 hours?

I thought it wasn’t until suspension, looks like I need to read up.

I know 4hrs is the rough downtime allowed per month but how that’s related to everything else I’ve seen mixed information and I clearly didn’t read deep enough about it all to know better.

I knew I didn’t have to do anything, now it sounds like repairs kick off much earlier than I expected and negate the potential impact I was worried about. Cool!

Alexey · June 13, 2023, 2:50am

After 4 hours offline pieces on offline node considered as unhealthy and could trigger a repair job, if the number of healthy pieces would below a threshold.
The suspension is the next level, it’s applied when the online score become lower than 60%, your node would not have any ingress until the online score become greater than 60%.

Just curious - what RAID and what filesystem?

AnoStor · June 13, 2023, 3:06am

Windows VM running on a Synology DS 1821+
8x10TB drives in SHR2 (RAID 6 or RAID Z2 rough equivalent) with 2x2TB nvme in RAID1 as read and write cache.
VHD volume was thin provisioned with space reclamation on but otherwise standard NTFS.

I suspect what caused the problem was the reclaim space setting within VMM, but I need to test further. I know it does bad things to boot volumes but have never had problems for any other workloads. Just another ghost

It wasn’t until this recent update and the increased write workload that it started having problems.

Alexey · June 13, 2023, 3:15am

Increased write workloads usually related to the customers’ activity, not the update. And if setup is not capable to handle it, perhaps setup was made not reliable enough.
It could be related to BTRFS or maybe thin provisioning, or both.

The node doing usual things - writes received data, reads data to give to the customer.
On restart there is a filewalker process to update caches (databases) with actual usage and free space in the allocation, and another one to check expiration of pieces (in the trash). From time to time it also run a garbage collector process. Most of these loops are reads, not writes though (except databases).

AnoStor · June 13, 2023, 3:35am

Yep, like I said, I can’t explain it.

The system could “handle it” from a disk throughput and IO perspective, it just eventually hit corruption and halted.

The relative performance at peak that it was using was 30-40mb/sec and 100-150 IOPs

I could throw another 400mb/sec contiguous writes or 100mb/sec mixed read/write and 100 IOPs at it and see no impact to the STORJ VM’s numbers. (8k raw video transfer from a workstation and bulk image processing respectively)

Another, very very different and physically distant node also saw increased write traffic around the same time (see my other post about REFS) and I suppose it could just be network patterns but it’s been sustained for some time now.

Cheers!

Alexey · June 13, 2023, 3:54am

storagenode doesn’t produce continuous writing or reading, it’s usually random access to relatively small files.
So, only IOPS and access time with random access are matter.

this only confirms that customers (or maybe repair workers) produces this write traffic.
You may also just check an ingress traffic as well.

AnoStor · June 13, 2023, 3:58am

Well whatever happened I won’t be running it with space reclamation on again at the least. Probably just gonna migrate all my nodes over to TrueNAS Scale here soon anyway, it’s been awesome!

Thanks for all the insights!

Alexey · June 13, 2023, 4:00am

By the way, why not a usual docker container, why VM? And also - resource consuming Windows VM.

AnoStor · June 13, 2023, 4:19am

I had the resources to spare but primarily because I wanted to see what kind of insights into how STORJ functions I could glean and I knew how to do that best with Windows at the time.

Plus some of the data analytics powershell scripts I’d seen, being able to use remote desktop (something I have to use daily for other work) made for easy access, etc.

TLDR; it was the more familiar option at the time for that platform.

Stob · June 13, 2023, 10:08am

This isn’t true. There are multiple threads discussing the filewalker continuously reading the disk on startup. On my Windows node it would read the disk at 100% utilisation for a week on every node update/restart.

Alexey · June 14, 2023, 1:28am

What I mean - there is no linear reading or writing, they are random accesses to relatively small files, so linear speed of the disk subsystem almost does not matter.