1 1/2 Years old, reliable 4TiB StorageNode with 99,9% uptime was unjustifiably suspended from eu1 after short NAS reboot

Hello there,

like I wrote it in the topic, my storagenode was suspended from eu1 for no reason, the suspension rate fell below 60% and so I have to fear, that my long gained storage will be transferred off my node.

I demand to an explanation why this happened now and why in 100 times before nothing happened after a NAS reboot.

Sorry, but with this attitude I don’t believe you deserve anything. If your node has stopped providing service, it’s natural that the node will be suspended. It’s your setup, you need to debug it. All we can do is to help you help yourself.

Please provide logs from your node.

Yeah, I’m sorry for being so rude, but I was very surprised, that this happened to me, because, like I said it before, there was never such a kind of problem and my other 3 Nodes are healthy.

I will provide logs, give me a moment please.

1 Like

looks like after sudden restart or something you file system have beed damaged

  1. Error: piecestore monitor: error verifying writability of storage directory: open config/storage/write-test677643309: read-only file system

As you can see there are lots of IO errors. Your NAS couldn’t read your file system and the node detected that during the writeability check, shutting itself down. Perhaps the filesystem was unmounted during shutdown before the node shut down? I tend to recommend stopping nodes manually before a reboot.

I’m not gonna reiterate the point about your wording, but it turns out your system was indeed having trouble responding to requests. This wasn’t unjustified. The upside though, the suspension score is volatile, it updates fast and your only just under the threshold. Your node should recover very soon and resume normal operation as long as the underlying issue has been resolved or was of a temporary nature.

2 Likes

the log is very clear about.

“ERROR piecestore download failed” and “pieces error: filestore error: unable to open”

The node cannot access the file, most likely for a permission issue or hdd broken.

I can see your point, but in itself I don’t understand why this is a problem now of all times.

Even after a sudden power outage, I didn’t have that kind of problem.

My nodes are running in Debian Linux VMs and Docker inside and mount the volume via iSCSI from the NAS.

The thing is: I had done a crazy experiment with my then second node and established an iSCSI connection to a remote VPS via VPN and it didn’t always run stable either, but never had those hard suspension errors because a script monitored the connection and shut down the Docker instance after a minute at the latest.

Hello @node_operator0815,
Welcome to the forum!

Please stop and remove the storagenode container, fix the underlaying issue with the filesystem and try to run storagenode back.
As soon, as it would start to pass audits, your suspension score should recover pretty quick.

As it looks like, the node has already recovered from the short outtage. I would like to apologize me for beeing so upset. Storj is a very nice project!

7 Likes