What is Suspension & Audit?

lalaland · October 13, 2020, 3:22pm

What happened?

BrightSilence · October 13, 2020, 3:55pm

That satellite was turned off. Nothing wrong on your end.

It’ll be removed from the trust list which will prevent that from showing up, but before that can happen all nodes need to be updated to v1.14.x, which hasn’t finished just yet.

As for what it means, audit displays your audit score, if your node fails to respond correctly to an audit, that score will drop. If it drops below 60, your node will get disqualified. This happens if the node returns a known error, like file not found or fails to respond within 5 minutes several times in a row.

The suspension score drops if your node fails with an unknown error. Your node will be suspended when that score drops below 60%. At that point it won’t receive new data and can lose data if repair is triggered. But you can recover the node if you fix the underlying problem and let the score recover.

However, the 0% you’re seeing is simply because that satellite is no longer online and can’t report the actual scores to your node. You can safely ignore it.

lalaland · October 13, 2020, 3:59pm

Thank you so much

donald.m.motsinger · October 13, 2020, 4:03pm

Is it only me where it shows 100% for stefan-benten? My nodes are on 1.14.7 and 1.13.3.

BrightSilence · October 13, 2020, 4:04pm

No, I think only new nodes see the 0%. You just see the last reported status if you have dealt with stefan-benten in the past.

SGC · October 13, 2020, 4:53pm

same here… i think bright is spot on
my 2 week old node has 0% to benten, because it never saw the satellite and thus never got vetted
the old node got vetted, so it remembers it was at 100% when the satellite vanished.

madbitz · December 9, 2020, 10:47pm

Just wanted to know if this will result in a suspension as my ‘online’ is not 100% on any satellites?

BrightSilence · December 10, 2020, 1:51pm

Not immediately. This system is still fairly new and suspension has been disabled for now. When it will be enabled the last reports said they would initially allow for 288 hours of downtime. But you should expect this requirement to be much higher in the future. As a guideline I would recommend to ensure at the least 95% uptime for now and aim for 99.5% or higher. Final suspension limits haven’t been decided on, but they’ll likely be higher than where you are at right now. Currently the scores display a warning when you drop below 95%.

madbitz · December 11, 2020, 7:23pm

I don’t know what is going on, My node is always turning off and on all day long. need to get to the bottom of this at some point.

Alexey · December 12, 2020, 3:43pm

It should show the reason in its logs: https://documentation.storj.io/resources/faq/check-logs

madbitz · December 14, 2020, 10:35am

HI Alexey
2020-12-14T11:02:43.456Z WARN console:service unable to get Satellite URL {“Satellite ID”: “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW”, “error”: “storage node dashboard service error: trust: satellite “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW” is untrusted”, “errorVerbose”: “storage node dashboard service error: trust: satellite “118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW” is untrusted\n\tstorj.io/storj/storagenode/trust.(*Pool).getInfo:228\n\tstorj.io/storj/storagenode/trust.(*Pool).GetNodeURL:167\n\tstorj.io/storj/storagenode/console.(*Service).GetDashboardData:168\n\tstorj.io/storj/storagenode/console/consoleapi.(*StorageNode).StorageNode:44\n\tnet/http.HandlerFunc.ServeHTTP:2042\n\tgithub.com/gorilla/mux.(*Router).ServeHTTP:210\n\tnet/http.serverHandler.ServeHTTP:2843\n\tnet/http.(*conn).serve:1925”}

baker · December 14, 2020, 4:07pm

That warning shouldn’t affect you node. It only shows up when you access the web dashboard, which is trying to reference an old satellite that has been shut down. Is this the only error you see in your logs? Is this a Windows or a Linux node?

madbitz · December 15, 2020, 12:17am

I am using win10. no other errors i can see. sat watching all the time until it dropped out, and all i got showing as error was the above issue. So doesn’t make any sense, why should it keep dropping out. my connection is fine, been using internet from the same machine while the storj node is showing its down on uptimerobot. Can i point out this is not a recent event, this has been for a couple months at least.

Here is just some of my latest logs showing downtime and how often it happens.

deathlessdd · December 15, 2020, 12:36am

Are you using a 3rd party virus/firewall program?

madbitz · December 15, 2020, 8:22am

No, only the default windows antivirus as its on a dedicated virtual machine. I have not installed anything other than the general updates from windows. I have been running this node for over 10 months now with very little problems, until this started.

baker · December 15, 2020, 1:26pm

How full is the disk? Do you know if it is SMR? Are you seeing high memory usage by the VM?

madbitz · December 15, 2020, 5:38pm

Hi Baker, the disk is around 50% full. Also they are unfortunately SMR drives. I have also seen an increase over the last few months come to think of it on the VM. Any reason?

baker · December 15, 2020, 5:46pm

I ask because SMR drives appear to have worse IOPS performance as they approach full capacity. Although I wouldn’t expect a big difference at 50%. An increase in RAM usage could indicate that the drive is unable to keep up with transfers. The node then has to cache the data in RAM if the system can’t write to disk quickly enough. If this continues and the node keeps caching in RAM, it will eventually hit the limit and the process is typically killed with an out of memory (OOM) error. This wouldn’t show up in your node logs, but should show up in your system logs.

deathlessdd · December 15, 2020, 6:31pm

Im pretty sure its not directly related to SMR drive because I have SMR drives for my nodes and they are full and have never seen any issues where it shows offline for any reason. Each node has its own SMR none are in raid.

What kinda VM are you running how is the drives connected to the system and are you using any type of raid?

madbitz · December 15, 2020, 7:16pm

Think i have found the issue, or rather you guys did. I am more full up than i previously thought. Just for info, i am in a storage spaces(pool) Using mirror not raid0. So what appears to be the case is that i am actually almost full with a drive error on one of my physical drives within the storage pool. I am looking into how i can make changes right now. Things like this always happen when you don’t have a lot of time don’t they. lol.
Here is my storage setup, 2 identical 5tb disks, storage pool size of 9.09 i setup storj to only use 4tb just in case of a failed drive, but i am currently at 7.27. This is fine, but also i was relying on hdd sentinel which reported zero issues with my drives. But just looked back into storage pools and it appears i have 1 drive issue. I need to make some physical changes to my setup. As this could be the issue. Also, to answer @deathlessdd, i am using hyper-v by the way.