Improve disqualification information

Looking more into one of my nodes disqualification I find that information about it can be improved.

In that nodes dashboard, there is not a single notification:

No notifications yet

And the disqualification message lacks useful information:

Your node has been disqualified on 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S. If you have any questions regarding this please check our Node Operators thread on Storj forum.

It shall be improved:

  • Disqualification should trigger a notification
  • The message (both notification and dashboard) should include the date and time of disqualification (it is already present in the API)
  • The message should include the reason for disqualification. The satellite logic must have this information, so the node should receive this information.

The satellite doesn’t log the reason of disqualification, it just place the timestamp in this field and that’s all, especially when most useful information is already in the logs of storagenode.
The storage in the distributed database is not free.
All other might be implemented I think.
I passed your request to the team.

1 Like

I would add that it should mention the human readable name of the satellite as well. Since that’s being used in other parts of the dashboard.

In the mean time if you want to see the name and time, the earnings calculator will show those as well. Earnings calculator (Update 2024-04-06: v13.2.1 - Now includes separate lines for trash and blobs under disk current - Detailed earnings info and health status of your node, including vetting progress) But as always, I am very open for good ideas in that tool to be “borrowed” by Storj Labs to use in the official dashboards. :wink:

3 Likes

I don’t know about the communication options between the satellite and the node.
What I mean is, that the internal logic of the satellite must know the reason for disqualification. Some parameter(s) do not meet the required criteria to remain qualified thus disqualification.
And maybe this can be translated into human language and transmitted to the node. It does not have to be stored on the satellite maybe.

Yes, it “knows” some kind. Auditors are a separate workers they just return the result - passed or not, not the reason. The reason is partially logged on their logs and in the logs of storagenode.
So, to transmit the reason, all workers should flood the distributed database of the satellite with these messages (they are multiple instances), then we need to decouple them and store in the database… too much efforts for nothing.
What will give you the reason similar to “piece is corrupted” without additional details? I believe nothing. You cannot revert or undo. So what’s the point? The main outcome - your node was not capable to store customers data safely.
Especially if you can use the tool to figure out - was they corrupted or not:

You mean it is too much effort to make the node display an information like “Disqualified for offline time”, “Disqualified for failing audits” or “Disqualified for piece retrieval errors” or something like that? Hard to believe and sad.

Yes. Because it’s not stored. You requests to store it and have some retention perhaps? Because otherwise it will fill the database with informative, but not useful information about disqualified nodes (which is not reversible anyway, so useless at max).

You still didn’t provide the value of these efforts. How it help to improve the customers experience?
You, as a SNO already know that your node is disqualified on that satellite. The reason you can figure out yourself, either from logs or with a provided tool above.
Why to add an addition load on satellites to get the same? But unlike for dozens who need it, it would require additional useless work for dozen thousands?

1 Like

I am not talking about customers. It would improve the SNOs experience.

This distinction can already be made by just looking at the scores. Which means the dashboard could also already derive it from just that though. At least the difference between online score and audit score causing the DQ. Considering how the audit workers operate, I do agree with @Alexey that further distinction may be difficult and expensive. Especially considering that it may not be a single type of audit failure. It may be that some files were corrupted, some were inaccessible and some were just not there. This would even complicate how it should be displayed on the node side.

2 Likes

I don’t know what level of distinction is possible with the data that the satellite already has to determine if a node shall be disqualified or not. Too much distinction is not required and certainly not an in-depth analysis by the satellite. Something like “Offline”, “Audit failure” and “Others” (where I don’t even know if there are other reasons), together with the date and human readable satellite name without having to make assumptions, call the API or check logs would be an improvement already.

The satellite doing a simple comparison: if the node has less than 97% audit score - please, disqualify it. The satellite will check only not disqualified nodes.
It doesn’t have a whole picture, the audit workers are separate processes and they update a distributed database, the satellite simple ignores all nodes with an audit score less than 0.97.

I do not see any improvements here, sorry (i’m SNO too). If the audit score is less than 0.97 we do not have any connections to it anymore. Fatality.
Actually it even do not check what’s audit score right now, it filters only not disqualified nodes. It doesn’t care of reasons and what it should do else. Either reliable or disqualified.

How can the satellite filter them if it does not have the information?

2 thoughts on that. First of all you are certainly not the average SNO. Of course you don’t mind to check forum threads, read code, check API data and the logs etc.
But I don’t think that is the same for the average SNO. For an average SNO it would be helpful, if he receives a notification when the node gets disqualified and finds information when it has happened and for what reason (Online/Audit/Other for example) in an easily readable and understandable way.

Second: I think this would be even more helpful for the SNOs and makes it easier to follow and understand if one day there are more independent satellites which could have different rules and thresholds than the Storj satellites.

They have. But not immediately. And they will not provide it to already DQ nodes (seems).

Yes, but I did setup by our documentation, otherwise it doesn’t help.
I do not have any special configuration options for that purpose. They are average - 1 Windows node, 2 Docker nodes (sorry, Windows Docker Desktop too…, I lost my Pi node because of died SD card and inability to recover it - need a physical access, and now it’s not possible in the near future).

Yes, they will got a notification if their email in the node setup is valid.

This is explained there:

And one of my nodes is disqualified from Europe-North-1 and Saltlake because of

So, I did not apply any trick (what is I think you thought) and I’m a usual regular SNO as anyone else. Otherwise it doesn’t makes any sense.
How would I able to help, if my nodes were “special”?! So, no, they are usual typical nodes. I missing my Pi node though… but I cannot ressurect it now.

No, I don’t mean a trick or special hardware. I mean your knowledge, that sets you apart from the average SNO.
You know how to solve most problems without additional information or you know how and where to find the solution. The majority of SNOs probably don’t know half of what you know.
And that’s why I mean to make it easy for SNOs and not hard. I don’t think average SNOs want to dig through logs, APIs or forum threads only to find information when and why the node was disqualified and on what satellite. And therefore my suggestion, so send a notification and to give clear and human readable information as to when, why and where (credits to @BrightSilence). Basically it is simple user-friendliness.

As mentioned before, the majority of this request can be done with info already available to the node. It knows based on scores what it was disqualified for (offline or audits). More detail might be nice, but this already helps. The database already stores when it happened locally. And it obviously knows the human readable names of the satellites. Doing those three things would only require minimal adjustments to the dashboard. No other component changes required.

2 Likes

Yes.
Yes.
But.
I still trying to put me on an average SNO shoes. So I doesn’t change anything unless that’s confirmed by other SNO (for honestly, I didn’t change anything in my nodes since the start back in 2018, where Storj V3 was introduced. Yes, I wiped my v2 nodes started in 2017…).

unfortunately it’s a requirement to be a SNO at the moment… Not like it was in v2, sorry. (where we lost customers data because they are didn’t take into the account that they are must renew a contract, which is was force expired after 90 days. We fixed that in the v3 though).

The notification is exist (if you ever open a dashboard…), and it will clearly state that your node is DQ on these satellites.
You may analyze your logs to get a reason:

I do not know, what is you need else…

I believe you try, but I don’t think you can always. You simply know too much.

Here is a very recent quote from a SNO:

Fine, but it does not have to/shouldn’t stay like that. Thus my suggestion to start to make things easier.

The DQ message on the dashboard is the message that I am suggesting to improve.
But with notification I also mean the message under ‘notifications’. There is no DQ message there for my DQ-ed node.

I’m sorry, but why?
Really. We have 20k nodes and they are used on 50% in a best case… Why we need to make it simpler? To have 40k nodes with 10% usage? Who would want to participate for less than a $1/mo?
As soon as it would change, I’m sure - our managers will take an attention.

1 Like

+1! As long as they do it slowly… Storj still has a lot of room to reduce SNO payouts. And only minimal node improvements need to be made: most dev time should be focused on things that help paying customers.

I kinda thought the project would peak around 20k nodes… but it keeps going up…

1 Like

Normally less complexity requires less support.
And with less complexity maybe errors are spotted earlier or are easier to investigate. Like this one:

This seems to be the issues that are getting fixed now. With easier tools, better monitoring we might have been able to fix them 1 year ago.

I am all for KISS. If Storj has issues with too many SNOs, other ways seem to be more efficient like pausing onboarding.