Reasoning for orders expiring after 48hrs?

Toyoo · November 13, 2020, 8:04am

It’s one step to truthfully represent the earnings.

Toyoo · November 13, 2020, 8:05am

So far the only communication channel between Storj and SNOs that works reliably is “my node’s not starting up”. Besides, if SNO isn’t get paid, why should they have the node up?

Pac · November 13, 2020, 8:14am

Also, very few SNOs have the skills or patience/time to set up advanved probes and monitoring systems to detect several categories of issues.
The simplest and most important thing to monitor is “is my node still online?” done with free tools like uptimerobot for instance.

When a node shuts down by itself because something’s wrong, the SNO is usually immediately aware of it.

Not saying it’s ideal, just saying that until an advanced system of notifications is builtin within nodes (or StorjLabs), shutting down a node might be the most efficient way to notify SNOs.

kevink · November 13, 2020, 8:23am

I’d say a simple monitoring needs to be implemented by storjlabs and it could just be something like sending all error logs to the satellite, which sends an email to the operator. additionally that error message is shown on the dashboard.

jammerdan · November 13, 2020, 8:33am

In any way it has to be on the dashboard.
Would be sufficient to have the information of last try to sent order failed or succeeded?
Or a timer, saying: “Couldn’t send orders for xx hrs.”

SGC · November 13, 2020, 9:48am

i think it did post errors in the logs… so not like it wasn’t informing SNO’s just that most wasn’t paying attention… but that’s to be expected and part of the problem to be solved.

and really solving the problem shouldn’t just solve this same error and/or peoples inability to notice it, but it should solve future errors and/or peoples inability to notice them.

i made this, which i believe would have prevented most of tis whole debacle

greener · November 13, 2020, 11:01am

Likely repeating other messages here but the following should be technically possible and trivial to implement:

When storage node encounters a corrupted order file it skips though that file and continues to upload newer ones, writes an error message to logs.
- this would be an improvement to current behaviour as it will prevent valid orders to be expired
- still relies on SNO checking logs which is not great but is a different problem to solve
- current behaviour where one corrupt file stops further order processing is a bug really
Storage node deletes any expired order files including corrupted ones, writes warning to logs
- improvement to current behaviour as it will prevent corrupt files stacking up and taking space
- will not require SNO to manually clear these files

Visibility of issues with orders is not required for implementing the above. Am I missing something?

As others suggested visibility could potentially be solved with just adding a widget with log error/warning counters on sno dashboard, could be extended with a detailed page with log entries and maybe acknowledge button so that sno can clear dashboard status once the issue is seen/solved etc. But visibility would be less relevant if orders processing is implemented as above.

Also does anyone know why this files get corrupted in the first place? It seems to be a common problem and happens more frequently then we’d normally expect provided nodes work fine otherwise, based on sno reporting these issues.

Toyoo · November 13, 2020, 7:16pm

SNOs are currently taught to ignore many errors in the log, so I don’t think this argument currently works. I wish it was different…

deathlessdd · November 13, 2020, 7:24pm

I don’t think SNOs were taught to ignore errors, There were certain errors you could ignore such as canceled errors, But you shouldn’t think its ok to just ignore errors in general I don’t believe that to be the case. If you saw order errors or audit errors or file missing errors you should know not to ignore these types of errors.
Any errors should be looked at and it has changed though all the updates.

kevink · November 13, 2020, 7:28pm

Those of us that are around for more than a year have seen a lot of errors that were completely irrelevant to the functioning of the nodes. Therefore it’s a valid argument that many SNOs have been taught to ignore errors.
And since you’ve been around since over a year now too, you should still remember all that. Even the upload canceled messages were errors once… had to tell every new SNO to ignore those as they are normal. So we had to ignore an error for something that occurs during normal operation and is perfectly normal…

It got better though, I don’t think there are any more errors that need to be ignored. So newer SNOs are certainly not used to ignoring error messages.

jammerdan · November 13, 2020, 7:30pm

I am not sure if the node doesn’t get paid at all or only for the occupied space. But even without paying, the node can perform audits while shutting it down comes with potential penalty of reduced uptime and could even lead to disqualification. This does not make sense.

deathlessdd · November 13, 2020, 7:33pm

Its partly true if you were here over a year but I still always monitored my nodes even though these errors weren’t a big deal also im sure anyone who has been here has always checked there nodes once in a while to make sure it’s all good in the log files. But all these new SNOs didn’t have the luxury to experience all the errors, So any new SNO wouldn’t ignore it less they just never learned to check. So now these are legit errors.

I guess the better way would have some kinda alert on the dashboard since that is where most SNOs go to check on there nodes.

SGC · November 13, 2020, 9:17pm

i linked a feature suggestion vote to just such a thing a bit further up.

deathlessdd · November 13, 2020, 9:19pm

Didnt see it I voted.

SGC · November 13, 2020, 9:22pm

no matter if it doesn’t success ill just call for a recount… lol vote fraud is everywhere you know…

KillahGoose · November 14, 2020, 2:16am

I agree. I had no idea that my node had more than 2000+ unsent orders. And upon that my node never had any crashes or shutdowns, egress/ingress/audits all ok. I even check the logs once a day, but its hard to find something, if you don’t know what you’re looking for. I only came to know about the ‘listing orders’ error after @nerdatwork told me about it. Maybe, we have something like a ‘Glance’ notification, where if an error occurs on my node and email is sent for only the first error, all remaining errors/notifications of the same type are muted. Or if emails are too annoying, something on the dashboard maybe. Even if we can’t fix it at the moment, it would be nice to know there is an error on the dashboard.

Side note: I’ve been checking my unsent order folder and its down to 10 to 15 files with the new update (v1.16.1) for now. Thanks for the fix