Sudden Disqualifications, while operating 24/7 (with minimal down times)

So:

1)If I parse all pieces from logs and find upload action for one of them, this will be enough?

  1. If I remove all satellite files, can I start store files on this satellite like a new node?

  2. Best practice store all logs from all nodes forever?

We need to see a history - uploaded, downloaded, something happened, download failed.
I want to know - what the something happened is. Is it hardware issue or software issue or wrong actions.

no

No, but it can be helpful in case of issues. If you decide to store logs - then you can install the logrotate if it’s not installed yet and configure it to remove old logs after a while, see

Hi, thank you for the further instructions.

This the output of the command that you sent me;

35434197:2021-10-08T13:33:26.015+0300    INFO    piecestore      download started{"PieceID": "FYMAHSRJEJTN54U4SYK6MLUDTC6V5KNVZ443CPQN2AL2U6P7264A", "Satellite ID":"12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_REPAIR"}
35434199:2021-10-08T13:33:26.147+0300    ERROR   piecestore      download failed{"PieceID": "FYMAHSRJEJTN54U4SYK6MLUDTC6V5KNVZ443CPQN2AL2U6P7264A", "Satellite ID":"12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_REPAIR", "error": "file does not exist","errorVerbose": "file does not exist\n\tstorj.io/common/rpc/rpcstatus.Wrap:73\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).Download:534\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func2:228\n\tstorj.io/drpc/drpcmux.(*Mux).HandleRPC:33\n\tstorj.io/common/rpc/rpctracing.(*Handler).HandleRPC:58\n\tstorj.io/drpc/drpcserver.(*Server).handleRPC:104\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:60\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:97\n\tstorj.io/drpc/drpcctx.(*Tracker).track:52"}

So, you do not have a history too. Have you reinstalled storagenode recently?
Or perhaps deleted logs?
What the oldest date in your log?

cat -First 2 "C:\Program Files\Storj\Storage Node\storagenode.log"

Thank you for you reply,

If i decide recreate nod on this hdd, what is the better solution for start?
How i understand first of all need make graceful exit and then create new identify?
And if i start graceful exit on one node this is doesn’t affect other nodes?

P.S. I checked all 513 pieces and no one record doesn’t include “upload started”

Best regards.

Yes, if you want to recreate - it’s better to perform a Graceful Exit, but the new node would start from scratch and it will take at least 9 months to establish the same state.
However, you cannot exit from the satellites disqualified your node.
The graceful exit will not affect other nodes, unless your router is unable to handle the load. In the last case the graceful exit may fail too and your node will be disqualified on that satellite.

Your current node would be paid by remained satellites until graceful exit or disqualification, so it might be better to leave the node running instead.

Actually the over 9,6 gb log file reaches to early may this year, as can be seen from the results;

2021-05-10T08:38:43.173+0300    INFO    Configuration loaded    {"Location": "C:\\Program Files\\Storj\\Storage Node\\config.yaml"}
2021-05-10T08:38:43.245+0300    INFO    Operator email  {"Address": "******@*********.com"}

Sadly because of the size of the log file, I tend to delete the log file every 6 months. Should propably try to parse it into smaller, more organized files instead for cases like this.

What would be the best option for Windows for, say, slicing the log file on weekly basis?

Oh, and let me know if there is any other things I could provide you with to solve what this is about.

Yes, I know. I archive logs when they reach 2GB and delete source logs.
I wrote the PowerShell script for that: GitHub - AlexeyALeonov/storagenode_win_log_rotate: Windows GUI storagenode log rotation script (doesn't compatible with logrotate)
If logs greater than 2GB, it stops the service/container, archive logs, remove sources and then starts services/containers back.

You can also use a logrotate for Windows.

Thank you, that helps a lot to keep the logs more organized :+1:

Have you reinstalled storagenode recently?

Since you mentioned it, around may I needed to reformat my StorJ drive as the NTFS filesystem kept constantly corrupting under the amount of files that StorJ stores on it and I did find one bad sector also in the process. I transferred all the StorJ files (6 Tb back then) to another drive with GoodSync, but did as you recommended on steps 3-5 recommended here. Since I didn’t have a single drive big enough for all the data, I needed to divide the databases between 3 smaller HDD’s, which is why GoodSync sped up the process. Sadly I couldn’t run the storagenode during the reformat period and was only able to start it once again after way over 24 hours of downtime. Audits were dangerously low around 80%, as was the Online percentage, as I had tried out the most performant way to move the files before and earlier that week re-installed Windows 10 completely, as I bought the Windows 10 Enterprise edition for it’s security features compared to Pro version and official ReFS support.

I reformatted to ReFS as at least couple of fellows here found that it helped the HDD to process the millions of small files better and didn’t bog down like NTFS had done for him also.

Other than that I have only updated the storagenode with the .msi executable file, when there was a new update.

I’m also currently trying to parse more information about the specific dates that the Audits crashed and a day before that - If I succeed, I’ll send them as .log files and put a link here.

Edit: I tried your script but PowerShell won’t let it run on my system. Do you know where to find the latest safe build of Logrotate for windows? Seems like there is a bunch of different locations.

Audit score never fall because of downtime. It can be only for lost pieces.

You need to allow the less strict policy to at least RemoteSigned in the PowerShell with Administrators rights:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned

If it would not allow to execute anyway, then you can set it to Unrestricted:

Set-ExecutionPolicy -ExecutionPolicy Unrestricted

Logrotate for Windows you can take from there LogRotateWin / Wiki / LogRotate

Is that required / helpful for Linux (RPI4), too?

It’s not required, but helpful. Fortunate the logrotate is a native tool in Linux - install it and place configuration

I did not rotate it daily though

1 Like

One more question.
Data from disqualificated node need remove manually?
Thanks.

It will not be removed automatically.