Return of used data

Further, this is the report from runs of the chkdsk

Stage 1: Examining basic file system structure ...
  128893184 file records processed.
File verification completed.
  151879 large file records processed.
  0 bad file records processed.

Stage 2: Examining file name linkage ...
  4 reparse records processed.
  128911258 index entries processed.
Index verification completed.
  0 unindexed files scanned.
  0 unindexed files recovered to lost and found.
  4 reparse records processed.

Stage 3: Examining security descriptors ...
Security descriptor verification completed.
  9037 data files processed.

Windows has scanned the file system and found no problems.
No further action is required.

  22890989 MB total disk space.
  19346498 MB in 128732185 files.
  40976888 KB in 9039 indexes.
         0 KB in bad sectors.
 129340999 KB in use by the system.
     65536 KB occupied by the log file.
   3378165 MB available on disk.

      8192 bytes in each allocation unit.
2930046719 total allocation units on disk.
 432405176 allocation units available on disk.

I see you’re using Windows…
Use Notepad++, it will index a .log file rapidly, so that it can simply page through huge log files. Has fast search/replace, .yaml and other progmatic format visuals, etc., etc., etc.

I am using that. It has limitations when the log file is that big as well. Either way, the point is moot if you read the current status.

It should have Unrecoverable error then, something should crash it.
Please search for it:

sls error "$env:ProgramFiles\Storj\Storage Node\storagenode.log" | sls "fatal|unrecoverable" | select -last 10

Your dashboard wouldn’t have correct values, until all “used-space-filewalker” would successfully finish the scan for all trusted satellites and would update the database without any errors.
If you do not have FATAL or Unrecoverable errors, then dashboard should be available.
If you have only filewalkers errors so far, then you need to disable the lazy mode,

save the config and restart the node.
The progress you may track on the debug port with method /mon/ps or by a Resources Monitor checking which folder in blobs is currently processing.

That command is resultant in the following

C:\Program Files\Storj\Storage Node\storagenode.log:392471:2024-07-16T20:05:47-04:00    FATAL   Unrecoverable error
{"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose":
"piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenod
e/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*
Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

I do not have that entry in my config file.

Bingo!
You should go there:

The solution is to optimize your disk subsystem (check & fix, defragmentation, move DB to a system drive/SSD, add an SSD as a storage tier with a PowerShell…). Or increase a timeout for that exact check (and accepting that the node may be disqualified, because of the slowness/hardware issues of this disk, especially for the readability checks…).

then you may add it, save the config and restart the node either from the Services applet or from the elevated PowerShell:

Restart-Service storagenode

P.S. The writeability check only discover that your node would lose most of uploads requests, because it’s too slow to save pieces… However, if you would be forced to increase the timeout for this check above a 5m0s just indicates that you have much more great issue with your hardware, than just crashes… You need to check this disk (S.M.A.R.T. in particular), it could be dying.

That just reset my dashboard stats.

How would I know if I would be forced to increase the timeout?
I do not have any issue with the hardware.

I am having issues with STORJ, on top of that, issues with getting help for the STORJ issues, with a small number of exceptions.

Uhm, you already have these errors. Seems like a good reason to me.

You may or you may not. Given the time-out errors you are having, your drive is either (1) too slow to begin with, (2) has hardware issues or (3) is overused. Are each of your nodes running on different hardware and different drives?

First and foremost, I think everyone wants you to have a nice experience with STORJ. On the other harnd, people have been helpful while you seem to dismiss the possibility that any of your drives are faulty or too slow. How will be ever get to a solution then? (a question we all now the answer to). You have to do something about the time-out errors or this node will stay problematic, potential solutions have been provided more than once already.

1 Like

Of course. Because the used-space-filewalkers didn’t update the databases with the current usage.
When it’s interrupted, it will not continue (all past progress will be lost), it will start from the scratch.

So, please, try to fix the underlaying issues first (to do not allow the node to crash).

Ok.

I do not. (1) then the issue would not have appeared as it did (2) already addressed (3) what does that mean? All on their own separate hardware.

Certainly does not appear that way, but I would hope that is the case. I am not dismissing the possibility - I have eliminated the possibility.

Please, what are they?

Of course to what?

How do I prevent it from being interrupted?

I don’t know what the underlying issues are (I do not know why it seems to be crashing - if that is what it is doing).

If you have an

this is result the node to crash (to try to save it from a disqualification, if your hardware is started to fail. The node doesn’t know, is it an intermediate issue or permanent, so it will stop to allow the Operator to decide, what they would do next).
So, you need to fix the disk or the configuration to do not fail a writeability checks.

  1. Try to optimize the filesystem
  1. If this wouldn’t help and you still have crashes because of a failed writeability/readability checks, then you may increase the related timeout on 30s, save the config and restart the node.