Return of used data

support26 · July 16, 2024, 9:16pm

Further, this is the report from runs of the chkdsk

Stage 1: Examining basic file system structure ...
  128893184 file records processed.
File verification completed.
  151879 large file records processed.
  0 bad file records processed.

Stage 2: Examining file name linkage ...
  4 reparse records processed.
  128911258 index entries processed.
Index verification completed.
  0 unindexed files scanned.
  0 unindexed files recovered to lost and found.
  4 reparse records processed.

Stage 3: Examining security descriptors ...
Security descriptor verification completed.
  9037 data files processed.

Windows has scanned the file system and found no problems.
No further action is required.

  22890989 MB total disk space.
  19346498 MB in 128732185 files.
  40976888 KB in 9039 indexes.
         0 KB in bad sectors.
 129340999 KB in use by the system.
     65536 KB occupied by the log file.
   3378165 MB available on disk.

      8192 bytes in each allocation unit.
2930046719 total allocation units on disk.
 432405176 allocation units available on disk.

Julio · July 16, 2024, 9:26pm

I see you’re using Windows…
Use Notepad++, it will index a .log file rapidly, so that it can simply page through huge log files. Has fast search/replace, .yaml and other progmatic format visuals, etc., etc., etc.

support26 · July 16, 2024, 10:00pm

I am using that. It has limitations when the log file is that big as well. Either way, the point is moot if you read the current status.

Alexey · July 17, 2024, 5:10am

It should have Unrecoverable error then, something should crash it.
Please search for it:

sls error "$env:ProgramFiles\Storj\Storage Node\storagenode.log" | sls "fatal|unrecoverable" | select -last 10

Your dashboard wouldn’t have correct values, until all “used-space-filewalker” would successfully finish the scan for all trusted satellites and would update the database without any errors.
If you do not have FATAL or Unrecoverable errors, then dashboard should be available.
If you have only filewalkers errors so far, then you need to disable the lazy mode,

save the config and restart the node.
The progress you may track on the debug port with method /mon/ps or by a Resources Monitor checking which folder in blobs is currently processing.

support26 · July 17, 2024, 12:37pm

That command is resultant in the following

C:\Program Files\Storj\Storage Node\storagenode.log:392471:2024-07-16T20:05:47-04:00    FATAL   Unrecoverable error
{"error": "piecestore monitor: timed out after 1m0s while verifying writability of storage directory", "errorVerbose":
"piecestore monitor: timed out after 1m0s while verifying writability of storage directory\n\tstorj.io/storj/storagenod
e/monitor.(*Service).Run.func2.1:175\n\tstorj.io/common/sync2.(*Cycle).Run:160\n\tstorj.io/storj/storagenode/monitor.(*
Service).Run.func2:164\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:78"}

I do not have that entry in my config file.

Alexey · July 18, 2024, 8:30am

Bingo!
You should go there:

The solution is to optimize your disk subsystem (check & fix, defragmentation, move DB to a system drive/SSD, add an SSD as a storage tier with a PowerShell…). Or increase a timeout for that exact check (and accepting that the node may be disqualified, because of the slowness/hardware issues of this disk, especially for the readability checks…).

then you may add it, save the config and restart the node either from the Services applet or from the elevated PowerShell:

Restart-Service storagenode

P.S. The writeability check only discover that your node would lose most of uploads requests, because it’s too slow to save pieces… However, if you would be forced to increase the timeout for this check above a 5m0s just indicates that you have much more great issue with your hardware, than just crashes… You need to check this disk (S.M.A.R.T. in particular), it could be dying.

support26 · July 18, 2024, 9:52pm

That just reset my dashboard stats.

support26 · July 18, 2024, 9:54pm

How would I know if I would be forced to increase the timeout?
I do not have any issue with the hardware.

I am having issues with STORJ, on top of that, issues with getting help for the STORJ issues, with a small number of exceptions.

Stez · July 19, 2024, 5:40am

Uhm, you already have these errors. Seems like a good reason to me.

You may or you may not. Given the time-out errors you are having, your drive is either (1) too slow to begin with, (2) has hardware issues or (3) is overused. Are each of your nodes running on different hardware and different drives?

First and foremost, I think everyone wants you to have a nice experience with STORJ. On the other harnd, people have been helpful while you seem to dismiss the possibility that any of your drives are faulty or too slow. How will be ever get to a solution then? (a question we all now the answer to). You have to do something about the time-out errors or this node will stay problematic, potential solutions have been provided more than once already.

Alexey · July 19, 2024, 7:50am

Of course. Because the used-space-filewalkers didn’t update the databases with the current usage.
When it’s interrupted, it will not continue (all past progress will be lost), it will start from the scratch.

So, please, try to fix the underlaying issues first (to do not allow the node to crash).

support26 · July 19, 2024, 1:04pm

Ok.

I do not. (1) then the issue would not have appeared as it did (2) already addressed (3) what does that mean? All on their own separate hardware.

Certainly does not appear that way, but I would hope that is the case. I am not dismissing the possibility - I have eliminated the possibility.

Please, what are they?

support26 · July 19, 2024, 1:08pm

Of course to what?

How do I prevent it from being interrupted?

I don’t know what the underlying issues are (I do not know why it seems to be crashing - if that is what it is doing).

Alexey · July 20, 2024, 6:41am

If you have an

this is result the node to crash (to try to save it from a disqualification, if your hardware is started to fail. The node doesn’t know, is it an intermediate issue or permanent, so it will stop to allow the Operator to decide, what they would do next).
So, you need to fix the disk or the configuration to do not fail a writeability checks.

Try to optimize the filesystem

Disk usage discrepancy?

Optimizing the filesystem includes:

check and fix errors on the disk

disable 8dot3 for NTFS: NTFS Disable 8dot3name

disable atime (for NTFS: [Solved] Win10 20GB Ram Usage - #17 by arrogantrabbit), for Linux - please, use the search here or in the internet

do a defragmentation if NTFS and enable the automatic defragmentation, if you disabled it (it’s enabled by default)

Disable indexing (Windows only)

Move databases to SSD (for Windows: Move databases on Windows storagenode - #2 by Alexey, for docker: How to move DB’s to SSD on Docker)

If you have a managed UPS, enable the write cache (for Windows - in the disk volume Policy, you need to select both checkboxes)

Add more RAM, if possible. Or add SSD cache before the disk subsystem (for Windows it’s possible too, but you need to use a tiered Storage).

If this wouldn’t help and you still have crashes because of a failed writeability/readability checks, then you may increase the related timeout on 30s, save the config and restart the node.

Fatal Error on my Node

Just note, there are two checkers and related timeouts:

There are other changes as well. I said that this only a feature which now has a timeout on dir verification. You may increase a timeout, but please be careful - do not put more than 5 minutes, otherwise you will risk to start to fail audits. The parameter is called PS C:\Users\user> & ‘C:\Program Files\Storj\Storage Node\storagenode.exe’ setup --help | sls verify --storage2.monitor.verify-dir-readable-interval duration how frequently to verify the location and readability of the stor…

So, if you have a read timeouts, you need to increase the readable timeout and readable interval (because they are equal 1m0s by default). If you have writeability timeout, you need to increase the writable timeout as well