Disk usage discrepancy?

Need to wait until gc-filewalker will successfully finish for each satellite.

Sorry for the late reply. I did not do that much, as it was not bothering me to much, but it peaked my interest now.

And I do know my xfs raid 10 setup is not at 100%. Because the command du --apparent-size --si -s storage/blobs is already running for 1,5 half day, and still not finished. :frowning:

I am running a Docker setup with the filewalker disabled at startup. As that takes also 2,5 days in total. I guess the du command will take the same amount of time.

But reading and writing files is at normal speed. So I guess that my raid setup has some issues with processing metadata. I will try fix this with a new server end of this year.

Also I use the script ReneSmeekes-storj_success_rate/successrate.sh at 7c3f6cb12574c49a294204024fc07cb241bf96ff · ifraixedes/ReneSmeekes-storj_success_rate · GitHub which is giving output that all is good. My download en upload success rate is about 98%. So that looks good.

But what I do not understand is how the ‘Average disk space used this month’ stays behind with the actual used storage on the right side.

From what I have red is that the satellites calculate the average disk space used. And I can image that this can cause a difference when I have 4k sectors, and 3k files are written. The satellites just sum up all the bytes, where the disk is summing up full sectors * 4kb. Which can be more.

Also I do expect that trash storage is also regarded in the average calculation.

From the logs all my sqlite databases are ok. No malformed error messages.

So is there a way to figure out the difference in number between average disk usage and total disk usage?
If I missed some other topics about this, sorry, post the links and I will follow that. And thanks for the feedback.

I believe this has nothing to do with calculations you reffer too. Is just the wrong setup for bloom filters, and your node dosen’t delete all the pieces that satellites tells it too. But this is already worked on and starts to correct itself.
Just wait. And I don’t know if you need to enable the Filewalker for this to help with the cleanup and correct the discrepancies.
I still don’t understand if it is needed or not. From practice it looks to me useless, but … I don’t know. In a month I would tell you more. I have nodes with FW on and FW off from a year now. If they all get average used space to match the real used space, than I’m pretty much convinced that the FW is useless.
And if you decide to enable the FW, you can check Tuning the filewalker thread, last posts, for my testing results. I have more coming.

1 Like

I had it off on some nodes which resulted in wrong disk space displayed and reported to satellites. I have one node that insists on having 1TB trash while du reports much less:

I don’t understand why this is happening. You run the FW once at the begining, lets say in the first month after un update, then the node counts anyways all inns and outs. So, why it needs FW regularly? It means it’s a bug somewhere that makes the node to count wrong.

How do you think it counts the ins and outs ?

By updating the db, I guess? Because it dosen’t run the FW daily to display the numbers correctly. And yet, after 6 month of no FW, the dashboard shows correct numbers. So, it counts…

I have made a big mistake! this drive is in fact exFAT - i have no idea how that happend as the other drives is not exFAT - i guess i need to migrate all files over to a temp drive, then format as ntfs and then move the drives back… :confused:

Thanks alot Alexey… its entirely my own fault.

1 Like

Hi @Alexey I added this as you mention and I can´t see evidences that file walker is running for over 5 days.
Anything else I should add to config.yaml?
image

You need to search for "gc-filewalker" and "started|finished" in your logs to see that the filewalker is successfully finished scans for each satellite.
If you still see an error for gc-filewalker, please show it.

Can you please paste the powershell commands I need to run to obtain those outputs?

There is already a fix on the way. Please search the forum before making duplicate posts.

1 Like

to capture errors:

sls "gc-filewalker" "C:\Program Files\Storj\Storage Node\storagenode.log" | sls "failed|error"

to check filewalkers processing (it should be performed for all trusted satellites):

sls "gc-filewalker" "C:\Program Files\Storj\Storage Node\storagenode.log" | sls "started|finished"
1 Like

Both commands have no output whatsoever:
image

Did you switch a log level from info? Or did you delete the log file?

No. Log Level is “info”.
I do delete the log file every Sunday at 3pm GMT, which means the commands I ran are searching throughout 1 week of log registries.

Perhaps you cannot see the work of the GC filewalker then - it’s running once or twice per week.

Side question but as implemented today the garbage collection process does not update the total used space, and only checks if pieces match or do not mach the received bloom filter, sending some pieces to trash. To my understanding it “walks” over all the files in the node anyways.

Is this down to a design choice or an implementation impossibility?

Yeah, but are your graphs normative in order to judge whether there is a glitch or a bug? :wink:

Point I’m making: many of us apparently haven’t seen this happening before, than it’s quite presumptuous in my opinion to judge it as a non-problem because you don’t have confirmed the issue or to categorize it as an other problem than it likely is. At least some explanation would be helpful in that case, in order to not to make feel some SNO’s alienated.

So again:

  • Why didn’t us1 satellite not report for over 24 hours? Is this normal now and then, but we didn’t see it before? Or was there a special reason (restart of the satellite or something?).
  • According to this source, the tallies are being done every hour. So why this glitch (or whatever we should call it)?

There are several filewalkers: