Reported trash size stuck at 7 TB alhough there is 0.8 TB trash occupied space on the disk

xsys · June 2, 2024, 2:53pm

Setup: Windows10, bare metal, Node Age 57 Months, v1.104.5
Dashboard reports

Used: 6.71 TB
Free: 12.24 GB
Trash: 7.28 TB
Overused: 0 B

But, in fact, folder e:\storj\storage\trash\ size on disk is 0.8 TB. The disk is 14TB size, there is no other data on the disk, solely for that single storj node, and no other node on the PC. Suspension and Audit is all fine 100%.
I wouldn’t care, but the node is not taking any more incoming traffic since it thinks the disk is full, although there is 6.4 TB free space on the disk. I firstly noticed it some two weeks ago, it did not change since then. Before there was about 12TB used and less than 1 TB trash.
A month ago I manually deleted leftovers from the decommed satellites. Now content of e:\storj\storage\trash\ are following 4 folders only: pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa qstuylguhrn2ozjv4h2c6xpxykd622gtgurhql2k7k75wqaaaaaa ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa v4weeab67sbgvnbwd5z7tweqsqqun7qox2agpbxy44mqqaaaaaaa
and file .trash-uses-day-dirs-indicator
All these folders contain another folders that go by names 2024-05-27 and later, nothing else, nothing older.
Troubleshooting done: checked log, no errors except lost race up/downloads, lazy filewalker completed successfully. I tried to set storage.allocated-disk-space: 16 TB to force it use 2 more TB from the free space, but the node dashboard still shows Total Space 14 TB. It apparently checks the physical disk size and does not allow me to set more, whatever number higher than 14 it does not follow.
I have been having storage2.piece-scan-on-startup: false active for some time, but week ago I removed it and no change.
So, the question is, how can I reset the amount of reported trash back to reality to make the node use the free space again?

xsys · June 2, 2024, 2:57pm

Alexey · June 2, 2024, 3:01pm

Hello @xsys,
Welcome back!

You need to allow to finish the used-space-filewalker to update state in the databases.
If you didn’t disable a startup scan (or if you do not know what’s am I talked about), it should update databases after a restart (it may take days, depending on your system).
If you disabled it - please, enable back or comment out, save the config and restart the node.

xsys · June 2, 2024, 3:05pm

Oh that’s awesome fast reply, thanks alot
Yes I removed the storage2.piece-scan-on-startup: false from my config almost week ago, is it possible it takes so long? Please what message I will see on the Log when it is completed?

Alexey · June 2, 2024, 3:10pm

depends, if you use a VM or a network remote storage - it’s quite possible to spent at least a week to just scan (and another one to delete…).

xsys · June 2, 2024, 3:17pm

ok thanks I will give it more time

xsys · July 8, 2024, 7:11am

Well, unfortunately it seems the piece scan never ends. It has been running for a month. Since the machine reboots once per month by Updates, it will never complete its job. The HDD is having hard time being 100% utilization constantly, it is the warmest HDD I have running, and by the sound and vibrations I can see it is doing seek, the heads skipping from one part of the plate to another all the time, it may significantly shorten the HDD life. All after all, I decided to shut down the node and repurpose the disk…
From other discussions here I would say the filewalker needs to be reworked, make it work different way, at least on windows, how it access all the millions of tiny files.
I know storj recommends Linux, but I’m fixed on Windows and I’m using what I already have.

Alexey · July 8, 2024, 9:08am

This sounds like an NTFS. If so, you need to check and fix errors on this disk and perform a defragmentation as soon as possible.
The next steps are:

NTFS Disable 8dot3name
[Solved] Win10 20GB Ram Usage - #17 by arrogantrabbit
Some reported that disabling an indexing service for the disk could improve a performance a little bit, too

If you would still have issues, then, well, you may improve it by adding an SSD as a storage tier using the PowerShell, or using a Primocache.

Do not forget to enable an automatic defragmentation for this disk, if you disabled it (it’s enabled by default).

xsys · July 8, 2024, 11:45am

Yes it is Windows 10 + NTFS + 32GB RAM, all the recommendations above applied even before the node installed - disabled 8.3, last access, indexing. Moreover, running defrag weekly (what takes days to complete and also highly stress the HDD), app and dtb and storagenode.log to SSD, disk is 12TB WD Gold, accessed exclusively by the storagenode.

Now I only have two nodes running on Synology DS923+ with SSD cache, they are filled yet under 1 TB so not big deal so far, will see how those perform when storing 10TB and more.
I will give Windows one more chance when I have a spare SSD to build tiered storage.

pdeline06 · July 8, 2024, 7:01pm

I already have more than ten such nodes that have stopped accepting new data due to problems with unaccounted garbage.
I understand that I need to run filewalker.

There are already more than 80 million files on each node. Filewalker is simply NOT POSSIBLE. I tried to run it on 1 node. For more than 10 days now, the hard drive has been abused and there is no end in sight.

This whole situation with the unfinished filewalker is already starting to make me extremely angry birds. 2 months of nothing but talk about this key function and nothing has changed.
MEANWHILE, THE NODES STOP RECEIVING DATA because it is not possible to bypass 80 million files.
Developers, where are you? @littleskunk, @heunland? What kind of space expansion are you talking about?

littleskunk · July 8, 2024, 7:11pm

You are doing great. Keep venting. Thats fine.

agente · July 8, 2024, 7:16pm

1.108 ver arriving with some kind of cache for used filewalker. Let’s grit our teeth

heunland · July 8, 2024, 7:18pm

Folks please make an effort to remain respectful in your choice of wording your comments, there is no need to descend into using inappropriate language on this forum.

littleskunk · July 8, 2024, 7:34pm

It is experimental. That means even I wouldn’t risk it. It also comes with some downsides. It doesn’t run in combination with the lazy filewalker and the first run will be a lot more expensive. So better don’t jump on this one early on and let others run some tests with it first. Once we have some numbers I can still question what would be needed to make this lazy filewalker compatible to get the best of both worlds. That might be the moment it is ready for you to use it. Unless you are eager to run some tests yourself early on. In that case I should be able to find out how to configure it.

Alexey · July 10, 2024, 5:47am

You may also try to enable a write cache for the disk in its policy (both checkbox) if you have a managed UPS.

Alexey · July 10, 2024, 5:50am

It’s possible. But requires to optimize a disk subsystem or disable a lazy mode and restart the node with enabled scan on startup (it’s enabled by default). The filewalker with a normal IO priority can finish its work. Moreover it’s even required for VMs (because the host is not aware about low IO priority processes, so they will never get a chance).

xsys · July 10, 2024, 7:08am

it is enabled by default, I have it always enabled everywhere, while no UPS anywhere (also some laptops I run with no battery inserted), and no issue happened…
My observation - it uses about 2 GB buffer/cache for writing, first 2 GB make insane speed, then it drops to the speed of HDD itself. Copy file smaller than 2 GB is instant.

but it only helps for writing - piecestore upload is piece of cake, but still doesn’t make any help for filewalkers. Windows is hopeless in this case
I’m saying that as Windows admin since 1994 and Windows 3.11, and while Windows makes me rich since then

Alexey · July 10, 2024, 7:14am

That’s true, it only help to offload uploads from the customers to release more IOPS to filewalkers. The alternative is to disable a lazy mode, and it’s a requirement, if you use VM (because host is not aware of low IO priority of any process in the guest). Especially bad it works on Windows VM on VMWare on the Linux host, as confirmed by many threads on the forum.