Disk usage discrepancy?

TWH · July 2, 2024, 3:40pm

I am seeing the same thing on my node, its about 40TB and running v1.105.4. I see " ERROR lazyfilewalker.used-space-filewalker failed to start subprocess {“Process”: “storagenode”, “satelliteID”: “***”, “error”: “context canceled”}", only one of the satellite has finished. When i run the info command its says it failed to load the identity, i have attached a screenshot. I have confirmed that the container is able to access all of the identity files.

EDIT: My drives are part of a mdadm array, they have been running 24/7 for months, no reported disk errors.

EDIT: If i include the config and identity directory options in the command, it executes successfully.

snorkel · July 2, 2024, 3:59pm

Just set lazzy fw off. It will let the FW run old style.

Vadim · July 2, 2024, 4:56pm

I will try, then write back will it help, and how fast.

Vadim · July 2, 2024, 5:17pm

Interesting that after restart i got on this node some free space but not from trash amount, looks like TTL data show free up on node only after restart(it is a theory yet)

CtrlAltDefeat · July 2, 2024, 6:44pm

I started my node with both:
storage2.piece-scan-on-startup=true
pieces.enable-lazy-filewalker=true

Using htop, I see it is finished scanning the blobs folders. However the usage according to the graph and df -h remain the same. using --si puts me at 20TB used.

I do get these kind of warnings, but I don’t think they’re related;
WARN console:service unable to get Satellite URL {“Process”: “storagenode”, “Satellite ID”: “12tRQrMTWUWwzwGh18i7Fqs67kmdhH9t6aToeiwbo5mfS2rUmo”, “error”: “console: trust: satellite is untrusted”, “errorVerbose”: “console: trust: satellite is untrusted\n\tstorj.io/storj/storagenode/trust.init:29\n\truntime.doInit1:6740\n\truntime.doInit:6707\n\truntime.main:249”

Stez · July 2, 2024, 6:50pm

Had these warnings as well. Followed this guide: How To Forget Untrusted Satellites

Turns out 2 old satellites were still using 0,5TB. Great.

snorkel · July 2, 2024, 7:45pm

Stop using lazy mode. How many times I keep telling you guys. But you don’t listen and complain about problems.

unrealSpeedy · July 2, 2024, 7:45pm

I think Im patient, but that the SNOs has to handle the deletion of decomissioned Satellites manually is a Joke … why the hell the SNO-Software couldnt do this?
Same Joke with the undeleted data which seems to be there forever even if TTL should be deleted. Same Joke with Trash and Filewalkers …

But lets see, i Hope they will fix all this

littleskunk · July 2, 2024, 8:17pm

One bug and it might wipe the wrong satellite. That was the reason we didn’t implement something that can potentially destroy the entire network with just one update.

Julio · July 2, 2024, 9:38pm

Your not seeing things, of course you’re spot on with your understanding of your NTFS volume. I’m guessing 64k clusters?

Quick explanation of your node history vs. now
By my recollection my average file size has been falling since my noted peak in 2023 of about 700k, down to about(~) 500k (death of 125 gb free) late 2023, and after recently killing off free tier data, all the way down to about 250k early 2024. This test data is perhaps 125k-135k distribution. You’ve got at least a 32TB disk there, ergo 16k or greater cluster size… prolly 64k. Your inefficiency now is over 16%… 37/32… Likely 1/4 of how efficient it used to be. Going forward, with this new workload - maybe go back to 16 or 32kb.

There are no bloom filters for your size, so I’d also suggest - especially if it’s true there’s a bug in deleting TTL, you go back to 16TB nodes, maybe even 4k clusters - maybe even with compression on, your only option to try that. Personally I use 32k, it’s a decent match. If you examine recent ‘test’ data, it’s 3k + 2meg polarized, imagine there’s an abundance of only 3k/64k efficiency in this new reality. Multiple nodes would sequence data better… inflow/outflow wize, without buggy prone to re-starts on unhealthy run-length file walker, etc. times.

a quick 2 cents for you.

jammerdan · July 3, 2024, 1:30am

Like that one unfortunate SNO did?

Rename first → 30 days grace period → delete.
That would be my idea of automatic deletion with a safety net.

CtrlAltDefeat · July 3, 2024, 1:47pm

Even with lazy mode disabled, the issue persists. the graph in the webui is showing 1/3rd of the storage on disk

Vadim · July 3, 2024, 1:50pm

doew used space filewalker ended its job?

CtrlAltDefeat · July 3, 2024, 2:00pm

It completed after roughly 8 hours;
lazyfilewalker.used-space-filewalker.subprocess used-space-filewalker completed {“Process”: “storagenode”, “satelliteID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “piecesContentSize”: 2065679422720, “Process”: “storagenode”, “piecesTotal”: 2066841298688}

#Edit, I just noticed this is a single sattelite. The next one started after. I’ll remain patient.

Shahzada · July 3, 2024, 2:03pm

Guys anyone found any fix for this

thelastspark · July 3, 2024, 3:07pm

Make sure all the filewalkers finish succesfuly, and restart node (latter part was necessary for me but may not be for you). No other solution is known

littleskunk · July 3, 2024, 3:14pm

I think a hard refresh on the storagenode / multinode dashboard would do the trick as well.

Shahzada · July 3, 2024, 3:21pm

how do i check the status of filewalkers ? i m running on Windows
or how do i make it re-run

Roberto · July 3, 2024, 4:28pm

EasyRhino · July 3, 2024, 4:40pm

oh wait, shoot, is that the normal lazy behavior, one filewalker pass per satellite, so I need to wait for four of them?

Man it’s already taken 3 days for the first one, this is gonna be rough.