High Trash usage - 27 TB - no more uploads

mcanto73 · November 8, 2024, 11:31am

Hello,
I’m running 5 nodes (with different IP) and I noticed that one has stopped to receive uploads even if with space availanle and I see a lot of Trash space in all my nodes

How is it possible that I have something like 27TB used as Trash ?
Why I’m not getting uploads in this with 3 TB free :
/STORJ5/STORJ_NFS4 11T 7.4T 3.1T 71% /STORJ

The screenshot related to it is node1 and says 0 free space with 7.3TB used, 5.91 TB Trash

So, why this huge amount of space is in trash ?

Thanks

andrew2.hart · November 8, 2024, 11:54am

Is it really NFS? It might be worth checking if you really have that much trash on one of the nodes. Look to see if the folders in the four satellite folders in trash have old dates, older than 7 days

flo82 · November 8, 2024, 1:10pm

Have you disabled the filewalkers on startup?

mcanto73 · November 8, 2024, 1:32pm

Hello,
why should I do that ?

My storj instances are build via docker compose with these parameters:

STORJ_PIECES_ENABLE_LAZY_FILEWALKER=true
STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=false

Best regards

PieceKeeper · November 8, 2024, 4:32pm

@mcanto73, try docker compose down on one of the nodes and set STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=true to allow the filewalker to run at startup.
You can set it back to false afterwards

snorkel · November 8, 2024, 5:08pm

There were so many bugs regarding trash in the last months, and your config didn’t help either.
To correct the problem, just put the lazzy on false and startup piece scan on true.
Stop, rm and start the nodes. Let the scan finish for all sats and wait a few hours before changing back or restart the node (if you want).

EasyRhino · November 8, 2024, 5:12pm

agree with snorkels advice. you may have missed the excitement over the summer, but there was a HUGE amount of test data that came in, filled up nodes, but now has all been trashed. your trash should have been deleted by now.

So turn your filewalker back on (STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=true) and give it a few days to finish.

in addition, you can go into your data folders and the “trash” folders and see if you have anything older than 7 days. if you do, you could always delete it manually, but the storj node should be doing it itself. The trash deleting will also probably take multiple days, you are talking millions of files.

mcanto73 · November 8, 2024, 7:05pm

Hello,
I will try to change node parameters.
What do you mean with rm?
Do you mean manually remove files from trash ? How ? Is there a single folder ?
Can this broke something: maybe the node still believe to have this darà?

Thanks

snorkel · November 8, 2024, 9:46pm

No no no! In order to change a docker command parameter, you stop the container, remove the container and start with new command. Maybe in docker compose is different, I don’t use it.
You don’t delete/remove any file!

EasyRhino · November 8, 2024, 9:46pm

if you go into the trash folder there are four subfolders with really long id’s, that’s one for each storj satellite:

Satellite info (Address, ID, Blobs folder, Hex) - Node Operators / FAQ - Storj Community Forum (official)

for our purposes that doesn’t really matter.

inside each one, they will be broken down by date. they were already shuffled there by the garbage collection job (taken out of “blobs” and put into ‘trash’)

if the date is older than seven days. it shouldn’t be there. maybe the emptying trash job has failed or is just waaay behind.

So you shouldn’t have to delete folders older than seven days, but it won’t hurt anything if you do it. And again these may have millions of files so a simple “rm” or windows delete command could take many many hours.

Alexey · November 9, 2024, 2:48am

@snorkel mean the docker rm storagenode command to remove the previously stopped container and then run it back with all your parameters.
However, for the docker compose it’s as simple as docker compose up -d and it will do everything automatically.

mcanto73 · November 9, 2024, 8:28am

Hi All,
I have deleted all trash files that has more than 7 days.

cd /STORJ5/storage/trash
for n in $(ls) ; do echo $n; /usr/bin/find $n -type f -mtime +8 -name “*” -print0 | xargs -r0 rm -v --; done

the space moved from
11T 7.4T 3.1T 71% /STORJ → 11T 6.8T 3.7T 65% /STORJ so more or less 0.6 TB was in the Trash

unfortunately still no uploads coming and nothing seems to be changed in the node report (see attached file)… I restarted a lot of time, now I disabled lazy and enable piece scan.

Thanks

(attachments)

nerdatwork · November 9, 2024, 8:35am

What is the version of your node with no upload ?

mcanto73 · November 9, 2024, 8:39am

it’s 1.114.6 likes other 3 nodes. Another one is 1.115.5

for sure this is not the issue, the problem is that he is reporting 0 free space !
see previous email

Best regards

snorkel · November 9, 2024, 4:05pm

You won’t see any change until the used space scan finishes + 2 hours (I mean after the scan is finished, wait 2 more hours before doing anything). Restarting dosen’t do anything for the reported space. Only the scan rectifies the dashboard.

Alexey · November 10, 2024, 3:43am

You need to enable the scan on startup if you disabled it (it’s enabled by default) and restart the node. The used-space-filewalker will calculate the actual usage and will update the databases.

by the way, this was a mistake:

You deleted also newest trash, maybe almost all of it.
See

$ sudo stat /mnt/w/storagenode5/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-11-04/3q/cexe7papfxruuyaqdckgifbkhmldnaji2bbkiidqdzc6363f4a.sj1

  File: /mnt/w/storagenode5/storage/trash/ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa/2024-11-04/3q/cexe7papfxruuyaqdckgifbkhmldnaji2bbkiidqdzc6363f4a.sj1
  Size: 4864            Blocks: 16         IO Block: 512    regular file
Device: 57h/87d Inode: 3659174698552822  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-10-27 14:11:33.583839900 +0300
Modify: 2024-10-27 14:11:33.583839900 +0300
Change: 2024-11-04 05:30:51.999237300 +0300
 Birth: -

You need to use -ctime, not -mtime.

And you can use the simplest script

for n in $(ls) ; do echo $n; /usr/bin/find $n -type f -ctime +8 -print -delete; done

mcanto73 · November 10, 2024, 6:30am

Hello,
I’m running with these

STORJ_PIECES_ENABLE_LAZY_FILEWALKER=false
STORJ_STORAGE2_PIECE_SCAN_ON_STARTUP=true

Still no luck

It reports 0 free space and no uploads

Best regards

Alexey · November 10, 2024, 6:45am

Did it finish the scan?

docker logs storagenode 2>&1 | grep "\sused-space" | grep -E "started|completed"

mcanto73 · November 10, 2024, 6:47pm

Hello,
my node is running since 34 hours with lazy filewalker disabled and piece scan enabled
storjlabs/storagenode:latest “/entrypoint” 34 hours ago

These are my logs after the parameters change:
ubuntu@hpool:/STORJ_LOCAL-5/LOG$ zcat node.log.1.gz | grep “\sused-space” | grep -E “started|completed”
2024-11-09T08:16:05Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-11-09T09:17:17Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-09T09:17:17Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-09T09:17:17Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-09T09:17:25Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-11-09T15:59:59Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-09T15:59:59Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-09T15:59:59Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-09T16:00:08Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}

ubuntu@hpool:/STORJ_LOCAL-5/LOG$ cat node.log | grep “\sused-space” | grep -E “started|completed”
2024-11-10T04:39:27Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Lazy File Walker”: false, “Total Pieces Size”: 4924677327714, “Total Pieces Content Size”: 4910494774626, “Total Pieces Count”: 27700299, “Duration”: “12h39m18.037612122s”}
2024-11-10T04:39:27Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-10T04:45:21Z INFO pieces used-space-filewalker completed {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Lazy File Walker”: false, “Total Pieces Size”: 10708280576, “Total Pieces Content Size”: 10701173504, “Total Pieces Count”: 13881, “Duration”: “5m54.47502696s”}
2024-11-10T04:45:21Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-10T13:45:17Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-10T13:45:33Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-11-10T13:45:41Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-11-10T13:45:41Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-11-10T13:45:41Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-11-10T13:45:47Z INFO pieces used-space-filewalker started {“Process”: “storagenode”, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}

I can see a lot of “started” and just a couple of “completed”.

Best regards

arrogantrabbit · November 10, 2024, 7:38pm

Never parse output of ls. Never. Especially, if deleting data is involved. Further reading: ParsingLs - Greg's Wiki

I thought to fix the scriptfor you, but it is a minefield in its entirety. Throw it away, and write a much simpler one.