Disk usage discrepancy?

daki82 · February 19, 2024, 10:12pm

Its right there. remove the checkmark, hit ok. (may take a long time)

How much space is used by the index?

A rule of thumb is that the index will be less than 10 percent of the size of the indexed files.

Source:
Search indexing in Windows 10: FAQ - Microsoft Support.

naxbc · February 19, 2024, 10:30pm

Thanks you very much. Will check that and let you know

Alexey · February 20, 2024, 2:59am

Already scanned pieces should be moved to the trash by retain process.

Alexey · February 20, 2024, 3:02am

Please do not delete databases and their parts, if they are not corrupted.
If you want to lose Stat, history and start scans from scratch - you may do so, but need to proceed differently:

mcanto73 · February 20, 2024, 7:20am

Hi @Alexey / All,
it’s running since few days and I’m a bit confused because of different state on nodes, see logs below.

In some nodes I can see 4 Prepared and 2 Moved, I suppose this is the right behavior and it’s still running. I suppose will finish when I will get 4 “Moved” logs

In master node, Local, I only have 1 Prepared and 1 moved… Why ?

On node 3 seems not be started and docker configuration is the same : started from the same docker-compose file. Why ?

When the space will be reclaimed once files are moved ?

Best regards

Node 1 - Local
2024-02-18T05:13:06Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:59Z”, “Filter Size”: 3374704, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-02-18T13:25:10Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “Deleted pieces”: 239700, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 5921470, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Duration”: “8h12m3.41778396s”, “Retain Status”: “enabled”}

Node 2
2024-02-16T14:36:20Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-12T17:59:59Z”, “Filter Size”: 310044, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-02-16T16:23:22Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “Deleted pieces”: 17272, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 539495, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Duration”: “1h47m1.982820251s”, “Retain Status”: “enabled”}
2024-02-16T20:40:44Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-07T17:59:59Z”, “Filter Size”: 4100003, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-02-17T01:34:45Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:42Z”, “Filter Size”: 49782, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-02-17T01:44:21Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “Deleted pieces”: 368, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 83766, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “9m36.020607763s”, “Retain Status”: “enabled”}
2024-02-17T23:30:35Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:59Z”, “Filter Size”: 1392674, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}

Node 3 :
Nothing logged - seems not started

Node 4 :
2024-02-16T09:28:38Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-07T17:59:59Z”, “Filter Size”: 4100003, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-02-16T18:52:38Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-12T17:59:59Z”, “Filter Size”: 435782, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-02-16T19:46:10Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “Deleted pieces”: 13169, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 744658, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”, “Duration”: “53m31.6848825s”, “Retain Status”: “enabled”}
2024-02-17T06:10:51Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:42Z”, “Filter Size”: 66393, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-02-17T06:21:43Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “Deleted pieces”: 340, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 111427, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “10m51.572616618s”, “Retain Status”: “enabled”}
2024-02-18T05:09:59Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:59Z”, “Filter Size”: 2313224, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}
2024-02-18T12:28:45Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “Deleted pieces”: 204184, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 4089826, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Duration”: “7h18m46.617034996s”, “Retain Status”: “enabled”}

Node 5 :
2024-02-16T12:41:11Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-12T17:59:59Z”, “Filter Size”: 414835, “Satellite ID”: “121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6”}
2024-02-16T15:14:08Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “num deleted”: 12334, “Retain Status”: “enabled”}
2024-02-16T20:25:15Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-07T17:59:59Z”, “Filter Size”: 4100003, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”}
2024-02-17T05:06:21Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:42Z”, “Filter Size”: 65095, “Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”}
2024-02-17T06:55:52Z INFO retain Moved pieces to trash during retain {“process”: “storagenode”, “num deleted”: 611, “Retain Status”: “enabled”}
2024-02-18T06:42:44Z INFO retain Prepared to run a Retain request. {“process”: “storagenode”, “Created Before”: “2024-02-13T17:59:59Z”, “Filter Size”: 2118367, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”}

Alexey · February 20, 2024, 7:29am

They all run in a different time, it depends on when you restarted the node and its NodeID, they also uses different disks with different speed.
Just they need time. For a week they should got all filewalkers at least started (when they finish depends on did they restart, how slow they are and so on).

elek · February 20, 2024, 4:09pm

They are moved to trash when GC finished (first phase: find pieces to delete, next phase: delete them). No moving back.

JWvdV · February 20, 2024, 6:52pm

Would this also be a possibility for optimization? Moving to trash during the walk, so an interruption doesn’t make the whole walk of no avail?

snorkel · February 20, 2024, 7:17pm

Or at least mark the pieces during the walk that needs deletion and than one of the walkers move them to trash, even though the GC is interupted.
On my slowest nodes that share 1GB RAM, I rebooted the machine for maintenance, during the GC, that was running for 1-2 days. At least they should not be hammered for nothing.

elek · February 20, 2024, 7:43pm

There is a plan to make the bloom filter persistent, @clement is working on it:

github.com/storj/storj

GC bloom filter should be stored on disk in case of node restart

opened 08:11AM - 25 Jan 24 UTC

mniewrzal

Currently lazy file walker will always start from the beginning on node start (s…ee #6708). Fixing used space calculation is one thing but we should also improve GC process. At the moment node is receiving bloom filter and its processing all satellite pieces against this bloom filter. The issue is that GC takes a lot of time and with node restart (updates, issues) we are loosing bloom filter and even if we start file walker from last position we won't have data to finish GC process. Acceptance Criteria: * store bloom filter on disk to be able to restart GC process from last position * bloom filter should be removed after successful GC

But I agree, it should be considered to delete the files during the walk. The IO pressure can be more balanced with this approach (instead of a higher spike at the end).

Toyoo · February 20, 2024, 9:12pm

It’s not even about that. Right now a memory-starved system will have to bring direntries and inodes to memory twice: firstly during the scan, then for actual deletion. By the time the deletion phase starts, all metadata caches are trashed, making the process slower than necessary.

The old pre-lazy implementation was better in this regard…

JWvdV · February 20, 2024, 9:36pm

That plan is in a very preliminary stage I see, as it even still needs estimation. Meaning this plan could even be trashed during next walk? Or at the end of it?

daki82 · February 21, 2024, 8:33am

That would explain some bad cache hit rates i observed, going up, after disabling the lazy.

elek · February 21, 2024, 9:11am

It’s more important to have a Sprint assigned to it. If there is a Sprint, it’s already decided that it supposed to be done it in two weeks (but it’s not always possible). Estimation is not always added at the beginning…

So it’s definitely on the short-list…

JWvdV · February 21, 2024, 9:19am

Thank you for the explanation, was also kind of a joke

kocoten1992 · February 24, 2024, 9:05am

Hi STORJ,

As I understand it, there are two numbers when it come to max storage: hdd real diskspace and my number diskspace (provided when running docker command), I don’t think these two can be perfect match. What would happen if:

HDD diskspace is 5TB, I provide 8TB, and my node is approaching 5TB?
HDD diskspace is 8TB, I provide 5TB, and my node is approaching 5TB (does it continue pass 5TB)?

Really appreciate if shining some light into this.

Thank you very much!

Alexey · February 24, 2024, 11:39am

Hello @kocoten1992,
Welcome to the forum!

This is related to filewalkers.
Do you have any errors related to: FATAL, walk and error|failed, retain and error|failed, piece:trash and error|failed?
If you have - you likely on VM, and you need to:

Stop the node
Perform a disk check and fix any errors
Do a defragmentation, if you use Windows.
Enable the automatic defragmentation back for this drive, if you disabled it and you use Windows.
If you use NTFS under Linux or exFAT under any OS - it’s time to backup your data, reformat the drive to a native FS for your OS (NTFS for Windows, ext4 for Linux) and restore data.

If you are not on VM, the steps are the same, but they really can help. If you use VM, well. It’s better to DO NOT use VM and use docker instead on a host OS.
If you are used any network filesystem like, but not limited to: NFS/SMB/CIFS/SSHFS/etc. - storagenode is not compatible with any. It could work until doesn’t, and seems you reached this point… So - migrate your node directly to a file server/NAS or use iSCSI.

Check your logs after changes are made, you should not have any FATAL errors any time.

snorkel · February 24, 2024, 4:41pm

Satellites stop sending data when the occupied storage space is equal or more than allocated storage space, with a delay of 1hour max.
That is why you need to let at least 10% free (unallocated space) for overhead.
So, for the situations you mentioned:

Your node will crash and you will get disqualified, because it will try to store more than 5TB of data.
It will stop storing pieces at 5TB and some GB (the overhead).

The satellites don’t know how much space you have on your HDD. Only you provide that limit.

kocoten1992 · February 24, 2024, 5:57pm

Thank you very much, this is the information I was looking for <3.

But one thing doesn’t feel right, 10% overhead. Why I need 10% free storage when only 1h of delay? For example: if I have 16TB hdd, 10% would be 1.6TB, but in 1h, at most satellites will send me at most about 20GB (there is no way - more like 5GB), why 10% overhead is needed?

Should the config make more sense? 1% maybe? And do you know STORJ calculate on terabyte or tebibyte?

Thank you very much!

daki82 · February 24, 2024, 8:54pm

the 10% is more or less an universal requirement for rotary drives in use.

its to account for overhead, fragmentation, and what else…

…long story short, it prevents you from getting stuck in an unmanageable drive.

if the (BIG > 10tb) node drive is nearly full, you can fine tune it to have around 1TB free.
example:
my 20TB drive is 18.1TB formated and i set 16.3 TB for the node from the start.

the safer one of the two

Ps: even in SSD s there is an “safety space” , but its “hidden” from the user. and you can only make it bigger.