Data remains from satellite.stefan-benten.de satellite

Hi,

I was migrate my storage node between hard disks and have noticed that there is still data on the disk from satellite.stefan-benten.de satellite.

du -sh  data/storage/blobs/abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa
210M    data/storage/blobs/abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa

Will this data get deleted now the satellite has shut down?

Thanks!

2 Likes

Was your node offline for some time before the shutdown of the satellite?

Thanks for getting back to me.

Looking at my logs I see ~4 times of between 4-12h where it might have been down over the last 4 months.

I think it’s inevitable that some data remains on pretty much all nodes. Especially considering that the bloom filters used by garbage collection don’t clean up everything in one go. It will probably be an insignificant amount for most nodes though. It’s just under 3GB on my node (which I consider insignificant compared to how much was stored by this satellite). Down time has been very low from my end over the pas few months. So that’s not the cause, but there was some cleanup of older zombie segments for which some garbage may have stuck around when the satellite shut down.

It would be nice to get some guidance on what to do. But for now, I’m just leaving it unless told otherwise. I have more than enough free space anyway. And what’s 3GB on a 28TB total available space.

3 Likes

I wonder if its possible to mark everything irrelevant during directory traversal and move it to garbage folder as this is literally garbage. SNO’s should be allowed to delete only data from garbage manually to make sure Stoj hasn’t marked custom scirpts, copies of config.yaml as irrelevant.

SNO’s are not supposed to delete the garbage folder themselves!

Lets assume the bloom filter that hit your node was missing some bits and caused accidential valid files to be marked as garbage.
If you delete the data then directly, we have no way to recover it for you. If we would not care about those files any longer, we would not need the garbage folder at all.

2 Likes

Those files end up in trash folder and I was referring to the garbage folder. I do know and also recommend others not to delete any data manually unless strictly instructed by a Storjling. I suggested manual delete only because if its the data inside garbage folder.

Could you please explain which files are sent to garbage folder ?

I believe that is an old folder that was once meant for garbage, but no longer used. Kind of like the old blob folder some nodes might still have. I think it’s no longer in use, but as with everything don’t remove anything unless specifically instructed by a storjling.

There are many folders/files from old (alpha+beta) days which is why I suggested something to tidyup the whole folder.

I thought so too but this made me think otherwise. I too do have older blob and garbage folders. Blob has 6GB of files :frowning:

eventually i’m sure they will figure out a way to completely get rid of data from dead satellites, but as you say it’s really insignificant, saw one guy complaining of 250mb left by the stefanbenten sat…

personally i haven’t bothered checking because… whats the point and i don’t like the idea of starting to delete stuff manually, even tho the data seems irrelevant one can’t really know if it will adversely affect the node long term… maybe when it finally starts to want to clean it, if the files are not there it could affect what happens to the databases… ofc most likely the databases are cleaned and only what was missed is left…

does kinda make me curious how much is left on my node… xD

not sure if this shows everything left… but when i run this i get 13MB xD
seems like it was doing some rather extensive garbage collection right when i shut it down to finish my migration… only spotted it after tho… so it will have to wait until it’s up again the new pool

du -sh /temptank/storj/storage/blobs/abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa
13M /temptank/storj/storage/blobs/abforhuxbzyd35blusvrifvdwmfx4hmocsva4vmpp3rgqaaaaaaa

I agree this is a very small percentage of data my node has ~7.3TB of data. For me the bigger question is will there be data in the other satellites folders that will not get deleted and will it get bigger over time?

nothing you will need to worry about, also storj will eventually be better at cleaning up after the satellites… but it’s almost impossible to avoid getting data on a drive that isn’t used for anything…
like say if you damage your database… then it might lack a few files when restored, how would it keep track of those… atleast until it’s programming is fixed to somehow clean up after even those rare cases.

i just migrated my node, and then i launched it incorrectly because it was running when i started rsync and i had forgotten, so the database was malformed… so i had to shut it down again… rsync it again… and didn’t ask it to delete the files it had gotten while running… so in theory those will not be in the old database… but all the other files would be and thus a few minutes of up time, would really take much space…

but i would rather leave the files rather than ask rsync to delete them,because then eventually it would cause failed audits most likely…
i got a 14tb node, but the older a node gets the more lost files will be around… but the better storj will also get at removing them…

and really harddisk space is evolving fast enough that the amount of space lost, is and will always be so little that cleaning it up is… like… for my 7 month node it was 13mb… for brightsilence his was 3 gb i think he said, and his node is significantly older than mine and about the same size.

so it seems maybe storj already got better at cleaning it up… really we are talking about maybe 2 or 3gb a year max… on a 14tb node, sure it’s space… but its like nothing will maybe cost you 1$ over 10 years… so is it worth worrying over…

1 Like

@stefanbenten when will you announce what the new purpose of your satellite is?

2 Likes

No, satellites that are still online do regular garbage collection. They will send a bloom filter to the node that the node can use to do cleanup. Anything that doesn’t match the bloom filter will get cleaned up by the node. This will include any garbage data. The data only remains for @stefanbenten’s satellite because it’s no longer sending bloom filters, so the node never goes through that cleanup procedure.

2 Likes

Thanks for explaining! I need to improve my understanding of the internal workings of storj.

I see that as this can’t happen on other satellites the small amount of data from Stefan’s satellite is insignificant. I’d only spotted this as I was migrating to a bigger disk so it will because an even smaller fraction on data over time :grinning:

1 Like

What about a system bloom filter which at least deletes satellite directory after a satellite has been shut down?

My Stefan Benten directory has not been used anymore since 3 September, the satellite does not exist anymore, so which is the point of not getting rid of that directory?

3 Likes

Could not agree more. Also to collect anything leftover after a (failed)GE.

1 Like