Upload "Remote Address" use docker Client public IP and not "ADDRESS=" endpoint

jammerdan · September 8, 2023, 8:31am

No my logs don’t go back that far. Current logs start on Sep 4th.

Toyoo · September 8, 2023, 9:17pm

So, at the current reported average piece size of 328 kB, that would be around 144 GB, or 0.05 USD at a 1.5 USD/TB/month rate assuming that it would survive a week on average.

Frankly, not sure if I’d bother, especially if GC would clean it up anyway with less I/O involved. On the other side, if we had an observation suggesting that these files somehow survive GC, then that would be worrisome.

jammerdan · September 9, 2023, 4:35am

Good news first, the piece is no longer in the blobs folder. I don’t know where it has gone and what has removed it.

The extent of the problem is pure random. Larger pieces, more satellites, more uploads and it scales.
Why not move the piece that we know we don’t need it to the trash immediately when the log line of the failed upload gets logged instead of filling the nodes up with garbage? It is my understanding that moving does not cost I/O. Of course to delete it immediately would be better, but if it was moved to trash then we could be at least sure that it will be deleted after 7 days.
BTW that is not the only garbage nodes get filled up with. There is literally the unpaid trash and the temp folder:

The latter doesn’t even get cleaned up by software at all.
I would certainly prefer to see the storagenode software being less messy and space wasting than it is today.

Toyoo · September 9, 2023, 9:39am

Assuming ext4, and assuming that the source and target directory pages are already in cache, which I think is reasonable here, that’s two writes to directories + a write to the journal.

Multiplying 3 by 439561 and dividing by 250 IOPS, that’s 1.5 hours of busywork for the HDD.

Immediate removal is worse: a write to the source directory, removal of an inode, marking data pages as free in the bitmap, maybe (though unlikely for storage nodes, as files are usually not fragmented much) removal of a separate extent tree.

GC makes it better because some of these writes can be batched, making a single write for many files. And it is done in a thread with low priority, as it should be. Though, the current implementation of that low priority thread is suboptimal, so not sure whether these gains are realized

jammerdan · September 10, 2023, 6:13am

That’s an interesting calculation.

But what writes are you referring to here? Isn’t a move just a rename so all that has to be changed is metadata?

But even if there would be 3 more operations required the question is if this is relevant at all. How many I/O operations have already been performed per piece starting with writing out buffers to temp files as they arrive moving the file from temp to blobs etc. ?
I can hardly believe this would increase the stress on drive significantly.

Toyoo · September 10, 2023, 10:38am

You need to write to the source directory metadata that the file is no longer there, then write to the destination directory metadata that the file is there.

I wrote down my guesswork here. Assuming an upload failed due to lost race, but still fast enough to be moved to blobs directory, we’d skip the database update and the orders file update—both are not synced, so they are likely coalesced along multiple uploads.

And I do, because numbers say so. 1.5h is non-trivial. 3 additional seeks on top of (best case) 10 seeks is 30% more.