Hdd after mooving to hashstore
Fragmentation is 99.94%
Itâs seems to be a consequence of the migration of millions small files into thousands of large ones. Do you have an image of a node that was on the hashstore from scratch?
<flashbacks to the soothing sights and sounds of Win3.11 defragging a 250MB IDE HDD>
I totally know what Iâm looking at, of course, but there might be less technically savvy people in this forum. Could you explain it in more verbose terms for them?
File system is totally fragmented. Itâs causing sequential file read to be random reads and read speed very much dropping (~20mb/s average on modern HDD)
I do not have literally acceptable words for this description, i know why and how it happened, it just one big point in TO DO list, after conversion.
I donât understand why this is such a big problem? Run defragmentation software once, and you should be golden no?
Iâd still much rather have a couple of thousand 2GB files than a couple trillionbillion very small files
I imagined this will happen, and I am pretty sure it happens on ext4 too. Smaller files, like in piecestore, are better handled to prevent this, than bigger ones, like in hashstore. As I said, hashstore is a bad move, better stick with piecestore and badger.
There is nothing you could do to prevent this, and running defrag 24/7 isnât going to help either.
It is already running, it just will take another week, one by one. I have 8 of them. I dont even imagine how to move all other 100 nodes to hashstore, it will take enormous amount of time.
But it will take strong advantage, once i want move node to bigger HDD, copy will take several times less time. Because now move data is like 2-5mb/s
Output from e4defrag on a disk (50% filled) for a hashstore folder of US1 satellite (./s0/00 to be exact):
e4defrag 1.47.0 (5-Feb-2023)
<...>
Total/best extents 1121/11
Average size per extent 10295 KB
Fragmentation score 0
[0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
This directory (/mnt/storj/node-1/storage/hashstore/<us satellite>/s0/00) does not need defragmentation.
Done.
It donât seem to be a problem for the moment.
It has 50% space to breath. When will be almost full, I doubt that this score will remain the same.
for me badger not work very good, each windows update end with node not start as there is cache error.
I can for sure say, if you HDD has bad sectors, dont migrate to hashstore. Piecestore was fine.
Say goodby to the old old HDD that only was on life support and is now going to kill my node and dies peacfully
Why is that? Access to trillion files individually can be optimized by the filesystem. Access to trillion data pieces inside of huge blobs â cannot. Itâs just another filesystem but worse.
Some of the hardware I use is vastly inferior to what youâre running. It breaks all the time, RAID volumes degrade and I just have a habit of moving data around for the heck of it.
The increase in filesize alone is a huge welcome to me, because It will make my node migrations take much lower time, which will significantly increase the amount of tests I can run
Did you get to the bottom of this? Perhaps bad cables? Bad ram? Itâs hard to imagine any hardware made in the last 30 years to be âbreaking all the timeâ
Mine is literally old garbage nobody wanted I got for pennies from a recyclerâŚ
Few comments here:
Time for another update. Memtable implementation is soon getting merged. Here is how it will work.
There will be a flag on the storage node to switch between hashtable and memtable. Next time compact runs it will migrate. It can migrate both directions. So if you have the feeling memtable doesnât work you can switch back to hashtable.
The migration doesnât rewrite LOG files. Compact does 2 things. It updates the metadata for all LOG files. Lets say it puts down a few thousand trash flags across all LOG files. And than there is the LOG rewrite part for some of the LOG files. The migration will be part of the first step. This basically means even with 0 LOG files to rewrite it will migrate all the metadata into the configured format.
There are 2 hashstores per satellite namespace. So for a full migration they both have to run compact. That might take a few days. Not because it takes so long. More like a few days of waiting until compact gets triggered.
The memtable migration will create the hint files on disk that are required to rebuild the memtable on the next restart. We donât know how long the rebuild will take. Best would be to use a SSD in combination with memtable to reduce the startup penalty.
We expect the memtable to consume about 9 bytes per piece + another 9 bytes reserved for empty entries (hashtable load factor). So for my own Orange Pi5 with 32 GB of RAM and 8 HDDs that would mean about 200 million pieces per drive. I canât wait to run it but first it needs to get merged.
The current theory in terms of best performance would be (top to buttom)
^ This list is just a theory at this point. Further benchmark tests are needed to find out which setup works best in which situation. This ranking might change.
It would seem less confusing if written as " ⌠of memory (SSD space) per 10 TiB of space "
&
Still needs about 1 GiB of memory (HDD space) per 10 TiB of space
Example: Letâs consider 10TiB node
Node with enough ram : 10GiB of memory (RAM) for 10TiB node
Node with less ram but with SSD: 1GiB of SSD space for 10TiB node
Node with less ram & no SSD: 1GiB of HDD space for 10TiB node
Did I interpret it correctly ?
After migrating to hashstore, my node shows 2x amount of data, real amount of data and reported from satellite are more less equal.