I’m still not seing any of my nodes with these symptoms, but they are all running from docker compose - if you are running them by “hand” it could pose a difference, as compose removes the container with the “docker compose stop” command, and recreates it with a “docker compose up -d” ?
Could be a difference to take into the considerations trying to figure this out..
If you don’t use docker compose down, but docker compose stop, then the container will not be removed, it will be stopped as with docker stop.
There could be a just timing, i.e. if you modifying before restart of the container, it may reset the filesystem state, because mount bind doesn’t expect modifications outside of the container.
So, probably only do docker stop, then modify, then do docker start would be enough to flush the filesystem before the modification. However, I prefer not only to stop the container, but also remove it to make sure that everything is flushed and reset to do not have any unexpected behavior.
I believe it’s not the code, but docker.
You confirmed that:
By the way, maybe resetting a database happened for the same reason. So, you need not only stop the container but also remove it before touching a storage location outside of the container.
But why is it not consistent then?
Only the .migrate_chore files the timestamp to be rewritten every hour. The .migrate files that are in the same directory and had been changed also did not.
I do not know, why the caching system of the docker layer filesystem choses these files. What I know, that bind mount in docker containers sometimes produces a such behavior.
So, it’s recommended to stop and remove the container before modification of files in the storage location.
I have seen some questions around moving the hashtable to a SSD.
I want to start with a fair warning.
No safeguards. Node might start with an empty hashtable. Disqualification would be the end result. Even if you detect this issue in time there is no way to merge hashtables. Please add extra steps to minimize this risk.
Performance gain overrated. Use memtable when possible. Memtable has better performance.
Ok now that this is done here is my way to do it:
First lets do some additional safety steps. Lets stop all incoming uploads. sed -i ‘s/storage.allocated-disk-space: .*/storage.allocated-disk-space: 0 B/g’ /mnt/sn1/storagenode1/storagenode/config.yaml sudo systemctl restart storagenode1
This should set the allocated space to 0 byte. Wait until the node has contacted all satellites in order to tell them that this node is full and doesn’t want any further uploads. Wait until upload activity dies down.
If you see an error message about allocated space and dont know how to avoid it you might want to stop running these steps. If you do want to continue I will give you only one hint: storage2.monitor.minimum-disk-space: 0 B
Still unable to avoid the allocated space safety check? Please don’t continue. Seriously this is a high risk operation and you should better practice with something else first.
You might also want to disable any piecestore to hashstore migration if you haven’t finished the migration already. This step is not included in my script.
Danger zone. Do not continue unless you understand the risks. Start with a small test node first.
Run your safety checks. I believe I double checked all 8 directories to make sure all the hashstable metadata has been coppied successfully. Last but not least I deleted the s0/meta and s1/meta folders from the HDD and restarted the node one more time to see if downloads are still working. In case of any mistakes I would stop the node as quickly as possible to avoid disqualifications and take my time to find out what my mistake was. If there is no sign of any mistake and the node looks healthy I would set back the used space value, restart the node and check for upload errors.
I was running one test node like this for a few days just to find out that there is no performance difference to my other memtable nodes. At the end I moved this node back to HDD. It does allow me to swap out HDDs and get them back online on a different machine. With the hashtable on SSD this would get a bit more complicated. Why risk that if memtable on HDD is giving me already best performance?
I just 2 days ago had power outage and SSD with all DBs gone unreadable, so I lost all 18 nodes databases, if it would be hashtables then i would lost all nodes.
During migration i have lot of wrong node size problem it show overused space, but on HDD it 500 GB free, after migration end it is easy to fix it or even it fix itself, but it is 2 weeks without ingress
I wonder if my node has enough RAM and if I activate ‘memtbl’.
There are other tasks and services running on my server, so if at any given moment they use more RAM (the other services), I suppose what is stored with memtbl is deleted?
Does this mean that everything has to be rebuilt or put back into ‘memtbl’?
How does this work? Is it done gradually? Are there parts that remain in the RAM, those that are read more often?
Wouldn’t it be more useful to write what is in the RAM to the hard drive and thus use a hybrid system that writes to the RAM and, if there is not enough RAM available, switches to using hashtbl? if this is not the case.
memtbl is a confusing name, it’s not in-memory only. It’s a hybrid system, where you have an on-disk structure, and an in-memory index. The disk structure is more sequential, as the in-memory structure works like an index, and can help to find the interesting part of the persisted data.
On the other hand hashtbl is more like a fully persisted version, which means for each update a disk seek is required to the right place. (also: same bytes can be rewritten multiple times, which is less SSD friendly on paper. We monitor our SSD, and didn’t notice any problem or quick degradation so far).
So now we are building systems for storagenodes?
SSD, RAM, filesystem optimisations… isn’t suppose to be run on what we have, on the daily used systems for usual activity?