I migrated a node from Windows to Linux and now the docker container eats up the RAM (16gb) in about an hour. In reading other posts it appears I have a DB file that’s probably been corrupted. The last time I tried fixing a DB file I failed miserably and lost the node. Any suggestions before I attempt this again?
In most cases this is indication of problems with disk subsystem, if it cannot keep up - the process will use more RAM to handle the load.
The broken DB will not lead to disqualification, you just lose a stat, nothing more.
Hi Alexey, thanks for your continuing support! I did copy all the data from the Windows machine across the network to my ubuntu box with the hardware R10 subsystem ext4 filesystem. After the original problem, I’m thinking that maybe there was some funny stuff being sorted out because now, all is working fine. If I restart the container, available ram will be eaten up to 12gb and then after the node runs for a while it goes back down to about 3gb. The node size is a little over 6tb.
R10 seems is RAID10. It could explain the slow disk operations. The node run a filewalker process to calculate a usage and fill up databases, if they were empty or inconsistent. So, seems during this period you seen a high RAM usage. But 3GB of usage during normal processing doesn’t looks right.
The normal usage without problems with disks subsystem usually not more than 300MB, in a very active period it could eat more, but not 3GB and even 12GB.
Please check the usage with docker stats.
And worth to check databases as well: https://support.storj.io/hc/en-us/articles/360029309111-How-to-fix-a-database-disk-image-is-malformed-
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
7cbadcd58343 storagenode2 3.42% 172.1MiB / 24.81GiB 0.68% 61.8GB / 34.5GB 0B / 0B 40
cf135a75547e storagenode5 0.02% 99.73MiB / 24.81GiB 0.39% 793MB / 10.2GB 0B / 0B 34
de6dbb76212e watchtower 0.00% 13.35MiB / 24.81GiB 0.05% 1.65kB / 0B 0B / 0B 15
storagenode5 is full, but storagenode2 is not (and has 3.5TB free).
If databases are ok and you use a local attached storage, then only storage itself could be a reason.
The RAID10 should work faster than RAID5, but could work slower than one HDD if some disks are slow. What types of disks did you use? Are they SMR?
I’m using 8TB SAS Enterprise drives with a 12gb interface on a Avago 9361 controller so performance shouldn’t be an issue. I dunno, since the system releases most of the ram after it’s on for a while I’m OK with the unexplained issue. If you can think of anything else I should try let me know.
The only known reason for high memory usage - when the disk writes are very slow (have a high latency), like using SMB/NFS or SMR disks or BTRFS filesystem (except Synology - they customized BTRFS and it works normally).