So I hope this doesn’t devolve into a finger pointing match between docker on mac and storj, but over the last couple weeks my computer has been locking up. I’ve been running a storagenode for over a year in this configuration with uptimes for a running container in the weeks range – now I can’t get thru a day.
In trying to troubleshoot this I saw that the memory used by the mac’s mount handler com.docker.osxfs grows quickly over time and once the computer runs out of swap space it kinda freezes up.
I tried downgrading docker from current (2.3.0.3) to the previous minor line (2.2.0.5), but didn’t help.
I don’t really want to revert backwards on the docker version of storj/storagenode as I’m sure that hasn’t been tested and would likely hose my configuration. Running 1.6.4
I was wondering if anybody else is seeing this issue running on a mac? Also, don’t tell me run docker in linux in a vm please. The current workaround is to restart docker twice a day to keep things running.
Here are some numbers to back things up with a timeline and some data points on memory use. I also notices that the overlay VM is growing slowly using df in the container, but not sure that is related since a restart shouldn’t change the underlying filesystem contents of the same container instance.
Perhaps the instability of Docker desktop on Windows on versions newer than 2.1.0.5 is include Mac?
However, when we saw the high memory consumption it was usually either network attached storage or corrupted databases.
Could you please check your databases?
That is why I tried rolling docker back to the 2.2.X line before posting this.
I did the DB check even though the logs do not show the “database disk image is malformed” message:
$ sqlite3 /Volumes/Storj/StorjV3/storage/bandwidth.db "PRAGMA integrity_check;"
ok
Like I said, its been running 6TB of used storage for almost a year with uptimes in the weeks – and the only change (other than the minor docker stuff) has been the version of storagenode.
So it really feels like a recent change in the code may be the culprit.
It is possible to revert the running version to something earlier that 1.6.4 w/o corrupting the underlying storage/node so I can see if that helps?
I do not have a Mac unfortunately. Perhaps it’s specific for Docker desktop on Mac.
If you run the docker stats I believe you will clearly see that there is no RAM usage by the storagenode.
agreed, I don’t think it is the application specifically, but an interaction with the shared filesystem process. Is this something we can kick up to engineering and/or the testing team to try and reproduce?
Something must have changed that is making this bad all of a sudden after a year of no issues.
Already checked the disks. Checked open bugs on docker-for-mac, but didn’t see anything related recently. Since I didn’t write the storagenode code, I’m hardly in a position to explain what the app is doing which might be causing this issue.
Since we’ve rolled docker back 2 versions and both have the problem – again after working fine for a year – it seems more like an issue with the storagenode software and its use of the docker file mounts. Can we at least ask somebody in the org to test current/older versions and can easily create tokens – who doesn’t risk blowing up their production node?
Has anybody else in the community running on Mac seen this process memory grow over time – you might not see it easily as it seems to quietly consume swap space. I can’t be the ONLY mac storagenode operator?
Can you try to create a new identity and start the second small node?
You can reduce the monitoring check threshold to zero to allow to start it with a reasonable small storage allocated. Will it have the same behavior or not?
Using a new identity/node, it still grows but at a slower pace.
There are times I see it drop slightly, but the overall trend is upward (time -> size):
Quick update for anybody who finds this in the future. So no progress on this. I still feel like there is a problem with the docker volume mounts on macs with the heavy use of sqllite DBs used by storj using the existing sharing mechanism com.docker.osxfs on nodes storing lots of data (6TB in my case).
I did try a beta build with an alternate implementation linked here. It didn’t have the same memory issue, but eventually crashed the computer anyway – no idea if it was just leaking in something less visible – ¯\_(ツ)_/¯
The workaround for now (short of just exiting the node and leaving the storj storagenode family) was to install a crontab to restart whole the Docker system (which auto-restarts the same container) every 4 hours. Tried 6 hours, but that pushed the ram/swap too high under load on my 16GB machine.
Including my crontab for anybody that needs to do the same:
Hi…My hunch is that it has something to do with Kubernetes. I think it was all fine, until I empowered Kubernetes and afterward incapacitated it. From that point forward, I’ve been encountering a wide range of memory issues and stoppages coming from Docker measure. Yet, could be a happenstance, obviously.
If you mean that you activated Kubernetes in the Docker desktop, then you can de-activate it from there and all things related to Kubernetes will be removed altogether.
No k8s running in my docker desktop. Also it appears they changed the handle for Applescript and I can’t quite figure out the new BundleIdentifier so trying something different. The restart crontab now is this:
2020-12-13T17:15:00.763Z INFO Got a signal from the OS: "terminated"
2020-12-13T17:15:33.636Z INFO Configuration loaded {"Location": "/app/config/config.yaml"}