Docker eats all memory over time on mac

So I hope this doesn’t devolve into a finger pointing match between docker on mac and storj, but over the last couple weeks my computer has been locking up. I’ve been running a storagenode for over a year in this configuration with uptimes for a running container in the weeks range – now I can’t get thru a day.

In trying to troubleshoot this I saw that the memory used by the mac’s mount handler com.docker.osxfs grows quickly over time and once the computer runs out of swap space it kinda freezes up.

I tried downgrading docker from current (2.3.0.3) to the previous minor line (2.2.0.5), but didn’t help.

I don’t really want to revert backwards on the docker version of storj/storagenode as I’m sure that hasn’t been tested and would likely hose my configuration. Running 1.6.4

I was wondering if anybody else is seeing this issue running on a mac? Also, don’t tell me run docker in linux in a vm please. The current workaround is to restart docker twice a day to keep things running.

Here are some numbers to back things up with a timeline and some data points on memory use. I also notices that the overlay VM is growing slowly using df in the container, but not sure that is related since a restart shouldn’t change the underlying filesystem contents of the same container instance.

7/22/20 4:22PM CT
- restarted docker for mac
- com.docker.osxfs: 4.2 MB

7/22/20 4:24PM CT
- started storjnode docker image 1.6.4
- storj dashboard and docker logs -f also running
- com.docker.osxfs: 51.8 MB

7/22/20 7:33pm CT (3h 10m)
- com.docker.osxfs: 3.89 GB

7/22/20 8:40pm CT (4h 17m)
- com.docker.osxfs: 6.10 GB

8:47pm CT, checking internal disk already 1.1G used
$ docker exec -ti storagenode df /
Filesystem           1K-blocks      Used Available Use% Mounted on
overlay               61255492   1174936  56939232   2% /

8:54pm CT
- stopped dashboard and docker logs -f
- com.docker.osxfs: 6.61 GB

8:56pm CT
- com.docker.osxfs: 6.66 GB
$ docker exec -ti storagenode df /
Filesystem           1K-blocks      Used Available Use% Mounted on
overlay               61255492   1175020  56939148   2% /

8:57pm CT
- com.docker.osxfs: 6.72 GB
- restart docker

8:57pm CT
- com.docker.osxfs: 6.39 MB
- continues to grow
- started storj-dashboard


install docker 2.2.0.5

9:08pm CT
- start storjnode, dashboard, and tail logs
- com.docker.osxfs: 51.7 MB

9:24pm CT (16m)
- com.docker.osxfs: 493.0 MB


9:42AM CT (12h 34m)
- com.docker.osxfs: 10.03 GB (11.36GB swap on computer used)
- reinstalling latest (2.3.0.3)

9:44am CT (45s)
- com.docker.osxfs: 71.5 MB (1.94GB swap on computer used)

12:07pm (2h 23m)
- com.docker.osxfs: 1.03GB (1.50GB swap on computer used)

Perhaps the instability of Docker desktop on Windows on versions newer than 2.1.0.5 is include Mac?

However, when we saw the high memory consumption it was usually either network attached storage or corrupted databases.
Could you please check your databases?

That is why I tried rolling docker back to the 2.2.X line before posting this.

I did the DB check even though the logs do not show the “database disk image is malformed” message:

$ sqlite3 /Volumes/Storj/StorjV3/storage/bandwidth.db "PRAGMA integrity_check;"
ok

Like I said, its been running 6TB of used storage for almost a year with uptimes in the weeks – and the only change (other than the minor docker stuff) has been the version of storagenode.

So it really feels like a recent change in the code may be the culprit.

It is possible to revert the running version to something earlier that 1.6.4 w/o corrupting the underlying storage/node so I can see if that helps?

Try to downgrade to 2.1.0.5

Please, do it for all databases, not only bandwidth.db

yes

without corrupting - no
So, better do not do it

Running tests with 2.1.0.3 now.

Regarding other databases, the only other *.db file is revocations.db which sqlite3 says “not a database”

$ sqlite3 /Volumes/Storj/StorjV3/revocations.db "PRAGMA integrity_check;"
Error: file is not a database
$ ls -l /Volumes/Storj/StorjV3/revocations.db
-rw-------  1 shoffman  admin  32768 Jul 24 14:21 revocations.db

Are there other files I’m just not finding?

The version 2.1.0.5 is recommended.
The only databases in the storage location.

Sorry I mistyped that 2.1.0.5 is what I’m using. Memory still seems to be growing pretty quick:

docker 2.1.0.5
- com.docker.osxfs: 4.1 MB

starting 1.6.4
7/24/20 2:22pm (1m)
- com.docker.osxfs: 57.5 MB

7/24/20 2:31pm (10m)
- com.docker.osxfs: 397.8 MB

7/24/20 2:51pm (30m)
- com.docker.osxfs: 902.8 MB

Will shutdown and try the other DBs…

All databases check out OK. Some are rather large and I suspect piecestore.db isn’t used anymore based on the timestamp:

$ ls -lh *.db
-rw-r--r--  1 shoffman  admin    16M Jul 24 14:53 bandwidth.db
-rw-r--r--  1 shoffman  admin    40K Jul 24 14:53 heldamount.db
-rw-r--r--  1 shoffman  admin    16K Jul 24 14:53 info.db
-rw-r--r--  1 shoffman  admin    32K Jul 24 14:53 notifications.db
-rw-r--r--  1 shoffman  admin   615M Jul 24 14:53 orders.db
-rw-r--r--  1 shoffman  admin    68K Jul 24 14:53 piece_expiration.db
-rw-r--r--  1 shoffman  admin    24K Jul 24 14:53 piece_spaced_used.db
-rw-r--r--  1 shoffman  admin   110M Jul 24 14:53 pieceinfo.db
-rw-r--r--  1 shoffman  admin    11M Apr 24  2019 piecestore.db
-rw-r--r--  1 shoffman  admin    24K Jul 24 14:53 pricing.db
-rw-r--r--  1 shoffman  admin    24K Jul 24 14:53 reputation.db
-rw-r--r--  1 shoffman  admin    32K Jul 24 14:53 satellites.db
-rw-r--r--  1 shoffman  admin   300K Jul 24 14:53 storage_usage.db
-rw-r--r--  1 shoffman  admin    80M Jul 24 14:53 used_serial.db
$ cat foo
sqlite3 /Volumes/Storj/StorjV3/storage/bandwidth.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/heldamount.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/info.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/notifications.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/orders.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/piece_expiration.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/piece_spaced_used.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/pieceinfo.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/piecestore.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/pricing.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/reputation.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/satellites.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/storage_usage.db "PRAGMA integrity_check;"
sqlite3 /Volumes/Storj/StorjV3/storage/used_serial.db "PRAGMA integrity_check;"
$ sh ./foo
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok

Any other ideas @Alexey? Right now I’m just restarting docker every few hours and :crossed_fingers:

Are you able to see com.docker.osxfs memory grow over time?

I do not have a Mac unfortunately. Perhaps it’s specific for Docker desktop on Mac.
If you run the docker stats I believe you will clearly see that there is no RAM usage by the storagenode.

agreed, I don’t think it is the application specifically, but an interaction with the shared filesystem process. Is this something we can kick up to engineering and/or the testing team to try and reproduce?

Something must have changed that is making this bad all of a sudden after a year of no issues.

I believe it’s better to ask on Docker’s forums. Perhaps they already have a bug on their backlog and can fix it eventually.

I would like to suggest you to check your disk for errors for sure.

Already checked the disks. Checked open bugs on docker-for-mac, but didn’t see anything related recently. Since I didn’t write the storagenode code, I’m hardly in a position to explain what the app is doing which might be causing this issue.

Since we’ve rolled docker back 2 versions and both have the problem – again after working fine for a year – it seems more like an issue with the storagenode software and its use of the docker file mounts. Can we at least ask somebody in the org to test current/older versions and can easily create tokens – who doesn’t risk blowing up their production node?

Has anybody else in the community running on Mac seen this process memory grow over time – you might not see it easily as it seems to quietly consume swap space. I can’t be the ONLY mac storagenode operator?

Can you try to create a new identity and start the second small node?
You can reduce the monitoring check threshold to zero to allow to start it with a reasonable small storage allocated. Will it have the same behavior or not?

I have no tickets on the subject so far

I’ll try and set up a test system later.

I did find this issue which could be related - I’ve asked if they see this process’s memory grow over time.

Using a new identity/node, it still grows but at a slower pace.
There are times I see it drop slightly, but the overall trend is upward (time -> size):

0m osxfs - 17.5MB
13m -> 84.1 MB
32m -> 79.1 MB
4h 54m -> 167.5 MB
6h 26m -> 195.0 MB
16h 33m -> 365.2 MB
18h 42m -> 420 MB

total storage on this new node is now 7.6 GB – my regular node is 6TB+

So there does seem to be come evidence that the more data managed, the faster this grows.

Quick update for anybody who finds this in the future. So no progress on this. I still feel like there is a problem with the docker volume mounts on macs with the heavy use of sqllite DBs used by storj using the existing sharing mechanism com.docker.osxfs on nodes storing lots of data (6TB in my case).

I did try a beta build with an alternate implementation linked here. It didn’t have the same memory issue, but eventually crashed the computer anyway – no idea if it was just leaking in something less visible – ¯\_(ツ)_/¯

The workaround for now (short of just exiting the node and leaving the storj storagenode family) was to install a crontab to restart whole the Docker system (which auto-restarts the same container) every 4 hours. Tried 6 hours, but that pushed the ram/swap too high under load on my 16GB machine.

Including my crontab for anybody that needs to do the same:

0 0,4,8,12,16,20 * * * /usr/bin/osascript -e 'quit app "Docker"' && /usr/bin/open -a Docker

Not ideal, but it keeps my node running – for now.

1 Like