I have a third party application that our pipeline runs on a a build server, within a dedicated Docker container, spun up fresh each time, just for this job. The application is very memory & CPU intensive, spawns many processes, and runs for ~20 minutes. When run once-at-a-time (non-concurrently) it runs successfully to completion, every time.
The problem occurs when the pipeline tries to run two or more instances of this application container concurrently on the same server. At certain places in the execution, somewhat randomly selected, but seemingly from a consistent and finite set of places, one or more of these application processes will crash with a segmentation fault, usually with an error referring to an invalid free or bad remalloc.
I've tried numerous techniques to try and debug the segfault but there's no debugging info provided with the application, the logs are insufficient, and there's something preventing strace
from running (it just crashes at the start when run with strace
).
I've been monitoring the server's memory consumption and it is not running out of memory.
Although there are other possibilities, at this time I hypothesize that the application is using shared memory within a dynamic library, and because the Docker images use the same layers, each process in any container is implicitly sharing this memory with other containers due to how Docker/Linux efficiently handle shared libraries that are resident on the same inode: https://stackoverflow.com/a/40096194
I'm looking for a way to isolate these Docker containers so that they do not implicitly share memory between them in this way.
Note that if I run the application directly on the server, without Docker, I see the same behaviour - multiple concurrent instances result in one or more segfaults eventually. The application is clearly not designed to be run concurrently on the same host, which is why I'd like to isolate within Docker, if possible.
I understand that if I make an actual copy of the Docker image, then this will result in shared objects being from different inodes in each container, which might be enough to help prove or disprove the hypothesis, but I don't know how to copy or flatten a Docker image so that it's using entirely new files within.
EDIT: For completeness, I should mention that I can run the same application concurrently on two different hosts, no problems. They only crash when run on the same host, inside or outside of a Docker container.