Score:1

How to diagnose a quietly dying Java server application?

cn flag

We run a cluster of Java Spring server apps on AWS EC2 instances running Centos 7. We have health monitors on them, and occasionally an alarm will go off and we'll find that the Java process has quietly just disappeared. We can find nothing in any of the logs...either our own, or system logs. We have an outer "catch Throwable" around our own code that logs what it catches, but we run Tomcat, which has may of its own threads. We've added extra logging to try to capture the moment when it disappears, but so far, that has yielded no information.

I've looked over this question: How to find out why a Java process died without a trace in Linux. I see nothing helpful there.

We currently can't involve the launcher of these processes in a solution. It's a long story. Trust me that we've tried to go down that road.

Any suggestions? I'm wondering if maybe I should wrap the Java process in an outer parent process that carefully monitors and logs all signals from the Java child process. I'm wondering if there's such an off-the-shelf solution that I haven't found yet. Any ideas would be greatly appreciated.

Michael Hampton avatar
cz flag
How exactly are you starting these apps?
CryptoFool avatar
cn flag
We're using Chef Habitat, but we're in the middle of switching to something else and we don't want to touch its setup. It was so bad at process management even though it proposes to do so that we disabled all of its process management features. I don't want to go there. I can stop the official running server and then run my own version manually or via another process manager if necessary. I don't know if such a setup would exhibit the same problem. If not, we'd at least then be more suspicious of Habitat itself.
CryptoFool avatar
cn flag
I have considered looking into what systemd can do for me. At first glance, that seemed complicated and not necessarily helpful. I know there are other process managers out there. I'm hoping to find one meant for debugging and/or troubleshooting situations like mine. I'm a programmer, not a sys admin, so I'm pretty virgin to all of this.
Michael Hampton avatar
cz flag
https://docs.spring.io/spring-boot/docs/current/reference/html/deployment.html#deployment.installing.nix-services.system-d
CryptoFool avatar
cn flag
@MichaelHampton - thanks, but my question isn't how to install a service under systemd. I know how to do that. The question is if by letting systemd manage the lifetime of my app/service, it can give me some sort of indication of why my app died that I'm not going to get from existing sources. If it can do this, what sort of configuration do I need to perform to get the most possible information out of systemd when my app disappears?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.