I don’t know why my pod is crashing – Investigating pod crashes

All

Working with Openshift in a daily basis, I come with several situations where the pod crashes. Given my background on java, I will talk about java here.

Let me list a few situations and the next steps

pods crash with OOMEThe java process uses more heap than it was suppose – it would generate a heap on OOME exception – but it might exit given: ExitOnOutOfMemoryError
pods crash with OOME killerVerify OCP node dmesg and verify if there is OOME Killer messages

How to know why my pod is crashing?

Let’s pretend you don’t know why the java pod (java pod here == pod with one container that is java). The first would be to see if the pod is OOME (in the JVM) or suffering from OOME-killer.

OOME will be handled by the JVM itself however, because the containers usually have ExitOnOOME so then the container will exit, which will prompt the orchestrator to respawn new pods given a certain timeout period.

For OOME Killers, this is an external agent (OCP node, or the cgroups) acting out and affecting the container to finish it up given a certain condition. Like lack of resources if the OCP (kubelet) needs to spawn a certain pod but doesn’t have resources, so it might just terminate the QoS best efforts ones over spawning Guaranteed pods.

Or that can be a native allocation breaching the cgroups limitations and causing the container to exit, by being killed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s