I don’t know why my pod is crashing – Investigating pod crashes

Working with Openshift in a daily basis, I come with several situations where the pod crashes. Given my background on java, I will talk about java here.

Let me list a few situations and the next steps

pods crash with OOME	The java process uses more heap than it was suppose – it would generate a heap on OOME exception – but it might exit given: ExitOnOutOfMemoryError
pods crash with OOME killer	Verify OCP node dmesg and verify if there is OOME Killer messages

How to know why my pod is crashing?

Let’s pretend you don’t know why the java pod (java pod here == pod with one container that is java). The first would be to see if the pod is OOME (in the JVM) or suffering from OOME-killer.

OOME will be handled by the JVM itself however, because the containers usually have ExitOnOOME so then the container will exit, which will prompt the orchestrator to respawn new pods given a certain timeout period.

For OOME Killers, this is an external agent (OCP node, or the cgroups) acting out and affecting the container to finish it up given a certain condition. Like lack of resources if the OCP (kubelet) needs to spawn a certain pod but doesn’t have resources, so it might just terminate the QoS best efforts ones over spawning Guaranteed pods.

Or that can be a native allocation breaching the cgroups limitations and causing the container to exit, by being killed.

	gabrielmariachi on South Africa Conference…
	franciscomelojr on Side note: JP Morgan about…
	gabrielmariachi on Side note: JP Morgan about…
	Undertow [pt2] \| Fra… on Undertow
	GC – GC and GC… on Garbagecat

Overhead DeMelo

Make it work, make it right, then make it faster! Not the other way around.

I don’t know why my pod is crashing – Investigating pod crashes

How to know why my pod is crashing?

Leave a comment Cancel reply

How to know why my pod is crashing?

Share this:

Related

Leave a comment Cancel reply