Back on DG/EAP 7 we used to use template for Source to image deployments at some point.
In fact I liked template deployment of course you need to import the templates and such, but I don’t think it is bad from a functional perspective. And the Operator is so simple, it gives much more time for the user to focus on what matters.
I think helm charts will streamline this process, instead of templates. It is pretty interesting and very flexible in some ways.
## to install
helm install <release_name> chart <flags> --> flags can be in the middle as well
## to upgrade
helm upgrade <release_name> chart <flags> --> flags can be in the middle as well
## to uninstall
helm uninstall <release_name>
It is interesting to handle the pod’s yaml directly on the statefulset/route instead of the custom resources, which is the default by the operator. It is a sense of a flexibility and responsibility as well, given there is much more room for the user to screw up.
I love Avatar, both of them, great movie. Made by a Canadian. Astonishing images. Back in year ago when I was released, but still have impressive images. Pandora is amazing.
However, the second has one main problem: it is 100% plot driven – not character driven. That’s not a spoiler, but be aware that the characters won’t make hard choices and set up a path for adventures. It is pretty much the opposite, the plot happens despite the characters’ choice.
Basically there is only one character that drive the plot: the whale and maybe the “bad guys” as well and one/two difficult choices. No big dilemma or little components that make the character choices critical ending.
The movie is not bad though, but probably on the Avatar series it won’t be the top 1st best.
At some point I gave up the plot and only was seeing the movie, like I used to watch Windows media player would create just for the music.
I mean arguably, one can image several scenarios and how each character should behave. And the end result may not even be better than the final result. But changing some part of the script, not 100% the fundamental here, the result would be 2x better and the characters would be much more relatable. Because we relate with characters that make choices, even bad ones.
But the explanation is likely because there will be several movies, so this one is just setting up some stones so then the next movies there will be some dilemmas and hard choices.
Working with Openshift in a daily basis, I come with several situations where the pod crashes. Given my background on java, I will talk about java here.
Let me list a few situations and the next steps
pods crash with OOME
The java process uses more heap than it was suppose – it would generate a heap on OOME exception – but it might exit given: ExitOnOutOfMemoryError
pods crash with OOME killer
Verify OCP node dmesg and verify if there is OOME Killer messages
How to know why my pod is crashing?
Let’s pretend you don’t know why the java pod (java pod here == pod with one container that is java). The first would be to see if the pod is OOME (in the JVM) or suffering from OOME-killer.
OOME will be handled by the JVM itself however, because the containers usually have ExitOnOOME so then the container will exit, which will prompt the orchestrator to respawn new pods given a certain timeout period.
For OOME Killers, this is an external agent (OCP node, or the cgroups) acting out and affecting the container to finish it up given a certain condition. Like lack of resources if the OCP (kubelet) needs to spawn a certain pod but doesn’t have resources, so it might just terminate the QoS best efforts ones over spawning Guaranteed pods.
Or that can be a native allocation breaching the cgroups limitations and causing the container to exit, by being killed.
Complementary to getting the jcmd VM.native_memory, the jcmd command VM.info, which I discussed a few times on this blog can be an awesome tool for investigating (native) leaks. This feature requires 8u222 or later if I’m not mistaken.
In fact, for containers, I would just get jcmd VM.info directly, which in fact has the jcmd VM.native_memory. So VM.info can, easily be used instead – Native memory info will be on the VM.info.
jcmd PID VM.info
VM.info will show detailed summary of the VM and the OS, including the native details and shared libraries – the native details will only show if Native Memory Tracking flags: -XX:NativeMemoryTracking=summary or XX:NativeMemoryTracking=details are there. Otherwise VM.info won’t display this section – but other sections will be there regardless.
I was reading the book Hybrid Cloud Apps with OCP again – that’s an awesome book by Michael Elder at al, and the only comment I would add is about networking/ DNS. The book could add more about networking and how it changed from OCP 3.x to OCP 4.x.
In some scenarios of lack resource of the kuberntes/ocp nodes, the pod Quality of Service, can play a role on the nodes eviction process and how they can terminate the pods, the types of quality set on the pods, either guaranteed, burstable, and best effort.
We do that on the deployment/deployment config – for usual deployments (no operator) involved.
In an Operator deployment for example, the Infinispan operator, setting the container specs in the Infinispan CR with cpu and memory with the same values will make it guaranteed.
And this setting can have a huge impact on the stability of critical nodes in case the OCP nodes decided to start killing ocp pods. The BestEffort will be the first on the list, followed by Burtable and Guaranteed pods will be the last one on the kill list.
Although a small setting this can have huge consequences for OOME Kill (or avoiding it).
Recently I’ve been working in some interesting OCP deployments with Service Mesh. I mean that is a very powerful and I’d say complicates subject – even for experts on the matter doesn’t seem trivial.
The context here is Istio, just to be clear. So I’m talking about the Cloud Native Computing Foundation project. Service Mesh is basically an extension of OCP where it provides customizable features. In this matter, Service mesh can add so much flexility and enables such a centralized control for microservices handling.
Features of Service mesh include load balancing, full/automatic authentication canary releases, access control and even end-to-end authentication (via Istio mtls). Everything in one place.
Objectively Service Mesh adds a transparent layer of transport – all without any application change. To do that Service Mesh captures/intercepting traffic between services that will act to modify, redirect, or create new requests to other services.
To do this interception/capture of requests, Service Mesh relies on the envoy sidecar – which is a container together with the other application in the same pod – a sidecar.
For deployments such as JBoss EAP/Widlfly this can be very interesting for be able to control communication and establish a level of network control more than what the services (eap-app-ping for clustering) already provide.
On the other hand, some architectures are coming up to use Istio without the sidecar, called sidecarless. One example is Ambient Mesh. So sidecarless implementations can be useful for environments where instrumenting the pod increases its complexity (deployment and instrumentation) and where it can be just simpler to not instrument the pod.
Get the namespace inspect for troubleshooting OCP issues.
One of the must useful tools in OCP, together with Must gather I think is the inspect.
The DevOps in OCP can be chaotic sometimes, with some many pods and operators and . But that’s exactly why the inspect can be a core component to debug the pods/services/deployments in OCP.
So to avoid this all going over your head, just get the inspect first. From there you can do a top down approach: so start with the deployment(config) and move to the services and pods, or bottom up – meaning the pods yaml/logs, and then move from to deployment config.
I mean, I’m trying to lead out the idea to get the inspect – via oc adm inspect ns/$namespace – so then this can be lead indicator for several issues: pod crashes, application issues, pod resource starvations. What happens if the application logs is ok, but the yaml shows the service’s label is wrong.
This avoids for example, only seeing the pod logs and forgetting about the resource allocations – in terms of cpus and memory allocation.
Doing a more global review: pod yaml, core configmaps, services, deployments, everything at once.
For deployments more and more complex, with several components – and sometimes, with Service Mesh – the istio side-car will be inside the pod and the user see the sidecar pods and access logs (set on the smcp – service mesh control plane). On this matter I will write some presentations on this regard.
I’m sure the above is not consensus, some people will opt out for getting the pod yamls or just pod logs first, and just then get the inspect. But for OCP problems I start with inspect, because I can see the complete deployment, see all pods on the namespace and do the work once. So almost by definition you will have an overview of the data, which is must better than an narrow view of only pod logs.
But, also because of its awesome performance for large heaps, we are talking about 50gb+ for examples. It is awesome very simple to understand, the performance doesn’t scale with larger heaps and so on.
However, I’ve explained this before, Shenandoah (in its non generational form) is not applicable for all situations and workloads and there will be workloads where its performance will be hurt more than helped by Shenandoah – given it is not generational. Being non-generational is a core part of the algorithm and helps considerably in several aspects but can hurt in other more specific aspects.
An example is when a high number of very short lived objects is created at random periods, which leads to all the threads kicking in and running at the same time and can lead to several subsequent full pauses in a roll. For those cases a generational collector, like G1GC and Parallel, would likely handle better the situation – by spliting the collection in phases. For those (generational) workloads Amazon (Correto) is developing its Generational Shenandoah.
In this aspect as well, I’ve seen some comments/discussions that Shenandoah will eventually surpass all and should replace G1GC/Parallel word-loads handles. Similar to how G1GC replaces CMS. That wouldn’t be the case, given some word-loads have a better performance with generational collectors. And in this aspect, Shenandoah is not necessarily a “improved” G1GC, so I won’t suggest all workloads to be replaced with Shenandoah necessarily.
Consequently, there needs to be a due diligence from the development team to verify how a non-generational collector is handling – in terms of latency, throughput, and less (but not least) footprint – which is most of the times sacrificed in several situations when developing in Java or (self/auto) collected garbage collection development.
But this can be generalize pretty much for anything in JVM/Java – no magic JVM flag will cut the latency in half (unless very specific cases for example where a certain collector is more adequate than another).
During the pandemic times I’ve watch the content of Peter on the Mentor Pilot’s Youtube channel considerably – the channel brings several aspects of Aviation – deep technical, procedural, behavioural analysis – within the aviation themes.
I can say Peter taught several important life lessons: beware confirmation bias, prejudice, verify the assumptions, learning always, and trust on the team.
One important thing I’ve learned with him was the usage of PIOSEE decision model – and how the Pilots use this model on critical situations. I show below:
It requires you to swiftly identify the problem at hand.
Gather information about the problem that is occurring.
With the gathered information about the problem, you and your team generate options to solve the problem
You need to select an option after efficiently evaluating the alternatives.
Options are worthless without swift and effective execution.
After execution, you and your team evaluate the process, noting places for improvement.
PIOSEE Model – PIOSEE is similar to FORDEC model – given the same number of stages .
This decision making model can be very useful in several situations and can be applied for ITtroubleshooting as well – from war room ( where actual systems operations mal functions results in system off-line) but also for upgrade and migration procedures.
First, defining the problem can be very useful and it is the first step to be understanding the problem. A well defined problem will be much better/faster troubleshooted. Sometimes the problem definition can be much harder than finding the actual solutions. Knowing the problem we will be able to know who are the resources (human and material/IT resources) that are needed to solve them.
Then collecting the right information, it can be an inspect from Openshift (oc adm inspect), or a server report (from Infinispan) or even a few heap/thread dumps/VM.info in java applications (deployed in kubernetes or not). Or even collecting custom resources, in case we need to see the API/resources created with some Operator (Service Mesh, or Data Grid Operator, MTA Operator) and so on. Knowing what aspects/data to collect for each situation will result in much faster troubleshooting phase.
Later, after the analysis of the data, provided that all the information is collected – which can be a top down approach (custom resource for example – in case a operator is used) up to the down/very low level – which can be kernel tracing data, kernel audit logs, or even heap dump specific interpretation. This goes on the analysis of the options we have: restart, reboot, upgrade, downgrade, remove certain JVM flags, add certain JVM flags, re-write the system.
The selection of the options, and its trade-offs, is the next step on the decision model – one needs to understand the data, interpret it, and then select the option – I think it is very important to considering two aspects on this stage: trade-off and time for implementing. If the options have many trade-offs, other options should be considered. Once the selection options are all listed – the selection should be done at once.
Later the execution of the option should be done thoroughly – with the right resources following the procedures (with or without the checklists) , but of course better if those can be tested – but sometimes the procedure issui generis therefore that’s the first time this is happening and might not have been tested/prepared before.
Finally, the evaluation of the system after the procedure will do – this includes visual references, in java particularly – jcmd threads/heap, will provide enough data. If not enough data/references will provide clues to how the system is performing well or not. If required more information, more data can be feed from the system and this process can be iterative until the (initial) problem is 100% solved.
I think trying to establish several procedures, methods, and preparations for critical situations can help considerably for this. In this matter, the QA/QE of a system/java application can avoid problems – and it is very useful if not essential before deploying in production. However, how/what procedures/how long they should take can put the system back much faster.