The Importance of Node Rotations, Image Tags, and a Good Piece of Cake

Imagine being a picky eater, and one day, you discover a restaurant that serves the perfect cake—so perfect it feels like it was made just for you. The cake is a perfect symphony of flavors; its sweetness and richness are in perfect balance, and the layers are crafted beautifully, making it a true masterpiece.

Until one day…

You order that same beloved cake, with the same name and description. But as you take that first anticipated bite, something feels off. The flavor is different; they changed the ingredients! It even looks exactly the same, but that small tweak in the recipe ruins everything.
This is what happened to me one day in my cluster.

I had been relying on the “latest” tag for some of my Docker images for months without noticing. Things were working smoothly—until one day, a problem in the cluster forced me to rotate one of the nodes. After the node was reinitialized and my apps re-deployed, I suddenly started receiving a wave of new bugs in one of my applications, some compatibility issues, leading to a lot of time wasted trying to understand why it happened.

Here’s what happened:

When a node is rotated, the Kubelet—the agent on each node responsible for managing containers, among other things—needs to pull the Docker images for all the pods scheduled to run on that node. If the images are already present on the node, the Kubelet simply uses the existing ones (based on the policy).
However, when the node is fresh or has been reset, the Kubelet has to re-pull the images from the container registry.

Understanding node rotations

Node rotation is a critical part of maintaining a healthy and resilient Kubernetes cluster. But what exactly does it involve?

A Kubernetes cluster is composed of multiple nodes, each acting as a worker machine that runs your application workloads in the form of containers. Over time, these nodes might need to be replaced or upgraded for various reasons—such as applying security patches, updating the underlying OS, or simply replacing an aging server. This process is what we refer to as node rotation.

When you rotate a node, it essentially means you’re removing an old node from the cluster and adding a new one. This new node will take over the workloads that were previously running on the old node. While this sounds straightforward, the implications for your applications can be significant.

The Kubelet’s role during node rotation:

Each node in a Kubernetes cluster runs an agent called the Kubelet. The Kubelet is responsible for ensuring that the containers defined in your pods are running as expected. When a node is rotated, the Kubelet on the new node takes over the responsibility of pulling the necessary Docker images and running the containers.

Here’s where things can get tricky: If your deployment is using the “latest” tag for your Docker images, the Kubelet will pull the most recent version of that image from the container registry. But what happens if that “latest” version has changed since the last time it was pulled? You might end up with a different version of your application running on the new node, leading to inconsistencies across your cluster.

In my case, that’s exactly what happened. The new node pulled an updated “latest” image, which had changes I wasn’t aware of. As a result, my application started exhibiting unexpected behavior—just like that cake that suddenly didn’t taste the way it used to.

Why node rotations matter

Nodes rotation isn’t just routine maintenance tasks; they are vital for the overall health and stability of your cluster. Regularly rotating nodes help ensure that your infrastructure stays up-to-date with the latest security patches, software updates, and hardware improvements. It also helps prevent issues like resource exhaustion or degradation over time.

However, node rotations also come with risks. If your deployment strategy isn’t carefully planned, a node rotation can inadvertently introduce new bugs or destabilize your applications. This is particularly true when it comes to image tags. Using “latest” might seem convenient, but it can lead to the kind of surprises you don’t want in a production environment.

Best practices for node rotations and image tags

To avoid these pitfalls, here are some best practices for handling node rotations:

Use Specific Image Tags: The most important best practice.
Avoid using the “latest” tag. Instead, use specific tags that correspond to tested and stable versions of your application. This ensures that even when a node is rotated, the Kubelet pulls the exact version of the image that you expect.
Implement Image Pull Policies: Use Kubernetes image pull policies to control when images are pulled. For example, you can set the policy to IfNotPresent to use the locally cached image if available or Always to pull the image every time a pod starts. Choose a policy that aligns with your deployment strategy.
Test Before Deployment: That is obvious, but always test your application with the specific image version you intend to deploy. This allows you to catch any issues before they reach production.
Automate Node Rotations: Use automation tools to manage node rotations in a controlled manner. This can help minimize downtime and ensure that rotations are performed consistently across your cluster.
Monitor and Rollback: Keep a close eye on your cluster during and after node rotations. If something goes wrong, have a rollback plan to quickly revert to a known good state.
Integrate CI/CD Practices: Incorporate Continuous Integration and Continuous Deployment (CI/CD) pipelines into your workflow to automate your applications’ building, testing, and deployment. This helps in:

Automated Image Tagging: When dealing with internal customized images, CI/CD pipelines can help assigning a unique version numbers or commit hashes as image tags during the build process, ensuring consistency and traceability.
Consistent Deployments: Automated pipelines reduce the risk of human error, ensuring that the correct versions of your applications are deployed every time.
Streamlined Testing: Integrate automated testing into your CI/CD pipeline to catch issues early, preventing them from reaching production.
Efficient Rollbacks: CI/CD tools can facilitate quick rollbacks in case of deployment issues, which is crucial during node rotations.

Conclusion:

Just like with your favorite cake, consistency is key. Node rotations are essential for maintaining a robust and secure Kubernetes cluster, but they require careful management. By using specific image tags and following best practices, you can ensure that your applications remain stable and reliable, no matter how many times you rotate your nodes.

So, next time you’re about to deploy that “latest” image, think of that perfect cake and remember: a little extra care with your tags can save you from a world of hurt.

The Importance of Node Rotations, Image Tags, and a Good Piece of Cake

Here’s what happened:

Understanding node rotations

The Kubelet’s role during node rotation:

Why node rotations matter

Best practices for node rotations and image tags

Conclusion:

Next Readings

Atlassian Forge: The Platform DevOps Engineers Should Know About (And Why I Built an App on It)

What Starts as a Simple Jenkinsfile Can Become a Scaling Problem

Mismatched Requests and Limits Are Lying to Your Scheduler – And It’s Costing You