Container design for Kubernetes

The weakest link

There is a saying that “a chain is as strong as its weakest link”. One of the proverbial “links” in a Kubernetes deployed application is the container inside which the application is running. But how does the container impact the way the application is deployed and the way it operates on a Kubernetes cluster you ask? This article is about exactly that.

The Dockerfile

The Dockerfile is the “design plan” of the container. A sample Dockerfile may look like this:

FROM ubuntu

copy main /
RUN chmod +x /main

CMD ["/main"]


FROM ubuntu

This means that the base image of the container will be ubuntu, meaning, the ubuntu container image will be downloaded when building the container, and everything else will be appended to it.


COPY app /

The binary app will be copied into an additional container layer and appended to the ubuntu layer.


RUN chmod +x /app

After copying the binary app to the container, the execute permission will be applied to it so it may be run inside the container.


CMD [“/app”]

This specifies the instructions which will be executed once running the container.


The above Dockerfile is acting as a stable source of truth plan which will be executed every time the container is built resulting in a stable and reliable container design wise with the only variable being the binary (/app) which changes throughout the different versions of the application.


Building a container from a Dockerfile is quite simple, running the following command from inside the folder in which the Dockerfile file resides will create the container:


docker build -t myrepository/examplesdirectory/mycontainer:version-2

Base Image

One of the main discussions regarding building containers is about which image to use as a base image.

The base image determines what will be the filesystem structure inside the container inside which the application will run nothing more and nothing less. Since the processes inside the containers are running using the kernel of the node on which it is running, the only thing being used by processes running inside a container is the file system structure and files.


The file system structure inside the container may be used for:

  • configuration
    For example the file /etc/resolv.conf can be use to define a DNS search suffix for the application.
  • Load Modules
    For example an application which requires the loading  a kernel module (such as libpcap)
  • Serve static content
    For example HTML files stored inside the container (caveat: Never write data inside a container).


As a rule of thumb, the less dependency the application running inside the container has, the more stable throughout the version progression of the application.

Security considerations

A container built upon a Base image will have at the very least everything included in that base image. For example, if the base image has a vulnerable library it will exist also in any container built based on that base image (unless specifically deleted in one of the layers).


The above statement makes base image choosing a very important task also security wise as choosing a base image which contains vulnerable binaries or libraries means that those vulnerabilities will exist in the container running your application.


Another very important thing to remember is that even non-vulnerable binaries existing on the base image may be used by an attacker to attack other containers inside your cluster (such as netcat – nc, curl, dig etc.).


Hardening a container is a mandatory stage even after choosing the most secure base image at the time of building the container to minimize future vulnerability exposure and reduce the chances of an attacker being able to attack other containers inside the cluster.


The Pod – container relationship

On Kubernetes, containers are run “inside” Pods. the reason there are quotes around inside is because it is not exactly inside. The division is done by kernel cgroups which is a bit too deep of a topic for this article, but as an architectural consideration, we can treat containers as running inside Pods. The smallest entity Kubernetes deals with is a Pod and it is the only mortal entity in Kubernetes.


Pods have container(s) running inside them while the networking of a Pod is shared by all of its containers. The IP of a Pod will be used to access all of its containers remotely and that also means that when exposing a port inside a container is actually exposing that port on the Pod’s IP address.


That means that if two containers running in the same Pod will expose the same port – for example, if both containers will try to bind to port 8088, the first container to bind the port will succeed and the second will fail with an error message stating that it cannot bind the port and one container in the Pod will not be running.

Single process inside a container

Kubernetes queries the container runtime daemon periodically using the PLEG (Pod Lifecycle Event Generator) mechanism. This mechanism tells the kubelet process running on every Kubernetes node what is the state of all the containers running on it. 

If a container dies, the Kubernetes scheduler needs to know about it, and react accordingly – for example changing the Pod to a not Ready state while the container restarts.


How does the above relate to a single process inside a container?

Imagine a container which starts a process1 that runs another process2 which is actually the application.


The container will keep running and will not stop as long as process1 is still running, BUT  what happens if process2 crashes rendering the application running inside the container useless?

PLEG will poll the container state (default is every 10 seconds) and will not have any indication that the application is, in fact, useless. The container will not restart and the Pod will remain in Running state even though the application is not functioning.


Having one process as the main process running inside a container is the best practice to run containers on Kubernetes. This does not mean that child processes should not be used but merely make sure that if the application functionality is compromised , the main process will exit and allow the container to be restarted and notify Kubernetes (via PLEG) about the change in state.


Scratch containers

Scratch containers mean exactly that, the base image is set to SCRATCH meaning, there is no file system in the container to begin with. A Dockerfile using a scratch container might look like this:

FROM scratch

copy main /
RUN chmod +x /main

CMD ["/main"]


As containers do really need the file system to run and they are actually run by the node’s kernel, having no file system is much more secure and much more stable if it can be achieved in a container.


An application inside a container which is not dependent on external files or configurations will not be affected by any changes in base image.

For example, using the ubuntu 20.04 base image for a container might be the choice you went with for your container, but what will be the effect of upgrading the base image to ubuntu 22.04? Or if you are forced to change the base image due to regulatory or company policy or even for a specific customer?


Security wise, scratch containers are as secure as you can be container wise, if you have no filesystem, you have no binaries or libraries inside the container.


Using scratch container does not mean you cannot mount PVCs, Secrets and even configmaps onto a container – it just mean that by default , there is no filesystem and the only thing inside the container will be your binary.


The caveat of using scratch container is the same as the advantage, there is no filesystem inside the container, hence there are no binaries – not even a basic shell let alone tools such as curl, dig and others so take that into account into your debugging process.