Containers Deconstructed

Virtualization has taken the world by storm. Most of the world has now moved away from physical machines to virtual environments, be it in cloud or labs.

How does virtualization help?

Virtualization helps in utilizing your resources in a more efficient way. Most of the times, when working on solo physical machines, the compute power, memory etc. remains unused and hence wasted. In the virtualization world, we use the same physical infra to run multiple virtual machines (VM) in parallel. Each VM will have its own guest OS and set of apps running on it, providing an isolated environment for users on the same machine. So the benefits are multifold:

1. Efficient compute, memory, network etc. usage

2. Isolated environments

“You’ll never reach perfection because there’s always room for improvement. Yet get along the way to perfection, you’ll learn to get better.” ― Hlovate, Versus.

Life was good with VMs until the day it dawned to mankind that we can have a lighter weight solution to the problem….CONTAINERS.

Lets dig a little deeper and explore what and how of containers.

Container vs VM : What’s the difference

The above figure elucidates the major difference between VM and containers:

VMs focus on virtualizing hardware. Multiple OSs running on top of the same hardware.

Containers focuses on virtualizing OS. Multiple workloads running on top of a single OS.

How are containers implemented?

I started with the Linux source code, searching for LXC (Linux containers). But to my awe, I found nothing.

First learning: CONTAINERS are not Linux native constructs.

Little more research (synonymous with googling) and ah here it comes:

Namespace
Control groups

Namespaces

The first thing to acheive when looking for virtualization would be ‘isolation’.

A process running in my virtualized environment should be separated from rest of the processes.

Linux namespaces can be aptly used for this purpose. It defines the boundaries of a process’ “awareness” of what else is running around it.

Each process running on your Linux machine is enumerated with a process ID (PID). Each PID is assigned a namespace. PIDs in the same namespace can have access to one another because they are programmed to operate within a given namespace.

A Namespace experiment

Create a new PID namespace
Run bash in the new namespace
Observe how many processes are running in the new namespace

Major takeaway from this experiment:

Processes in one namespace don’t have access outside that namespace. PIDs in different namespaces are unable to interact with one another by default because they are running in a different context, or namespace.

Containers utilize this for enabling isolation. Process’ running in a “container” under one namespace cannot access information outside its container or information running inside a different container.

Control Groups

Lets you limit and meter physical resources per process.

For the ones who are interested in the nitty gritty of cgroup, refer to the kernel documentation here.

cgroups are hierarchical in structure. Unlike the Linux process hierarchy tree, which is always rooted at the ‘init’ process, the cgroups can have multiple independent hierarchy trees. In fact each of the subsystems has a tree of it own. So cpu will have its own tree, as so for memory and so on. Shown below are the different subsystems. You can find more details about each subsystem here.

Interactions with cgroup is usually done through the cgroup virtual filesystem /sys/fs/cgroup.

A cgroup experiment

Let’s create a ‘hello-world’ C program, run it and observe the CPU cores on which it runs.

Following is the program that we are going to run.

#include <stdio.h>
int main(){ //Infinite while loop while(1) printf("Hello\n"); }

Observe the PSR value in the output, which shows the current CPU core.

Next we will try to restrict the program to CPU core 3 using cgroups.

we first create a new cgroup in subsystem cpuset named ‘testGrp’
We assign relevant parameters to the new cgroup namely cpuset.cpus=<core number>
assign the hello-world program to that cgroup and execute

We observe the ‘testGrp’ getting created in sysfs directory of the cgroup and the PSR being steady at 3.

Similarly, various other types of constraints can be applied onto the processes as per the subsystem settings.

Major takeaways from this experiment:

cgroups hierarchy is depicted by the various directories in the sysfs file system
A process can be assigned to a cgroup by writing its PID to the ‘tasks’ file in the directory. For eg. in our case it was /sys/fs/cgroup/cpuset/testGrp/tasks and the PID value was written to it by cgexec.

Namespaces, cgroups are the major constructs used for containers. Obviously these are not the only things used. Other Linux native constructs such as seccomp-bpf etc. for advanced features of containers.

Interesting fact

When the system boots up it has a PID namespace in which all the processes are rooted at PID 1 (init).

The cgroup hierarchy has a single node and the processes are part of this cgroup.

So it turns out that even though there are no explicit containers defined, the system is running in a container, albeit with no restrictions on the system resources.

Containerization platforms like Docker etc. use these same principles to enable the users, implement containerization of software, with ease.

Containers Deconstructed

Container vs VM : What’s the difference