Glossary on Linux container technology, runtimes and orchestrators

Microservices

Lots of „slim“ and “autonomous” Processes, scalable seperately and, in the best case, replicable.

Pro: scalability and resilience
Con: complexity and traceability/transparency (Zipkin → Tracetool)

Container / Vms

Container technologies (here: Docker, rkt, etc.) use Linux Namespaces to enable runtime isolation for processes on the underlying OS (read: underlying Kernel, LXC, runc, etc.).
Example Namespaces are:
– mnt
– pid
– network
– user
– etc. (uts, ipc)

a little bit like chroot (->mnt) but much more granular.
different from (e.g.) Vms → Hypervisor on Ring X + dedicated OS, sep. Kernel) or Jboss as Containers (Runtime environments are located somewhere in between on the OS, Isolation through standard OS utils).

Pro „Container“: saves resources, replicable (e.g. Images w. Pipeline)
Pro Vms: security

Namespaces + cgroups

Linux Namespaces:
“quasi pseudo virtual machines” (overlay for Filesystem (mnt), Network, Pid, etc.).

build them yourself:
“unshare –fork –pid –mount-proc bash”
(forks a new PID Namespace through the syscall unshare and starts a bash)

there you go:
PID 1 in a new space (and a second to see the first one with ps).
“Ps aux” → note pts and call “ps aux” from outside the new namespace to see it’s “real” PID.

nsenter:
calls a programm in the respective namespace (from the outside, read: the OS, just lookup the PID and go ahead).
Example:
“sudo nsenter -t 13582 –pid htop, -t = “PID of the „namespace Bash“ –pid = “type of space”, Programm”

Cgroups:
they limit and/or prioritize the resources of the respective Space.
(similar to nice or quota).
Example:
“cgcreate -a nkalle -g memory:memgrp”
(create cgroup, -a Allowed User, -g Controller(in this case: Mem):Path(specify your own)

Seccomp-bpf:
Use it to limit syscalls in your namespace (read, write, kill, etc.).
In docker it is implemented via “–security-opt” and in kvm through “–sandbox“

Container Runtime

There exist several types of “container runtime”.
In Principle, everything that runs inside a container, can be called “Container Runtime”.

But:
There exist some standards though, created by the OCI (open container initiative).
For example: the runc Library (reference Implementation of OCI runtime specification).

Dockers contribution to containerization is “only” the ease of running containers through more or less standardization, one could say.
Anyway:

Runtimes can be distinguished in High- and Low-Level runtimes.

Low („real“ runtime): lxc, runc („run“ Containers based on Namespaces and CGroups)
High: CRI-O (Container Runtime Interface), containerd (Docker) → implement additional Features like APIs (Docker uses it to implement additional Features implementiert, like downloading Images, unpacking, etc., the High-Level runtimes themselves use runc, though.)
RKT could be considered high-level (it uses runc, implements lots of Features, said to be more secure than Docker)

build your own “container runtime” with standard Linux Tools:
create cgroup (cgcreate)
set attributes for cgroup (attr -s)
execute commands (sudo cgexec -g memory:memgrp bash)
move it into it’s own namespace (unshare)

Docker

Docker consist of:
dockerd (Image building, e.g..: „docker build“ with dockerfile)
containerd ((Image)managment, e.g..: „docker image list“, etc.) (High-Level Runtime), https://containerd.io/img/architecture.png
runc the Library containerd includes to spawn containers, etc.
Docker images consist of Layers (e.g..: Debian Base Image).
So the Baseimage may be the same for all Containers gleich sein but the applications may be different ones.
For example: all Applications may use the same Libs (write-protected Layer) but also bring their own Libs („open“ Layer).
If something must be written to one of the underlying Layers, it is done in a copy of that layer, which resides in Memory.

important commands:
docker import (imports an Image – from a Tarball for example)
docker image ls (shows all Images)
docker tag (names the image)
docker run -i -t stretchbase bash (executes a command Command (Bash) in a Container (named stretchbase and attaches you to that bash)
docker run -d –name “apache-perl-test” –network test-net -p 8082:80 apacheperl
docker login (logs you into the respective docker repo, defaul is dockerhub)
docker push (pushes the Image to the respective hub)
docker build (builds the new Image, based on the base image and a dockerfile)
docker exec (executes a command in an existing container, good for debugging)
docker info (status information)
docker logs (shows the Logs of a specific Container)
docker inspect (shows Properties of a specific Container)

Kubernetes

could be defined as:
Super-High Level Container Runtime with CRI (Container Runtime Interface) as the Bridge between kubelet („the heart“ of Kubernetes: Agent on Master and Nodes that manages everything) and the runtime.

A Container runtime that wants to work with Kubernetes, must support CRI.
https://storage.googleapis.com/static.ianlewis.org/prod/img/772/CRI.png
supported (CRI)Runtimes:
Docker
containerd (this is docker more or less)
CRI-O

CRI Specs:
CRI is a gRPC API (modern RPC) with a google specific Protocol.
(Protocol Buffers: https://developers.google.com/protocol-buffers/) → serialization Interface ala XML

Kubelet uses CRI to communicate with the Runtime (RPC: e.g.: pull Image, start/stop Image/Pod, etc.)
This can also be done manually via crictl.

Kubernetes consists of Masters and Workers.

One (or more) Masters contain the management tools:
API-Server (connects the components of the cluster)
Scheduler (assigns the components (e,g. Applications, Pods, etc.) to the workers)
Controller-Manager (manages Cluster outages, Replication, Worker status, etc.)
ETCD distributed Data storage, which always contains the cluster status as a whole.

One (or more) Workers contain the Pods and/or Applications:
Docker (or another Runtime is running there (may also run on the masters))
Kubelet (the Agent, responsible for communication between master and worker, which manages the containers, etc.)
Kube-Proxy (Network- and Application Proxy)

How does it work?
To run an application (maybe consisting of several micro services), it must be described (yaml manifest) and published to the API.

The Manifest contains all Information about the Components/Applications, how they relate to each other (e.g. Pods), the workers they should be running on (optional), how many Replicas there should be (optional) and much more (more on the Topic and Formats: here).

The Scheduler manages which Container group (Pod) should run on which Node (monitoring the cluster resources and doing some magic).

The Kubelet, for example, tells the runtimes to download images, execute the Pods and much more.

ETCD continually tracks and monitors the status of the cluster.
And, in case of malfunction of a component (e.g.: death of a node), a Pod would be restarted on another node by the controller manager and kubelet.

important Commands:
kubectl (main command for Mgmt., Settings, Deployments, etc.)

kubectl cluster-info (statusinfo)
kubectl cluster-info dump (dumps etcd content to the Console)
kubectl get nodes (lists all nodes)
kubectl describe node $nodename (returns Properties for the Node, CPU, Mem, Storage, OS, etc. pp.)
kubectl get pods (lists all pods) (tip: -o wide)
(tip: -o yaml, returns Pod descrition as yaml, useful for defining new Pods)
kubectl get services (lists exposed pods/replicasets = services)
kubectl delete (service|pod|etc) (deletes)
kubectl port-forward $mypod 8888(localhost):8080(pod) (creates portforwarding on localhost, useful for debugging)
kubectl get po –show-labels bzw. -L $labelname (groups pods by lables)

how to create pods manually (not recommended, except for testing)

kubectl run blabb –image=gltest01.server.lan:4567/nkalle/jacsd-test/jacsd:master
–port=8081 –generator=run-pod/v1 –replicas=1 (creates a replication controller with 1 replica, without „replicas = normal pod)

kubectl scale rc blabb –replicas=3 (scales replicas manually)

kubectl expose (rc|pod) –type=(ClusterIP|LoadBalancer|etc.)\
–name blabb-http –(external|load-balancer)-ip=10.88.6.90 –target-port=8080 –port=6887 (target-port is inside the Container, port is on the ext. IP)
(exposes pod/service outside the cluster/node)

API-Doku: https://kubernetes.io/docs/reference/

Pods
Pods are groups of containers, which exist in the same namespace and on one worker.
If a pod contains several containers, they run on the same worker (a pod may never use multiple workers).
A Process should run inside one contianer and a container should run inside a pod.
Everything that should be able to scale separateley, should also get it’s own pod.
(Example: Apache Frontend/Mysql-Backend → 2 Pods).

create Pods with yaml

Usually Pods aren’t created manually with the above commands, but through yaml Files which are published to the kubernetes API.
Read more about it here.

„Kubectl get pods $podname -o yaml“ is a good start to create a new Pod based on an already existing one.

important parts of the yaml description:
API Version: which version to use v1 – stable, v1/beta – Beta (Check the Link above).
metadata (Name (name:), Namespace, Lables, Infos)
spec (Content/Container (containers:), Volumes, Daten)
status (Status, internal IP, basics, not writable, Kubernetes Info!)

For orientation check the above API-Reference, but also explain:
kubectl explain pods (describes all Attributes of the Object)
kubectl explain pods.spec (describes all Fields of the Attribute: spec)
you get the idea.

kubectl create -f mypoddefinition.yaml (API is used through kubectl)

Volumes:
A Volume may be used by the entire Pod, but has to be mounted first.
popular Storage Solutions out there: nfs, cinder, cephfs, glusterfs, gitRepo – but there are way more…

Without a volume for shared use and persistence of Data, most Pods are useless.
For Testing, one may use emptyDir, which adds volatile storage to the pod.

For real persistence though, one will need one of the storage solutions mentioned above (nfs for example as it doesn’t take much effort to set it up).
This is called a Persistent Volume (or PV).
A PV can be configured for the whole cluster via the API:
– PV: kind_PersistentVolume

Next a “Persistent Volume Claim” has to be made.
– PVC: kind_persistentVolumeClaim

Now one may create a pod that references the PVC.

As everything Kubernetes, this is highly customizable and volumes may be claimed and provided dynamically or statically.

Read this:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/
https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/

Glossary on Linux container technology, runtimes and orchestrators

Leave a comment

Cancel reply