Monitoring with Prometheus

Infrastructure needs to be monitored and there exist several tools for this task, not at least because the term “monitoring” is rather fuzzy.
However, two great tools for this task are graphite and Prometheus.
Both have their pros and cons, like with graphite it is much simpler to keep data for long term analysis, while Prometheus shines with it’s powerful query language, named prom-ql.
But there are way more and I won’t discuss them here, it’s pretty much dependent on ones needs and general preferences which one to use.
So…

What is Prometheus?

In short:
Prometheus is a set of tools for monitoring, some of which are optional.
In it’s core, the Prometheus Server takes care of collecting and storing metrics in a highly performant time series database and makes them available for further processing (like querying them or sending alerts).
The Metrics are mostly scraped from so called exporters.
They are one of prometheus strengths, because the ones it brings are already powerful and there already exist a ton of exporters from the community for all kinds of services, also it’s kind of simple to implement custom exporters.

Optional components are the Alertmanager, the Pushgateway and third party dashboards like Grafana.

What’s this article about?

I’m going to set up a Prometheus server, an alertmanager, a grafana server with an example dashboard and scrape some metrics to fiddle with promql and display them in a grafana dashboard.

What’s needed?
  • 3 VMs with Debian Buster (bullseye should work too)
Let’s go… basic setup

I’ll name the three VMs “prometheus”, “grafana” and “alertmanager”,
make sure that prometheus is able to reach them all and grafana is able to connect to prometheus.
On prometheus install the prometheus package:

root@prometheus:~# apt-get install prometheus 

On alertmanager install the prometheus-alertmanager package:

root@alertmanager:~# apt-get install prometheus-alertmanager

On grafana install the grafana package.
With grafana it really is best to use the package they provide.

apt-get install -y apt-transport-https software-properties-common wget gnupg

echo "deb https://packages.grafana.com/oss/deb stable main" >> /etc/apt/sources.list.d/grafana.list

wget -q -O - https://packages.grafana.com/gpg.key | apt-key add -

apt-get update
apt-get install grafana

On all hosts install the prometheus-node-exporter:

apt-get install prometheus-node-exporter

If everything worked, then you should have a basic install by now and should be able to see the webinterfaces of the services.

  • prometheus.your.domain:9090
  • alertmanager.your.domain:9093
  • grafana.nik.local:3000

The initial credentials are admin:admin.

Check if the exporters are working in general.
Therefore on any or all of the hosts do:

curl localhost:9100/metrics
curl oneofthehosts.your.domain:9100/metrics

You should see lots of metrics like this, more on that later.

# TYPE apt_upgrades_pending gauge
apt_upgrades_pending{arch="",origin=""} 0
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 9.929e-06
go_gc_duration_seconds{quantile="0.25"} 1.63e-05
go_gc_duration_seconds{quantile="0.5"} 2.249e-05
go_gc_duration_seconds{quantile="0.75"} 4.0521e-05
go_gc_duration_seconds{quantile="1"} 0.00410069

root@prometheus:~# curl -s localhost:9100/metrics | grep node_disk_written_bytes_total
# HELP node_disk_written_bytes_total The total number of bytes written successfully.
# TYPE node_disk_written_bytes_total counter
node_disk_written_bytes_total{device="sr0"} 0
node_disk_written_bytes_total{device="vda"} 8.2030592e+07

Basic installation is finished and working.
Let’s check the…

prometheus configuration

A quick overview:

Prometheus server may be started with lots of arguments.
There are four categories of them:

  • config
    • the config file, defaults to: /etc/prometheus/prometheus.yaml.
  • storage
    • parameters on where to store tsdb data and how to handle it
  • web
    • web paramaters, like the url under which prometheus is reachable, api endpoints, etc.
  • query
    • query parameters, like timeouts but also max-samples, etc.

On a debian system, these parameters are defined in /etc/default/prometheus and for this setup they are sufficient.

The prometheus.yaml can be split in the following sections:

  • global
    • global parameters for all other sections like scrape and evaluation intervalls, etc., may be overwritten in the specific configurations.
  • scrape_configs
    • the actual job definitions on what to scrape, where and how
  • alerting
    • parameters for the alert-manager
  • rule_files
    • files that contain the recording and alerting rules
  • remote_read + remote_write
    • parameters for working with long term storage like thanos and/or federation

Rule files are periodically evaluated for changes and the prometheus-server itself may be SIGHUPed to gracefully reload it’s config (/etc/prometheus.yaml).
This can also be done via an api endpoint (/reload) if enabled.


Let’s build a useful config-file to scrape our hosts.

This one, aptly named “node”, just scrapes the exporter on localhost (prometheus) and grafana.your.domain.
But, for the purpose of demonstration, I’ll override some parameters in job context, like scrape_interval, etc..
sample_limit is important here, because it prevents collecting too may samples, marking the scrape as failed. This may also prevent what is called “cardinality explosion” in Prometheus (more on this later).
The job config itself is static, which is OK if one has a defined set of metrics to check, which doesn’t change too often and get’s distributed and reloaded by some mechanism.
We’ll stick to that for now.

scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'node'

# Override the global default and scrape targets 
    scrape_interval: 15s
    scrape_timeout: 10s
    sample_limit: 1000

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']
      - targets: ['grafana:9100']

Now, that there is an initial working config, one may access the metrics via the configured url of the prometheus server:

http://prometheus.your.domain:9090/targets

One may create a graph from those metrics now, by clicking on “Graph” and starting to type. For Example:

Execute it and you’ll get a list of Instances and Jobs with this metric.
Choose the one you are looking for (or all), copy it to the search bar and click on “Graph”.

Good, now that this is working, a few words on the scrape config and especially labels.
Prometheus uses labels for almost everything and has some powerful functions to manipulate and thus work with them.
Labels are the key/value pairs associated with a certain metric.
In the example above, it’s instance=’grafana’ and job=’node’, placed in {}, they are assigned to a metric.
Of course these are basic labels and one may add lots more.
That’s were one needs to be careful, because each unique combination of labels and metrics add a new time series in the database, leading to the aforementioned “cardinality explosion” eventually.
So, choose labels wisely and never ever use dynamic, unbounded labels, like user ids and such.
Read more on this topic: here.

Alright, let’s check out how relabeling works:
Add a new label (this would also update an existing label foo with the value bar):

- job_name: node
    static_configs:
      - targets: ['grafana:9100']
      - targets: ['localhost:9100']
    relabel_configs:
      - target_label: "foo"
        replacement: "bar"

One can also rename and drop Metrics which match regexes and also chain those rules, you may want to check out this excellent site with examples.
However, don’t confuse relabel_configs with metric_relabel_configs.
metric_relabel_configs is applied after a metric was collected, but before it is written to storage, while relabel_configs is run before a scrape is performed.
You may use this to drop specific metrics until problems are fixed on a client, for example.
This one drops all the metrics for sidekiq_jobs_completion_seconds_bucket{job=”gitlab-sidekiq”} from my gitlab server.

- job_name: gitlab-sidekiq
    metric_relabel_configs:
      - source_labels: [ __name__ ]
        regex: sidekiq_jobs_completion_seconds_bucket.+
        action: drop

You could find such a problematic metric with this query, which will list the top 20 time series (check out this blog on the topic):

topk(20, count by (__name__, job)({__name__=~".+"}))

Grafana

Now that we have a rudimentary understanding of the config and are scraping the exporters on two of our hosts, let’s connect Grafana, so that we can paint nice Graphs with powerful promql.

After login, find the configuration symbol on the left side and choose Data sources.
It is sufficient to enter the URL of your Prometheus server here.

Of course, there is a ton of options and Grafana itself can be used to as a Frontend for many data sources like databases, Elasticsearch, and so on.
If you are configuring Grafana for an Organization, you might also want to check out the User- and Org Settings.
For the sake of this Tutorial, we are done with the configuration.
Let’s build some graphs.

Add a new dashboard:

klick on “add a new Panel”

Our Data source “Prometheus” should be visible.

CPU Graphs are always good, let’s add one for system and for User CPU Time.

avg(irate(node_cpu_seconds_total{mode="system", job="node",instance="grafana:9100"}[5m]))

avg(irate(node_cpu_seconds_total{mode="user", job="node",instance="grafana:9100"}[5m]))

As you type, Grafana will provide you with options for auto completion.

Don’t forget to save and click Apply.

What has been done here?
I started to type node_ which provides me with a list of all node_* metrics and chose node_cpu_seconds_total.
Then the labels are added, within {}.
If you know that you want mode, just start typing it (you may also use the explorer (left side) first or just look through the the exporter output on host:9100 (respective port if you use another one).
Also we want it for the resources defined in job=node and an specific instance.
Then, the function irate is applied.
As stated, prometheus brings some very good functions to deal with timeseries data, but there are so many of them, best read about it here.
Rate and IRate though are used often (at least by me) and easily confused.
They are pretty similar, but work differently.
As a rule of thumb: irate reacts better to data changes while rate gives more of a trend (see link above on how they really work).
Both get us an “average” (calculated differently) per second rate of events.
For short term CPU Load, I’ll use irate [5m].
(See the picture below for the difference, when using rate)

At least (outside) I used the avg aggregator function, to calculate an average over the cpu cores (there are two of them.
Read about aggregator functions here.
Check out how it looks like without avg.

Of course there’s much more to Grafana.
For example: if you click on the “Preferences” Icon in the upper right of the new Dashboard, it is possible to define variables, which (e.g.) can replace mode, job and instance values, helping you to create dynamic and easy to navigate dashboards with drop down fields.
Also worth noticing is, that you don’t need to build every single dashboard yourself.
There are many well maintained and production ready dashboards on grafana.com/dashboards.


Alertmanager

The Alertmanager is prometheus’ “Alert Router”.
It takes alerts, generated from Prometheus’ Alerting Rules through it’s API and converts them into notifications in all kinds of forms: Slack Messages, Emails, and so on.
In a production environment one would set up more than the one we are building for the sake of this article.
Alertmanager can be clustered and replicates it’s state throughout it’s nodes and also de-duplicates alerts.

How does it work?
In short:
Whenever an alert rule fires, prometheus sends and event to the alertmanager API (json) and it keeps doing this as long as the rule matches.
The alertmanager then dispatches these alerts into alert groups (defined by several labels, as everything in prometheus, e.g. alertname), here grouping and de-duplication is done to avoid unnecessary alert spamming, also alerts may be categorized.
From here, each group will trigger the notification pipeline.
First is the Inhibition Phase.
This basically allows for dependencies between alerts to mapped (think of: a switch fails and every rule for connected devices would start alerting, this won’t happen if configured correctly).
Mapping is done in the alertmanager.yaml and thus requires reloading.
Second is the Silencer Phase.
It does what it’s name says, it silences Alarms, either by directly matching labels or by a regex (be careful with that, though).
Silences can be configured through the web interface, by clicking on Silences (http://alertmanager.your.domain:9093/#/silences).
If the alert was not handled by one of the previous phases, it get’s routed.
Basically the alert is send to an endpoint, which has to be configured in alertmanager.yaml, route section.
Several enpoints are pre-configured with example configs.

The configuration:
On prometheus.your.domain, add the Alertmanager section:

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
      - targets: 
          - 'alertmanager.your.domain:9093'

Also, tell prometheus which rule files to evaluate:

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:   
  - "alerting_rules.yml"

Create an alerting rule in alerting_rules.yaml.
This one is pretty straight forward.
Raise an Alert “NodeExporterDown” if “up” for our Job=node is not “1”, for one minute and label it as critical.
Also annotations such as links and other useful stuff may be defined.

groups:
- name: alerting_rules
  rules:

  - alert: NodeExporterDown
    expr: up{job="node"} != 1
    for: 1m
    labels:
      severity: "critical"
    annotations:
      description: "Node exporter {{ .Labels.instance }} is down."
      link: "https://example.com"

Now we need to configure the alertmanager itself.
On alertmanager.your.domain edit /etc/prometheus/alertmanager.yml.
As stated, there are several pre-configured endpoints, let’s just use email.

global:
  smtp_smarthost: 'smtp.whatever.domain:25'
  smtp_from: 'alertmanager@your.domain'
  smtp_auth_username: 'user'
  smtp_auth_password: 'pass'
  smtp_require_tls: true

define a route:

route:
  receiver: operations
  group_by: ['alertname', 'job']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 3m

define the receiver:

- name: 'operations'
  email_configs:
  - to: 'your@ops-email.de'

So, now, if we stop the prometheus-node-exporter on grafana.your.domain, the rule get’s evaluated and prometheus enters a pending state for one minute to see if the service comes up again.

If it does not, the service is marked as down:

and the prometheus alert starts firing:

We can check for it on the alertmanager.your.domain web interface:

and also get an email:

That’s it. Prometheus, some basic rules, Grafana and a basic dashboard, and also Alertmanager and a basic alarming is up and running.


Of course there is much more to know, especially about alerting and recording rules, promql and dynamic configuration, as well as cloud configs and operators, long term storage and so on.
I’d recommend reading the excellent Infrastructure Monitoring with Prometheus as well as the official documentation on the prometheus website.

Glossary on Linux container technology, runtimes and orchestrators

MicroservicesLots of „slim“ and “autonomous” Processes, scalable seperately and, in the best case, replicable.

Pro: scalability and resilience
Con: complexity and traceability/transparency (Zipkin → Tracetool)
Container / VmsContainer technologies (here: Docker, rkt, etc.) use Linux Namespaces to enable runtime isolation for processes on the underlying OS (read: underlying Kernel, LXC, runc, etc.).
Example Namespaces are:
– mnt
– pid
– network
– user
– etc. (uts, ipc)

a little bit like chroot (->mnt) but much more granular.
different from (e.g.) Vms → Hypervisor on Ring X + dedicated OS, sep. Kernel) or Jboss as Containers (Runtime environments are located somewhere in between on the OS, Isolation through standard OS utils).

Pro „Container“: saves resources, replicable (e.g. Images w. Pipeline)
Pro Vms:
security
Namespaces + cgroupsLinux Namespaces:
“quasi pseudo virtual machines” (overlay for Filesystem (mnt), Network, Pid, etc.).

build them yourself:
“unshare –fork –pid –mount-proc bash”
(forks a new PID Namespace through the syscall unshare and starts a bash)

there you go:
PID 1 in a new space (and a second to see the first one with ps).
“Ps aux” → note pts and call “ps aux” from outside the new namespace to see it’s “real” PID.

nsenter:
calls a programm in the respective namespace (from the outside, read: the OS, just lookup the PID and go ahead).
Example:

“sudo nsenter -t 13582 –pid htop, -t = “PID of the „namespace Bash“ –pid = “type of space”, Programm”

Cgroups:
they limit and/or prioritize the resources of the respective Space.
(similar to nice or quota).
Example:
“cgcreate -a nkalle -g memory:memgrp”
(create cgroup, -a Allowed User, -g Controller(in this case: Mem):Path(specify your own)

Seccomp-bpf:
Use it to limit syscalls in your namespace (read, write, kill, etc.).
In docker it is implemented via “–security-opt” and in kvm through “–sandbox
Container RuntimeThere exist several types of “container runtime”.
In Principle, everything that runs inside a container, can be called “Container Runtime”.

But:
There exist some standards though, created by the OCI (open container initiative).
For example: the runc Library (reference Implementation of OCI runtime specification).

Dockers contribution to containerization is “only” the ease of running containers through more or less standardization, one could say.
Anyway:

Runtimes can be distinguished in High- and Low-Level runtimes.

Low („real“ runtime): lxc, runc („run“ Containers based on Namespaces and CGroups)
High: CRI-O (Container Runtime Interface), containerd (Docker) → implement additional Features like APIs (Docker uses it to implement additional Features implementiert, like downloading Images, unpacking, etc., the High-Level runtimes themselves use runc, though.)
RKT could be considered high-level (it uses runc, implements lots of Features, said to be more secure than Docker)

build your own “container runtime” with standard Linux Tools:
create cgroup (cgcreate)
set attributes for cgroup (attr -s)
execute commands (sudo cgexec -g memory:memgrp bash)
move it into it’s own namespace (unshare)
DockerDocker consist of:
dockerd (Image building, e.g..: „docker build“ with dockerfile)
containerd ((Image)managment, e.g..: „docker image list“, etc.) (High-Level Runtime), https://containerd.io/img/architecture.png
runc the Library containerd includes to spawn containers, etc.
Docker images consist of Layers (e.g..: Debian Base Image).
So the Baseimage may be the same for all Containers gleich sein but the applications may be different ones.
For example: all Applications may use the same Libs (write-protected Layer) but also bring their own Libs („open“ Layer).
If something must be written to one of the underlying Layers, it is done in a copy of that layer, which resides in Memory.

important commands:
docker import (imports an Image – from a Tarball for example)
docker image ls (shows all Images)
docker tag (names the image)
docker run -i -t stretchbase bash (executes a command Command (Bash) in a Container (named stretchbase and attaches you to that bash)
docker run -d –name “apache-perl-test” –network test-net -p 8082:80 apacheperl
docker login (logs you into the respective docker repo, defaul is dockerhub)
docker push (pushes the Image to the respective hub)
docker build (builds the new Image, based on the base image and a dockerfile)
docker exec (executes a command in an existing container, good for debugging)
docker info (status information)
docker logs (shows the Logs of a specific Container)
docker inspect (shows Properties of a specific Container)
Kubernetescould be defined as:
Super-High Level Container Runtime with CRI (Container Runtime Interface) as the Bridge between kubelet („the heart“ of Kubernetes: Agent on Master and Nodes that manages everything) and the runtime.

A Container runtime that wants to work with Kubernetes, must support CRI.
https://storage.googleapis.com/static.ianlewis.org/prod/img/772/CRI.png
supported (CRI)Runtimes:
Docker
containerd (this is docker more or less)
CRI-O

CRI Specs:
CRI is a gRPC API (modern RPC) with a google specific Protocol.
(Protocol Buffers: https://developers.google.com/protocol-buffers/) → serialization Interface ala XML

Kubelet uses CRI to communicate with the Runtime (RPC: e.g.: pull Image, start/stop Image/Pod, etc.)
This can also be done manually via crictl.

Kubernetes consists of Masters and Workers.

One (or more) Masters contain the management tools:
API-Server (connects the components of the cluster)
Scheduler (assigns the components (e,g. Applications, Pods, etc.) to the workers)
Controller-Manager (manages Cluster outages, Replication, Worker status, etc.)
ETCD distributed Data storage, which always contains the cluster status as a whole.

One (or more) Workers contain the Pods and/or Applications:
Docker (or another Runtime is running there (may also run on the masters))
Kubelet (the Agent, responsible for communication between master and worker, which manages the containers, etc.)
Kube-Proxy (Network- and Application Proxy)

How does it work?
To run an application (maybe consisting of several micro services), it must be described (yaml manifest) and published to the API.

The Manifest contains all Information about the Components/Applications, how they relate to each other (e.g. Pods), the workers they should be running on (optional), how many Replicas there should be (optional) and much more (more on the Topic and Formats: here).

The Scheduler manages which Container group (Pod) should run on which Node (monitoring the cluster resources and doing some magic).

The Kubelet, for example, tells the runtimes to download images, execute the Pods and much more.

ETCD continually tracks and monitors the status of the cluster.
And, in case of malfunction of a component (e.g.: death of a node), a Pod would be restarted on another node by the controller manager and kubelet.

important Commands:
kubectl (main command for Mgmt., Settings, Deployments, etc.)

kubectl cluster-info (statusinfo)
kubectl cluster-info dump (dumps etcd content to the Console)
kubectl get nodes (lists all nodes)
kubectl describe node $nodename (returns Properties for the Node, CPU, Mem, Storage, OS, etc. pp.)
kubectl get pods (lists all pods) (tip: -o wide)
(tip: -o yaml, returns Pod descrition as yaml, useful for defining new Pods)
kubectl get services (lists exposed pods/replicasets = services)
kubectl delete (service|pod|etc) (deletes)
kubectl port-forward $mypod 8888(localhost):8080(pod) (creates portforwarding on localhost, useful for debugging)
kubectl get po –show-labels bzw. -L $labelname (groups pods by lables)

how to create pods manually (not recommended, except for testing)

kubectl run blabb –image=gltest01.server.lan:4567/nkalle/jacsd-test/jacsd:master
–port=8081 –generator=run-pod/v1 –replicas=1 (creates a replication controller with 1 replica, without „replicas = normal pod)

kubectl scale rc blabb –replicas=3 (scales replicas manually)

kubectl expose (rc|pod) –type=(ClusterIP|LoadBalancer|etc.)\
–name blabb-http –(external|load-balancer)-ip=10.88.6.90 –target-port=8080 –port=6887 (target-port is inside the Container, port is on the ext. IP)
(exposes pod/service outside the cluster/node)

API-Doku: https://kubernetes.io/docs/reference/

Pods
Pods are groups of containers, which exist in the same namespace and on one worker.
If a pod contains several containers, they run on the same worker (a pod may never use multiple workers).
A Process should run inside one contianer and a container should run inside a pod.
Everything that should be able to scale separateley, should also get it’s own pod.
(Example: Apache Frontend/Mysql-Backend → 2 Pods).

create Pods with yaml

Usually Pods aren’t created manually with the above commands, but through yaml Files which are published to the kubernetes API.
Read more about it here.

„Kubectl get pods $podname -o yaml“ is a good start to create a new Pod based on an already existing one.

important parts of the yaml description:
API Version: which version to use v1 – stable, v1/beta – Beta (Check the Link above).
metadata (Name (name:), Namespace, Lables, Infos)
spec (Content/Container (containers:), Volumes, Daten)
status (Status, internal IP, basics, not writable, Kubernetes Info!)

For orientation check the above API-Reference, but also explain:
kubectl explain pods (describes all Attributes of the Object)
kubectl explain pods.spec (describes all Fields of the Attribute: spec)
you get the idea.

kubectl create -f mypoddefinition.yaml (API is used through kubectl)

Volumes:
A Volume may be used by the entire Pod, but has to be mounted first.
popular Storage Solutions out there: nfs, cinder, cephfs, glusterfs, gitRepo – but there are way more…

Without a volume for shared use and persistence of Data, most Pods are useless.
For Testing, one may use emptyDir, which adds volatile storage to the pod.

For real persistence though, one will need one of the storage solutions mentioned above (nfs for example as it doesn’t take much effort to set it up).
This is called a Persistent Volume (or PV).
A PV can be configured for the whole cluster via the API:
– PV: kind_PersistentVolume

Next a “Persistent Volume Claim” has to be made.
– PVC: kind_persistentVolumeClaim

Now one may create a pod that references the PVC.

As everything Kubernetes, this is highly customizable and volumes may be claimed and provided dynamically or statically.

Read this:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/
https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/

Notes on Openstack

What is OpenStack and why should I need it?

In short terms: OpenStack is a collection of software, made to control resources (e.g. CPU, Harddisks, RAM, etc.) on many hosts efficiently. In other words: it’s a cloud management/orchestration system.
I use it to provision VMs.

Do you need it?
That depends.
If you are planning to go cloud native and develop all your software with solely DevOps principles in Mind (e.g. lean and agile) and to be fully stateless and REST-ified, the you might just jump to Kubernetes.
If you think another management Layer underneath it, so that you can manage (e.g.) several Kubernetes-Clusters on (e.g.) VMs and separate VMs (with a good developed Service catalogue and self service functions, etc.), wont hurt, you could give it a try.
Also be aware that this adds another level of complexity to your infrastructure (and I mean it…).

What’s this about?

In this post I’ll cover what I’ve learned about OpenStack during the installation.
This is not a HowTo per se.
An excellent HowTo (the one I worked with) can be found on the OpenStack Site:
https://docs.openstack.org/newton/install-guide-debian/
These notes are meant to be used with the above HowTo and to generally learn about the OpenStack.
Also, it pretty much compiles the more relevant (for me, at least at the moment) parts of the setup, so one gets to a working installation more quickly.

Pre-requisites

  • one will need at least two nodes (controller and worker) for a working setup
  • I recommend at least four (redundant master and workers)
  • the setup of block-/objectstorage nodes is not covered here

Installation (the Basics)

These are the most basic packages needed, categorized by node type.

CategoryItemHostNote
Basic2 NIC (Mgmt./Traffic)all1 Interface should be reserved exclusively for the communication/synchronization of the hosts.
The other is used for “normal” traffic.
BasicChronyallInstall chrony on all hosts and make sure ntp sync works and all hosts have the same time.
BasicPython-openstackclientallopenstackclient is needed on every host (read here how to install it: https://docs.openstack.org/newton/install-guide-debian/environment-packages.html)
BasicmariadbMasterthe master needs a database, I use mariadb but you may also use postgres.
BasicRabbitmq-serverMastera message broker is needed others may be used as well.
Note your username and password.
https://docs.openstack.org/newton/install-guide-debian/environment-messaging.html
BasicMemcachedMastermemcached is needed on the master to store auth tokens.
https://docs.openstack.org/newton/install-guide-debian/environment-memcached.html

Installation (ID-Service)

Now that the basic packages are installed, the Installation and Configuration of OpenStacks Services may begin. As mentioned above, OS consists of several modular Services of which some are mandatory and others optional.
The ID-Service is mandatory, runs on the master and is needed to authenticate and authorize (Tool-)users, it also manages the service catalogue (more on this later).
It’s Name ist keystone.
The relvant part of OpenStacks documentation is:
https://docs.openstack.org/newton/install-guide-debian/keystone-install.html

ID-ServiceMysql-ConfigMasterCREATE DB keystone, GRANT
ID-ServicekeystoneMasterinstall the keystone packages (keystone itself consists of several components).
ID-Servicekeystone ConfMasterconfigure the DB and the fernet token provider

The fernet tokens are bearer tokens used for authentication (more info: https://docs.openstack.org/keystone/pike/admin/identity-fernet-token-faq.html)
ID-Serviceconf apache enable wsgi-site Masterconfigure apache + enable mod-wsgi
In my case the correct module was not installed automatically so you might need to “apt-get install libapache2-mod-wsgi”.
Also there wer problems related to python3 on the system (I’m writing this from my notes) so if it does not work out of the box you might want to check python versions and dependencies.
ID-Service exports (nochmal anschauen!)MasterLogin Config
add a new user for this or just do it as root.
ID-Servicecreate configs and shell aliases for admin and user accessMasterfor now I’ve just added several echo “export FOO=”
to /etc/admin_openrc_cmd and /etc/demo_openrc_cmd.
also some aliases:
alias demo-openrc=’eval $(/etc/demo_openrc_cmd)’

Installation (Image Service)

Next is the Image Service (named glance) it enables our users (self service) to discover, register and retrieve virtual machine images.
https://docs.openstack.org/newton/install-guide-debian/glance-install.html

Imageserviceconfigure a file backendMasterIn productive Environments this should be on a separate Storage, for testing purposes it is sufficient to use a separate filesystem/directory.
Citation:
“Storage repository for image files
Various repository types are supported including normal file systems (or any filesystem mounted on the glance-api controller node), Object Storage, RADOS block devices, VMware datastore, and HTTP. Note that some repositories will only support read-only usage.”
ImageserviceDB erstellenMasterglance db user: username, pw:password
Imageservicecreate user endpointsMasterdefine groups and users that may use (parts of) the image service.
Imageserviceinstall glance Masterglance from jessie-backports + configs + glance-services, glance-api (restart)
Imageserviceupload image (debian)MasterI just used some debian image in qcow2 format.

Installation (Compute Service)

The compute service is the core of this IaaS Setup and interacts with the identity- and image service for auth and deployments, as well as dashboards, it also manages the Hypervisor on nodes, etc..
The Name is: Nova.
https://docs.openstack.org/newton/install-guide-debian/nova-controller-install.html
https://docs.openstack.org/newton/install-guide-debian/common/get-started-compute.html

ComputeserviceDB-ConfigMastercreate Nova and nova_api db, user config, etc.
Computeserviceinstall Nova Service Masternova-api nova-conductor nova-consoleauth nova-consoleproxy nova-scheduler
Computeservicemake dbs known in Novaconfig Master/etc/nova/nova.conf (mysql config)
ComputeserviceRabbitMQ Masterconfigure rabbitMQ, read the docs:
https://docs.openstack.org/newton/install-guide-debian/nova-controller-install.html
ComputeserviceCompute-DB Masterexecute the scripts to initialize the compute db.
Computeservicerestart Nova ServicesMasterrestart
Computeserviceinstall nova-compute ComputeNodeInstall nova-compute services (provides hypervisor, etc.)
Computeserviceconfigure Nova-Compute (VNC, etc.)ComputeNodehttps://docs.openstack.org/newton/install-guide-debian/nova-compute-install.html
ComputeservicefirewallingComputeNode5672 out
ComputeservicefirewallingMaster5672 in

Installation (Networking Service)

At last, for now, the networking service must be installed and configured.
Here one has to make a decision, even for the test-setup.
There are two modes: provider and SelfService.
Provider leaves it up to you to configure the networks for services/VMs and selfService is what the name says.
Read more on the concepts:
https://docs.openstack.org/newton/install-guide-debian/neutron-concepts.html
Also, check the installation guide:
https://docs.openstack.org/newton/install-guide-debian/neutron-controller-install.html

Networkservicecreate DBMasterDB-Name: neutron, User: neutron, pw: password
Networkservicecreate openstack user: (Team/Your)userMasteruse the script created above and create the user.
Networkserviceadd openstack admin role to the neutron
User
Masteruse the script, follow the howTo.
Networkservicecreate neutron ServiceMasteruse the script, follow the howTo.
Networkservicecreate Network (Provider)MasterAs stated above:
2 possibilities, Provider and SS.
For testing admin is probably the best as one will need an appropriate infrastructure to make sense of user defined routing/ips and the like.
NetworkserviceNeutron Pakete installierenMasterfollow the howTo
Networkserviceconfigure Networking ServeMaster(Server contains. DB, Message Queue, Auth, topology change, plug-ins, etc.)
/etc/neutron/neutron.conf (keystone, Auth, ml2 plugin, rabbitmq)
Networkserviceconfigure MLayer2 Plugin Master/etc/neutron/plugins/ml2/ml2_conf.ini
Networkserviceconfigure linuxbridge agent Master/etc/neutron/plugins/ml2/linuxbridge_agent.ini
IT’s recommended to configure the firewall driver.
make sure to either integrate or disable an already firewall.
Networkserviceconfigure dhcp agentMaster/etc/neutron/dhcp_agent.ini
Networkserviceconfigure metadata agent Master/etc/neutron/metadata_agent.ini
metadata_secret=mysecret
Networkserviceconfigure Compute Service konfen to use the
Networking Service
Master/etc/nova/nova.conf
Networkserviceinitalize DBMasterNeutron-db-manage (just follow the howto)
Networkserviceinstall the Compute Nodes ComputeNode
(just follow the howto)

Conclusion

Installation is possible (though problematic because of python2/3 dependencies on some packages (openstack is written mostly in python).
Also the documentation is quite excellent.
I fear updating it though, because it very much depends on versioned APIs, I guess at least a second cluster should be build – one for testing, one for production – if testing is stable enough, clusters should be switched and the old production cluster should become the new testing cluster and so on.

How To set up a DevOps Pipeline with gitlab and kubernetes

What’s a pipeline?

In short: A Pipeline helps you to get your code (or in general: service) from Development (stage) to production (stage) in a short time, while providing the ability for automatic tests in a consistent environment.

What’s this about?

I”ll build a pipeline that builds a web server in a docker container, tests it’s functionality and finally deploys it into a production environment.

What’s needed?
Let’s go

The gitlab container registry has to be enabled, from 12.5 on it is enabled by default.
If you would like to put it on it’s own url, you may set:

gitlab_external_registry="name.domain.tld" in /etc/gitlab/gitlab.rb (for omnibus installation).

Build a basic dockerimage with an installed lighttpd (checkout docker and docker file)

Setup a new project in gitlab and name it webserver, for example (checkout gitlab setup).

The Docker image

Create a Dockerfile with the following content (mkdir dockerfile, cd dockerfile, vi Dockerfile):

# Dockerfile for simple Lighttpd Server
FROM "gitlab.your.domain:4567/nikster/webserver/lighttpd:latest"

LABEL maintainer "yourmail@yourprovider.tld"
Label description "simple Webserver"

COPY lighttpd.conf /etc/lighttpd/
COPY start.sh /usr/local/bin/

CMD ["/usr/local/bin/start.sh"]

The important parts are:

FROM: it tells docker from where to get the base image (could be dockerhub as well, if you follow my docker tutorial)
COPY: we copy the lighttp.conf to it’s location.
This is the part of this setup that is most likely to change often and every time it does, we’ll commit it, push it and run it through our pipeline to be deployed in production.
Also we copy our own startscript and tell docker to execute it.

vi lighttpd.conf

server.modules              = (
            "mod_access",
            "mod_alias",
            "mod_accesslog",
            "mod_compress",
            "mod_status",
            "mod_redirect",
)

server.document-root		= "/var/www/"
server.upload-dirs		= ( "/var/cache/lighttpd/uploads" )
server.errorlog            	= "/var/log/lighttpd/error.log"
server.pid-file			= "/var/run/lighttpd.pid"
server.username			= "www-data"
server.groupname		= "www-data"

index-file.names           	= ( "index.php", "index.html", "index.lighttpd.html" )

accesslog.filename         	= "/var/log/lighttpd/access.log"
url.access-deny            	= ( "~", ".inc" )
static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" )

## Use ipv6 only if available.
include_shell "/usr/share/lighttpd/use-ipv6.pl"

## virtual directory listings
dir-listing.encoding        = "utf-8"
server.dir-listing          = "enable"

#### compress module
compress.cache-dir          = "/var/cache/lighttpd/compress/"
compress.filetype           = ("text/plain", "text/html", "application/x-javascript", "text/css")

#include_shell "/usr/share/lighttpd/use-ipv6.pl " + server.port
include_shell "/usr/share/lighttpd/create-mime.assign.pl"
include_shell "/usr/share/lighttpd/include-conf-enabled.pl"

$HTTP["remoteip"] =~ "127.0.0.1" {
	alias.url += (
		"/doc/" => "/usr/share/doc/",
		"/images/" => "/usr/share/images/"
	)
	$HTTP["url"] =~ "^/doc/|^/images/" {
		dir-listing.activate = "enable"
	}
}
server.port			= 8080
server.bind			= ""
#$HTTP["host"] == "your.server.tld" {
#	dir-listing.activate = "disable"
#        server.document-root = "/var/www/whatever/"
#        accesslog.filename = "/var/log/lighttpd/webserver-access.log"
#}
$SERVER["socket"] == "127.0.0.1:80" {
        status.status-url = "/server-status"
}

This one should work, as you can see it’s listening on port 8080.

Now create the start script:

vi start.sh

#!/bin/sh

#NK: Source of this script: https://github.com/spujadas/lighttpd-docker/

tail -F /var/log/lighttpd/access.log 2>/dev/null &
tail -F /var/log/lighttpd/error.log 2>/dev/null 1>&2 &
/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf

it will start the server.

We’ll also create a small test script to do some Q&A in our pipeline:

vi script.sh

#!/usr/bin/env bash

set -e

echo-run() {
    echo "===== Testing if the webserver is running ===== $1"
    echo "$($1)"
    echo
}

declare MYHOSTNAME="$(hostname)"

echo-run "/etc/init.d/lighttpd start"
echo-run "hostname"
echo-run "netstat -antup"
echo-run "pwd"
echo-run "ls -al --color=auto ."
echo "curl -i http://${MYHOSTNAME}/"

Now may be a good time to commit and push.

The kubernetes setup

Kubernetes must be up and running (check out kubernetes cluster) so we can plug it into our gitlab.

In your Project, klick on:

  • Operations
  • Kubernetes
  • add kubernetes cluster
  • add existing cluster

You will be asked to fill in some information:

  • Cluster Name – choose a name you like
  • api url (find it on your cluster):
kubectl cluster-info | grep 'Kubernetes master' | awk '/http/ {print $NF}' 
  • CA Certificate:
kubectl get secret <secret name> -o jsonpath="{['data']['ca\.crt']}" | base64 --decode

You’ll also need a token (if you followed the kubernetes cluster howto, fetch it like this):

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep gitlab-admin | awk '{print $1}')

If not or if something does not work, read this:
https://docs.gitlab.com/ee/user/project/clusters/add_remove_clusters.html#existing-gke-cluster

OK, now that kubernetes is plugged into gitlab, we need to tell it what services we want to deploy, how they can be accessed, and so on.
We do this with yaml files, that we’ll add to our gitlab project:

mkdir manifests 
vi deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: __CI_BUILD_REF_SLUG__
  namespace: kube-system
  labels:
    app: __CI_BUILD_REF_SLUG__
    track: stable
spec:
  replicas: 2
  selector:
    matchLabels:
      app: __CI_BUILD_REF_SLUG__
  template:
    metadata:
      labels:
        app: __CI_BUILD_REF_SLUG__
        track: stable
    spec:
      imagePullSecrets:
        - name: gitlab-admin
      containers:
      - name: app
        image: gitlab.your.domain:4567/user/webserver/webserver:__VERSION__
        imagePullPolicy: Always
        volumeMounts:
        - name: firmware
          mountPath: /var/www/firmware
        - name: balance
          mountPath: /var/www/balance
        - name: logvol
          mountPath: /var/log/lighttpd
        ports:
        - containerPort: 8080
        readinessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20
      volumes:
      - name: firmware
        emptyDir: {}    
      - name: balance
        emptyDir: {}    
      - name: logvol
        emptyDir: {}    

This will create a deployment object for kubernetes, based on our docker image, with 2 replicas and 3 volumes (in this case of the type empty dir: non persistent storage!).
Read more about storage here, if you are interested: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
For simplicity the kube-system namespace is used here.

Now, let’s define the Service:

vi service.yaml

apiVersion: v1
kind: Service
metadata:
  name: webserver-__CI_BUILD_REF_SLUG__
  namespace: kube-system
  labels:
    app: __CI_BUILD_REF_SLUG__
spec:
  type: ClusterIP
  externalIPs:
  - 10.88.6.90
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 31808
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: __CI_BUILD_REF_SLUG__
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}

Important parts here are:

  • type ClusterIP – the IP the Service should be available at
  • type LoadBalancer – as the name already states: Service will be behind the built in loadbalancer

Last, but not least we could create an ingress object (Service would also run without it but one would need a separate IP for every Loadbalancer, ingress routes requests by host- and servicename (URI).
Therefore an ingress controller needs to be deployed on kubernetes.
There are several: best known are nginx, haproxy and traefik.
(This isn’t necessary to get your first service online but cool, I’ve played with nginx and haproxy so far).
Read about how to deploy an haproxy ingress here: https://haproxy-ingress.github.io/docs/getting-started/

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: webserver-__CI_BUILD_REF_SLUG__
  namespace: kube-system
  labels:
    app: __CI_BUILD_REF_SLUG__
  #annotations:
   # kubernetes.io/tls-acme: "true"
   # kubernetes.io/ingress.class: "nginx"
spec:
  tls:
  - hosts:
    - __CI_BUILD_REF_SLUG__-gitlab.your.domain
    # the secret used here is an unsigned wildcard cert for demo purposes
    # use your own or comment this out
    secretName: tls-wildcard-demo
  rules:
  - host: __CI_BUILD_REF_SLUG__-gitlab.your.domain
    http:
      paths:
      - path: /
        backend:
          serviceName: webserver-__CI_BUILD_REF_SLUG__
          servicePort: 8080

Time to commit and push and then build the pipeline itself.

The Pipeline

Now, we need to glue it all together.
Therefore we create our pipeline definition for the Project (make sure this file in the root directory of your project):

vi .gitlab-ci.yaml

image: "gitlab.your.domain:4567/nikster/webserver/lighttpd:latest"

before_script:
  - docker login -u gitlab+deploy-token-1 -p <pass> gitlab.your.domain:4567

stages:
  - test
  - build
  - deploy

before_script:
  - apt-get update && apt-get install -y -o Dpkg::Options::=--force-confold net-tools lighttpd 

my_tests:
  stage: test
  script:
  - echo "Running my tests in Environment $CI_JOB_STAGE"
  - echo "CI-BUILD-REF-SLUG $CI_BUILD_REF_SLUG"
  - ./script.sh

image_build:
  stage: build
  image: "gitlab.your.domain:4567/nikster/webserver/lighttpd:latest"
  script:
    - echo "Entering Environment $CI_JOB_STAGE"
    - echo "CI-BUILD-REF-SLUG $CI_BUILD_REF_SLUG"
#    - mkdir /etc/docker 
#    - cp daemon.json /etc/docker/daemon.json 
#    - cp docker_default /etc/default/docker 
    - docker info
    - docker login -u gitlab-ci-token -p ${CI_JOB_TOKEN} gitlab.your.domain:4567
    - cat /etc/resolv.conf
    - docker build --no-cache -f dockerfile/Dockerfile -t gitlab.your.domain:4567/nikster/webserver/webserver:${CI_COMMIT_REF_NAME} .
    - docker tag gitlab.your.domain:4567/nikster/webserver/webserver:${CI_COMMIT_REF_NAME} gitlab.your.domain:4567/nikster/webserver/webserver:${CI_COMMIT_REF_NAME}
    - test ! -z "${CI_COMMIT_TAG}" && docker push gitlab.your.domain:4567/nikster/webserver/webserver:${CI_COMMIT_REF_NAME}
    - docker push gitlab.your.domain:4567/nikster/webserver/webserver:${CI_COMMIT_REF_NAME}

deploy_live:
  stage: deploy
  image: "gitlab.your.domain:4567/nikster/webserver/webserver:${CI_COMMIT_REF_NAME}"
  environment:
    name: live
    url: https://yourkubernetesclusterapi.your.domain
#environment:
#  only:
#    - master
#    - tags
#  when: manual
  script:
    - echo "CI_COMMIT_REF_NAME $CI_COMMIT_REF_NAME"
    - echo "Entering Environment $CI_JOB_STAGE"
    - echo "CI-BUILD-REF-SLUG $CI_BUILD_REF_SLUG"
    - mkdir ~/.kube
    - cp admin.conf ~/.kube/admin.conf 
    - export KUBECONFIG=~/.kube/admin.conf
    - kubectl version
    - cd manifests/
    - sed -i "s/__CI_BUILD_REF_SLUG__/${CI_ENVIRONMENT_SLUG}/" deployment.yaml ingress.yaml service.yaml
    - sed -i "s/__VERSION__/${CI_COMMIT_REF_NAME}/" deployment.yaml ingress.yaml service.yaml
    - echo "VERSION $VERSION"
    - kubectl apply -f deployment.yaml
    - kubectl apply -f service.yaml
#   - kubectl apply -f ingress.yaml
#   - kubectl rollout status -f deployment.yaml
    - kubectl get all,ing -l app=${CI_ENVIRONMENT_SLUG}
  • image: which docker image to use
  • before_script: logs into our docker hub (on gitlab), I found that works best.
    the second before_script directive updates all packages in the image to their latest version.
  • stage: here we define our three stages: TEST, BUILD and DEPLOY (maybe better called PROD)
    In the TEST STAGE the pipeline executes the basic testscript we wrote to check if the service comes up and is available.
    If QA is passed the new container is BUILD and the new image is pushed to our hub.
    Then it is DEPLOYed on kubernetes .

    I’ve also added lots of debug output to see what the pipeline does.

If you commit and push your changes, from now on the pipeline will be triggered on every commit.
Fetching the latest image, update it, test if lighttpd runs, build a new image and then deploy a 2x replicated and loadbalanced Service on kubernetes!

my project now looks like this (everything needed is described here):

Pipelines look like this:

Our deployment is the last of the three: the replicaset.apps/thingy
Have fun playing around with this stuff! I had it.


Tips

At some point you might want/need to use a private gitlab registry for your docker images, then you’ll need kubernetes to log into gitlab and pull the image (works with private registries on dockerhub as well), here’s how to do it:

Check if a serviceaccount exists for your namespace, otherwise create one yourself:

kubectl get sa -n webservice-live
NAME                              SECRETS   AGE
default                           1         9h
webservice-live-service-account   1         9h

if none exists (better not use default):

vi my-service-account.yml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: webservice-live-service-account
  namespace: webservice-live

kubectl apply -f my-service-account.yml

Now we’ll need a secret for our private registry, attached to our namespace:

kubectl create secret docker-registry gitlab-reg --docker-server='https://gitlab.my.domain:4567' --docker-username='validprojectusername' --docker-password='Validprojectuserpassword' --docker-email='myaddress@email.tld' -n webservice-live

secret/gitlab-reg created

check the secret like this:

kubectl get secret gitlab-reg --output=yaml -n webservice-live             
apiVersion: v1
data:
  .dockerconfigjson: eyJhdxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxlYWVdsMCJ9fX0=
kind: Secret
metadata:
  creationTimestamp: "2020-02-27T20:52:45Z"
  name: gitlab-reg
  namespace: webservice-live
  resourceVersion: "355279"
  selfLink: /api/v1/namespaces/webservice-live/secrets/gitlab-reg
  uid: 82f9675c-1967-4be9-9a5b-ac1c9e18b09a
type: kubernetes.io/dockerconfigjson

.dockerconfigjson object contains the base64 encoded credentials, if you encounter problems, check them like this:

root@kubernetes-master1:~# echo "eyJhdxxxxxxxxxxxxxxxxxxxxxxxxxxxxlYWVdsMCJ9fX0=" | base64 -d
{"auths":{"https://gitlab.my.domain:4567":{"username":"validgitlabprojectuser","password":"validgitlabprojectpassword","email":"mymail@email.tld","auth":"base64credsagain"}}}

now we may patch the secret into our serviceaccount:

kubectl patch serviceaccount webservice-live-service-account -p '{"imagePullSecrets": [{"name": "gitlab-reg"}]}' -n webservice-live

we see that the account now contains the secret named gitlab-reg:

kubectl describe sa webservice-live-service-account -n webservice-live                      Name:                webservice-live-service-account
Namespace:           webservice-live
Labels:              <none>
Annotations:         <none>
Image pull secrets:  gitlab-reg
Mountable secrets:   webservice-live-service-account-token-9pdx6
Tokens:              webservice-live-service-account-token-9pdx6
                     webservice-live-token
Events:              <none>

it’s usable in a (e.g deployment-) manifest now:

spec:
      serviceAccountName: webservice-live-service-account
      containers:
       - name: webservice
         image: gitlab.my.domain:4567/path/to/image:__VERSION__
         ports:
          - containerPort: 80
      imagePullSecrets:
       - name: gitlab-reg

technically you’ll need either serviceAccountName or imagePullSecrets in your deployment.yaml, here are both ways.

How to set up a kubernetes Cluster

What is kubernetes and why should I need it?

Simply put, kubernetes is a tool for managing computing resources. It does this very efficently by abstracting your hardware into one (or more, if you like) big computing resource and therefore highly efficient use of your hardware with very little overhead unlike Virtual Machines for example (doesn’t mean there is no use case for them anymore though as there are also pros and cons as it is with every technology).
However, kubernetes mostly manages containers like docker, rkt or podman which allow you to build cost efficient and high available (microservice) architectures.

What’s this about

In this post I’ll cover how to set up a kubernetes cluster (master and worker) and the most basic commands.

Pre-requisites

  • Im using 3 vms with debian9 and 4GB RAM, this is sufficient for a master with two workers.
  • I’m using docker for containers in this setup
  • Don’t format the Filesystem on your vm with xfs (or do it with -d), use ext4 for example. Docker uses an overlayfs (needed even if you don’t plan to keep files locally), it uses AUFS by default but that isn’t supported anymore in kernels > 4.x, so we’ll use overlay2, which has said limitations.
  • Disable swap on installation or you will have to disable it later

Installation

Let’s install the hosts (kubernetes-master1, kubernetes-node1 ):

apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common

We need some basic packages, so that we can work with external repositories.
(If you are behind a proxy, see instructions for working behind a proxy in the tips section at the end of this post).

curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io

Now docker is installed on our master and our node(s)

vi /etc/default/grub : GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"

Edit your grub config and enable the memory controlgroup (node only).

vi /etc/sysctl.conf
net.ipv4.conf.all.forwarding=1
net.ipv4.ip_nonlocal_bind=1
net.bridge.bridge-nf-call-iptables=1

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1

On your node(s) only, edit your sysctl to enable the use of the docker firewall, forwarding and the binding of non-local ips, so that we may expose our services via kubernetes later. Also I found it convenient to disable ipv6 (didn’t get it to work with ipv6 enabled).

vi /etc/hosts: 
192.168.1.10     kubernetes-master1.your.domain     kubernetes-master1
192.168.1.11     kubernetes-node1.your.domain     kubernetes-node1

I also add master and nodes to /etc/hosts here, but you may skip this if you trust in the reliability of your local dns (or use something like dnsmasq).

vi /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "dns": ["192.168.1.33","192.168.1.34"]
}

This Part is important (at least at the moment, when working with debian).
We change the cgroup driver to systemd, so docker cgroups are handled by systemd.
Also we set the storage driver to “overlay2” (docker needs an overlay fs to handle local filesystem access, here we use overlayfs2 because the old AUFS isn’t supported anymore by recent linux kernels.
You may also set things like dns here, but that’s not really necessary under normal circumstances (here it is though…).

mkdir -p /etc/systemd/system/docker.service.d

Because debian (at least atm) leaves it up to the user to handle the configuration of docker with systemd, we have to create this directory (also see: https://kubernetes.io/docs/setup/cri/).

systemctl daemon-reload
systemctl restart docker

docker should be running now on your master and nodes.

Troubleshooting Tips:
rm -f /var/lib/docker/network/files/local-kv.db

If docker won’t start because of network/virtual ip problems, delete the above file as it was created during the installation but may contain wrong information now, because of our configurations. It will get recreated on restart with the correct information.

ip link add name docker0 type bridge && ip addr add dev docker0 192.168.5.1/24

If that doesn’t help create a virtual ip address, delete the above file and the restart (virtual ip will get re-created on reboot because systemd handles docker and docker has the correct info then), you may not run into this but I encountered this problem several times.

installing and configuring kubernetes

Like with the docker installation, everything here must be done on the master and the nodes (I’ll mark the few exceptions explicitly).

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
add-apt-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
apt-get update

Configure the kubernetes sources/repositories, at the moment these are the xenial sources (they work for debian too of course).

deb-src [arch=amd64] https://download.docker.com/linux/debian stretch stable
deb http://apt.kubernetes.io/ kubernetes-xenial main

Your sources.list (or whatever you use) should now contain the above entries.

apt-get install ebtables socat conntrack

Install some dependencies first.

apt-get install -t kubernetes-xenial -y kubelet kubeadm kubectl

Now install kubernetes.

vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS $KUBELET_CGROUP_ARGS

Here we go again, cgroups driver has to be configured and this configuration has to be added to ExecStart Parameters.

systemctl daemon-reload
systemctl kubelet restart

Issue the above command on the master.

kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.90 --kubernetes-version "1.14.0"

Now execute the above command on your master to initialize the control server with the API IP to advertise (the IP of your master node) and a range for your pod network (the virtual network your services will use to communicate, make sure that the pod network is the same as written above, it won’t work otherwise in flannel).
One may give a lot of parameters here (https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/) but for now this should be sufficient.

This generates a token (and prints the entire command) one can use to join nodes to the cluster, after deploying a pod network, let’s use flanneld because it’s simple::

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Now it’s time to join our nodes with the previously generated token:

kubeadm join 192.168.1.90:6443 --token j54q85.kxpvl5hwlssvcd3  --discovery-token-ca-cert-hash sha256:dde35c22122fa8fbc0c16ddd448dc98ce1e22341e50a1463ae0dda99974cfd9f 

Kubernetes should now be up and running (for now this is kubelet, which is the kubernetes node agent that runs on every cluster node, be it master or worker, read more about it, here: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/).


Your kubernetes cluster should be up and running!
(see the below section for some Troubleshooting Tips (if the join didn’t work, you are behind a proxy pod network didn’t start, etc.)
Also let’s deploy something: the kubernetes dashboard (which is extremely useful as a graphical user interface and for a general overview about your cluster).


Tips and Troubleshooting

Check if your cluster is working as expected (do it to see if your join worked):

kubectl get nodes
root@kubmastertest:~# kubectl get nodes
NAME                 STATUS     ROLES    AGE     VERSION
kubmastertest   Ready      master   17h     v1.14.0
kubnodetest01     NotReady   <none>   4m23s   v1.14.0

Obviously everything should be in state “Ready”. Here it is not and I fixed it by manually editing flannel config in the end, on the node:

mkdir -p /etc/cni/net.d
vi /etc/cni/net.d/10-flannel.conflist
{
  "name": "cbr0",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

if you should “forget” the join command, just look it up on your master with:

kubeadm token create --print-join-command

If you are behind a proxy, configure the following to let docker and kubernetes communicate (though you might encounter several issues as these things are not build to be run behind a proxy…):

vi /etc/environment
http_proxy="http://192.168.255.1:3128"
https_proxy="http://192.168.255.1:3128"
no_proxy="127.0.0.1, localhost, alltheserversyouwanttobeexcluded 

Edit /etc/environment, which is OS default for using proxies.
One has to exclude servers by single ips, because the no_proxy directive ignores netmasks.

vi /etc/apt/apt.conf.d/99HttpProxy
http_proxy="http://192.168.255.1:3128"
https_proxy="http://192.168.255.1:3128"
Acquire::http::Proxy::my.debianrepo.de DIRECT;
Acquire::http::Proxy::corporate.debian.repo.de DIRECT;

Excluding repositories is done via “DIRECT”, so one can use internal repos and the external ones, mentioned above.

vi /etc/systemd/system/docker.service.d/https-proxy.conf
vi /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTPS_PROXY=http://192.168.255.1:3128"
Environment="NO_PROXY=127.0.0.1, localhost, alltheserversyouwanttoexclude"
[Service]
Environment="HTTP_PROXY=http://192.168.255.1:3128"
Environment="NO_PROXY=127.0.0.1, localhost, alltheserversyouwanttoexclude"

Proxy settings for docker.


kubeadm complains about missing kubeconfig or the like:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

useradd -U -G users,sudo -m -s /bin/bash kubeadm
vi /etc/sudoers
kubeadm ALL=(ALL)       NOPASSWD:ALL

Try adding a kubeadm user withthe kubeadm config or just do it directly as root.



Kubernetes dashboard deployment

We use RBAC (Role Based Authentication) with our kubernetes cluster, this allows us to manage access and rights on a granular level. First we need to create a user:

vi dashboard-adminuser.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kube-system

In case, that the role “admin-user” already exists, just add a cluster role binding:

vi dashboard-adminuser.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kube-system
kubectl apply -f dashboard-adminuser.yaml

Anyway, create it with the above command.
Now find out the auth token for the user/role, we will need it to log into the dashboard (unfortunatley ldap is not supported as of yet).:

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

Now deploy the dashboard:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-rc5/aio/deploy/recommended.yaml

http://192.168.1.90:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

The dashboard should be accessible now on your api server address.
With newer Dashboards you’ll need a client certificate (if you use the method described above and access it through the api), you may extract the information from your .kube/config.

grep 'client-certificate-data' $HOME/.kube/config | awk '{print $2}' | base64 -d >> admin.crt

grep 'client-key-data' $HOME/.kube/config | awk '{print $2}' | base64 -d >> admin.key

openssl pkcs12 -export -in admin.crt -inkey admin.key -out kub_client.p12

Extract the Key and the Cert and put them together as pkcs12. You may now import them into firefox and access the dashboard through the api (don’t forget to enter the token).


(Read everything about the dashboard here: https://github.com/kubernetes/dashboard)
You should have a fully working and manageable kubernetes cluster by now!


some closing words and Resources

This post covers the manual setup of a kubernetes cluster, which is good for understanding the basics, for a more production ready setup you may want to take a look at common automation tools (puppet and ansible):

  • https://forge.puppet.com/puppetlabs/kubernetes
  • https://kubespray.io/#/

Also, because the topic is huge, you might want to consider reading a book about it:

  • https://www.hanser-fachbuch.de/buch/Kubernetes+in+Action/9783446455108

It’s good, I’m reading (parts of) it at the moment, it uses minikube for examples a lot though.

How to set up a puppet infrastructure

What is puppet and why would I need it?

Puppet is an open source configuration management tool, that can help you to manage lots of servers without writing customized scripts for setup and maintenance for each one or each group of them.
It’s very powerful and comes with lots of modules already, for nearly every task you face.

What’s this about

In this post I’ll cover how to set up a” puppet master” (control server), managing different environments on it (for teams and/or test/production) with r10k, writing a simple module, connecting a host to the master and configure it automatically.
The master will be puppet5, so we can fully utilize the power of hiera, which allows us to separate data (configs) from code (modules).

Pre-requisites

  • Im using a vm with debian9 and 4GB RAM, once you start connecting many servers though, you’ll need much more RAM.
  • with puppet it is important to have ntp setup correctly on all hosts (apt-get install ntp should take care of everything)
  • make sure the master knows himself (/etc/hosts: 127.0.1.1 puppetmaster.your.domain puppetmaster)

Installation

echo "deb http://apt.puppetlabs.com stretch puppet" >> /etc/apt/sources.list.d/puppet.list
wget apt.puppetlabs.com/pubkey.gpg
apt-key add pubkey.gpg
apt-get update
apt-get install puppet-master-passenger

We’ll use puppetmaster with apache passenger here, this is more than sufficient for even big environments.
puppetserver is the future though, as it is more scalable in very large environments.

Setup

root@puppetmastertest:~# ls /etc/puppet/
auth.conf  code  hiera.yaml  puppet.conf

You’ll find the above structure in /etc/puppet.
hiera.yaml basically describes the hierarchy puppet should search to lookup host(grou) infromation, Delete it for now, we’ll use it in our environments later.
auth.conf is used to control catalog and ssl information your hosts are allowed to access (leave it, it should be sufficient for now).
puppet.conf contains your log- and ssl paths, as well as a facter path and more (can be extended, see: https://puppet.com/docs/puppet/5.3/config_file_main.html) important for now are the log and ssl locations and also the path for facter.
(Enter the example from the next box into your puppet.conf)
The directory code contains modules and hieradata, normally everything is stored just there, but well use several environments, so we’ll first have to create the right structure.

[main]
logdir=/var/log/puppet
vardir=/var/lib/puppet
ssldir=/var/lib/puppet/ssl
rundir=/var/run/puppet
factpath=$vardir/lib/facter

[master]
vardir = /var/lib/puppet
cadir  = /var/lib/puppet/ssl/ca
ssl_client_header = SSL_CLIENT_S_DN
ssl_client_verify_header = SSL_CLIENT_VERIFY
certname = puppetmaster
dns_alt_names = puppetmaster, puppetserver.your.domain

The [main] Section is pretty self explanatory, logs are stored in logdir and most other “stuff” related to the function of puppet is stored in $vardir or related directories.
SSL-Certificates (as well as requests) from the clients are stored in $ssldir (We’ll come to that later), the reports for the puppetruns are stored under $vardir/state and so on. $factpath is where puppet will look for custom facts.
The main section can be the same for master and clients but must not be.

The [master] section is specifically for the master itself.
Important here are dns_alt_names and certname, because puppetmaster brings it’s own CA (you can also use your own, but we won’t cover that here), so set especially dns_alt_names to whatever you think is useful when calling the master from the clients.

connecting a client

echo "deb http://apt.puppetlabs.com stretch puppet" >> /etc/apt/sources.list.d/puppet.list
wget apt.puppetlabs.com/pubkey.gpg
apt-key add pubkey.gpg
apt-get update
apt-get install puppet

Let’s connect our first client and leave the directories for now. We’ll cover them once this works.

[main]
server = puppetmaster.your.domain
certname = puppettest1.your.domain
env = prod

Delete the master section.
server is the fqdn of our puppetmaster
certname is the name we want for our ssl servercertificate. This is a fact and will be important when working with facts later (e.g. in hiera.yaml)
env is prod for now, because as long as we haven’t configured the master otherwise this is the only environment available. We’ll configure at least two later though, a testing and a productive environment.

root@puppettest1:~# puppet agent -vot
Info: Caching certificate for ca
Info: csr_attributes file loading from /etc/puppet/csr_attributes.yaml
Info: Creating a new SSL certificate request for puppettest1.your.domain
Info: Certificate Request fingerprint (SHA256): E5:71:47:CE:94:9B:A9:DC:DC:37:B7:92:89:BA:DD:78:75:D5:CC:72:06:A5:32:AF:83:8D:B0:5A:E9:81:3F:88
Info: Caching certificate for ca
Exiting; no certificate found and waitforcert is disabled

Use the above command to connect the client to the puppetmaster. puppet automatically creates a csr.
You can view and sign it on the master like this:

puppet cert list
  "puppettest1.your.domain" (SHA256) E5:71:47:CE:94:9B:A9:DC:DC:37:B7:92:89:BA:DD:78:75:D5:CC:72:06:A5:32:AF:83:8D:B0:5A:E9:81:3F:88

root@puppetmaster:~# puppet cert sign puppettest1.your.domain
Signing Certificate Request for:
  "puppettest1.your.domain" (SHA256) E5:71:47:CE:94:9B:A9:DC:DC:37:B7:92:89:BA:DD:78:75:D5:CC:72:06:A5:32:AF:83:8D:B0:5A:E9:81:3F:88
Notice: Signed certificate request for puppettest1.your.domain
Notice: Removing file Puppet::SSL::CertificateRequest puppettest1.your.domain at '/var/lib/puppet/ssl/ca/requests/puppettest1.your.domain.pem'

If something goes wrong with signing (or you need to migrate a node), you may issue “puppet cert clean $certname” and delete the certificate files in $ssldir on the client. Then repeat the above.
Tip: In large environments you will want to autosign certificates (policybased or just everything is up to you), read here on how to do this: https://puppet.com/docs/puppet/5.3/ssl_autosign.html

However, you should have a connected node by now. Let’s move on, filling it with stuff.

configuring the master

I’d recommend to do all configuration of the environments in a git repository (check out how to do this: here), leaving the management up to r10k.
However, you can just take the examples and put it into the respective directories:

  • /etc/puppet/code/environments/test
  • /etc/puppet/code/environments/production

Whatever you do, you should have at least two environments. One for testing and one for production.

In your repository (or in prod/test) create the following:

vi environment.conf
modulepath          = site-modules:modules:$basemodulepath
config_version      = 'scripts/config_version.sh $environmentpath $environment'

This contains the path where puppet should look for modules and the path for config_version.sh (needed to manage the environments with r10k).
You may look these up with:

puppet config print basemodulepath
/etc/puppet/code/modules:/usr/share/puppet/modules

puppet config print environment
production

puppet config print environmentpath
/etc/puppet/code/environments

Create a Puppetfile in your repository and either use prod and test branch, or just create (for now) identical files in prod/test folders:

forge "http://forge.puppetlabs.com"
moduledir = 'modules'

# Get a specific release from GitHub
#mod 'puppet-gitlab',
#   :git    => 'https://github.com/voxpupuli/puppet-gitlab' ,
#   :ref    => '3.0.2'

#mod 'helloworld', :local => true
#mod 'base', :local => true
#mod 'docker', :local => true
#mod 'kubernetes', :local => true
#mod "puppetlabs/inifile"
#mod "puppetlabs/stdlib"
#mod "puppetlabs/apt"

The above is just an example, but a good one to walk you through it.

  • forge is the location of an external url to get (forge based) modules from, I use the official puppelabs forge.
  • moduledir tells puppet where to install these modules (relative to path)
  • mod contains all attributes of the respective module.
    Let’s take the first block, the name of the module is ‘puppet-gitlab’ but I use another git source than the standard one (:git => ‘http://$modulegit’), also I want a specific version (:ref => ‘3.0.2’).
    In everyday work you’ll most probably want to use specific version in production environment, after you’ve tested them and “latest” (this is fetched as a default if you don’t provide a version) in test environments.
    Try it out, with hiera properly configured (later), you’ll get a fully configured and operational gitlab server in no time.
    However, when working with r10k, it will wipe everything that is not referenced (so don’t forget to commit your changes and push them before running it). So there is one more important parameter:
  • :local => true
    This is to indicate that r10k should not try to download this module from somewhere but also should not wipe it, because (most likely) it’s one of your self written modules (I’ll cover helloworld, base and docker later, so that you get the idea on how to write basic modules and how to combine them to sets).
    Best create a git repo for those modules and later clone/pull them separately if needed.
  • I commented nearly everything for now but you may already uncomment the last two modules:
    “puppetlabs/stdlib” and “puppetlabs/apt” as they are pretty useful and I use them in every setup.
    check it out: https://forge.puppet.com/puppetlabs/stdlib. inifile is useful when using the gitlab module.

Create the hiera.yaml in your repo:

vi hiera.yaml

---
version: 5
defaults:
  data_hash: yaml_data
  datadir: hieradata
hierarchy:
  - name: nodes
    path: nodes/%{trusted.certname}.yaml
  - name: Common
    path: common.yaml

This tells hiera that it should use yaml (data_hash), could be json as well but I haven’t tried that at all.
Also it references the directory where hiera should look for configuration data (this is where our cluster/note$resourceconfiguration goes).
Important is the hierarchy, because it tells hiera how and where to look for data.
It’s best to let hiera look from the most to the least specific definition.
In this example we create a simple hierarchy (under hieradata, which is a folder in our repo):

mkdir hieradata
mkdir nodes
touch nodes/yourserver.your.domain.yaml

This way, if the agent connects the master the master looks under nodes (as configured) and finds a trusted.certname.yaml because it’s the certname of the client. Hiera stops here at the most specific match and starts applying whatever config we put into nodes/yourserver.your.domain.yaml.
Next, create a file named common.yaml under hieradata.

touch hieradata/common.yaml


This file will contain the configuration everything that has no specific match.
As you can see, you may extend and improve this structure according to your needs, maybe naming your servers/clusters FunctionDepartmentLocationVlan.your.domain or whatever and have different paths for different combinations of what may be implied by this structure at your company.
You may also specify different merge strategies for hiera like stop at first match or merge most and least specific, etc.
You get the idea. Let’s stick with our example for now, but if you are interested, read here: https://puppet.com/docs/puppet/4.10/hiera_merging.html

vi r10k.yaml

cachedir: '/var/cache/r10k'
sources:
   yourdepartment:
    basedir: '/etc/puppet/code/environments'
    remote: 'https://gitlab.your.domain/youruser/puppetcontrolrepo.git'
    prefix: true

Create the above file in your git repo (if you’ve decided not to use git, which is not recommended, skip this).
This tells r10k where to fetch the environments (I named it yourdepartment, you may configure as much as you like, to allow different teams to build their own puppet environment for example or just to separate things from each other).
prefix is the important option here, because it tells r10k to prefix the environments with the respective branch names.
So if you have a production and a test branch for “yourdepartment” in your gitlab repo, they will be checked out separateley.
This allows you to set the environment of some nodes (either in /etc/../puppet.conf or on the commandline with env=yourenvironment_master or –environment=yourenvironment_test) so that you can use them for testing or production, just as you need.
I recommend you keep this in your git repo under version control. But you need copy it to /etc/r10k.yaml on your master (if you like, create another repo for that).

mkdir modules
mkdir manifests

Our initial structure is complete. commit and push your code into git (no git? ignore) and do a “checkout -b test” to create a test branch while you’re at it.
Then install r10k on your master.

gem install r10k

Now clone your repo initially (so that r10k.yaml exists, you may also just copy it there)

git clone https://gitlab.your.domain/youruser/puppetcontrolrepo /etc/puppet/code/environments/yourdepartment_master

Your master branch is checked out. From now on r10k will take care about updating the environment, installing modules and so on (after you’ve pushed to git of course and ran te appropriate commands).
I would advise to either run r10k regularly via cron or on demand using git as a trigger or some ssh command when running the master for several teams.
However, let’s check out the important commands, like checking the syntax of your puppetfile.

/etc/puppet/code/environments/yourdepartment_master# r10k puppetfile check
Syntax OK

Then let’s see what r10k would do (dry-run):

/etc/puppet/code/environments/yourdepartment_master# r10k deploy display -v
---
:sources:
- :name: :yourdepartment
  :basedir: "/etc/puppet/code/environments"
  :prefix: true
  :remote: https://gitlab.your.domain/youruser/puppetcontrolrepo.git
  :environments: []

If everything seems right, let’s deploy our environment:

r10k deploy environment -v -p

root@puppetmastertest:/etc/puppet/code/environments# r10k deploy environment -v -p
WARN     -> The r10k configuration file at /etc/r10k.yaml is deprecated.
WARN     -> Please move your r10k configuration to /etc/puppetlabs/r10k/r10k.yaml.
INFO     -> Using Puppetfile '/etc/puppet/code/environments/test_master/Puppetfile'
INFO     -> Using Puppetfile '/etc/puppet/code/environments/test_test/Puppetfile'
INFO     -> Deploying environment /etc/puppet/code/environments/test_master
INFO     -> Environment test_master is now at 7d56685293dae8a68673b7ba1009e8ceea51571a
INFO     -> Deploying Puppetfile content /etc/puppet/code/environments/test_master/modules/helloworld
[...]

Ignore the warnings, they tell you that we’re not using the “official” puppet package from puppetlabs but the one from the debian repository (one uses /etc/puppet and the other /etc/puppetlabs/).

You now have r10k management for your puppetmaster! Let’s go on and write some basic modules.

writing basic modules

Either develop your modules in the modules folder you created in your git puppet repository or use a different one (or several), that depends on your setup and company structure.
Create a site.pp file in your manifests folder.
In the days of old these contained node configurations but here we tell hiera how to lookup and merge all the classes it finds and put them in the catalog (this is the merge strategy, mentioned further above).
More on the topic: https://puppet.com/docs/puppet/5.3/hiera_automatic.html

notify { "Using $environment" :
    message => "Processing catalog from the $environment environment." ,
}

lookup ('classes', Array[String], 'unique').include

The interesting part is the lookup statement.
The notify statement is just for debug purposes. It always shows the environment used on the client.
Let’s create a base module which we can use on all of our hosts:

mkdir -p modules/base/manifests
mkdir -p modules/base/files
mkdir -p modules/base/manifests/openssh

Our base module will contain an openssh module and a bashrc module (you can download much better modules from puppetlabs, this is just to get the idea)

vi modules/base/manifests/openssh.pp

class base::openssh {
   class { base::openssh::install: }
}

vi modules/base/manifests/openssh/install.pp

class base::openssh::install {
   package { "openssh-client":
      ensure => present,
   }
   package { "openssh-server":
      ensure => present,
   }
}

In our base module we create an openssh.pp file which contains a reference openssh::install class. We could have defined it in the file itself, but later, if you add to this rather rudimentary openssh module or create your own,you’ll want to design them as modular as possible. SO you could add e.g a configure.pp and so on. However, this will make sure, that always the latest version of ssh client and server will be installed on our node (in prod you may want to work with a specific version).

root@puppettest1:~# puppet resource package openssh-client
package { 'openssh-client':
  ensure => '1:6.7p1-5+deb8u8',
}

On a host where you have some package installed you can let puppet tell you about it’s attributes and then write your code accordingly.

root@puppettest1:~# puppet resource user root
user { 'root':
  ensure           => 'present',
  comment          => 'root',
  gid              => '0',
  home             => '/root',
  password         => '$........onbZvbm0',
  password_max_age => '99999',
  password_min_age => '0',
  shell            => '/bin/bash',
  uid              => '0',
}

https://puppet.com/docs/puppet/5.3/type.html (or use puppet describe $resourcename)

vi modules/base/manifests/bashrc.pp

class base::bashrc {
  
file { '/root/.bashrc':
  ensure  => 'present',
  group   => '0',
  mode    => '0644',
  owner   => '0',
  source  => "puppet:///modules/base/bashrc/bashrc"
}
}

vi modules/base/files/bashrc/bashrc
#just copy a bashrc that you like and add:
#additional paths
PATH=$PATH:/opt/puppetlabs/puppet/bin

This is our second module in our base module, it creates a bashrc on our nodes. Notice the source parameter. We could write our content directly (parameter is “content”) but we store the file separately and tell puppet were to find it.
This translates to ” modules/base/files/bashrc/bashrc”, puppet skips “files” in the path…

from now on puppet will be in your PATH on all of your host.
Let’s put it together:

vi modules/base/manifests/init.pp

class base {
   class { base::openssh: }
   class { base::bashrc: }
}

Every module needs an init.pp! Here we define a class “base” that consist of our two new classes. But even if you are developing a new module I’d recommend that you put only basics into init.pp and keep everything as modular as possible.
Now let’s define our nodes.

defining a node

vi hieradata/common.yaml

---
classes:
   - base

Let’s put our base class in our common.yaml file that we’ve touched earlier (don’t forget to commit/push). We want ssh and bashrc on all of our hosts so it fits best in our least specific definition.

puppet agent -vv --test --environment yourdepartment_test

Info: Using configured environment 'yourdepartment_test'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for yourhost.your.domain
Info: Applying configuration version 'puppetserver-yourdepartment_test-caab813140e'
Notice: Processing catalog from the yourdepartment_test environment.
Notice: /Stage[main]/Base/Notify[Using Base Class!]/message: defined 'message' as 'Using Base Class!'
Notice: /Stage[main]/Base::Openssh/Notify[Using Openssh Test class!]/message: defined 'message' as 
Notice: /Stage[main]/Base::Bashrc/Notify[Using bashrc class!]/message: defined 'message' as 'Using bashrc class!'
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 0.62 seconds

Your node now has a new bashrc and ssh-packages provisioned.
On a more specific node (hieradata/nodes/yourserver.your.domain.yaml) use the gitlab class from puppetlabs maybe:

---
classes:
    - gitlab

gitlab::external_url: 'http://gitlabmaster.your.domain'
gitlab::gitlab_rails:
  time_zone: 'Europe/Berlin'
  gitlab_email_enabled: false
  gitlab_default_theme: 4
  gitlab_email_display_name: 'Gitlab'
gitlab::sidekiq:
  shutdown_timeout: 5

From now on, do all your configuration changes in your git test branch, pull it on the master automatically with r10k try it on your test nodes, merge into master if appropriate, run the agent and always be sure about what’s configured where and the state of your nodes!


Tips:

  • create cronjobs on the master for r10k and on the clients for puppetruns.

I’ve learned a lot from these books I read ages ago (they should be free on the web by now but also greatly outdated):

“Pro Puppet”, James Turnbull and Jeffrey McCune, APress
“Learning puppet 4”, Jo Rhett, O’reilly

How to install and connect a gitlab runner

What’s a gitlab runner and what do I need it for?

gitlab runners are worker nodes that can be connected to gitlab to run jobs on.
I use the docker executor a lot to build images, but you can run any sort of jobs on them like shell/$language scripts for testing, building, etc., whatever you configure in your projects(ci-/cd-) pipeline (by using .gitlab-ci.yaml or the auto dev-ops pipeline).

What’s this about?

In this post I’ll cover how to set up a runner and connect it to gitlab.

apt-get install -y apt-transport-https ca-certificates software-properties-common

make sure to install latest ca-certificates and the packages needed to conveniently manage apt repositories.

curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | bash

execute the script.deb.sh, provided by gitlab. This basically identifies your OS Version (in my case: debian stretch), inserts the correct repository in you sources.list and imports the gpg.key. If it doesn’t work for some reason, just manually do apt-add-repository, wget gpg.key, apt-key add gpg.key.

apt-get -y install gitlab-runner

install the gitlab-runner package

gitlab-runner register

and register the runner.
This will ask for:

  • your gitlab url
  • a token

To get the token:

  • log in to your gitlab admin account (standard: root)
  • click the little tool icon (admin area) on the top left/middle.
  • select runners on the left sidebar
  • copy the token

That’s it!


If you come here from “how to set up gitlab and work with it” and used self signed certificates from the how to.
Then you should import your CA-Certificate on the runner.

cp ca-root.crt /usr/local/share/ca-certificates/ca-root.crt
update-ca-certificates
systemctl daemon-reload
systemctl restart gitlab-runner

Tips

If you plan to use your runner for building docker images (e.g. you want to build a pipeline) , you’ll need to configure some additional things:

vi /etc/gitlab-runner/config.toml

[[runners]]
  executor = "docker"
    environment = ["DOCKER_AUTH_CONFIG={\"auths\": { \"gitlab.my.domain\": { \"auth\": \"base64encodedcreds\" }}}"]

[runners.docker]
    tls-ca-file = "/etc/ssl/certs/ca-root.pem"
    tls_verify = false
    volumes = ["/var/run/docker.sock:/var/run/docker.sock","/cache"]
    dns = ["192.168.178.33"]

DOCKER_AUTH_CONFIG is available under ~/.docker/config.json after you have issued:

docker login -u usernme https://gitlab.your.domain:4567 

tls-ca-file might prove useful if you are working with self-signed certs.
tls_verify = false disables verification
dns tells docker to use a specific dns server, this helps if you encounter “cannot resolve..” errors.


Troubleshooting Tips

  • check your firewall
  • check the config: (everything gitlab related and important is configured in /etc/gitlab-runner/config.toml)
  • if docker still complains about dns, try to add dns servers to /etc/docker/daemon.json
"dns": ["192.168.178.33","192.168.178.1"]

How to install gitlab and work with it

What is git?

Git is a modern distributed revision control system, in a way pretty much like svn (or cvs, rcs), which is centralized. but git is much more powerful, especially when it comes to working with branches.
It’s a full blown devops-tool nowadays and besides keeping track of your code, you can configure build- and deploy-pipelines, connect it to kubernetes and whatnot.

What’s this about?

In this post I’ll cover the basics on how to install a gitlab server “manually” with the omnibus installer (this is pretty straightforward as it brings it’s own chef installer) and getting started with basic commands and workflows.
If you despise “manual” installations entirely, you may take a look at the puppetlabs module (which also uses omnibus, which utilizes chef).
However, I for one strongly believe in doing things myself at least one time, for a better understanding of how they work.

Prerequisites

  • a VM/Server with at least 6GB RAM and debian9

Installation

apt-get install -y curl openssh-server ca-certificates

especially ca-certificates are important

echo postfix postfix/main_mailer_type select "Internet Site" | debconf-set-selections
echo postfix postfix/mailname select `hostname -f` | debconf-set-selections
echo unattended-upgrades unattended-upgrades/enable_auto_updates boolean true | debconf-set-selections

Gitlab requires postfix and postfix (even in this very basic setup) requires some input.
Thats what debconf is used for here, for pre-defining the needed values for the unattended (not in our case) installation.

apt-get -y install postfix unattended-upgrades

Install postfix.

curl -sS https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh | bash

The above script detects your OS, adds the correct repository to your sources.list and imports the gpg key.
You may as well fetch the with wget and then import it with apt-key add.

apt-get install -y gitlab-ce

Let’s install gitlab. This will take some time while chef does it’s thing.

vi /etc/hosts
127.0.1.1       gitlab.your.domain        gitlab
192.168.122.123 gitlab.your.domain        gitlab

Double check that your /etc/hosts entries fit, gitlab relies on it.

cp /home/user/gitlab.your.domain-key.pem /etc/gitlab/ssl/gitlab.your.domain.key
cp /home/user/gitlab.your.domain-pub.pem /etc/gitlab/ssl/gitlab.your.domain.crt

Skip this step if your certs are not self signed:

mv /home/user/ca-root.pem /usr/local/share/ca-certificates/ca-root.crt
update-ca-certificates 

Now get yourself an ssl certificate for your domain, either self signed or maybe through a service like letsencrypt and copy it to /etc/gitlab/ssl.

vi /etc/gitlab/gitlab.rb

external_url 'https://gitlab.your.domain'

gitlab-ctl reconfigure
gitlab-ctl restart

You should now be able to access gitlab with a browser on https://gitlab.your.domain.
Create yourself a password for the root account and then create a user.

If something does not work, here are some troubleshooting tips:

gitlab-rake gitlab:check SANITIZE=true

does a config check

gitlab-ctl status

shows the status of the various gitlab processes

tail -f /var/log/gitlab/*

gitlab is pretty verbose about it’s actions though it can be somewhat hard to figure out which component is failing because there are lots of them.


working with git – the basics

I recommend using git (from the commandline) with ssh, therefore you should:

  • Login as your user via the webinterface
  • click on the icon in the top right corner and then settings
  • on the sidebar to the left, click on SSH KEYS
  • upload an ssh key
  • test your connection by logging in via ssh (ssh git@gitlab.your.domain)

creating a new project

Commandline:

mkdir myproject
vi some_fancy_code.ending
git init 
Initialized empty Git repository in /home/user/myproject/.git/
git add --all                                                    # No Output, tells git to track the file, in this case all of them
git commit -m "initial commit" # commits to the local repository
[master (root-commit) ec268cc] initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 some_fancy_code.ending
git remote add origin git@gitlab.your.domain:user/myproject.git # adds a remote repo on your currently created git server, no Output
git push origin master                                          # pushes our changes to the master branch 
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), 210 bytes | 210.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: 
remote: The private project user/myproject was successfully created.
remote: 
remote: To configure the remote, run:
remote:   git remote add origin git@gitlab.your.domain:user/myproject.git
remote: 
remote: To view the project, visit:
remote:   https://gitlab.your.domain/user/myproject
remote: 
To gitlab.user.domain:user/myproject.git
 * [new branch]      master -> master

You may also just stay logged in with your browser and click on the “Plus” in the top right “corner” and then select project.
This way you will easily create a new project, but you will first have to check it out in order to work with it

checking out the project

git clone git@gitlab.your.domain:user/myproject.git

If you want to check out your project elsewhere or even another project, use git clone.
this works with https too (as do all commands described here), git clone https://gitlab.your.domain/user/myproject.gi

updating an already cloned project from the remote server

cd myproject
git pull

this pulls the changes from your remote server for the checked out branch (at the moment this should be master).
It’s best to do this every time you enter the local repository (your directory), if only to avoid merge conflicts as much as possible.

using branches

~/myproject $ git checkout -b testbranch
Switched to a new branch 'testbranch'

Branching is a great way to work on a copy of your project and change things, without breaking your “working” or “production” branch, ideally master (but branch as much as you like).
It is good practice (in teams even more then when working alone but it’s always useful) to create a branch of your working code, work on it and then commit it, test it on (e.g.) a testsystem, let it be reviewed by others or whatever and only merge it into your stable branch when you are sure it works (or at least doesn’t break anything).

~/myproject $ vi some_fancy_code.ending # add some more text
git add some_fancy_code.ending
git commit -m "add more text"
git push --set-upstream origin testbranch

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Writing objects: 100% (3/3), 244 bytes | 22.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: 
remote: To create a merge request for testbranch, visit:
remote:   https://gitlab.your.domain/user/myproject/merge_requests/new?merge_request%5Bsource_branch%5D=testbranch
remote: 
To gitlab.user.domain:user/myproject.git
 * [new branch]      testbranch -> testbranch
Branch 'testbranch' set up to track remote branch 'testbranch' from 'origin'.

In your new branch you did some changes to the file, re-added it, commited it with a message thats says clearly what you just did and then pushed it to a new branch (git will complain if you just use push, because there is no upstream branch).
It then created a merge request, which you can view in your browser and merge into your master branch, after you are sure that everything works as expected.

That’s basically it. You have a functional gitlab server and a first project to work on.
(automation pipelines and runner-configuration as well as kubernetes connection will be covered in separate posts).


Tips:

git status
On branch testbranch
Your branch is up to date with 'origin/testbranch'.

nothing to commit, working tree clean

use this often to be confident about your environment, this command tells you most things you might forget about otherwise.

source ~/.git-completion.bash
PS1='\[\033[0;32m\]\[\033[0m\033[0;32m\]\u\[\033[0;36m\]@ \w\[\033[0;32m\] [$(git branch 2>/dev/null | grep "^*" | colrm 1 2)\[\033[0;32m\]]\[\033[0m\033[0;32m\]\$\[\033[0m\033[0;32m\]\[\033[0m\]'

You may even install the bash completion for git (e.g. apt-get install bash-completion, then, if installed, enter something like the above in your .bashrc.
After installation you’ll have to manually copy the file from /usr/share/bash-completion/completions/git to (e.g.) ~/.git-completion.bas

This gives you autocompletion for git commands and a nice command prompt, which always show the branch you are currently in.

Set up a gitlab runner to build and test your code / configs

How to SSL Certificates for your services

What is SSL, what’s a CA and what do I need it for?

SSL or nowadays TLS is used to encrypt communication and validate the identity of a service provider.
The CA is what establishes the trust in a certificate.
In short:
if the certificate presented (e.g. by the server) and the CA certificate (you trust in e.g. your client) fit together, one can be sure that the servers certificate was signed by the CA. Which itself is trusted to verify at least, that the domain the (server-)certificate is for, belongs to the requester (the one who put up the csr).

What’s this about?

I’ll cover how to set up your own CA with openssl and sign your own certificates (good for when you control all the clients) and how to use letsencrypt (which is pretty easy and trusted in all major browsers).

openssl CA

apt-get install openssl
mkdir my_ca
cd my_ca
openssl genrsa -aes256 -out ca-key.pem 4096

Install openssl, create a folder for your CA and generate a private key for it.

openssl req -x509 -new -nodes -extensions v3_ca -key ca-key.pem -days 1024 -out ca-root.pem -sha512

Fill in the questions and your password – there is your root CA Certificate. You may import this in your browser or OS cert store (debian: cp ca-root.pem /usr/local/share/ca-certificates/ && update-ca-certificates).
Your CA is ready to be used.

openssl genrsa -out servercert-key.pem 4096

generate a key for the server you want a certificate for.
a good practice is to name keys and certs like “server_example_com-key.pem” and “server_example_com-cert.pem”

openssl req -new -key servercert-key.pem -out servercert.csr -sha512

use the key to create a csr (certificate signing request).
Important:
CommonName = your fqdn (e.g. server.example.com)
Also leave the password empty or you will have to manually enter it with every start of your service.

openssl x509 -req -in servercert.csr -CA ca-root.pem -CAkey ca-key.pem -CAcreateserial -out servercert-pub.pem -days 365 -sha512

Now create the servers certificate.
If you are using a webserver like apache copy the servercert-pub.pem and the servercert-key.pem (and – if not already happened – the ca-root.pem) to the appropriate folder (e.g. /etc/ssl), configure apache to use them and you are done


letsencrypt

wget https://dl.eff.org/certbot-auto
mv /home/user/certbot-auto /usr/local/bin/certbot-auto
chmod 0755 /usr/local/bin/certbot-auto

Letsencrypt is pretty easy to set up and maintain.
Download their certbot and move it to /usr/local/bin/ (at least under debian there is no package for it).
Make sure, that your firewall allows access on port 443 (or on the port your service listens? I have only used it for apache so far), otherwise letsencrypts check – if you are the domain owner – fails.

/usr/local/bin/certbot-auto --apache

You may want to back up your vhost configs if you are going to use the above commands!

The above command automatically requests and downloads Certificates for your server and edits your vhost config automatically (never broke anything on my systems).
Your SSL-Certs are now configured.
You may call it with –apache certonly, to only get the certificates and do the config by yourself.

/usr/local/bin/certbot-auto renew --dry-run

if the above works, install a cronjob to let the autobot take care of automatic renewal of certs for you.

crontab -e

0 0,12 * * * python -c 'import random; import time; time.sleep(random.random() * 3600)' && /usr/local/bin/certbot-auto renew

Install a cronjob and never again worry about certificate renewal!

How to build a webservice with docker

To start building a new Service, you’ll need an image first.
Our Service will be an apache Server which delivers an application.

So get an image from dockerhub or build your own (as described in my last post).

We’ll start by creating a working directory for our files:

mkdir docker_perl_hello_world 
cd docker_perl_hello_world

Let’s add a simple Perlscript, that prints “Hello World!” (I had this lying around already, but use whatever language you prefer).

vi mywebsite/index.pl

#!/usr/bin/perl print "Content-type: text/html\n\n"; 
print <<HTML; 
<html> 
 <head> 
  <title>Simple Perl Script</title> 
 </head> 
  <body> 
   <h1>Perl says:</h1> 
   <p>Hello World!</p> 
  </body> 
HTML 
exit;

don’t forget to make this executable

chmod +x mywebsite/index.p

Now, let’s create a simple vhost config for apache, so that it knows what to do with our script:

vi 001-perlcgi.conf

<VirtualHost *:80>
      ServerName localhost      
      DocumentRoot /var/www/html/        
        <Directory "/var/www/html/perl">
                  AllowOverride None
                  Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                  Order allow,deny
                  Allow from all
                  AddHandler cgi-script .pl
        </Directory>
      ErrorLog /var/log/apache2/perlcgi-error.log
      CustomLog /var/log/apache2/perlcgi-access.log combined
 </VirtualHost>

this is the heart of our build and requires some explanation:

vi Dockerfile

# Dockerfile for Apache Server 
FROM stretchbase:latest 
LABEL maintainer "nkalle@nikster.de" 
Label description "Apache Webserver with perl cgi" 

# install Apache and remove unneeded data 
RUN apt-get update && apt-get install -y apache2 libapache2-mod-perl2 && apt-get -y clean && rm -rf /var/cache/apt /var/lib/apt/lists/* 

# copy the vhost definition for our service 
COPY 001-perlcgi.conf /etc/apache2/sites-available/ 

# enable the default and the perl-script page 
RUN a2ensite 001-perlcgi 
RUN a2enmod cgi 
EXPOSE 80 

#copy the script to its destination 
RUN mkdir /var/www/html/perl 
COPY mywebsite/ /var/www/html/perl 
CMD ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"]

We have created the Dockerfile, this contains all the information the Docker daemon needs to assemble our new container, and within it, our service.

“FROM” tells docker which baseimage it should use (in this case the image that we built in the last post).
“LABEL” adds additional info.
The “RUN” Keyword allows us to run all kinds of commands inside the conatiner, thus configuring it.
Here, I’m updating the apt repo information, so that the latest version of apache and libapache-mod-perl2 can be installed.
Afterwards I’m cleaning up, to keep the Image as small as possible.
Now we’ll “COPY” our vhost.conf to the right place. And then enable it and the cgi Module with the “RUN” Keyword.
“EXPOSE” tells docker which port to expose (with apache that’s usually 80 and/or 443), this can be mapped to what you like later.
Last we have “CMD”, which is pretty much the same as “ENTRYPOINT” to my understanding. It tells the container which command to execute (we’ll run apache).
The Syntax is important here.

It’s time to assemble our new conatiner:

docker build -f Dockerfile -t apacheperl .

docker build -f tells Docker to use this File in particular and -t tells it to immediately tag the image after the build (last post we did this in a separate step).

After a few minutes the build should be finished and you should see your new container with:

docker image ls
REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
apacheperl              latest              57b87ae05e5a        24 minutes ago      361MB

Now, run it with:

docker run -d --name "apache-perl-test" --network test-net -p 8082:80 apacheperl

And then access it via:

http://localhost:8082/perl/index.pl

You have build a containerized web service!


Remark on Service start: there are more convenient way to bring up and maintain your service with docker compose or kubernetes for example.


Remark on the Dockerfile:
If we would be using official images, we could do some of the configuration, that we put into the vhost definition, via “ENV” Variables.
“ENV” are environment variables one can use to configure some aspects of the container. The only way I found to lookup all possible variables of an image, is to check out it’s documentation on dockerhub (in this case: mariadb).
One can set ones own environment variables at buildtime, using ARG, they van be accessed by ENV then

ENV APACHE_RUN_USER=www-data \    
APACHE_RUN_GROUP=www-data \
APACHE_LOG_DIR=/var/log/apache2