Some Kubernetes/Container things demystified

Kubernetes Jun 09, 2020

Taking the confusion out of the hottest infrastructure tool on the market

The Humble Container

A standard unit of software --> a sandboxed process

Without getting too graphic, let's talk a bit about how containers are made.

Packaging

A container starts from an image, say, ubuntu. That "image" is actually just a zip file containing the packages the publisher installed in it.

When you run a container, your computer downloads ("pulls") it from a container registry. The most popular public registry is Docker Hub, but there are many others. In face, it's actually a spec, and you can build/host your own if you want to. Many organizations do this.

Aaaanyways, once you've pulled the image, the computer downloads (and caches) the zip file, extracts it into some location on your local machine, and starts it up.

But how does this work? And what is it starting up, exactly? A teeny VM?

So, first off, no, it's not a VM. Not unless you're talking Kata Containers, anyway. Those actually are lightweight VMs that act like containers to solve a very specific security problem.

Anyways, back to regular containers.

There are two features of the Linux kernel that make containers possible: Namespaces and Control Groups. Of note: there are also Windows containers out there...but, that's another topic.

Namespace

Resource partitions

This Linux kernel feature allows you to control which processes can talk to one another. Processes in the same namespace can generally see each other.

These do several cool things. For one, they allow you to lie to a process about the context it's running in.

For example, you can map ./html to /var/www/html, and ./log to /var/log for your Nginx container.

Pretty convenient for local development, eh?

They also let you create virtual network interfaces between isolated processes.

For example, if you want to use Nginx as a reverse proxy in front of an Apache server running a LAMP application, you can connect the Nginx proxy only to the Apache server without letting it see the network traffic going between Apache and the database.

Pretty cool for security, eh?

Control Group

Permissions for your code

Control groups allow you to restrict what resources a process can access. They manage namespaces, plus a bunch of other stuff.

For example, you can limit the amount of CPU and ram a hungry Java application (💥🔥☕️) can consume!

(Just kidding Java)
(..sorta!) 😜


Container terms

Docker

The most popular container engine

People tend to associate containers with Docker, a popular container management tool ("engine") based on the containerd runtime.

That's kinda fair. Docker is to containers as Kleenex is to tissues. Others exist, but since the death of the Rocket project, Docker is more or less the big name in the game.

containerd

The common runtime of containers

If Docker is a popular distro (say, Ubuntu), containerd is the Debian upstream it's built from. Docker's runtime is only one implementation of this runtime

Runtime

The code that runs your code

Nobody wants to program in 1's and 0's, so the runtime is the support code that translates whatever high-level language you've chosen into them.

To be honest, I hate the word "runtime". It's so unspecific, confusing, and lots of "runtimes" include a lot of code that isn't runtime-related, like a package manager (*cough*, Node, *cough*).

I think Java has the most apt analogy for this, calling its' runtime the "Java Virtual Machine." That's...kinda what it is; a mini VM your code runs in.

"Stateless"

A lie you will hear a lot.

You can't actually have stateless everything. You need something for persistant storage. Otherwise what are you doing building a backend? "Stateless" actually just means "Someone Else's Problem"!

There are a bunch of options for this.

If you're kuberneting on-prem, you'll have to setup some kind of network file storage solution like NFS or storeageos.

If you're in the cloud, your provider will give you "managed" storage, which leads us to...

"Managed"

Someone else's problem.

Someone (probably you cloud provider) sets up and maintains a service for you. Probably another Kubernetes cluster, actually. But you don't have to think about that, cause they just give you a set of creds and say go.


The Hot-Sh*t Kubernetes Cluster

Kubernetes

The Linux of the Cloud

Kubernetes is how you run backend applications in production, at scale.

It's not the only way...but it sure feels like it these days.

What it does for you is maintain the infrastructure you order it to. Want three replicas of your app running connected to two MySQL instances for high availibility? You got it! Just tell Kubernetes, and it'll stand them up for you and make sure they stay up.

Pod

Service scaling group

When you tell Kubernetes you want those three replicas of your application, pods are what they become. They consist of one or more containers. Usually just one-but...

If, say, you're running a CI/CD service that needs multiple containers to execute a set of concurrent tests in for a pipeline, you can put those containers in a pod. However...

Don't put multiple services in a pod!

You wouldn't want to run your Wordpress instance in the same pod as your MariaDB server. If you did this, each instance of Wordpress would be connecting to a different database - that would be a disaster!

Instead, create seperate...⬇️

Services

The definiton of service that you're used to...but highly availible.

A running thing with an IP address. Maybe a domain if you're going all fancy with it. That's it, really.

Can you hit it requests? If so, it's probably a service.

All Kubernetes does is make them more reliable...if you use it right, that is.

Master nodes

The machines with all the wisdom

These machines make up the control plane for your Kubernetes cluster. One is fine for development, but in prod you may want to use multiple for high availability. They can talk to each other automagically and do that replication without too much hassle, kinda like domain controllers.

If you're using a managed Kubernetes service, your cloud provider will typically give you these for free and just charge you for the...⬇️

Worker nodes

They are totally obedient, taking any order without question

These are the rank-and-file clone soldiers that do the grunt work. They do as their Kube masters command, running your actual applications.

Typically they're just VMs in the cloud somewhere - but any Linux machine will do.

kubectl

The command-line frontend for managing Kubernetes clusters

kubectl is nothing more than a friendly CLI that fires off HTTP requests to the master node. You can install it on your laptop and make requests to any Kubernetes cluster you control, or you can ssh into your server and do it directly from there. Where you kubectl is up to you.

Telling Kubernetes what kind of infrastructure you want is a little complicated, and you don't wanna spend hours typing out long, hard-to-reproduce commands- which is why we have...⬇️

something.y(a)ml

The config files you feed into kubectl

These are great 'cause you can version-control them and do everything that entails. Yes, you can CI/CD your entire production infrastructure simply by pushing to master now. Pretty neat, huh?

They're not actually read directly by the master node, however; they just get serialized and blasted into the...⬇️

API server

How the whole damn cluster works

This is exactly what it sounds like. It's a REST api running on the master node that actually does the control work.

Now, as we all know, API's don't work without a database, which brings us to...⬇️

etcd

The Linux Registry. Sigh. Yes, one now exists.

Named sentimentally after the classic UNIX /etc settings folder, etcd is a the key-value database the API server hits to store and retrieve the cluster's settings.

You care about etcd.

It's a big red juicy target with "hack me" spraypainted all over it. By default, this is where secrets get stored, but you really don't wanna do this, since they get stored in plaintext.

 _   _            _      __  __      
| | | | __ _  ___| | __ |  \/  | ___ 
| |_| |/ _` |/ __| |/ / | |\/| |/ _ \
|  _  | (_| | (__|   <  | |  | |  __/
|_| |_|\__,_|\___|_|\_\ |_|  |_|\___|
What hackers see looking at etcd

Secrets

You need somewhere to store those passwords & API keys!

Secrets are exactly what they sound like. If you have something to hide (hint, if you're running Kubernetes, you do) it's a secret.

You could keep storing your production Postgres password in plaintext in a configuration file on disk...

But that's...err...what's the phrase...?

...a really f*cking bad idea?

Yeah, that!

You got options for this. Each cloud provider has its' own in-house solution, but there exist others that you can use on-prem, like Hashicorp's Vault.

Ingress Controller

A Kubernetes-optimized reverse proxy

If you've ever used a reverse proxy for load-balencing or handling TLS...you already basically know how this works. The ingress controller ingests incoming requests (or plain old TCP packets, if you're playing on L4) and funnels them into your containers as you wish.

Some big names here are Traefik, Nginx, Istio and HAproxy. Each of these provide a specialized build of their respective packages optimized to tango with the Kubernetes API.

Network plugin

They're not like Wordpress plugins, I promise!

Kubernetes uses software-defined networking to facilitate communication between containers. Network plugins are basically virtual routers/firewalls that you install in your cluster to let the containers talk to each other.

Helm

The Kubernetes package manager

Remember how I told you before that "stateless" was a lie? Well, here you go. We've come full circle, from apt install $packagename to helm install $packagename. Go figure.

Help packages, called charts, help with deploying prebuilt applications on Kubernetes quickly.

Noah Williams

Software Engineer

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.