Taking the confusion out of the hottest infrastructure tool on the market
The Humble Container
A standard unit of software --> a sandboxed process
Without getting too graphic, let's talk a bit about how containers are made.
A container starts from an
ubuntu. That "image" is actually just a zip file containing the packages the publisher installed in it.
When you run a container, your computer downloads ("pulls") it from a container registry. The most popular public registry is Docker Hub, but there are many others. In face, it's actually a spec, and you can build/host your own if you want to. Many organizations do this.
Aaaanyways, once you've pulled the image, the computer downloads (and caches) the zip file, extracts it into some location on your local machine, and starts it up.
But how does this work? And what is it starting up, exactly? A teeny VM?
So, first off, no, it's not a VM. Not unless you're talking Kata Containers, anyway. Those actually are lightweight VMs that act like containers to solve a very specific security problem.
Anyways, back to regular containers.
There are two features of the Linux kernel that make containers possible: Namespaces and Control Groups. Of note: there are also Windows containers out there...but, that's another topic.
This Linux kernel feature allows you to control which processes can talk to one another. Processes in the same namespace can generally see each other.
These do several cool things. For one, they allow you to lie to a process about the context it's running in.
For example, you can map
/var/log for your Nginx container.
Pretty convenient for local development, eh?
They also let you create virtual network interfaces between isolated processes.
For example, if you want to use Nginx as a reverse proxy in front of an Apache server running a LAMP application, you can connect the Nginx proxy only to the Apache server without letting it see the network traffic going between Apache and the database.
Pretty cool for security, eh?
Permissions for your code
Control groups allow you to restrict what resources a process can access. They manage namespaces, plus a bunch of other stuff.
For example, you can limit the amount of CPU and ram a hungry Java application (💥🔥☕️) can consume!
(Just kidding Java)
The most popular container engine
People tend to associate containers with Docker, a popular container management tool ("engine") based on the
That's kinda fair. Docker is to containers as Kleenex is to tissues. Others exist, but since the death of the Rocket project, Docker is more or less the big name in the game.
The common runtime of containers
If Docker is a popular distro (say, Ubuntu),
containerd is the Debian upstream it's built from. Docker's runtime is only one implementation of this runtime
The code that runs your code
Nobody wants to program in 1's and 0's, so the runtime is the support code that translates whatever high-level language you've chosen into them.
To be honest, I hate the word "runtime". It's so unspecific, confusing, and lots of "runtimes" include a lot of code that isn't runtime-related, like a package manager (*cough*, Node, *cough*).
I think Java has the most apt analogy for this, calling its' runtime the "Java Virtual Machine." That's...kinda what it is; a mini VM your code runs in.
A lie you will hear a lot.
You can't actually have stateless everything. You need something for persistant storage. Otherwise what are you doing building a backend? "Stateless" actually just means "Someone Else's Problem"!
There are a bunch of options for this.
If you're kuberneting on-prem, you'll have to setup some kind of network file storage solution like NFS or storeageos.
If you're in the cloud, your provider will give you "managed" storage, which leads us to...
Someone else's problem.
Someone (probably you cloud provider) sets up and maintains a service for you. Probably another Kubernetes cluster, actually. But you don't have to think about that, cause they just give you a set of creds and say go.
The Hot-Sh*t Kubernetes Cluster
The Linux of the Cloud
Kubernetes is how you run backend applications in production, at scale.
It's not the only way...but it sure feels like it these days.
What it does for you is maintain the infrastructure you order it to. Want three replicas of your app running connected to two MySQL instances for high availibility? You got it! Just tell Kubernetes, and it'll stand them up for you and make sure they stay up.
Service scaling group
When you tell Kubernetes you want those three replicas of your application, pods are what they become. They consist of one or more containers. Usually just one-but...
If, say, you're running a CI/CD service that needs multiple containers to execute a set of concurrent tests in for a pipeline, you can put those containers in a pod. However...
Don't put multiple services in a pod!
You wouldn't want to run your Wordpress instance in the same pod as your MariaDB server. If you did this, each instance of Wordpress would be connecting to a different database - that would be a disaster!
Instead, create seperate...⬇️
The definiton of service that you're used to...but highly availible.
A running thing with an IP address. Maybe a domain if you're going all fancy with it. That's it, really.
Can you hit it requests? If so, it's probably a service.
All Kubernetes does is make them more reliable...if you use it right, that is.
The machines with all the wisdom
These machines make up the control plane for your Kubernetes cluster. One is fine for development, but in prod you may want to use multiple for high availability. They can talk to each other automagically and do that replication without too much hassle, kinda like domain controllers.
If you're using a managed Kubernetes service, your cloud provider will typically give you these for free and just charge you for the...⬇️
They are totally obedient, taking any order without question
These are the rank-and-file clone soldiers that do the grunt work. They do as their Kube masters command, running your actual applications.
Typically they're just VMs in the cloud somewhere - but any Linux machine will do.
The command-line frontend for managing Kubernetes clusters
kubectl is nothing more than a friendly CLI that fires off HTTP requests to the master node. You can install it on your laptop and make requests to any Kubernetes cluster you control, or you can ssh into your server and do it directly from there. Where you kubectl is up to you.
Telling Kubernetes what kind of infrastructure you want is a little complicated, and you don't wanna spend hours typing out long, hard-to-reproduce commands- which is why we have...⬇️
The config files you feed into kubectl
These are great 'cause you can version-control them and do everything that entails. Yes, you can CI/CD your entire production infrastructure simply by pushing to master now. Pretty neat, huh?
They're not actually read directly by the master node, however; they just get serialized and blasted into the...⬇️
How the whole damn cluster works
This is exactly what it sounds like. It's a REST api running on the master node that actually does the control work.
Now, as we all know, API's don't work without a database, which brings us to...⬇️
The Linux Registry. Sigh. Yes, one now exists.
Named sentimentally after the classic UNIX
/etc settings folder,
etcd is a the key-value database the API server hits to store and retrieve the cluster's settings.
You care about etcd.
It's a big red juicy target with "hack me" spraypainted all over it. By default, this is where
secrets get stored, but you really don't wanna do this, since they get stored in plaintext.
You need somewhere to store those passwords & API keys!
Secrets are exactly what they sound like. If you have something to hide (hint, if you're running Kubernetes, you do) it's a secret.
You could keep storing your production Postgres password in plaintext in a configuration file on disk...
But that's...err...what's the phrase...?
...a really f*cking bad idea?
You got options for this. Each cloud provider has its' own in-house solution, but there exist others that you can use on-prem, like Hashicorp's Vault.
A Kubernetes-optimized reverse proxy
If you've ever used a reverse proxy for load-balencing or handling TLS...you already basically know how this works. The ingress controller ingests incoming requests (or plain old TCP packets, if you're playing on L4) and funnels them into your containers as you wish.
Some big names here are Traefik, Nginx, Istio and HAproxy. Each of these provide a specialized build of their respective packages optimized to tango with the Kubernetes API.
They're not like Wordpress plugins, I promise!
Kubernetes uses software-defined networking to facilitate communication between containers. Network plugins are basically virtual routers/firewalls that you install in your cluster to let the containers talk to each other.
The Kubernetes package manager
Remember how I told you before that "stateless" was a lie? Well, here you go. We've come full circle, from
apt install $packagename to
helm install $packagename. Go figure.
Help packages, called
charts, help with deploying prebuilt applications on Kubernetes quickly.