Wait, How Do I Store Stuff In Kubernetes Again?

So you're new to Kubernetes, and wondering where your files go...

Let go of the concept of "files" for a sec, and think about what digital storage is.

On the lowest software layer, it's made up of these things called blocks. We're not gonna get into how these physically work - but just imagine raw chunks of space on a drive. Bits with a "one" or "zero" value.

The filesystem is a software abstraction for those blocks that makes them easier to work with. It ususally includes some fault-tolerance and performance optimization features.

On top of the filesystem we can build other storage protocols like S3, an application that serves "objects" (practically speaking, files) over HTTP.

As a cross-platform container orchestrator, Kubernetes has differrent StorageClasses for differnet use cases and differrent cloud providers. We'll be focusing on AWS today to keep the scope of this article manageable.

So, ask yourself what you want to store.

We need a place to store our database files. Do we just mount /var/lib/mysql to a folder on one of the hosts like Docker does, or...?

That's gonna be block storage, since you need to give the database engine direct access to the disk for optimal performance. On AWS, this disk is an EBS volume, with a StorageClass of awsElasticBlockStore.

Ok, how about bulk S3-style object storage for my app's user-uploaded content?

In this case, it's actually a real S3 bucket. Kubernetes talks to the AWS api to dynamically provision resources using a set of drivers called AWS Controllers for Kuberneres (ACK). Here's some code samples. In fact, here's the config file for the pod the driver runs in.

Note: At time of writing, the ACK drivers are still in developer preview. This is a bit of a head-scratcher to me; one would think AWS would want to make it easier to spend money on their services, but...ok Bezos! 🤷

You'll also notice the ACK docs make no mention of the awsElasticBlockStore StorageClass. That's because support for EBS volumes, GCEPersistentDisks, AzureDisks , and several other provider-specific storage configs are actually baked into the core Kubernetes project. This seems a bit nonsensical to me, and I'd like to see them broken out in the future, but...alas.

But what about super-secret data? We can't possibly be storing our API keys in plaintext config files...can we?

No, we can't, and Kubernetes needs to improve here. By default, our secrets get stored in plaintext format etcd, the master node's settings database. As of v1.13, you can at least opt to encrypt your secrets...but you still need a place to store the encryption keys.

This is where you slot in another cloud-specific specific plugin, the Key Management Service (KMS) Provider. Basically, your cloud hangs onto the encryption keys for you, and you just trust them. Here are the generic docs to familiarize yourself with the concepts, and here's a good tutorial for Amazon's KMS implementation.

But how do the secrets get passed around, mechanically?

Glad you asked - that's actually one of the cool parts. Since the Kubernetes master is a VM, AWS can bolt on UNIX sockets as it sees fit. It mounts one for the KMS service (external) and the master to share, and they communicate over gRPC, a zippy bidirectional communications protocol.

Hope this clarified a few things! Let me know if this was helpful, and if I missed anything in the comments, k? ⬇️

Noah Williams

Noah Williams

Software Engineer | InfoSec & Infrastructure