Which CRI should I use to replace docker for my kubernetes cluster

Ulrich Giraud
6 min readDec 11, 2020

--

So, the deprecation of docker by kubernetes as made some waves and worried a good part of the community, but, docker isn’t the only implementation of the CRI.

Alright then, what other CRI should I use? you may ask, and in fact this is the exact question I had, and the purpose of this story.
I’ve tested the following CRIs and tried to get an answer by benching them:

  • dockershim
  • containerd
  • crio

For cri-o, 2 backends have been tested : runc and crun, to test against cgroupsv2 and see the impact if any

Test environment

The test environment will be a kubernetes 1.19.4 cluster, created using ansible and the roles from https://gitlab.com/incubateur-pe ( https://galaxy.ansible.com/incubateurpe )

The servers will run on kvm and configured as follows :

  • master : Centos/7, 2vcpus/2Go ram
  • crio-crun node : Fedora-32, 2vcpus/4Go ram
  • other nodes : Centos/7, 2vcpus/2Go ram

The underlying hardware is an i7–9700K, 64Go of ram and a mp510 nvme drive

Cluster creation

Pretty straightforward, I used molecule to pop up a cluster, and set it to use a different cri on each worker node. The source of the cluster is here : https://gitlab.com/incubateur-pe/kubernetes-bare-metal/-/tree/dev/molecule/criBench

So let’s run a “molecule converge” and, 10 minutes later, we’ve got ourselves the following cluster:

Ok! time to bench it!

First bench : bucketbench

Bucketbench ( https://github.com/estesp/bucketbench ) is a tool for executing a sequence of actions against a container engine, which is perfect to get an idea of the performance of each of the previous nodes.

The scenario is simple :

  • 3 threads
  • 15 iterations
  • run/stop/delete

And the results are ( in ms ) :

Alright! we have quite a difference in performance!

But, wait a minute, what is this docker-shim? Why is there 5 instances tested but only 4 servers?

Ok, let’s dig down, docker, as we use it with the docker client is not what kubernetes uses, in fact, docker implements the CRI and provides a socket which is callable like any other cri socket.
So, the difference here is :
- docker-shim: bench via the cri socket
- docker-cli : bench via the docker client

As you can see, there is a tremendous difference in performance when using the cli, and in fact, docker isn’t as bad as I thought, he is in fact faster that cri-o in this test.

But, the clear winner here seems to be containerd

Second bench: kubernetes

While running the tests above, I had the feeling that this was not the whole story: what is their behaviour when used by kubernetes? is there more than just run/stop/delete to test? Does this difference in performance have any meaning on a real cluster?

And, in fact, I was right, those 3 verbs are far from the whole story!

Let’s dive in, with, this time, prometheus in the cluster, grafana to visualize the metrics, a custom dashboard to get a clear view ( https://gitlab.com/ulrich.giraud/bench-cri/-/blob/master/dashboard/dashboard_bench.json ) and something to deploy in the cluster

As the test is on the runtime only, and not the workload, what I’ve deployed in the cluster is a daemonset ( to get it on each node each time ) of a busybox launching a “sleep infinity”:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: benchds-replaceme
namespace: benchds
labels:
k8s-app: benchds
spec:
selector:
matchLabels:
name: benchds
template:
metadata:
labels:
name: benchds
spec:
containers:
- name: benchds
image: busybox:latest
command:
- sleep
- infinity
resources:
limits:
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi

This daemonset will be deployed with unique name :

  • a hundred times with some delay between 2 creations
  • a hundred times in bulk
  • a thousand times in bulk

Grafana gave me the following results, which are great, but not really easy to read and compare :

Ok, let’s see the results, reworked and regraphed ( source : https://docs.google.com/spreadsheets/d/1FcRjVlqwy-1kTHfSxSv3e0LDxidrdGSx-b_-a7gkpE4/edit?usp=sharing )

Slow creation of a hundred daemonsets:

Burst creation of a hundred daemonsets:

Burst creation of a thousand daemonsets:

Alright, now that we have all the numbers, time to see what we can extract from them:

Cri-o/runc: surprisingly, the slowest of all on creations/deletion, but average at everything else
Cri-o/crun: not great on creation/deletions, but the best on everything else
Containerd : fast in almost all circumstances, pretty good response to heavy loads
Docker : faster than cri-o on creations/deletions, but the slowest on status/list requests

The status/list requests are the most frequent requests on the cri, this is were the performances are the most important, and crio seems to be the better choice here, followed really close by containerd.

Containerd is really performant across all metrics, and seems to be the most balanced choice.

Docker in the other hand doesn’t get great results, but is consistent accross the board, regardless of the load.

Conclusion

Alright, there are really good alternatives in pure performance terms to docker, and our clusters won’t suffer any penalty from this docker deprecation.
In fact, this is a good thing that this communication of kubernetes deprecation of docker made some folks realize that docker isn’t the only CRI available, and in fact, not even the only tool to build the images.

Docker is still in my opinion the one which made the whole containerization move forward and is a great tool. It’s also great to see other projects focus on other use cases.

But I haven’t answered my initial question, which was : what CRI should I use for my k8s cluster?
So docker won’t be a choice anymore so, in fact, any of the others, depending on the constraint and use case, the performance isn’t the whole story and while I knew it from the beginning, it has been a way to test each one, to understand how they work and they integrates with k8s.

But you still not give a clear answer!

True, so this is my personal choice : containerd, he is fast, easy to configure, pretty reliable and secure. Cri-o in the other hand already support cgroupsv2 and would be my choice on fedora or centos/8

--

--