Couple of tips for K8s operator beginner developer

Vitaly Elyashev
6 min readDec 20, 2021

I used to design K8s operators for last 4 years or so, but never had a chance to actually develop operator myself. It’s is always was one of the highest priority item in my learning items list. Couple of months ago I finally was able to start spend some time to learn how to implement an operator. I defined high level list of the goals and requirements. As always, the learning process was not smooth as I hoped to and I needed to resolve some issues during the process. I’d like to provide here couple of issues and solutions. It is not that there is no enough documentation and examples out there, but I always found examples either too basic (like famous Operator’s SDK memcached operator) — which cannot be used to create real operator, or way to complex — which not really good for beginner learners like myself. The documentation was also not always good enough, at least for me.

Note: this post would be probably too basic for experience operator developers.

As I said previously — I defined for myself couple of high level goals:

  1. If I already learning how to create operators — I’ll do it in the recommended way. Which means in Golang. So it is also a good opportunity to improve my Golang skills.
  2. I needed to choose the framework I’d use to create an operator. After some (really quick) thinking, I decided to use Operator SDK. Not sure whether it was good choice — but it was really convenient and useful.
  3. I needed to define a good use case. I had several use cases in my mind, but decided to implement the simplest one — Couchbase index creation operator. Couchbase is one of the most popular NoSQL document databases and it is actually already have their own really mature Couchbase Autonomous Operator. But it is missing some very important functionality — creating query indexes using the operator. Couchbase supporting SQL like language and capabilities and its have indexes (similar to relational databases indexes). Couchbase team have their own reasons and priorities, and I hope that one day their implement index creation as part of their operator. But the fact that I need to find some creative way to create indexes as part of my Microservices architecture is really annoying — so I decided to try and implement it myself. I’ll relate below to some requirements and solutions.

Ok, so after this long and boring preface — let’s get to the tips. You can find the sources in my GitHub Repo.

Print Column

So I created operator according to guidelines and best practices. I even tried my best and implemented status using Conditions. Then I deployed it, created some test index and deployed it to the cluster. As always, it doesn’t work from the first time and I started to analyze to find out the issue. So I’ve executed “kubectl get” to my resource and got just general information. Of cause I could use “describe” and get all the information, but this is not really convenient way to see summary of my resource and status, especially if I have many resources. I saw that other custom resources “get” provide much more information. And I wanted same for my operator… Its took me some time to understand what exactly I’m looking for, but eventually I found out: it is called: “print column” and can be configured really easy:

You need to edit your types file (in the api folder). Find your main resource structure and just add printcolumn marker with your desired name and full path to the property you want to print in the kubectl:

And this is how it looked in the command line when I do “kubectl get

Handle Resource Deletion

Another topic with somewhat confusing documentation and examples is handling Resource Deletion — what happened when the CR was removed from the cluster? As in my use case I’m managing Couchbase indexes — when the CR is deleted, index should be deleted accordingly from Couchbase cluster… Like with any other resource related event — reconciliation loop is initiated on the resource deletion. But it is too late — there is no resource anymore in the cluster and we don’t know what index should be deleted…

I’ll get directly to the bottom line (which took me couple of days to figure out):

The way to handle resource deletion in the K8s is to use Finalizers. But the trick here is that you need to create finalizer dynamically when Resource is created and remove it once deletion is handled. This is how I did in my operator:

As you can see, in line 12 I’m checking whether the finalizer is exists and if not — I’m creating one with unique key: “cachev1alpha1.IndexFinalizer is the const, referencing to the string with the key value. By convention, I put it to the api file.

When finalizers are used — the resource is not deleted on delete request, but only marked for deletion. So in the line 12 I’m checking whether resource is marked for deletion and if yes — calling to the method, handling the deletion, which looks like:

I skipped the actual deletion logic part and put only finalizer removal handling, which is pretty self explanatory.

Note: pay attention that if you are not removing finalizer — the resource will be not removed. The same if, from some reason, finalizer was not deleted. You even will be not able to remove the operator in this case — so pay attention to the finalizer cleanup logic…

BTW the way to overcome this issue is just to manually remove finalizer from the resource — so you need to edit the resource, remove the finalizer and save resource. Resource will be automatically removed then…

Handle resource update

What happens when somebody updating a resource and submitting it again? The obvious answer is that our operator should identify that the resource is changed and react to this change… In our Couchbase index use case, it means that each time in the reconcile loop (if the custom resource is in correct status — Ready in this case) we need to see if the index parameters are changed, and if the answer is yes — update the index.

So what is the best approach to identify that index parameters are changed? In theory, if the operator was the only one who’s allowed to update indexes in the Couchbase (which is not correct for the indexes, but do correct for all the resources, controlled by Couchbase Autonomous operate), the easiest and most performing way will be to cache all the indexes with all the definitions somewhere (local, distributed cache) and do the comparing staff each time. So yes — this approach is working and pretty effective, but it against the best practices of creating operators, which state very clearly that operators should be stateless. So in my case — I should identify in the hard and long way whether the index was updated. Which means — I should bring the index definition from Couchbase itself and compare its definition against the new definition.

This approach can work fine for my use case and the performance overhead is not really significant here, as indexes are not changed very frequently and there are not supposed to be too much indexes in the system… I believe that it will be also situation for most of the operator use cases. But I can imaging use cases when the use case is different and would require using caching…

So what now — I know that index changed and would like to handle the change back to the Couchbase… I actually already started implement it, but in last moment understood that I’m not sure that it is really good approach to make an index update on every resource change for Couchbase (or for any other database as matter of fact). And the reason is that index update can take significant amount of time and resources… And unlike index creation or deletion, index is supposed to be in use both before and after the update. It is just not feels right to me to take a risk of stuck the whole system as result of small yaml update… Much better approach in this case is to create the new index, wait until it will be created, and then drop the old index… At least this is what I’m thinking to do now…

This is all for now — I’ll update this post with my outcomes of resource update implementation and other tips…

--

--