My Istio adoption lessons learned — part 1

Vitaly Elyashev
6 min readFeb 1, 2021

No doubts that service mesh in general and Istio specifically brings a lot to the cloud native development by offloading many non functional aspects from the functional code, supporting much improved testability (errors and delays injections for example) and complex deployment scenarios (Canary for example). There is a lot of buzz in the industry around Istio in last couple of years and looks like we got more and more confidence with newer Istio versions. I also have to admit that Istio development team did amazing job in documenting all the functionality there with tons of examples. But despite all of it, Istio adoption could be a bit more complex and time consuming than its looks like. So I’d like to share here some lessons that we learned in the hard way. And some other issues that still open or we did some workarounds…

Deployment of Istio

You can deploy Istio in several different ways: using istioctl, using Helm and finally, using Istio Operator. I personally big fan of Kubernetes Operators, so suggested from beginning to deploy Istio with Operator. I’ll elaborate about it a bit more in the next items, but in general, looks like using operator was the right decision for us and I strongly recommend it.

Istio CA

One of the biggest values of using Istio, in my eyes, is the fact that, in theory, you don’t need to take care about certificate management for your microservices or third parties deployed in the K8s. By default Istio using self signed certificate and providing an option to change it with your custom CA. I don’t think that using self signed CA is really useful approach for production, but may be there are use cases that justified it. Either way, we tried to see what our alternatives here… The problem with injecting custom CA is that you need to create a secret with public and private keys with your CA, chain and RootCA public key in the secret, stored in the Istio namespace (istio-system by convention). It means that everybody/everything who has an access to this namespace have also access to the CA. There is additional(experimental) option to use K8s CA. It’s supposed to be better option security wise, but we not tried it yet.

And one last point — you probably need first to create the secret and only later deploy Istio in the namespace, so if you already decided to use custom CA — take into account that if you already have Istio up and running with self signed certificate, it’s probably will be challenging to change it without redeploying Istio…

K8s Service port name

Ok, how its exactly relevant to Istio you can ask — I’m using K8s services to expose my pods internally regardless of Istio. We put these services in front of our microservice in order to simplify dns resolution for other pods running in the cluster (the default configuration is ClusterIP). Well, this is what we also assumed and ignored existing services completely. Until we tried to enable JWT based AuthN and its not worked for us… After some investigation time and claims: “you see Istio is not really working” from many directions :), we found that the problem is that Istio is using port name to decide what is the underlined protocol. For example, if you are using HTTP2 protocol — you need to start port name by “http2” and add whatever you want after. By default, Istio assuming that the protocol is tcp. So this is the reason that JWT AuthN didn’t worked — there is not http headers to look for JWT in tcp protocol. Actually, we could understand that something is wrong when we saw that Kiali is identifying the traffic as tcp and not http, but we was new to Kiali and not connected the dots… So yes — when you implementing Istio in existing K8s environment, you need to go through all your services and make sure that port name defined correctly.

ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.service.port }}
protocol: HTTP
name: http2-my

Set correctly imagePullPolicy

This is another item that not really directly connected to Istio, but still worth to mention here. So what is this configuration and why it is important to discuss: according to K8s documentation, imagePullPolicy configuration is saying to kubelet whether the image should be pulled or skip pulling if it’s already exist. Per same documentation, the default policy is ifNotPresent, which means that image will be pulled only if it’s not present. I don’t know if we did something wrong, or there is something specific to Istio — but when we deployed Istio initially, the imagePullPolicy was set to Always. So each time new pod was deployed to the cluster, kubelet pulled istio-proxy image from the source instead of take the existing image. In our case the source was docker hub (which is another lesson learned :) ). We found it out only when our docker hub quota limit was reached and we was not able to deploy any pods anymore. It’s took some time, be we found out what the problem and what is the solution. So my strong advise — just configure in your IstioOperator custom resource image policy and you’ll probably save yourself unnecessary potential headache:

values:
global:
imagePullPolicy: IfNotPresent

Istio security auditing

Auditing and specifically security auditing can be interpreted in many different ways. What I mean when I say “security auditing” is the auditing of security events: like authentication and authorization of incoming calls. Both of successful and failed attempts. Since with Istio we are not need anymore to authorize/authenticate in functional code - it’s also make sense that Istio will take care about the security auditing. Well, Istio don’t have such a feature, at least it not called by this name. The feature is called “Envoy access logs”. As you can see — the Istio official site is very lite on the details about the access log. Its just explain how you can configure this feature in IstioOperator custom resource and not much more… If you’d like to publish your logs as a json — you can find this information as well:

spec:
meshConfig:
accessLogEncoding: JSON
accessLogFile: "/dev/stdout"

So basically, once you’ll define path for the access logs, every istio-proxy side car will start to issue some information about the incoming calls to its own log, which can be later redirected to log infrastructure of your choice. This is pretty useful feature as you can see the full process flow inside the mesh, starting from IngressGateway (if you are using one). The problem is that the default access log is missing some important information. For example, its missing response_code_details value, which is pretty useful when the request failed… Its also missing traceability information (again, if you using some infra for it). We assumed that since Istio documentation is not describing how we can customize list of the fields — it’s not possible. Well, as one of our architects found out, it is possible and documentation is just located in different place — Envoy access log documentation. Its make perfect sense as istio-proxy built on the top of envoy proxy, but one small link from Istio documentation site to Envoy documentation site could make poor developers life a bit easier… As you can see in the link — access log can be easily configured and you can define much more options than just default… And by the way, in Envoy 1.18 its much easier to work with configuration than in previous versions — so strongly recommend to take a look on it, even if you are already happily configuring access logs from previous versions… Example from IstioOperator CR:

spec:
meshConfig:
accessLogEncoding: JSON
accessLogFile: "/dev/stdout"
accessLogFormat: |
{
"ota_trace": "%REQ(x-b3-traceid)%",
"start_time": "%START_TIME%",
"route_name": "%ROUTE_NAME%",
"method": "%REQ(:METHOD)%",
"path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%",
"protocol": "%PROTOCOL%",
"response_code": "%RESPONSE_CODE%",
"response_flags": "%RESPONSE_FLAGS%",
"response_code_details": "%RESPONSE_CODE_DETAILS%",
"local_reply": "%LOCAL_REPLY_BODY%",
"bytes_received": "%BYTES_RECEIVED%",
"bytes_sent": "%BYTES_SENT%",
"duration": "%DURATION%",
"upstream_service_time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
"x_forwarded_for": "%REQ(X-FORWARDED-FOR)%",
"user_agent": "%REQ(USER-AGENT)%",
"request_id": "%REQ(X-REQUEST-ID)%",
"authority": "%REQ(:AUTHORITY)%",
"upstream_host": "%UPSTREAM_HOST%",
"upstream_cluster": "%UPSTREAM_CLUSTER%",
"upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
"downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
"downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
"requested_server_name": "%REQUESTED_SERVER_NAME%",
"upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%"
}

And by the way, as envoy is writing some other logs and not only access logs — if you’d like to have all other logs in the json format as well — you need to configure it as well in IstioOperator CR:

values:
global:
logAsJson:true

--

--