My Istio adoption lessons learned — part 2

12 min readMar 1, 2021

Istio is really life saver when we are talking about securing K8s services. We just defining which namespace is controlled by Istio, specifying STRICT mode for MTLS and thats it — all the services starting get certificates, created by Istio and all the internal workload is ensured to be encrypted and authenticated using MTLS:

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "peer-authentication-all"
spec:
  mtls:
    mode: STRICT

So the life inside the mesh looks really good, right? But what about communication with external services (outside of mesh) And what if you need to authenticate external users or services using OIDC tokens (JWT)?. In this chapter of my lesson learned I’d like to discuss mostly communication and security of external workloads, both incoming (Ingress) and outgoing (Egress). So if your use case not includes any of it — you are probably don’t need to continue reading this article and invest your time in some more relevant articles :)

One of the greatest benefits of using K8s (at least in my eyes) is it’s really great isolation, which improves security of your services, if you are using it correctly. You completely controlling the way your services communicating with external services. There are many good practices of how exactly to do it. My favourites are restricting incoming and outgoing channels to minimum. There are many ways of how to do it in plain K8s. One of it is to put some API Gateway component as the only gate to the cluster and let API Gateway deal with authentication, authorization and routing. You can define egress rules in network policies to restrict the egress communication.

With Istio its even easier to control and secure incoming and outgoing traffic. While Istio has more than one option to allow incoming and outgoing traffic, I’d like to concentrate on Ingress Gateway for incoming and Egress Gateway for outgoing. Both are actually using under the hood exactly the same K8s resource: Istio Gateway. As you can see in the documentation: “Gateway describes a load balancer operating at the edge of the mesh receiving incoming or outgoing HTTP/TCP connections.” Pay special attention to the fact that actually Gateway is load balancer, or at least it have to use one in order to operate. The fact that you are not providing load balancer explicitly when you defining Gateway doesn’t mean that there is no one… So may be you do want to specify which one should be used, especially if you are using public cloud managed environment…

One really confusing thing about the Gateways, that there are two completely different places when you need to configure it. Actually, you are configuring there two completely different things, both called “Gateway” :) May be its just me, but its’ really hard to have discussions about the gateway, as you always need to explain what “Gateway” are you talking about.

So what are these two Gateways and why we need two:

Load Balancer itself. As part of the Istio Operator CR (remember I said that operator really simplify your life with Istio), you can define one or more Gateways, both Ingress and Egress. Why you need more than one? Well, excellent question… I personally think that one Ingress and one Egress is more than enough, but sometimes you want to separate your workloads and/or use multiple underlined load balancers. Please be really careful and read some documentation before you are creating multiple Ingress Gateway. There are plenty of resources you can find regarding pros and cons of using several gateway, but most of it recommending to use only one, unless it is just not enough for you…

Istio Custom Resource Gateway. There is a dedicated CRD in Istio with “kind: Gateway”. Here you are configuring how you are using the Load Balancer — which hosts you want to expose using the gateway, how you secure it and so on… This resource can be defined in each and every namespace and referenced later in VirtualService’s, for example. This resource used to configure both ingress and egress gateways and you can create several different resources for each load balancers

Ok, so after this long preface lets see some lessons learned:

You can define different types of incoming traffic with Ingress Gateway

When I started to use Ingress Gateway, I somehow missed the fact that it’s not necessary directly connected to the Ingress CR/controller in the K8s. As we all know, there are several ways to expose our services in K8s: Ingress, LoadBalancer, NodePort and even ClusterIp can be exposed as well. So when I heard “Ingress” in IngressGateway, I assumed that its just Istio IngressController implementation… So when one of our architects asked me: What if I cannot use IngressController from some reason? Or what if I want to use Nodeport instead? I said that I don’t know — probably you cannot use Istio for incoming communication. Which is both wrong answer and really weird :) In his use case, he wanted to do some local tests with “Kind”, which is not supporting IngressControllers at all(at least this is what he said). Perfectly valid use case, right? Actually, when you defining IngressGateway (the first one — in the Operator :)), you can define what exactly type of service you are using. This configuration is hiding here:

components:
    ingressGateways:
      - name: istio-ingressgateway
        k8s:
          service:
            type: NodePort

Except of the NodePort, you can use ClusterIp, ExternalName and LoadBalancer values. And of cause, you need to configure correctly all other service configuration

JWKS URI issue

If I want to authenticate external users/software — the easiest and most standard way is to use JWT token and verify it inside the mesh. I will not go into all bits and bytes of how exactly it is working, but in general: there is a provider that issuing JWT and signing it with its own private key. Than public key is exposed as JWKS (JSON Web Key Set) and used to verify the JWT signature. In Istio, in order to enable End User authentication with JWT, we need to provide RequestAuthentication CR. Really straightforward and easy to use… Except of the JWKS staff. There are two options of providing JWKS as part of the RequestAuthentication CR:

Embed it inside the CR
By providing jwksURI — uri to the service, providing the JWKS.

First option not really useful, as requires to put the JWKS in each and every CR, which requires maintenance of many CRs with same certificate. And actually we was not able to find too much examples about it. So second option looks like much more convenient: you just need to provide the uri (probably the service that signing on JWT will also expose this uri). And you done… Well it depends. If you have your own service or using some third party service and deploying it inside the K8s, and you providing FQDN of this uri inside your CR — you are in trouble:

apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
spec:
  jwtRules:
  - forwardOriginalToken: true
    issuer: exampleIssuer
    jwksUri: https://keys.myns.example.local/jwks.json

So how it’s working? IstioD reading this CR and trying to access the uri in order to get a jwks itself, in order to publish it later to relevant istio-proxies. But from some weird and unclear reason, the IstioD is not succeeding to perform the MTLS authentication to the service, exposing the uri. And if you’ll change https above to the http — its won’t work for you either (assuming that you are using STRICT mode for the service). Just with different error. There are plenty of open issues about it. So how it‘s supposed to work then? Actually, in all the examples, only external DNS is using. And yes, when you are using external DNS without MTLS — there is no any issue. Again, the problem is only with MTLS inside the cluster. And, its not possible to define only TLS for service authentication (peer authentication). Or at least, I don’t know how to do it... Yes, it is really weird and annoying behaviour. And, unfortunately we were not able to resolve it properly so far… So we did some not really nice workaround: instead of defining internal uri, we exposed this uri externally (using IngressGateway) and defined external uri in the RequestAuthentication CR:

apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
spec:
  jwtRules:
  - forwardOriginalToken: true
    issuer: exampleIssuer
    jwksUri: https://my.example.net/keys/jwks.json

In this case only simple TLS is using from the IstioD to IngressGateway. Then TLS is terminated and new MTLS initiated from the IngressGateway to the service. Here it is working just fine. We understand that what we did is not really nice and need to be fixed one day — but at least we are able now to authenticate external traffic…

Egress traffic

So we are know how to restrict the incoming traffic and its looks pretty easy and straight forward — but what about the outgoing traffic? There is no something specific in the plain K8s — you just need to define egress rules in the NetworkPolicies. Not really easy approach, I must say, but it’s working… So what about Istio? We already touched EgressGateways here, so lets dive a bit deeper… Spoiler — there are much more CRs that we need to configure if we’d like to control our outgoing traffic…

So lets start with the use case: lets assume that we have microservice that need to communicate with some external resource securely. Lets assume that the traffic is over http (not that its really matter, it’s can be tcp as well). Lets also assume that we’d like to make encrypted communication seamless to the microservice — so from developer’s perspective the communication to external services is not encrypted. One of the benefints isthe non-functional concerns offloading. We don’t want to deal with anything related to security in the microserveces, remember? (well, at least we want to try to deal it security in microservices as less as possilbe). But there is an additional benefit : if we are communicating with istio-proxy using plain text and let istio deal with the encryption - istio-proxy can see the whole payload and can provide us much more information…

Istio documentation have a lot of information about how to do the Egress, but all this information is spread between different pages and you need to invest some substantial amount of time to implement our easy and straightforward use case. Or at least I did. So let me save some time for you here:

First, we need to define below configuration in IstioOperator CR:

spec:
  meshConfig:
    outboundTrafficPolicy: 
      mode: REGISTRY_ONLY

When we have this configuration, “then the Istio proxy blocks any host without an HTTP service or service entry defined within the mesh”. This is how we can control egress traffic only to allowed and prevent malicious code to send some sensitive information outside…

Now we need to define ServiceEntry to our API:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-https
  namespace: your-namespace
spec:
  hosts:
  - some.external.api.com
  - another.provider.com
  ports:
  - number: 80
    name: http
    protocol: HTTP
  - number: 443
    name: https
    protocol: HTTPS
  resolution: DNS

Couple of lesson learned here:

Wildcard is not really working in Service Entries (at least we was not able to define it).
Pay attention that there are both ports 80 and 443 and both HTTP and HTTPs: yes — we are really need both of it, and no — we are don’t want to allow unencrypted external communication here. The reason is that there are several different places where we are going to use this configuration: functional microservice need HTTP and port 80, as it’s not aware about encryption, remember? And EgressGateway need HTTPS and port 443 as it’s going to initiate encrypted traffic.
You can use several different hosts in the same ServiceEntry or even all the hosts. Yes it is working. I still think that it’s better to have several ServiceEntries to separate different workloads…

Lets Configure now EgressGateway CR.

Note that like with IngressGateway’s we are talking here about two different things when we discussing EgressGateway. I don’t want to discuss the load balancer here — only the CR:

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: istio-egressgateway
  namespace: your-namespace
spec:
  selector:
    istio: egressgateway
  servers:
  - port:
      number: 80
      name: https-port-for-tls-origination
      protocol: HTTPS
    hosts:
    - some.external.api.com
    - another.provider.com
    - "*.using.wildcard.com"
    tls:
      mode: ISTIO_MUTUAL

Some lessons and clarification here as well:

I will be honest with you: it’s took me some time to understand that when we defining Egress Gateway — all the configuration is from the mesh to the Gateway and not from the Gateway to external service. May be it is just me, but I see it as a bit confusing fact.
Why port 80 and protocol HTTPS? Well, remember that microservice is not aware about the fact that the traffic is encrypted? So we configuring our microservice to connect to external url with port 80. Than istio-proxy originating TLS from to the Egress Gateway using the original port. Egress Gateway will terminate here TLS and will initiate new TLS communication, this time on the port 443 — a bit later about the details…
Pay attention that the desired TLS mode here is ISTIO_MUTUAL. So we let Istio deal with securing the traffic to the Egress Gateway.
Like with Service Entries — we can define several hosts. Here we can also use wildcards.

Now we need to define TWO Destination Rules.

One from the microservice and another from the Egress Gateway:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: egressgateway-for-external
  namespace: your-namespace
spec:
  host: istio-egressgateway.istio-system.svc.cluster.local
  subsets:
  - name: external-https
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
      portLevelSettings:
      - port:
          number: 80
        tls:
          mode: ISTIO_MUTUAL
          sni: some.external.api.com
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: originate-tls
  namespace: your-namespace
spec:
  host: some.external.api.com
  trafficPolicy:
    loadBalancer:
      simple: ROUND_ROBIN
    portLevelSettings:
    - port:
        number: 443
      tls:
        mode: SIMPLE 
        sni: some.external.api.com

More clarification and lesson learned:

First DestinationRule is defining traffic from the microservice to the Egress Gateway. You can see that it is matching pretty well Gateway CR, including the port and TLS mode.
Pay attention, that if you are copy pasting Istio documentation when defining the first DestinationRule (like me) — you’ll find out that sni is not defined there. This is what I did in the beginning. And it’s even worked for me in some use cases. But then it didn’t. And we spent whole day of debugging to understand what we did wrong. Until we found out that for second DestinationRule we do defined sni — so we tried the first as well. And it worked… So since than we are defining sni for all our destination rules and this is also what I’d recommend you to do.
Second DestinationRule is defining the traffic from the Egress Gateway to the external service — you can see here that the port is 443, host is the real host and the TLS mode is SIMPLE as we are not using MTLS authentication with external service, just encrypting the traffic.
According to Istio documentation, DestinationRules supporting wildcards. But it is not always working. Its depends of the certificates. So be careful here and if something is not working for you when you are using wildcard — well may be you cannot use it.
Not only we cannot use DestinationRule CR for multiple different hosts (like with EgressGateways and ServiceEntries) — we even need two DestinationRules per host :). Its pretty annoying, but I was not able to find another way…

And the last CR for now — VirtualService

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: virtualservice-external
  namespace: your-namespace
spec:
  hosts:
  - some.external.api.com
  gateways:
  - mesh
  - istio-egressgateway
  http:
  - match:
    - gateways:
      - mesh
      port: 80
    route:
    - destination:
        host: istio-egressgateway.istio-system.svc.cluster.local
        subset: external-https
        port:
          number: 80
  - match:
    - gateways:
      - istio-egressgateway
      port: 80
    route:
    - destination:
        host: some.external.api.com
        port:
          number: 443
      weight: 100

Lets see some clarification and lesson learned here:

As with DestinationRules, we need to define two matches here: one from the istio-proxy (from some reason you need to define as gateway with name mesh) and another one from the Egress Gateway to the external service.
Like with DestinationRules - in theory you can use wildcards, but not sure that it is going to work for you in all the use cases…
Again, like with the DestinationRules, we where not be able to use same VirtualService for more than one host. Here the problem is with second match: when we want to redirect the traffic from the gateway — not clear how we can define multiple destinations…

Summary about External Traffic configuration:

The fact, that we really need to summarise external traffic configuration, kind of alerting by itself — if you are using Istio documentation and defining all the CRs, you need to work hard for every single external API… And if you have many such APIs? Well — it is not really convenient way to spend time of your developers, right? And we are here to save their time, right? May be I’m too new to the Istio and don’t know some magic way to make this boilerplate work easier, but since this is came from Istio official documentation — not sure we have any other option…

Or may be we have… Since all the CRs looks pretty same for different hosts, — we can probably have our own operator. We can define one CRD with multiple hosts and generate/update all the Istio Specific Resources on the fly. Of cause assuming that all the use cases similar. Or may be our operator can be a bit more sophisticated… Or we can provide some build time utility that going to generate all of the resources by metadata. I’m thinking about writing and contributing such operator — but if there is already existing operator in place or there are other better ideas - I’ll be happy to use wisdom of the crowed.

My Istio adoption lessons learned — part 2

Written by Vitaly Elyashev