Series "Service Mesh"
- What is a Service Mesh?
- The Best Service Mesh: Linkerd vs Kuma vs Istio vs Consul Connect comparison + Cilium and OSM on top
- Do Developers need Service Mesh?
Do Developers need Service Mesh?
This article is sponsored by Allianz Direct. Read about our partnership here.
Service Mesh sounds like something focused on the infrastructure automation. It’s features around traffic management, observability and security are definitely exciting for any Infrastructure Engineer. But what about developers? Is there any reason for them to mess around with the Service Mesh? Let’s find out in this artcle.
First question to answer is: should you, as a developer, even care about having a service mesh? We already gave the answer in our introduction to the service mesh - I highly recommend to watch it before proceeding with this article. In short, you will only benefit from the service mesh if you have many different services talking to each other.
Important point here is that they do need to talk to each other, because even if you have 100 applications, if all of them are completely independent from each other and belong to different teams or different parts of the business, then, as a developer of one of these applications, you might not benefit much from the service mesh. You can not build a mesh from 2 containers, so to speak.
Now, let’s assume you have a system, or application, that consists of roughly a dozen microservices that communicate with each other in one way or another. This is the kind of setup where you, as a developer, might see a benefit of a service mesh.
Now let’s look at the 3 pillars of the service mesh - that is, observability, traffic management and security - through the developer lens. We will provide examples in Linkerd, but everything we are going to look at is applicable to any other service mesh.
One of the immediate benefits that you get from the service mesh is it’s observability features. Once you install the mesh and make all of your microservices part of it, you immediately get a set of industry standard metrics and dashboards, most importantly the so called golden metrics - that is, latency, success rate and a request rate.
For example, I can view those metrics directly in the Linkerd dashboard, or in Grafana. I can also see from which sources traffic comes to this particular service, as well as per-route metrics. I can go deeper and inspect the live traffic with the Linkerd Tap feature and inspect particular requests.
The benefit here is not just the metrics themselves, but the automation of those particular metrics across all microservices in the mesh, and the ability to investigate ongoing issues end to end. You can include distributing tracing on top of this, to get even higher level of observability.
The downside of such standardisation of metrics and dashboards is that if you want to get any application-specific telemetry, you still need to implement it on your own or with the help of your infrastructure or platform team.
In other words, for the developer, the service mesh will cover 70-80% of your needs in terms of observability, but to get the proper end to end overview of what’s happening, you still need to introduce more tools and automations on top of the mesh.
Let’s say, that observability alone is not the reason to care about service mesh in the first place, but if you do have the mesh, you will some nice default observability features out of the box.
Another important note here is that the service mesh gives you observability inside the mesh, and, for example, inside Linkerd you can not see any outgoing traffic at all. What happens in the mesh, stays only in the mesh.
The next pillar of the service mesh is traffic management. This is the part that is most beneficial to developers, but also the one that highly depends on which particular mesh you are using.
Most obvious feature is the ability to split the traffic between two versions of the application. You can roll out a new deployment only to the small percentage of your users, and then gradually increase this percentage.
You can also configure retries inside the service mesh. If it happens that one of your endpoints breaks every now and then, you can delegate to the service mesh retrying requests to this endpoint automatically. It’s up to you if you actually want those retries to happen outside of your application’s code.
In case of Linkerd, this is as far as traffic management goes - traffic splits and retries. In case of other service meshes, like Istio, you can do way more advanced things, including fault injection, headers manipulation, traffic mirroring and many more.
Same as with observability, traffic management features of the service mesh work only inside the service mesh. The load balancer of your Kubernetes cluster needs to be part of the service mesh for traffic splits and other things to work. This is something your infrastructure team has to provide, and if not, then you can only manage traffic between microservices, and not between your users and your applications.
In any case, if your microservices require some advanced routing or splitting of the traffic, then Service Mesh is the easiest way to get it - you won’t need to modify your application code and you can change traffic policies with a simple YAML configuration.
With this power comes great responsibility - if you, as a developer, configure some advanced routing inside the mesh, and production traffic suffers as a result of those changes, your infrastructure team might have a hard time debugging what happened, unless they were involved in the code review of those changes.
This is the general concept of service mesh when it comes to DevOps: developers get an easier way to configure operations-related functionality, and thus there is a requirement for a proper communication and collaboration between them and the people who maintain the cluster and the mesh, most often the platform team.
Now let’s talk about security.
Service Mesh security consists of two interrelated parts.
First is mTLS. With the service mesh, you can encrypt all the traffic between your services. Do you, as a developer, sincerely care if all the internal traffic is encrypted? Most likely you don’t. Security team probably cares, as well the infrastructure team. But application developer? It’s probably a nice feeling that even if your micro service happens to send sensitive information to another micro service in a plain text, then service mesh proxies will still take care of making sure this information is encrypted. But on a day to day business, you as a developer probably don’t care that much about thast.
The second part of security features is authorisation. Service Mesh allows you to restrict which other services can access your application. In case of Linkerd, polices are very simple - you can allow or deny complete traffic on some port. With Istio, you can further restrict to which particular routes access is allowed or denied.
The big question here is if the service mesh the right place to do advanced authorisation and authentication. After all, it either means that you completely remove this part from your code, or you split those features between the mesh and the application code. In the first case, your application is one YAML configuration away from being exposed.
In the second case, you might as well just to the whole thing inside the application code. The problem then, how do you handle encryption if you already have the service mesh and you still want to do authentication and authorisation in the app? This brings us to the final part of this article.
DevOps consulting: DevOps is a cultural and technological journey. We'll be thrilled to be your guides on any part of this journey. About consulting
If you happen to have many microservices and Kubernetes, and your platform team installs a Service Mesh, you have an opportunity to delegate some of the features around traffic management and security to the mesh.
You need to decide if those features are part of the infrastructure, or if they are part of the application layer.
One can argue that things like retries and traffic splits obviously belong to the infrastructure and there is no reason to pollute your application code with this.
On the other side, authentication and authorisation can be very application-specific, and it can be very tricky, or even dangerous to delegate it to the service mesh completely.
Once the service mesh is there, you need to work together with your infrastructure team to evaluate which features of the mesh you should use and include in your application’s Helm Chart, which ones you don’t really care about, and which ones you’d rather implement in the source code.
And even if you decide that none of the features are really useful for your particular environment, at least you still get some extra metrics and dashboards across the board.
If your company needs help evaluating, selecting and integrating service mesh, please contact us and we will schedule a call to discuss your needs and how we can help you with them.
Here's the same article in video form for your convenience: